Hi,

I have applied this patch to the same QEMU source tree where I reproduced
this issue. I changed the queue size to 3 on the OSv side to trigger this
bug, but unfortunately, I still see the same behavior of the OSv guest
hanging.

Regards,
Waldemar Kozaczuk

On Fri, Oct 25, 2024 at 6:51 AM Klaus Jensen <[email protected]> wrote:

> From: Klaus Jensen <[email protected]>
>
> If a host chooses to use the SQHD "hint" in the CQE to know if there is
> room in the submission queue for additional commands, it may result in a
> situation where there are not enough internal resources (struct
> NvmeRequest) available to process the command. For a lack of a better
> term, the host may "over-commit" the device (i.e., it may have more
> inflight commands than the queue size).
>
> For example, assume a queue with N entries. The host submits N commands
> and all are picked up for processing, advancing the head and emptying
> the queue. Regardless of which of these N commands complete first, the
> SQHD field of that CQE will indicate to the host that the queue is
> empty, which allows the host to issue N commands again. However, if the
> device has not posted CQEs for all the previous commands yet, the device
> will have less than N resources available to process the commands, so
> queue processing is suspended.
>
> And here lies an 11 year latent bug. In the absense of any additional
> tail updates on the submission queue, we never schedule the processing
> bottom-half again unless we observe a head update on an associated full
> completion queue. This has been sufficient to handle N-to-1 SQ/CQ setups
> (in the absense of over-commit of course). Incidentially, that "kick all
> associated SQs" mechanism can now be killed since we now just schedule
> queue processing when we return a processing resource to a non-empty
> submission queue, which happens to cover both edge cases.
>
> So, apparently, no previous driver tested with hw/nvme has ever used
> SQHD (e.g., neither the Linux NVMe driver or SPDK uses it). But then OSv
> shows up with the driver that actually does. I salute you.
>
> Fixes: f3c507adcd7b ("NVMe: Initial commit for new storage interface")
> Cc: [email protected]
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2388
> Reported-by: Waldemar Kozaczuk <[email protected]>
> Signed-off-by: Klaus Jensen <[email protected]>
> ---
>  hw/nvme/ctrl.c | 16 +++++++---------
>  1 file changed, 7 insertions(+), 9 deletions(-)
>
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index
> f4e89203c1a6e3b051fd7185cbf01ec9bae9684a..b13585c4da911b9e8ae4a722761fd85dfa24be4d
> 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -1520,9 +1520,16 @@ static void nvme_post_cqes(void *opaque)
>              stl_le_p(&n->bar.csts, NVME_CSTS_FAILED);
>              break;
>          }
> +
>          QTAILQ_REMOVE(&cq->req_list, req, entry);
> +
>          nvme_inc_cq_tail(cq);
>          nvme_sg_unmap(&req->sg);
> +
> +        if (QTAILQ_EMPTY(&sq->req_list) && !nvme_sq_empty(sq)) {
> +            qemu_bh_schedule(sq->bh);
> +        }
> +
>          QTAILQ_INSERT_TAIL(&sq->req_list, req, entry);
>      }
>      if (cq->tail != cq->head) {
> @@ -7950,7 +7957,6 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr
> addr, int val)
>          /* Completion queue doorbell write */
>
>          uint16_t new_head = val & 0xffff;
> -        int start_sqs;
>          NvmeCQueue *cq;
>
>          qid = (addr - (0x1000 + (1 << 2))) >> 3;
> @@ -8001,18 +8007,10 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr
> addr, int val)
>
>          trace_pci_nvme_mmio_doorbell_cq(cq->cqid, new_head);
>
> -        start_sqs = nvme_cq_full(cq) ? 1 : 0;
>          cq->head = new_head;
>          if (!qid && n->dbbuf_enabled) {
>              stl_le_pci_dma(pci, cq->db_addr, cq->head,
> MEMTXATTRS_UNSPECIFIED);
>          }
> -        if (start_sqs) {
> -            NvmeSQueue *sq;
> -            QTAILQ_FOREACH(sq, &cq->sq_list, entry) {
> -                qemu_bh_schedule(sq->bh);
> -            }
> -            qemu_bh_schedule(cq->bh);
> -        }
>
>          if (cq->tail == cq->head) {
>              if (cq->irq_enabled) {
>
> ---
> base-commit: e67b7aef7c7f67ecd0282e903e0daff806d5d680
> change-id: 20241025-issue-2388-bd047487f74c
>
> Best regards,
> --
> Klaus Jensen <[email protected]>
>
>

Reply via email to