Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-19 Thread Guenter Roeck
On 1/19/23 00:02, Klaus Jensen wrote: On Jan 19 08:28, Klaus Jensen wrote: On Jan 18 21:03, Keith Busch wrote: On Thu, Jan 19, 2023 at 01:10:57PM +1000, Alistair Francis wrote: On Thu, Jan 19, 2023 at 12:44 PM Keith Busch wrote: Further up, it says the "interrupt gateway" is responsible

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-19 Thread Peter Maydell
On Thu, 19 Jan 2023 at 04:03, Keith Busch wrote: > > On Thu, Jan 19, 2023 at 01:10:57PM +1000, Alistair Francis wrote: > > On Thu, Jan 19, 2023 at 12:44 PM Keith Busch wrote: > > > > > > Further up, it says the "interrupt gateway" is responsible for > > > forwarding new interrupt requests while

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-19 Thread Klaus Jensen
On Jan 19 08:28, Klaus Jensen wrote: > On Jan 18 21:03, Keith Busch wrote: > > On Thu, Jan 19, 2023 at 01:10:57PM +1000, Alistair Francis wrote: > > > On Thu, Jan 19, 2023 at 12:44 PM Keith Busch wrote: > > > > > > > > Further up, it says the "interrupt gateway" is responsible for > > > >

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-18 Thread Klaus Jensen
On Jan 18 21:03, Keith Busch wrote: > On Thu, Jan 19, 2023 at 01:10:57PM +1000, Alistair Francis wrote: > > On Thu, Jan 19, 2023 at 12:44 PM Keith Busch wrote: > > > > > > Further up, it says the "interrupt gateway" is responsible for > > > forwarding new interrupt requests while the level

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-18 Thread Klaus Jensen
On Jan 18 15:26, Keith Busch wrote: > Klaus, > > This isn't going to help your issue, but there are at least two legacy > irq bugs in the nvme qemu implementation. > > 1. The admin queue breaks if start with legacy and later initialize > msix. > Hmm. Interesting that we have not encountered

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-18 Thread Keith Busch
On Thu, Jan 19, 2023 at 01:10:57PM +1000, Alistair Francis wrote: > On Thu, Jan 19, 2023 at 12:44 PM Keith Busch wrote: > > > > Further up, it says the "interrupt gateway" is responsible for > > forwarding new interrupt requests while the level remains asserted, but > > it doesn't look like

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-18 Thread Alistair Francis
On Thu, Jan 19, 2023 at 12:44 PM Keith Busch wrote: > > On Thu, Jan 19, 2023 at 10:41:42AM +1000, Alistair Francis wrote: > > On Thu, Jan 19, 2023 at 9:07 AM Keith Busch wrote: > > > --- > > > diff --git a/hw/intc/sifive_plic.c b/hw/intc/sifive_plic.c > > > index c2dfacf028..f8f7af08dc 100644 >

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-18 Thread Keith Busch
On Thu, Jan 19, 2023 at 10:41:42AM +1000, Alistair Francis wrote: > On Thu, Jan 19, 2023 at 9:07 AM Keith Busch wrote: > > --- > > diff --git a/hw/intc/sifive_plic.c b/hw/intc/sifive_plic.c > > index c2dfacf028..f8f7af08dc 100644 > > --- a/hw/intc/sifive_plic.c > > +++ b/hw/intc/sifive_plic.c > >

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-18 Thread Alistair Francis
On Thu, Jan 19, 2023 at 9:07 AM Keith Busch wrote: > > On Wed, Jan 18, 2023 at 09:33:05AM -0700, Keith Busch wrote: > > On Wed, Jan 18, 2023 at 03:04:06PM +, Peter Maydell wrote: > > > On Tue, 17 Jan 2023 at 19:21, Guenter Roeck wrote: > > > > Anyway - any idea what to do to help figuring

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-18 Thread Keith Busch
On Wed, Jan 18, 2023 at 09:33:05AM -0700, Keith Busch wrote: > On Wed, Jan 18, 2023 at 03:04:06PM +, Peter Maydell wrote: > > On Tue, 17 Jan 2023 at 19:21, Guenter Roeck wrote: > > > Anyway - any idea what to do to help figuring out what is happening ? > > > Add tracing support to pci

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-18 Thread Keith Busch
Klaus, This isn't going to help your issue, but there are at least two legacy irq bugs in the nvme qemu implementation. 1. The admin queue breaks if start with legacy and later initialize msix. 2. The legacy vector is shared among all queues, but it's being deasserted when the first queue's

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-18 Thread Keith Busch
On Wed, Jan 18, 2023 at 03:04:06PM +, Peter Maydell wrote: > On Tue, 17 Jan 2023 at 19:21, Guenter Roeck wrote: > > Anyway - any idea what to do to help figuring out what is happening ? > > Add tracing support to pci interrupt handling, maybe ? > > For intermittent bugs, I like recording the

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-18 Thread Peter Maydell
On Tue, 17 Jan 2023 at 19:21, Guenter Roeck wrote: > Anyway - any idea what to do to help figuring out what is happening ? > Add tracing support to pci interrupt handling, maybe ? For intermittent bugs, I like recording the QEMU session under rr (using its chaos mode to provoke the failure if

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-17 Thread Keith Busch
On Thu, Jan 12, 2023 at 02:10:51PM +0100, Klaus Jensen wrote: > Hi all (linux-nvme, qemu-devel, maintainers), > > On QEMU riscv64, which does not use MSI/MSI-X and thus relies on > pin-based interrupts, I'm seeing occasional completion timeouts, i.e. > > nvme nvme0: I/O 333 QID 1 timeout,

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-17 Thread Guenter Roeck
On Tue, Jan 17, 2023 at 04:18:14PM +, Peter Maydell wrote: > On Tue, 17 Jan 2023 at 16:10, Guenter Roeck wrote: > > > > On Mon, Jan 16, 2023 at 09:58:13PM -0700, Keith Busch wrote: > > > On Mon, Jan 16, 2023 at 10:14:07PM +0100, Klaus Jensen wrote: > > > > I noticed that the Linux driver does

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-17 Thread Peter Maydell
On Tue, 17 Jan 2023 at 16:10, Guenter Roeck wrote: > > On Mon, Jan 16, 2023 at 09:58:13PM -0700, Keith Busch wrote: > > On Mon, Jan 16, 2023 at 10:14:07PM +0100, Klaus Jensen wrote: > > > I noticed that the Linux driver does not use the INTMS/INTMC registers > > > to mask interrupts on the

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-17 Thread Guenter Roeck
On Mon, Jan 16, 2023 at 09:58:13PM -0700, Keith Busch wrote: > On Mon, Jan 16, 2023 at 10:14:07PM +0100, Klaus Jensen wrote: > > I noticed that the Linux driver does not use the INTMS/INTMC registers > > to mask interrupts on the controller while processing CQEs. While not > > required by the

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-17 Thread Guenter Roeck
On Mon, Jan 16, 2023 at 10:14:07PM +0100, Klaus Jensen wrote: [ ... ] > > I noticed that the Linux driver does not use the INTMS/INTMC registers > to mask interrupts on the controller while processing CQEs. While not > required by the spec, it is *recommended* in setups not using MSI-X to >

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-16 Thread Keith Busch
On Mon, Jan 16, 2023 at 10:14:07PM +0100, Klaus Jensen wrote: > I noticed that the Linux driver does not use the INTMS/INTMC registers > to mask interrupts on the controller while processing CQEs. While not > required by the spec, it is *recommended* in setups not using MSI-X to > reduce the risk

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-16 Thread Klaus Jensen
On Jan 12 14:10, Klaus Jensen wrote: > Hi all (linux-nvme, qemu-devel, maintainers), > > On QEMU riscv64, which does not use MSI/MSI-X and thus relies on > pin-based interrupts, I'm seeing occasional completion timeouts, i.e. > > nvme nvme0: I/O 333 QID 1 timeout, completion polled > > To

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-13 Thread Keith Busch
On Fri, Jan 13, 2023 at 12:32:29PM +, Peter Maydell wrote: > On Fri, 13 Jan 2023 at 08:55, Klaus Jensen wrote: > > > > +CC qemu pci maintainers > > > > Michael, Marcel, > > > > Do you have any comments on this thread? As you can see one solution is > > to simply deassert prior to asserting,

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-13 Thread Klaus Jensen
On Jan 13 12:42, Peter Maydell wrote: > On Fri, 13 Jan 2023 at 12:37, Klaus Jensen wrote: > > There are a fair amount of uses of pci_irq_pulse() still left in the > > tree. > > Are there? I feel like I'm missing something here: > $ git grep pci_irq_pulse > include/hw/pci/pci.h:static inline void

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-13 Thread Peter Maydell
On Fri, 13 Jan 2023 at 12:37, Klaus Jensen wrote: > There are a fair amount of uses of pci_irq_pulse() still left in the > tree. Are there? I feel like I'm missing something here: $ git grep pci_irq_pulse include/hw/pci/pci.h:static inline void pci_irq_pulse(PCIDevice *pci_dev) $ ...looks at

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-13 Thread Klaus Jensen
On Jan 13 12:32, Peter Maydell wrote: > On Fri, 13 Jan 2023 at 08:55, Klaus Jensen wrote: > > > > +CC qemu pci maintainers > > > > Michael, Marcel, > > > > Do you have any comments on this thread? As you can see one solution is > > to simply deassert prior to asserting, the other is to

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-13 Thread Peter Maydell
On Fri, 13 Jan 2023 at 08:55, Klaus Jensen wrote: > > +CC qemu pci maintainers > > Michael, Marcel, > > Do you have any comments on this thread? As you can see one solution is > to simply deassert prior to asserting, the other is to reintroduce a > pci_irq_pulse(). Both seem to solve the issue.

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-13 Thread Klaus Jensen
+CC qemu pci maintainers Michael, Marcel, Do you have any comments on this thread? As you can see one solution is to simply deassert prior to asserting, the other is to reintroduce a pci_irq_pulse(). Both seem to solve the issue. On Jan 12 14:10, Klaus Jensen wrote: > Hi all (linux-nvme,

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-12 Thread Guenter Roeck
On 1/12/23 11:27, Keith Busch wrote: On Thu, Jan 12, 2023 at 06:45:55PM +0100, Klaus Jensen wrote: On Jan 12 09:34, Keith Busch wrote: On Thu, Jan 12, 2023 at 02:10:51PM +0100, Klaus Jensen wrote: The pin-based interrupt logic in hw/nvme seems sound enough to me, so I am wondering if there

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-12 Thread Guenter Roeck
On 1/12/23 09:45, Klaus Jensen wrote: On Jan 12 09:34, Keith Busch wrote: On Thu, Jan 12, 2023 at 02:10:51PM +0100, Klaus Jensen wrote: The pin-based interrupt logic in hw/nvme seems sound enough to me, so I am wondering if there is something going on with the kernel driver (but I certainly

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-12 Thread Keith Busch
On Thu, Jan 12, 2023 at 06:45:55PM +0100, Klaus Jensen wrote: > On Jan 12 09:34, Keith Busch wrote: > > On Thu, Jan 12, 2023 at 02:10:51PM +0100, Klaus Jensen wrote: > > > > > > The pin-based interrupt logic in hw/nvme seems sound enough to me, so I > > > am wondering if there is something going

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-12 Thread Guenter Roeck
On 1/12/23 09:45, Klaus Jensen wrote: On Jan 12 09:34, Keith Busch wrote: On Thu, Jan 12, 2023 at 02:10:51PM +0100, Klaus Jensen wrote: The pin-based interrupt logic in hw/nvme seems sound enough to me, so I am wondering if there is something going on with the kernel driver (but I certainly

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-12 Thread Keith Busch
On Thu, Jan 12, 2023 at 06:45:55PM +0100, Klaus Jensen wrote: > On Jan 12 09:34, Keith Busch wrote: > > On Thu, Jan 12, 2023 at 02:10:51PM +0100, Klaus Jensen wrote: > > > > > > The pin-based interrupt logic in hw/nvme seems sound enough to me, so I > > > am wondering if there is something going

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-12 Thread Klaus Jensen
On Jan 12 09:34, Keith Busch wrote: > On Thu, Jan 12, 2023 at 02:10:51PM +0100, Klaus Jensen wrote: > > > > The pin-based interrupt logic in hw/nvme seems sound enough to me, so I > > am wondering if there is something going on with the kernel driver (but > > I certainly do not rule out that

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-12 Thread Keith Busch
On Thu, Jan 12, 2023 at 02:10:51PM +0100, Klaus Jensen wrote: > > The pin-based interrupt logic in hw/nvme seems sound enough to me, so I > am wondering if there is something going on with the kernel driver (but > I certainly do not rule out that hw/nvme is at fault here, since > pin-based

Re: completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-12 Thread Guenter Roeck
On 1/12/23 05:10, Klaus Jensen wrote: Hi all (linux-nvme, qemu-devel, maintainers), On QEMU riscv64, which does not use MSI/MSI-X and thus relies on pin-based interrupts, I'm seeing occasional completion timeouts, i.e. nvme nvme0: I/O 333 QID 1 timeout, completion polled To rule out issues

completion timeouts with pin-based interrupts in QEMU hw/nvme

2023-01-12 Thread Klaus Jensen
Hi all (linux-nvme, qemu-devel, maintainers), On QEMU riscv64, which does not use MSI/MSI-X and thus relies on pin-based interrupts, I'm seeing occasional completion timeouts, i.e. nvme nvme0: I/O 333 QID 1 timeout, completion polled To rule out issues with shadow doorbells (which have been a