Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
On Wed, Nov 14, 2018 at 02:30:14PM +0100, Lukas Wunner wrote: > On Wed, Nov 14, 2018 at 11:52:25AM +0200, mika.westerb...@linux.intel.com > wrote: > > On Tue, Nov 13, 2018 at 03:57:47PM +, Shameerali Kolothum Thodi wrote: > > > > The smb_mb() thing is not that clear (at least to me) because it is used > > > > in two places in the driver and both seem to be making write to > > > > ctrl->cmd_busy visible to other CPUs but I don't see where we deal with > > > > the read part. > > > > > > > > I may be missing something, though. > > > > > > I think the read part is in wait_event_timeout() which evaluates the > > > condition. The wake_up is called from the pciehp_isr(). Since the flag > > > is being updated in both process level and interrupt handler context, > > > smp_mb() is used. I think the same now applies to ctrl->slot_ctrl now > > > as this being used in process context and interrupt context as well. > > > > Right, but that would require to use another read/general barrier in the > > pciehp_isr() before we read the variable in case interrupt happens > > immediately on another CPU (at least that's my understanding). > > In pcie_do_write_cmd(), please just move the > > ctrl->slot_ctrl = slot_ctrl; > > above the call to pcie_capability_write_word(). > > AFAICS an explicit memory barrier isn't needed here because of the call to > pcie_capability_write_word(), which "will [ordinarily] be guaranteed to be > fully ordered and uncombined" (Documentation/memory-barriers.txt, section > "KERNEL I/O BARRIER EFFECTS"). > > The memory barrier in pciehp_isr() is also bogus because the following > wake_up() implies a memory barrier if a task was woken. (And if none > was woken, who cares.) > > > > Since I'm > > not too comfortable with all these barriers to be honest I would prefer > > reading the slot control register directly in pciehp_isr() :-) > > That is an approach I'd strongly object to: While pciehp itself only > signals very few interrupts (making an additional mmio read appear to > be negligible), it may share its interrupt with other devices. On my > MacBookPro9,1, a hotplug port of the Thunderbolt controller shares > its interrupt line with the Wifi card and SD card reader, and those > may signal a huge number of interrupts. On such a machine an additional > mmio read per interrupt becomes a problem. OK. I just sent a patch moving ctrl->slot_ctrl assignment to happen before pcie_capability_write_word().
Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
On Wed, Nov 14, 2018 at 02:30:14PM +0100, Lukas Wunner wrote: > On Wed, Nov 14, 2018 at 11:52:25AM +0200, mika.westerb...@linux.intel.com > wrote: > > On Tue, Nov 13, 2018 at 03:57:47PM +, Shameerali Kolothum Thodi wrote: > > > > The smb_mb() thing is not that clear (at least to me) because it is used > > > > in two places in the driver and both seem to be making write to > > > > ctrl->cmd_busy visible to other CPUs but I don't see where we deal with > > > > the read part. > > > > > > > > I may be missing something, though. > > > > > > I think the read part is in wait_event_timeout() which evaluates the > > > condition. The wake_up is called from the pciehp_isr(). Since the flag > > > is being updated in both process level and interrupt handler context, > > > smp_mb() is used. I think the same now applies to ctrl->slot_ctrl now > > > as this being used in process context and interrupt context as well. > > > > Right, but that would require to use another read/general barrier in the > > pciehp_isr() before we read the variable in case interrupt happens > > immediately on another CPU (at least that's my understanding). > > In pcie_do_write_cmd(), please just move the > > ctrl->slot_ctrl = slot_ctrl; > > above the call to pcie_capability_write_word(). > > AFAICS an explicit memory barrier isn't needed here because of the call to > pcie_capability_write_word(), which "will [ordinarily] be guaranteed to be > fully ordered and uncombined" (Documentation/memory-barriers.txt, section > "KERNEL I/O BARRIER EFFECTS"). > > The memory barrier in pciehp_isr() is also bogus because the following > wake_up() implies a memory barrier if a task was woken. (And if none > was woken, who cares.) > > > > Since I'm > > not too comfortable with all these barriers to be honest I would prefer > > reading the slot control register directly in pciehp_isr() :-) > > That is an approach I'd strongly object to: While pciehp itself only > signals very few interrupts (making an additional mmio read appear to > be negligible), it may share its interrupt with other devices. On my > MacBookPro9,1, a hotplug port of the Thunderbolt controller shares > its interrupt line with the Wifi card and SD card reader, and those > may signal a huge number of interrupts. On such a machine an additional > mmio read per interrupt becomes a problem. OK. I just sent a patch moving ctrl->slot_ctrl assignment to happen before pcie_capability_write_word().
Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
On Wed, Nov 14, 2018 at 11:52:25AM +0200, mika.westerb...@linux.intel.com wrote: > On Tue, Nov 13, 2018 at 03:57:47PM +, Shameerali Kolothum Thodi wrote: > > > The smb_mb() thing is not that clear (at least to me) because it is used > > > in two places in the driver and both seem to be making write to > > > ctrl->cmd_busy visible to other CPUs but I don't see where we deal with > > > the read part. > > > > > > I may be missing something, though. > > > > I think the read part is in wait_event_timeout() which evaluates the > > condition. The wake_up is called from the pciehp_isr(). Since the flag > > is being updated in both process level and interrupt handler context, > > smp_mb() is used. I think the same now applies to ctrl->slot_ctrl now > > as this being used in process context and interrupt context as well. > > Right, but that would require to use another read/general barrier in the > pciehp_isr() before we read the variable in case interrupt happens > immediately on another CPU (at least that's my understanding). In pcie_do_write_cmd(), please just move the ctrl->slot_ctrl = slot_ctrl; above the call to pcie_capability_write_word(). AFAICS an explicit memory barrier isn't needed here because of the call to pcie_capability_write_word(), which "will [ordinarily] be guaranteed to be fully ordered and uncombined" (Documentation/memory-barriers.txt, section "KERNEL I/O BARRIER EFFECTS"). The memory barrier in pciehp_isr() is also bogus because the following wake_up() implies a memory barrier if a task was woken. (And if none was woken, who cares.) > Since I'm > not too comfortable with all these barriers to be honest I would prefer > reading the slot control register directly in pciehp_isr() :-) That is an approach I'd strongly object to: While pciehp itself only signals very few interrupts (making an additional mmio read appear to be negligible), it may share its interrupt with other devices. On my MacBookPro9,1, a hotplug port of the Thunderbolt controller shares its interrupt line with the Wifi card and SD card reader, and those may signal a huge number of interrupts. On such a machine an additional mmio read per interrupt becomes a problem. Thanks, Lukas
Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
On Wed, Nov 14, 2018 at 11:52:25AM +0200, mika.westerb...@linux.intel.com wrote: > On Tue, Nov 13, 2018 at 03:57:47PM +, Shameerali Kolothum Thodi wrote: > > > The smb_mb() thing is not that clear (at least to me) because it is used > > > in two places in the driver and both seem to be making write to > > > ctrl->cmd_busy visible to other CPUs but I don't see where we deal with > > > the read part. > > > > > > I may be missing something, though. > > > > I think the read part is in wait_event_timeout() which evaluates the > > condition. The wake_up is called from the pciehp_isr(). Since the flag > > is being updated in both process level and interrupt handler context, > > smp_mb() is used. I think the same now applies to ctrl->slot_ctrl now > > as this being used in process context and interrupt context as well. > > Right, but that would require to use another read/general barrier in the > pciehp_isr() before we read the variable in case interrupt happens > immediately on another CPU (at least that's my understanding). In pcie_do_write_cmd(), please just move the ctrl->slot_ctrl = slot_ctrl; above the call to pcie_capability_write_word(). AFAICS an explicit memory barrier isn't needed here because of the call to pcie_capability_write_word(), which "will [ordinarily] be guaranteed to be fully ordered and uncombined" (Documentation/memory-barriers.txt, section "KERNEL I/O BARRIER EFFECTS"). The memory barrier in pciehp_isr() is also bogus because the following wake_up() implies a memory barrier if a task was woken. (And if none was woken, who cares.) > Since I'm > not too comfortable with all these barriers to be honest I would prefer > reading the slot control register directly in pciehp_isr() :-) That is an approach I'd strongly object to: While pciehp itself only signals very few interrupts (making an additional mmio read appear to be negligible), it may share its interrupt with other devices. On my MacBookPro9,1, a hotplug port of the Thunderbolt controller shares its interrupt line with the Wifi card and SD card reader, and those may signal a huge number of interrupts. On such a machine an additional mmio read per interrupt becomes a problem. Thanks, Lukas
Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
On Tue, Nov 13, 2018 at 03:57:47PM +, Shameerali Kolothum Thodi wrote: > > The smb_mb() thing is not that clear (at least to me) because it is used > > in two places in the driver and both seem to be making write to > > ctrl->cmd_busy visible to other CPUs but I don't see where we deal with > > the read part. > > > > I may be missing something, though. > > I think the read part is in wait_event_timeout() which evaluates the > condition. > The wake_up is called from the pciehp_isr(). Since the flag is being updated > in both process level and interrupt handler context, smp_mb() is used. I think > the same now applies to ctrl->slot_ctrl now as this being used in process > context and interrupt context as well. Right, but that would require to use another read/general barrier in the pciehp_isr() before we read the variable in case interrupt happens immediately on another CPU (at least that's my understanding). Since I'm not too comfortable with all these barriers to be honest I would prefer reading the slot control register directly in pciehp_isr() :-) I wonder if the below works in your case? I think it is still easier to understand than adding another barrier there. diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 7dd443aea5a5..575da1005836 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -518,11 +518,9 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) u16 status, events; /* -* Interrupts only occur in D3hot or shallower and only if enabled -* in the Slot Control register (PCIe r4.0, sec 6.7.3.4). +* Interrupts only occur in D3hot or shallower (PCIe r4.0, sec 6.7.3.4). */ - if (pdev->current_state == PCI_D3cold || - (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode)) + if (pdev->current_state == PCI_D3cold) return IRQ_NONE; /* @@ -548,6 +546,22 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) return IRQ_NONE; } + if (!pciehp_poll_mode) { + u16 ctrl; + + /* +* Check that the hotplug interrupt was enabled. It may +* be that the interrupt was meant for PME instead as +* they share the MSI vector. +*/ + pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, ); + if (ctrl == (u16) ~0 || !(ctrl & PCI_EXP_SLTCTL_HPIE)) { + if (parent) + pm_runtime_put(parent); + return IRQ_NONE; + } + } + /* * Slot Status contains plain status bits as well as event * notification bits; right now we only want the event bits.
Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
On Tue, Nov 13, 2018 at 03:57:47PM +, Shameerali Kolothum Thodi wrote: > > The smb_mb() thing is not that clear (at least to me) because it is used > > in two places in the driver and both seem to be making write to > > ctrl->cmd_busy visible to other CPUs but I don't see where we deal with > > the read part. > > > > I may be missing something, though. > > I think the read part is in wait_event_timeout() which evaluates the > condition. > The wake_up is called from the pciehp_isr(). Since the flag is being updated > in both process level and interrupt handler context, smp_mb() is used. I think > the same now applies to ctrl->slot_ctrl now as this being used in process > context and interrupt context as well. Right, but that would require to use another read/general barrier in the pciehp_isr() before we read the variable in case interrupt happens immediately on another CPU (at least that's my understanding). Since I'm not too comfortable with all these barriers to be honest I would prefer reading the slot control register directly in pciehp_isr() :-) I wonder if the below works in your case? I think it is still easier to understand than adding another barrier there. diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 7dd443aea5a5..575da1005836 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -518,11 +518,9 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) u16 status, events; /* -* Interrupts only occur in D3hot or shallower and only if enabled -* in the Slot Control register (PCIe r4.0, sec 6.7.3.4). +* Interrupts only occur in D3hot or shallower (PCIe r4.0, sec 6.7.3.4). */ - if (pdev->current_state == PCI_D3cold || - (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode)) + if (pdev->current_state == PCI_D3cold) return IRQ_NONE; /* @@ -548,6 +546,22 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) return IRQ_NONE; } + if (!pciehp_poll_mode) { + u16 ctrl; + + /* +* Check that the hotplug interrupt was enabled. It may +* be that the interrupt was meant for PME instead as +* they share the MSI vector. +*/ + pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, ); + if (ctrl == (u16) ~0 || !(ctrl & PCI_EXP_SLTCTL_HPIE)) { + if (parent) + pm_runtime_put(parent); + return IRQ_NONE; + } + } + /* * Slot Status contains plain status bits as well as event * notification bits; right now we only want the event bits.
RE: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
> -Original Message- > From: mika.westerb...@linux.intel.com > [mailto:mika.westerb...@linux.intel.com] > Sent: 13 November 2018 15:08 > To: Shameerali Kolothum Thodi > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Wangzhou (B) > ; Linuxarm ; Lukas > Wunner > Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue [...] > > Right. As I mentioned in my previous mail, I missed the fact that you are > updating > > the ctrl->slot_ctrl with cmd value while in my test I did my update with the > value > > returned by pcie_capability_read_word(). > > OK, I see. > > > > However, I think we are missing check for PCI_EXP_SLTCTL_CCIE in > > > pciehp_isr(). > > > > Ok. > > > > > Here's an updated patch, can you try and see if it makes any difference? > > > > I just tried this and it works. Thanks. > > Can you still check that the previous one (without _CCIE check) works? Yes, it works for me without _CCIE. > > See few comments below. > > > > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c > > > b/drivers/pci/hotplug/pciehp_hpc.c > > > index 7dd443aea5a5..da2cbe892444 100644 > > > --- a/drivers/pci/hotplug/pciehp_hpc.c > > > +++ b/drivers/pci/hotplug/pciehp_hpc.c > > > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller > *ctrl, > > > u16 cmd, > > > slot_ctrl |= (cmd & mask); > > > ctrl->cmd_busy = 1; > > > smp_mb(); > > > + ctrl->slot_ctrl = slot_ctrl; > > > > Does it make more sense if we can move this before smp_mb()?. Also I am > not > > sure updating the ctrl->slot_ctrl before actually the hardware is > programmed > > with that value will result in any other race conditions? TBH, I am not that > familiar > > with this code and I leave that to you :) > > Both are good questions :) > > For the moving ctrl->slot_ctrl before pcie_capability_write_word(), I > think we should be fine and this is actually more correct because if we > are unmasking interrupts they may trigger immediately making > pciehp_isr() find wrong values in ctrl->slot_ctrl (as can be seen in the > issue you reported). Ok. I was more concerned about an unsolicited event triggering the _isr while we are modifying the ctrl->slot_ctrl. But that's ok I think as the _isr reads the hw status anyway. > The smb_mb() thing is not that clear (at least to me) because it is used > in two places in the driver and both seem to be making write to > ctrl->cmd_busy visible to other CPUs but I don't see where we deal with > the read part. > > I may be missing something, though. I think the read part is in wait_event_timeout() which evaluates the condition. The wake_up is called from the pciehp_isr(). Since the flag is being updated in both process level and interrupt handler context, smp_mb() is used. I think the same now applies to ctrl->slot_ctrl now as this being used in process context and interrupt context as well. Thanks, Shameer
RE: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
> -Original Message- > From: mika.westerb...@linux.intel.com > [mailto:mika.westerb...@linux.intel.com] > Sent: 13 November 2018 15:08 > To: Shameerali Kolothum Thodi > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Wangzhou (B) > ; Linuxarm ; Lukas > Wunner > Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue [...] > > Right. As I mentioned in my previous mail, I missed the fact that you are > updating > > the ctrl->slot_ctrl with cmd value while in my test I did my update with the > value > > returned by pcie_capability_read_word(). > > OK, I see. > > > > However, I think we are missing check for PCI_EXP_SLTCTL_CCIE in > > > pciehp_isr(). > > > > Ok. > > > > > Here's an updated patch, can you try and see if it makes any difference? > > > > I just tried this and it works. Thanks. > > Can you still check that the previous one (without _CCIE check) works? Yes, it works for me without _CCIE. > > See few comments below. > > > > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c > > > b/drivers/pci/hotplug/pciehp_hpc.c > > > index 7dd443aea5a5..da2cbe892444 100644 > > > --- a/drivers/pci/hotplug/pciehp_hpc.c > > > +++ b/drivers/pci/hotplug/pciehp_hpc.c > > > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller > *ctrl, > > > u16 cmd, > > > slot_ctrl |= (cmd & mask); > > > ctrl->cmd_busy = 1; > > > smp_mb(); > > > + ctrl->slot_ctrl = slot_ctrl; > > > > Does it make more sense if we can move this before smp_mb()?. Also I am > not > > sure updating the ctrl->slot_ctrl before actually the hardware is > programmed > > with that value will result in any other race conditions? TBH, I am not that > familiar > > with this code and I leave that to you :) > > Both are good questions :) > > For the moving ctrl->slot_ctrl before pcie_capability_write_word(), I > think we should be fine and this is actually more correct because if we > are unmasking interrupts they may trigger immediately making > pciehp_isr() find wrong values in ctrl->slot_ctrl (as can be seen in the > issue you reported). Ok. I was more concerned about an unsolicited event triggering the _isr while we are modifying the ctrl->slot_ctrl. But that's ok I think as the _isr reads the hw status anyway. > The smb_mb() thing is not that clear (at least to me) because it is used > in two places in the driver and both seem to be making write to > ctrl->cmd_busy visible to other CPUs but I don't see where we deal with > the read part. > > I may be missing something, though. I think the read part is in wait_event_timeout() which evaluates the condition. The wake_up is called from the pciehp_isr(). Since the flag is being updated in both process level and interrupt handler context, smp_mb() is used. I think the same now applies to ctrl->slot_ctrl now as this being used in process context and interrupt context as well. Thanks, Shameer
Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
On Tue, Nov 13, 2018 at 01:28:04PM +, Shameerali Kolothum Thodi wrote: > > > -Original Message- > > From: mika.westerb...@linux.intel.com > > [mailto:mika.westerb...@linux.intel.com] > > Sent: 13 November 2018 12:59 > > To: Shameerali Kolothum Thodi > > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Wangzhou (B) > > ; Linuxarm ; Lukas > > Wunner > > Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue > > > > On Tue, Nov 13, 2018 at 12:36:20PM +, Shameerali Kolothum Thodi wrote: > > > > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller > > > > *ctrl, > > > > u16 cmd, > > > > slot_ctrl |= (cmd & mask); > > > > ctrl->cmd_busy = 1; > > > > smp_mb(); > > > > + ctrl->slot_ctrl = slot_ctrl; > > > > > > Actually I tried this one, but it doesn't help in this case as the > > > initial > > > pcie_capability_read_word() returns the slot_ctrl without > > > PCI_EXP_SLTCTL_HPIE bit set. It looks to me > > > pcie_enable_notification() function enables this, > > > > > > if (!pciehp_poll_mode) > > > cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE; > > > > > > I don't know this is as per the spec or not as the initial cap read > > > doesn't seems to have the PCI_EXP_SLTCTL_HPIE bit set. > > > > If I read the code right cmd value should end up in ctrl->slot_ctrl > > properly from > > pcie_enable_notification(). > > Right. As I mentioned in my previous mail, I missed the fact that you are > updating > the ctrl->slot_ctrl with cmd value while in my test I did my update with the > value > returned by pcie_capability_read_word(). OK, I see. > > However, I think we are missing check for PCI_EXP_SLTCTL_CCIE in > > pciehp_isr(). > > Ok. > > > Here's an updated patch, can you try and see if it makes any difference? > > I just tried this and it works. Thanks. Can you still check that the previous one (without _CCIE check) works? > See few comments below. > > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c > > b/drivers/pci/hotplug/pciehp_hpc.c > > index 7dd443aea5a5..da2cbe892444 100644 > > --- a/drivers/pci/hotplug/pciehp_hpc.c > > +++ b/drivers/pci/hotplug/pciehp_hpc.c > > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl, > > u16 cmd, > > slot_ctrl |= (cmd & mask); > > ctrl->cmd_busy = 1; > > smp_mb(); > > + ctrl->slot_ctrl = slot_ctrl; > > Does it make more sense if we can move this before smp_mb()?. Also I am not > sure updating the ctrl->slot_ctrl before actually the hardware is programmed > with that value will result in any other race conditions? TBH, I am not that > familiar > with this code and I leave that to you :) Both are good questions :) For the moving ctrl->slot_ctrl before pcie_capability_write_word(), I think we should be fine and this is actually more correct because if we are unmasking interrupts they may trigger immediately making pciehp_isr() find wrong values in ctrl->slot_ctrl (as can be seen in the issue you reported). The smb_mb() thing is not that clear (at least to me) because it is used in two places in the driver and both seem to be making write to ctrl->cmd_busy visible to other CPUs but I don't see where we deal with the read part. I may be missing something, though.
Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
On Tue, Nov 13, 2018 at 01:28:04PM +, Shameerali Kolothum Thodi wrote: > > > -Original Message- > > From: mika.westerb...@linux.intel.com > > [mailto:mika.westerb...@linux.intel.com] > > Sent: 13 November 2018 12:59 > > To: Shameerali Kolothum Thodi > > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Wangzhou (B) > > ; Linuxarm ; Lukas > > Wunner > > Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue > > > > On Tue, Nov 13, 2018 at 12:36:20PM +, Shameerali Kolothum Thodi wrote: > > > > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller > > > > *ctrl, > > > > u16 cmd, > > > > slot_ctrl |= (cmd & mask); > > > > ctrl->cmd_busy = 1; > > > > smp_mb(); > > > > + ctrl->slot_ctrl = slot_ctrl; > > > > > > Actually I tried this one, but it doesn't help in this case as the > > > initial > > > pcie_capability_read_word() returns the slot_ctrl without > > > PCI_EXP_SLTCTL_HPIE bit set. It looks to me > > > pcie_enable_notification() function enables this, > > > > > > if (!pciehp_poll_mode) > > > cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE; > > > > > > I don't know this is as per the spec or not as the initial cap read > > > doesn't seems to have the PCI_EXP_SLTCTL_HPIE bit set. > > > > If I read the code right cmd value should end up in ctrl->slot_ctrl > > properly from > > pcie_enable_notification(). > > Right. As I mentioned in my previous mail, I missed the fact that you are > updating > the ctrl->slot_ctrl with cmd value while in my test I did my update with the > value > returned by pcie_capability_read_word(). OK, I see. > > However, I think we are missing check for PCI_EXP_SLTCTL_CCIE in > > pciehp_isr(). > > Ok. > > > Here's an updated patch, can you try and see if it makes any difference? > > I just tried this and it works. Thanks. Can you still check that the previous one (without _CCIE check) works? > See few comments below. > > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c > > b/drivers/pci/hotplug/pciehp_hpc.c > > index 7dd443aea5a5..da2cbe892444 100644 > > --- a/drivers/pci/hotplug/pciehp_hpc.c > > +++ b/drivers/pci/hotplug/pciehp_hpc.c > > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl, > > u16 cmd, > > slot_ctrl |= (cmd & mask); > > ctrl->cmd_busy = 1; > > smp_mb(); > > + ctrl->slot_ctrl = slot_ctrl; > > Does it make more sense if we can move this before smp_mb()?. Also I am not > sure updating the ctrl->slot_ctrl before actually the hardware is programmed > with that value will result in any other race conditions? TBH, I am not that > familiar > with this code and I leave that to you :) Both are good questions :) For the moving ctrl->slot_ctrl before pcie_capability_write_word(), I think we should be fine and this is actually more correct because if we are unmasking interrupts they may trigger immediately making pciehp_isr() find wrong values in ctrl->slot_ctrl (as can be seen in the issue you reported). The smb_mb() thing is not that clear (at least to me) because it is used in two places in the driver and both seem to be making write to ctrl->cmd_busy visible to other CPUs but I don't see where we deal with the read part. I may be missing something, though.
RE: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
> -Original Message- > From: mika.westerb...@linux.intel.com > [mailto:mika.westerb...@linux.intel.com] > Sent: 13 November 2018 12:59 > To: Shameerali Kolothum Thodi > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Wangzhou (B) > ; Linuxarm ; Lukas > Wunner > Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue > > On Tue, Nov 13, 2018 at 12:36:20PM +, Shameerali Kolothum Thodi wrote: > > > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller > > > *ctrl, > > > u16 cmd, > > > slot_ctrl |= (cmd & mask); > > > ctrl->cmd_busy = 1; > > > smp_mb(); > > > + ctrl->slot_ctrl = slot_ctrl; > > > > Actually I tried this one, but it doesn't help in this case as the > > initial > > pcie_capability_read_word() returns the slot_ctrl without > > PCI_EXP_SLTCTL_HPIE bit set. It looks to me > > pcie_enable_notification() function enables this, > > > > if (!pciehp_poll_mode) > > cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE; > > > > I don't know this is as per the spec or not as the initial cap read > > doesn't seems to have the PCI_EXP_SLTCTL_HPIE bit set. > > If I read the code right cmd value should end up in ctrl->slot_ctrl properly > from > pcie_enable_notification(). Right. As I mentioned in my previous mail, I missed the fact that you are updating the ctrl->slot_ctrl with cmd value while in my test I did my update with the value returned by pcie_capability_read_word(). > However, I think we are missing check for PCI_EXP_SLTCTL_CCIE in > pciehp_isr(). Ok. > Here's an updated patch, can you try and see if it makes any difference? I just tried this and it works. Thanks. See few comments below. > diff --git a/drivers/pci/hotplug/pciehp_hpc.c > b/drivers/pci/hotplug/pciehp_hpc.c > index 7dd443aea5a5..da2cbe892444 100644 > --- a/drivers/pci/hotplug/pciehp_hpc.c > +++ b/drivers/pci/hotplug/pciehp_hpc.c > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl, > u16 cmd, > slot_ctrl |= (cmd & mask); > ctrl->cmd_busy = 1; > smp_mb(); > + ctrl->slot_ctrl = slot_ctrl; Does it make more sense if we can move this before smp_mb()?. Also I am not sure updating the ctrl->slot_ctrl before actually the hardware is programmed with that value will result in any other race conditions? TBH, I am not that familiar with this code and I leave that to you :) Thanks, Shameer > pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl); > ctrl->cmd_started = jiffies; > - ctrl->slot_ctrl = slot_ctrl; > > /* >* Controllers with the Intel CF118 and similar errata advertise @@ - > 522,7 +522,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) >* in the Slot Control register (PCIe r4.0, sec 6.7.3.4). >*/ > if (pdev->current_state == PCI_D3cold || > - (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode)) > + (!(ctrl->slot_ctrl & (PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE)) > +&& !pciehp_poll_mode)) > return IRQ_NONE; > > /*
RE: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
> -Original Message- > From: mika.westerb...@linux.intel.com > [mailto:mika.westerb...@linux.intel.com] > Sent: 13 November 2018 12:59 > To: Shameerali Kolothum Thodi > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Wangzhou (B) > ; Linuxarm ; Lukas > Wunner > Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue > > On Tue, Nov 13, 2018 at 12:36:20PM +, Shameerali Kolothum Thodi wrote: > > > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller > > > *ctrl, > > > u16 cmd, > > > slot_ctrl |= (cmd & mask); > > > ctrl->cmd_busy = 1; > > > smp_mb(); > > > + ctrl->slot_ctrl = slot_ctrl; > > > > Actually I tried this one, but it doesn't help in this case as the > > initial > > pcie_capability_read_word() returns the slot_ctrl without > > PCI_EXP_SLTCTL_HPIE bit set. It looks to me > > pcie_enable_notification() function enables this, > > > > if (!pciehp_poll_mode) > > cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE; > > > > I don't know this is as per the spec or not as the initial cap read > > doesn't seems to have the PCI_EXP_SLTCTL_HPIE bit set. > > If I read the code right cmd value should end up in ctrl->slot_ctrl properly > from > pcie_enable_notification(). Right. As I mentioned in my previous mail, I missed the fact that you are updating the ctrl->slot_ctrl with cmd value while in my test I did my update with the value returned by pcie_capability_read_word(). > However, I think we are missing check for PCI_EXP_SLTCTL_CCIE in > pciehp_isr(). Ok. > Here's an updated patch, can you try and see if it makes any difference? I just tried this and it works. Thanks. See few comments below. > diff --git a/drivers/pci/hotplug/pciehp_hpc.c > b/drivers/pci/hotplug/pciehp_hpc.c > index 7dd443aea5a5..da2cbe892444 100644 > --- a/drivers/pci/hotplug/pciehp_hpc.c > +++ b/drivers/pci/hotplug/pciehp_hpc.c > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl, > u16 cmd, > slot_ctrl |= (cmd & mask); > ctrl->cmd_busy = 1; > smp_mb(); > + ctrl->slot_ctrl = slot_ctrl; Does it make more sense if we can move this before smp_mb()?. Also I am not sure updating the ctrl->slot_ctrl before actually the hardware is programmed with that value will result in any other race conditions? TBH, I am not that familiar with this code and I leave that to you :) Thanks, Shameer > pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl); > ctrl->cmd_started = jiffies; > - ctrl->slot_ctrl = slot_ctrl; > > /* >* Controllers with the Intel CF118 and similar errata advertise @@ - > 522,7 +522,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) >* in the Slot Control register (PCIe r4.0, sec 6.7.3.4). >*/ > if (pdev->current_state == PCI_D3cold || > - (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode)) > + (!(ctrl->slot_ctrl & (PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE)) > +&& !pciehp_poll_mode)) > return IRQ_NONE; > > /*
Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
On Tue, Nov 13, 2018 at 12:36:20PM +, Shameerali Kolothum Thodi wrote: > > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl, > > u16 cmd, > > slot_ctrl |= (cmd & mask); > > ctrl->cmd_busy = 1; > > smp_mb(); > > + ctrl->slot_ctrl = slot_ctrl; > > Actually I tried this one, but it doesn't help in this case as the initial > pcie_capability_read_word() returns the slot_ctrl without PCI_EXP_SLTCTL_HPIE > bit set. It looks to me pcie_enable_notification() function enables this, > > if (!pciehp_poll_mode) > cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE; > > I don't know this is as per the spec or not as the initial cap read doesn't > seems to > have the PCI_EXP_SLTCTL_HPIE bit set. If I read the code right cmd value should end up in ctrl->slot_ctrl properly from pcie_enable_notification(). However, I think we are missing check for PCI_EXP_SLTCTL_CCIE in pciehp_isr(). Here's an updated patch, can you try and see if it makes any difference? diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 7dd443aea5a5..da2cbe892444 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl, u16 cmd, slot_ctrl |= (cmd & mask); ctrl->cmd_busy = 1; smp_mb(); + ctrl->slot_ctrl = slot_ctrl; pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl); ctrl->cmd_started = jiffies; - ctrl->slot_ctrl = slot_ctrl; /* * Controllers with the Intel CF118 and similar errata advertise @@ -522,7 +522,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) * in the Slot Control register (PCIe r4.0, sec 6.7.3.4). */ if (pdev->current_state == PCI_D3cold || - (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode)) + (!(ctrl->slot_ctrl & (PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE)) && !pciehp_poll_mode)) return IRQ_NONE; /*
Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
On Tue, Nov 13, 2018 at 12:36:20PM +, Shameerali Kolothum Thodi wrote: > > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl, > > u16 cmd, > > slot_ctrl |= (cmd & mask); > > ctrl->cmd_busy = 1; > > smp_mb(); > > + ctrl->slot_ctrl = slot_ctrl; > > Actually I tried this one, but it doesn't help in this case as the initial > pcie_capability_read_word() returns the slot_ctrl without PCI_EXP_SLTCTL_HPIE > bit set. It looks to me pcie_enable_notification() function enables this, > > if (!pciehp_poll_mode) > cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE; > > I don't know this is as per the spec or not as the initial cap read doesn't > seems to > have the PCI_EXP_SLTCTL_HPIE bit set. If I read the code right cmd value should end up in ctrl->slot_ctrl properly from pcie_enable_notification(). However, I think we are missing check for PCI_EXP_SLTCTL_CCIE in pciehp_isr(). Here's an updated patch, can you try and see if it makes any difference? diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 7dd443aea5a5..da2cbe892444 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl, u16 cmd, slot_ctrl |= (cmd & mask); ctrl->cmd_busy = 1; smp_mb(); + ctrl->slot_ctrl = slot_ctrl; pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl); ctrl->cmd_started = jiffies; - ctrl->slot_ctrl = slot_ctrl; /* * Controllers with the Intel CF118 and similar errata advertise @@ -522,7 +522,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) * in the Slot Control register (PCIe r4.0, sec 6.7.3.4). */ if (pdev->current_state == PCI_D3cold || - (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode)) + (!(ctrl->slot_ctrl & (PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE)) && !pciehp_poll_mode)) return IRQ_NONE; /*
RE: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
> -Original Message- > From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of > Shameerali Kolothum Thodi > Sent: 13 November 2018 12:36 > To: mika.westerb...@linux.intel.com > Cc: linux-...@vger.kernel.org; Lukas Wunner ; linux- > ker...@vger.kernel.org; Linuxarm > Subject: RE: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue > > > > > -Original Message- > > From: mika.westerb...@linux.intel.com > > [mailto:mika.westerb...@linux.intel.com] > > Sent: 13 November 2018 12:25 > > To: Shameerali Kolothum Thodi > > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Wangzhou (B) > > ; Linuxarm ; Lukas > > Wunner > > Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue > > > > +Lukas > > > > On Tue, Nov 13, 2018 at 11:45:42AM +, Shameerali Kolothum Thodi > wrote: > > > Hi Mika, > > > > Hi, > > > > > Since the commit commit 720d6a671a6e("PCI: pciehp: Do not handle events > > > if interrupts are masked"), the hotplug support on Qemu Guest(4.120-rc1) > > > with a vfio passthrough device seems to be broken. This is on an ARM64 > > platform. > > > > > > I am booting a Guest with below command line options with the intention of > > > hot add a ixgbevf dev later, > > > > > > ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu > > host \ > > > -kernel Image_4.20-rc1 \ > > > -initrd rootfs-iperf.cpio \ > > > -device ioh3420,id=rp1 \ > > > -net none \ > > > -m 4096 \ > > > -nographic -D -d -enable-kvm \ > > > -append "console=ttyAMA0 root=/dev/vda -m 4096 rw > > pciehp.pciehp_debug=1 > > > pcie_ports=native searlycon=pl011,0x900" > > > > > > But receives this on boot, > > > > > > [1.327852] pciehp :00:01.0:pcie004: Timeout > > > on hotplug command 0x03f1 (issued 1016 msec ago) > > > [1.335842] pciehp :00:01.0:pcie004: Timeout on hotplug command > > > 0x03f1 (issued 1024 msec ago) > > > [3.847843] pciehp :00:01.0:pcie004: Failed to check link status > > > [3.855843] pciehp :00:01.0:pcie004: Timeout on hotplug command > > > 0x02f1 (issued 2520 msec ago) > > > [4.879846] pciehp :00:01.0:pcie004: Timeout on hotplug command > > > 0x06f1 (issued 1024 msec ago) > > > [5.911840] pciehp :00:01.0:pcie004: Timeout on hotplug command > > > 0x06f1 (issued 2056 msec ago) > > > [6.927844] pciehp :00:01.0:pcie004: Timeout on hotplug command > > > 0x07f1 (issued 1016 msec ago) > > > [7.951843] pciehp :00:01.0:pcie004: Timeout on hotplug command > > > 0x0771 (issued 1024 msec ago) > > > > > > Trying to hot add using "device_addvfio- > > pci,host=:01:10.1,id=net0,bus=rp1" > > > doesn't work either. And if I boot the guest with an assigned device > > > (-device vfio-pci,host=:01:10.1,id=net0,bus=rp1), I can see the dev > listed > > in > > > the Guest but then hot remove doesn't work. > > > > > > This all works on 4.19 and bisect points to the above mentioned commit, > > where an > > > additional check is added in pciehp_isr(), > > > > > > - * Interrupts only occur in D3hot or shallower (PCIe r4.0, sec 6.7.3.4). > > > + * Interrupts only occur in D3hot or shallower and only if enabled > > > + * in the Slot Control register (PCIe r4.0, sec 6.7.3.4). > > >*/ > > > - if (pdev->current_state == PCI_D3cold) > > > + if (pdev->current_state == PCI_D3cold || > > > + (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode)) > > > return IRQ_NONE; > > > > > > I think this doesn't work for the first time, where the cmd with > > PCI_EXP_SLTCTL_HPIE bit set > > > is written, > > > pciehp_probe() > > > pcie_init_notification() > > > pcie_enable_notification() > > >pcie_do_write_cmd() > > > > > > to begin with, ctrl->slot_ctrl = 0 in pciehp_isr() as this is only set > > > once the > > write > > > is returned. > > > > > > Or else I am missing something here. Please take a look and let me know. > > > > Thanks for the detailed report and analysis. I think you are right and > > the ctrl->slot_ctrl is only set after the actual value has been > > programmed to the hardware, so if there was interrupt "pending" it will >
RE: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
> -Original Message- > From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of > Shameerali Kolothum Thodi > Sent: 13 November 2018 12:36 > To: mika.westerb...@linux.intel.com > Cc: linux-...@vger.kernel.org; Lukas Wunner ; linux- > ker...@vger.kernel.org; Linuxarm > Subject: RE: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue > > > > > -Original Message- > > From: mika.westerb...@linux.intel.com > > [mailto:mika.westerb...@linux.intel.com] > > Sent: 13 November 2018 12:25 > > To: Shameerali Kolothum Thodi > > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Wangzhou (B) > > ; Linuxarm ; Lukas > > Wunner > > Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue > > > > +Lukas > > > > On Tue, Nov 13, 2018 at 11:45:42AM +, Shameerali Kolothum Thodi > wrote: > > > Hi Mika, > > > > Hi, > > > > > Since the commit commit 720d6a671a6e("PCI: pciehp: Do not handle events > > > if interrupts are masked"), the hotplug support on Qemu Guest(4.120-rc1) > > > with a vfio passthrough device seems to be broken. This is on an ARM64 > > platform. > > > > > > I am booting a Guest with below command line options with the intention of > > > hot add a ixgbevf dev later, > > > > > > ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu > > host \ > > > -kernel Image_4.20-rc1 \ > > > -initrd rootfs-iperf.cpio \ > > > -device ioh3420,id=rp1 \ > > > -net none \ > > > -m 4096 \ > > > -nographic -D -d -enable-kvm \ > > > -append "console=ttyAMA0 root=/dev/vda -m 4096 rw > > pciehp.pciehp_debug=1 > > > pcie_ports=native searlycon=pl011,0x900" > > > > > > But receives this on boot, > > > > > > [1.327852] pciehp :00:01.0:pcie004: Timeout > > > on hotplug command 0x03f1 (issued 1016 msec ago) > > > [1.335842] pciehp :00:01.0:pcie004: Timeout on hotplug command > > > 0x03f1 (issued 1024 msec ago) > > > [3.847843] pciehp :00:01.0:pcie004: Failed to check link status > > > [3.855843] pciehp :00:01.0:pcie004: Timeout on hotplug command > > > 0x02f1 (issued 2520 msec ago) > > > [4.879846] pciehp :00:01.0:pcie004: Timeout on hotplug command > > > 0x06f1 (issued 1024 msec ago) > > > [5.911840] pciehp :00:01.0:pcie004: Timeout on hotplug command > > > 0x06f1 (issued 2056 msec ago) > > > [6.927844] pciehp :00:01.0:pcie004: Timeout on hotplug command > > > 0x07f1 (issued 1016 msec ago) > > > [7.951843] pciehp :00:01.0:pcie004: Timeout on hotplug command > > > 0x0771 (issued 1024 msec ago) > > > > > > Trying to hot add using "device_addvfio- > > pci,host=:01:10.1,id=net0,bus=rp1" > > > doesn't work either. And if I boot the guest with an assigned device > > > (-device vfio-pci,host=:01:10.1,id=net0,bus=rp1), I can see the dev > listed > > in > > > the Guest but then hot remove doesn't work. > > > > > > This all works on 4.19 and bisect points to the above mentioned commit, > > where an > > > additional check is added in pciehp_isr(), > > > > > > - * Interrupts only occur in D3hot or shallower (PCIe r4.0, sec 6.7.3.4). > > > + * Interrupts only occur in D3hot or shallower and only if enabled > > > + * in the Slot Control register (PCIe r4.0, sec 6.7.3.4). > > >*/ > > > - if (pdev->current_state == PCI_D3cold) > > > + if (pdev->current_state == PCI_D3cold || > > > + (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode)) > > > return IRQ_NONE; > > > > > > I think this doesn't work for the first time, where the cmd with > > PCI_EXP_SLTCTL_HPIE bit set > > > is written, > > > pciehp_probe() > > > pcie_init_notification() > > > pcie_enable_notification() > > >pcie_do_write_cmd() > > > > > > to begin with, ctrl->slot_ctrl = 0 in pciehp_isr() as this is only set > > > once the > > write > > > is returned. > > > > > > Or else I am missing something here. Please take a look and let me know. > > > > Thanks for the detailed report and analysis. I think you are right and > > the ctrl->slot_ctrl is only set after the actual value has been > > programmed to the hardware, so if there was interrupt "pending" it will >
RE: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
> -Original Message- > From: mika.westerb...@linux.intel.com > [mailto:mika.westerb...@linux.intel.com] > Sent: 13 November 2018 12:25 > To: Shameerali Kolothum Thodi > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Wangzhou (B) > ; Linuxarm ; Lukas > Wunner > Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue > > +Lukas > > On Tue, Nov 13, 2018 at 11:45:42AM +, Shameerali Kolothum Thodi wrote: > > Hi Mika, > > Hi, > > > Since the commit commit 720d6a671a6e("PCI: pciehp: Do not handle events > > if interrupts are masked"), the hotplug support on Qemu Guest(4.120-rc1) > > with a vfio passthrough device seems to be broken. This is on an ARM64 > platform. > > > > I am booting a Guest with below command line options with the intention of > > hot add a ixgbevf dev later, > > > > ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu > host \ > > -kernel Image_4.20-rc1 \ > > -initrd rootfs-iperf.cpio \ > > -device ioh3420,id=rp1 \ > > -net none \ > > -m 4096 \ > > -nographic -D -d -enable-kvm \ > > -append "console=ttyAMA0 root=/dev/vda -m 4096 rw > pciehp.pciehp_debug=1 > > pcie_ports=native searlycon=pl011,0x900" > > > > But receives this on boot, > > > > [1.327852] pciehp :00:01.0:pcie004: Timeout > > on hotplug command 0x03f1 (issued 1016 msec ago) > > [1.335842] pciehp :00:01.0:pcie004: Timeout on hotplug command > > 0x03f1 (issued 1024 msec ago) > > [3.847843] pciehp :00:01.0:pcie004: Failed to check link status > > [3.855843] pciehp :00:01.0:pcie004: Timeout on hotplug command > > 0x02f1 (issued 2520 msec ago) > > [4.879846] pciehp :00:01.0:pcie004: Timeout on hotplug command > > 0x06f1 (issued 1024 msec ago) > > [5.911840] pciehp :00:01.0:pcie004: Timeout on hotplug command > > 0x06f1 (issued 2056 msec ago) > > [6.927844] pciehp :00:01.0:pcie004: Timeout on hotplug command > > 0x07f1 (issued 1016 msec ago) > > [7.951843] pciehp :00:01.0:pcie004: Timeout on hotplug command > > 0x0771 (issued 1024 msec ago) > > > > Trying to hot add using "device_addvfio- > pci,host=:01:10.1,id=net0,bus=rp1" > > doesn't work either. And if I boot the guest with an assigned device > > (-device vfio-pci,host=:01:10.1,id=net0,bus=rp1), I can see the dev > > listed > in > > the Guest but then hot remove doesn't work. > > > > This all works on 4.19 and bisect points to the above mentioned commit, > where an > > additional check is added in pciehp_isr(), > > > > -* Interrupts only occur in D3hot or shallower (PCIe r4.0, sec 6.7.3.4). > > +* Interrupts only occur in D3hot or shallower and only if enabled > > +* in the Slot Control register (PCIe r4.0, sec 6.7.3.4). > > */ > > - if (pdev->current_state == PCI_D3cold) > > + if (pdev->current_state == PCI_D3cold || > > + (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode)) > > return IRQ_NONE; > > > > I think this doesn't work for the first time, where the cmd with > PCI_EXP_SLTCTL_HPIE bit set > > is written, > > pciehp_probe() > > pcie_init_notification() > > pcie_enable_notification() > >pcie_do_write_cmd() > > > > to begin with, ctrl->slot_ctrl = 0 in pciehp_isr() as this is only set once > > the > write > > is returned. > > > > Or else I am missing something here. Please take a look and let me know. > > Thanks for the detailed report and analysis. I think you are right and > the ctrl->slot_ctrl is only set after the actual value has been > programmed to the hardware, so if there was interrupt "pending" it will > trigger immediately (just to find ctrl->slot_ctrl == 0). > > I wonder if the following change helps here? > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c > b/drivers/pci/hotplug/pciehp_hpc.c > index 7dd443aea5a5..cd9eae650aa5 100644 > --- a/drivers/pci/hotplug/pciehp_hpc.c > +++ b/drivers/pci/hotplug/pciehp_hpc.c > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl, > u16 cmd, > slot_ctrl |= (cmd & mask); > ctrl->cmd_busy = 1; > smp_mb(); > + ctrl->slot_ctrl = slot_ctrl; Actually I tried this one, but it doesn't help in this case as the initial pcie_capability_read_word() returns the slot_ctrl without PCI_EXP_SLTCTL_HPIE bit set. It looks to me pcie_enable_notification() function enables this, if (!pciehp_poll_mode) cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE; I don't know this is as per the spec or not as the initial cap read doesn't seems to have the PCI_EXP_SLTCTL_HPIE bit set. Thanks, Shameer > pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl); > ctrl->cmd_started = jiffies; > - ctrl->slot_ctrl = slot_ctrl; > > /* >* Controllers with the Intel CF118 and similar errata advertise
RE: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
> -Original Message- > From: mika.westerb...@linux.intel.com > [mailto:mika.westerb...@linux.intel.com] > Sent: 13 November 2018 12:25 > To: Shameerali Kolothum Thodi > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Wangzhou (B) > ; Linuxarm ; Lukas > Wunner > Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue > > +Lukas > > On Tue, Nov 13, 2018 at 11:45:42AM +, Shameerali Kolothum Thodi wrote: > > Hi Mika, > > Hi, > > > Since the commit commit 720d6a671a6e("PCI: pciehp: Do not handle events > > if interrupts are masked"), the hotplug support on Qemu Guest(4.120-rc1) > > with a vfio passthrough device seems to be broken. This is on an ARM64 > platform. > > > > I am booting a Guest with below command line options with the intention of > > hot add a ixgbevf dev later, > > > > ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu > host \ > > -kernel Image_4.20-rc1 \ > > -initrd rootfs-iperf.cpio \ > > -device ioh3420,id=rp1 \ > > -net none \ > > -m 4096 \ > > -nographic -D -d -enable-kvm \ > > -append "console=ttyAMA0 root=/dev/vda -m 4096 rw > pciehp.pciehp_debug=1 > > pcie_ports=native searlycon=pl011,0x900" > > > > But receives this on boot, > > > > [1.327852] pciehp :00:01.0:pcie004: Timeout > > on hotplug command 0x03f1 (issued 1016 msec ago) > > [1.335842] pciehp :00:01.0:pcie004: Timeout on hotplug command > > 0x03f1 (issued 1024 msec ago) > > [3.847843] pciehp :00:01.0:pcie004: Failed to check link status > > [3.855843] pciehp :00:01.0:pcie004: Timeout on hotplug command > > 0x02f1 (issued 2520 msec ago) > > [4.879846] pciehp :00:01.0:pcie004: Timeout on hotplug command > > 0x06f1 (issued 1024 msec ago) > > [5.911840] pciehp :00:01.0:pcie004: Timeout on hotplug command > > 0x06f1 (issued 2056 msec ago) > > [6.927844] pciehp :00:01.0:pcie004: Timeout on hotplug command > > 0x07f1 (issued 1016 msec ago) > > [7.951843] pciehp :00:01.0:pcie004: Timeout on hotplug command > > 0x0771 (issued 1024 msec ago) > > > > Trying to hot add using "device_addvfio- > pci,host=:01:10.1,id=net0,bus=rp1" > > doesn't work either. And if I boot the guest with an assigned device > > (-device vfio-pci,host=:01:10.1,id=net0,bus=rp1), I can see the dev > > listed > in > > the Guest but then hot remove doesn't work. > > > > This all works on 4.19 and bisect points to the above mentioned commit, > where an > > additional check is added in pciehp_isr(), > > > > -* Interrupts only occur in D3hot or shallower (PCIe r4.0, sec 6.7.3.4). > > +* Interrupts only occur in D3hot or shallower and only if enabled > > +* in the Slot Control register (PCIe r4.0, sec 6.7.3.4). > > */ > > - if (pdev->current_state == PCI_D3cold) > > + if (pdev->current_state == PCI_D3cold || > > + (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode)) > > return IRQ_NONE; > > > > I think this doesn't work for the first time, where the cmd with > PCI_EXP_SLTCTL_HPIE bit set > > is written, > > pciehp_probe() > > pcie_init_notification() > > pcie_enable_notification() > >pcie_do_write_cmd() > > > > to begin with, ctrl->slot_ctrl = 0 in pciehp_isr() as this is only set once > > the > write > > is returned. > > > > Or else I am missing something here. Please take a look and let me know. > > Thanks for the detailed report and analysis. I think you are right and > the ctrl->slot_ctrl is only set after the actual value has been > programmed to the hardware, so if there was interrupt "pending" it will > trigger immediately (just to find ctrl->slot_ctrl == 0). > > I wonder if the following change helps here? > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c > b/drivers/pci/hotplug/pciehp_hpc.c > index 7dd443aea5a5..cd9eae650aa5 100644 > --- a/drivers/pci/hotplug/pciehp_hpc.c > +++ b/drivers/pci/hotplug/pciehp_hpc.c > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl, > u16 cmd, > slot_ctrl |= (cmd & mask); > ctrl->cmd_busy = 1; > smp_mb(); > + ctrl->slot_ctrl = slot_ctrl; Actually I tried this one, but it doesn't help in this case as the initial pcie_capability_read_word() returns the slot_ctrl without PCI_EXP_SLTCTL_HPIE bit set. It looks to me pcie_enable_notification() function enables this, if (!pciehp_poll_mode) cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE; I don't know this is as per the spec or not as the initial cap read doesn't seems to have the PCI_EXP_SLTCTL_HPIE bit set. Thanks, Shameer > pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl); > ctrl->cmd_started = jiffies; > - ctrl->slot_ctrl = slot_ctrl; > > /* >* Controllers with the Intel CF118 and similar errata advertise
Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
+Lukas On Tue, Nov 13, 2018 at 11:45:42AM +, Shameerali Kolothum Thodi wrote: > Hi Mika, Hi, > Since the commit commit 720d6a671a6e("PCI: pciehp: Do not handle events > if interrupts are masked"), the hotplug support on Qemu Guest(4.120-rc1) > with a vfio passthrough device seems to be broken. This is on an ARM64 > platform. > > I am booting a Guest with below command line options with the intention of > hot add a ixgbevf dev later, > > ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu host > \ > -kernel Image_4.20-rc1 \ > -initrd rootfs-iperf.cpio \ > -device ioh3420,id=rp1 \ > -net none \ > -m 4096 \ > -nographic -D -d -enable-kvm \ > -append "console=ttyAMA0 root=/dev/vda -m 4096 rw pciehp.pciehp_debug=1 > pcie_ports=native searlycon=pl011,0x900" > > But receives this on boot, > > [1.327852] pciehp :00:01.0:pcie004: Timeout > on hotplug command 0x03f1 (issued 1016 msec ago) > [1.335842] pciehp :00:01.0:pcie004: Timeout on hotplug command > 0x03f1 (issued 1024 msec ago) > [3.847843] pciehp :00:01.0:pcie004: Failed to check link status > [3.855843] pciehp :00:01.0:pcie004: Timeout on hotplug command > 0x02f1 (issued 2520 msec ago) > [4.879846] pciehp :00:01.0:pcie004: Timeout on hotplug command > 0x06f1 (issued 1024 msec ago) > [5.911840] pciehp :00:01.0:pcie004: Timeout on hotplug command > 0x06f1 (issued 2056 msec ago) > [6.927844] pciehp :00:01.0:pcie004: Timeout on hotplug command > 0x07f1 (issued 1016 msec ago) > [7.951843] pciehp :00:01.0:pcie004: Timeout on hotplug command > 0x0771 (issued 1024 msec ago) > > Trying to hot add using "device_addvfio-pci,host=:01:10.1,id=net0,bus=rp1" > doesn't work either. And if I boot the guest with an assigned device > (-device vfio-pci,host=:01:10.1,id=net0,bus=rp1), I can see the dev > listed in > the Guest but then hot remove doesn't work. > > This all works on 4.19 and bisect points to the above mentioned commit, where > an > additional check is added in pciehp_isr(), > > - * Interrupts only occur in D3hot or shallower (PCIe r4.0, sec 6.7.3.4). > + * Interrupts only occur in D3hot or shallower and only if enabled > + * in the Slot Control register (PCIe r4.0, sec 6.7.3.4). >*/ > - if (pdev->current_state == PCI_D3cold) > + if (pdev->current_state == PCI_D3cold || > + (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode)) > return IRQ_NONE; > > I think this doesn't work for the first time, where the cmd with > PCI_EXP_SLTCTL_HPIE bit set > is written, > pciehp_probe() > pcie_init_notification() > pcie_enable_notification() >pcie_do_write_cmd() > > to begin with, ctrl->slot_ctrl = 0 in pciehp_isr() as this is only set once > the write > is returned. > > Or else I am missing something here. Please take a look and let me know. Thanks for the detailed report and analysis. I think you are right and the ctrl->slot_ctrl is only set after the actual value has been programmed to the hardware, so if there was interrupt "pending" it will trigger immediately (just to find ctrl->slot_ctrl == 0). I wonder if the following change helps here? diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 7dd443aea5a5..cd9eae650aa5 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl, u16 cmd, slot_ctrl |= (cmd & mask); ctrl->cmd_busy = 1; smp_mb(); + ctrl->slot_ctrl = slot_ctrl; pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl); ctrl->cmd_started = jiffies; - ctrl->slot_ctrl = slot_ctrl; /* * Controllers with the Intel CF118 and similar errata advertise
Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
+Lukas On Tue, Nov 13, 2018 at 11:45:42AM +, Shameerali Kolothum Thodi wrote: > Hi Mika, Hi, > Since the commit commit 720d6a671a6e("PCI: pciehp: Do not handle events > if interrupts are masked"), the hotplug support on Qemu Guest(4.120-rc1) > with a vfio passthrough device seems to be broken. This is on an ARM64 > platform. > > I am booting a Guest with below command line options with the intention of > hot add a ixgbevf dev later, > > ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu host > \ > -kernel Image_4.20-rc1 \ > -initrd rootfs-iperf.cpio \ > -device ioh3420,id=rp1 \ > -net none \ > -m 4096 \ > -nographic -D -d -enable-kvm \ > -append "console=ttyAMA0 root=/dev/vda -m 4096 rw pciehp.pciehp_debug=1 > pcie_ports=native searlycon=pl011,0x900" > > But receives this on boot, > > [1.327852] pciehp :00:01.0:pcie004: Timeout > on hotplug command 0x03f1 (issued 1016 msec ago) > [1.335842] pciehp :00:01.0:pcie004: Timeout on hotplug command > 0x03f1 (issued 1024 msec ago) > [3.847843] pciehp :00:01.0:pcie004: Failed to check link status > [3.855843] pciehp :00:01.0:pcie004: Timeout on hotplug command > 0x02f1 (issued 2520 msec ago) > [4.879846] pciehp :00:01.0:pcie004: Timeout on hotplug command > 0x06f1 (issued 1024 msec ago) > [5.911840] pciehp :00:01.0:pcie004: Timeout on hotplug command > 0x06f1 (issued 2056 msec ago) > [6.927844] pciehp :00:01.0:pcie004: Timeout on hotplug command > 0x07f1 (issued 1016 msec ago) > [7.951843] pciehp :00:01.0:pcie004: Timeout on hotplug command > 0x0771 (issued 1024 msec ago) > > Trying to hot add using "device_addvfio-pci,host=:01:10.1,id=net0,bus=rp1" > doesn't work either. And if I boot the guest with an assigned device > (-device vfio-pci,host=:01:10.1,id=net0,bus=rp1), I can see the dev > listed in > the Guest but then hot remove doesn't work. > > This all works on 4.19 and bisect points to the above mentioned commit, where > an > additional check is added in pciehp_isr(), > > - * Interrupts only occur in D3hot or shallower (PCIe r4.0, sec 6.7.3.4). > + * Interrupts only occur in D3hot or shallower and only if enabled > + * in the Slot Control register (PCIe r4.0, sec 6.7.3.4). >*/ > - if (pdev->current_state == PCI_D3cold) > + if (pdev->current_state == PCI_D3cold || > + (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode)) > return IRQ_NONE; > > I think this doesn't work for the first time, where the cmd with > PCI_EXP_SLTCTL_HPIE bit set > is written, > pciehp_probe() > pcie_init_notification() > pcie_enable_notification() >pcie_do_write_cmd() > > to begin with, ctrl->slot_ctrl = 0 in pciehp_isr() as this is only set once > the write > is returned. > > Or else I am missing something here. Please take a look and let me know. Thanks for the detailed report and analysis. I think you are right and the ctrl->slot_ctrl is only set after the actual value has been programmed to the hardware, so if there was interrupt "pending" it will trigger immediately (just to find ctrl->slot_ctrl == 0). I wonder if the following change helps here? diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 7dd443aea5a5..cd9eae650aa5 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl, u16 cmd, slot_ctrl |= (cmd & mask); ctrl->cmd_busy = 1; smp_mb(); + ctrl->slot_ctrl = slot_ctrl; pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl); ctrl->cmd_started = jiffies; - ctrl->slot_ctrl = slot_ctrl; /* * Controllers with the Intel CF118 and similar errata advertise