RE: [PATCH] PCI: Probe bridge window attributes once at enumeration-time
> On Tue, Jan 29, 2019 at 05:02:26PM -0600, Bjorn Helgaas wrote: > > On Tue, Jan 29, 2019 at 05:47:32PM -0500, Michael S. Tsirkin wrote: > > > On Tue, Jan 29, 2019 at 04:43:33PM -0600, Bjorn Helgaas wrote: > > > > On Tue, Jan 22, 2019 at 01:02:54PM -0600, Bjorn Helgaas wrote: > > > > > From: Bjorn Helgaas > > > > > > > > > > pci_bridge_check_ranges() determines whether a bridge supports > > > > > the optional I/O and prefetchable memory windows and sets the > > > > > flag bits in the bridge resources. This *could* be done once > > > > > during enumeration except that the resource allocation code > > > > > completely clears the flag bits, e.g., in the > > > > > pci_assign_unassigned_bridge_resources() path. > > > > > > > > > > The problem with pci_bridge_check_ranges() in the resource > > > > > allocation path is that we may allocate resources after devices > > > > > have been claimed by drivers, and pci_bridge_check_ranges() > > > > > *changes* the window registers to determine whether they're > > > > > writable. This may break concurrent accesses to devices behind the > bridge. > > > > > > > > > > Add a new pci_read_bridge_windows() to determine whether a > > > > > bridge supports the optional windows, call it once during > > > > > enumeration, remember the results, and change > > > > > pci_bridge_check_ranges() so it doesn't touch the bridge windows but > sets the flag bits based on those remembered results. > > > > > > > > > > Link: > > > > > https://lore.kernel.org/linux-pci/1506151482-113560-1-git-send-e > > > > > mail-wangzh...@hisilicon.com > > > > > Link: > > > > > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.h > > > > > tml > > > > > Reported-by: xuyandong > > > > > Tested-by: xuyandong > > > > > Cc: Sagi Grimberg > > > > > Cc: Ofer Hayut > > > > > Cc: Roy Shterman > > > > > Cc: Keith Busch > > > > > Cc: Zhou Wang > > > > > > > > Applied to pci/enumeration for v5.1. > > > > > > > > This is fairly simple in concept, but doesn't meet the letter of > > > > the restrictions in Documentation/process/stable-kernel-rules.rst, > > > > so I didn't tag it for stable. > > > > > > > > Is there a compelling argument to mark it for stable? > > > > > > Well it's needed downstream. > > > It's a bit above 100 lines with context. Is this what you mean? > > > > Yep, I should have been explicit about that. > > > > > If yes how about using my patch for stable as an alternative? > > > The rules allow for equivalent patches. > > > > Well, yes, that would be a possibility. > > > > I would rather develop an argument for marking *this* patch for > > stable. Even though it is technically too large, I think we could > > make it work if we have compelling reasons. > > > > Those would need to be fleshed out a little more than "needed > > downstream" -- something about the scenarios where this happens, what > > the serious problem is, etc. > > > > The *ideal* thing would be to have an actual bugzilla.kernel.org > > report of the problem with a way to reproduce it and dmesg > > illustrating the problem. > > xuyandong would you like to take a stub at creating the bugzilla entry? > If not let me know and I'll do it. > Thanks! > Thanks, but I do not want to create the bugzilla entry. > > > > > > --- > > > > > drivers/pci/probe.c | 52 > +++ > > > > > drivers/pci/setup-bus.c | 45 > > > > > - > > > > > include/linux/pci.h |3 +++ > > > > > 3 files changed, 59 insertions(+), 41 deletions(-) > > > > > > > > > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index > > > > > 257b9f6f2ebb..2ef8b954c65a 100644 > > > > > --- a/drivers/pci/probe.c > > > > > +++ b/drivers/pci/probe.c > > > > > @@ -348,6 +348,57 @@ static void pci_read_bases(struct pci_dev *dev, > unsigned int howmany, int rom) > > > > > } > > > > > }
RE: [PATCH v3] PCI: avoid bridge feature re-probing on hotplug
Hi Bjorn and Michael After trying to reproduce the problem for a whole day, the bug did not show up any more. So I think the new patch does solve this problem. > -Original Message- > From: Bjorn Helgaas [mailto:helg...@kernel.org] > Sent: Sunday, January 20, 2019 4:13 AM > To: Michael S. Tsirkin > Cc: linux-kernel@vger.kernel.org; xuyandong ; > Yinghai Lu ; Jesse Barnes ; > linux-...@vger.kernel.org; Sagi Grimberg ; Ofer Hayut > ; Roy Shterman ; Keith > Busch ; Wangzhou (B) > Subject: [PATCH v3] PCI: avoid bridge feature re-probing on hotplug > > I gave up trying to reproduce the problem and test this patch with qemu; can > you guys (Michael and Xu (sorry if I mangled your name)) give this a try? > > I cc'd a few other people who have noticed this issue in the past, so just > FYI for > them. > > Bjorn > > > commit dd21b922db366ba069291b6fef2a8ce6768756a2 > Author: Bjorn Helgaas > Date: Sat Jan 19 11:35:04 2019 -0600 > > PCI: Probe bridge window attributes once at enumeration-time > > pci_bridge_check_ranges() determines whether a bridge supports the > optional > I/O and prefetchable memory windows and sets the flag bits in the bridge > resources. This could be done once during enumeration except that the > resource allocation code completely clears the flag bits, e.g., in the > pci_assign_unassigned_bridge_resources() path. > > The problem was that in some cases pci_bridge_check_ranges() *changes* > the > window registers to determine whether they're writable, and this may break > concurrent accesses to devices behind the bridge. > > Add a new pci_read_bridge_windows() to determine whether a bridge > supports > the optional windows, call it once during enumeration, remember the > results, and change pci_bridge_check_ranges() to set the flag bits based > on > those remembered results. > > Link: > https://lore.kernel.org/linux-pci/1506151482-113560-1-git-send-email- > wangzh...@hisilicon.com > Link: https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html > Reported-by: xuyandong > Cc: Sagi Grimberg > Cc: Ofer Hayut > Cc: Roy Shterman > Cc: Keith Busch > Cc: Zhou Wang > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index > 257b9f6f2ebb..2ef8b954c65a 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -348,6 +348,57 @@ static void pci_read_bases(struct pci_dev *dev, > unsigned int howmany, int rom) > } > } > > +static void pci_read_bridge_windows(struct pci_dev *bridge) { > + u16 io; > + u32 pmem, tmp; > + > + pci_read_config_word(bridge, PCI_IO_BASE, &io); > + if (!io) { > + pci_write_config_word(bridge, PCI_IO_BASE, 0xe0f0); > + pci_read_config_word(bridge, PCI_IO_BASE, &io); > + pci_write_config_word(bridge, PCI_IO_BASE, 0x0); > + } > + if (io) > + bridge->io_window = 1; > + > + /* > + * DECchip 21050 pass 2 errata: the bridge may miss an address > + * disconnect boundary by one PCI data phase. Workaround: do not > + * use prefetching on this device. > + */ > + if (bridge->vendor == PCI_VENDOR_ID_DEC && bridge->device == > 0x0001) > + return; > + > + pci_read_config_dword(bridge, PCI_PREF_MEMORY_BASE, &pmem); > + if (!pmem) { > + pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE, > +0xffe0fff0); > + pci_read_config_dword(bridge, PCI_PREF_MEMORY_BASE, > &pmem); > + pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE, > 0x0); > + } > + if (!pmem) > + return; > + > + bridge->pref_window = 1; > + > + if ((pmem & PCI_PREF_RANGE_TYPE_MASK) == > PCI_PREF_RANGE_TYPE_64) { > + > + /* > + * Bridge claims to have a 64-bit prefetchable memory > + * window; verify that the upper bits are actually > + * writable. > + */ > + pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, > &pmem); > + pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, > +0x); > + pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, > &tmp); > + pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, > pmem); > + if (tmp) > + bridge->pref_64_window = 1; > + } > +} > + > static void pci_read_bridge_io(struct pci_bus *child) { > st
RE: [PATCH] pci: avoid bridge feature re-probing on hotplug
> -Original Message- > From: Michael S. Tsirkin [mailto:m...@redhat.com] > Sent: Tuesday, December 11, 2018 10:19 AM > To: linux-kernel@vger.kernel.org > Cc: xuyandong ; sta...@vger.kernel.org; Yinghai > Lu ; Jesse Barnes ; Bjorn > Helgaas ; linux-...@vger.kernel.org > Subject: [PATCH] pci: avoid bridge feature re-probing on hotplug > > commit 1f82de10d6 ("PCI/x86: don't assume prefetchable ranges are > 64bit") added probing of bridge support for 64 bit memory each time bridge is > re-enumerated. > > Unfortunately this probing is destructive if any device behind the bridge is > in > use at this time. > > There's no real need to re-probe the bridge features as the regiters in > question > never change - detect that using the memory flag being set and skip the > probing. > Avoiding repeated calls to pci_bridge_check_ranges might be even nicer would > be a bigger patch and probably not appropriate on stable. > > Reported-by: xuyandong > Cc: sta...@vger.kernel.org > Cc: Yinghai Lu > Cc: Jesse Barnes > Signed-off-by: Michael S. Tsirkin Tested-by: xuyandong > --- > > This issue has been reported on upstream Linux and Centos. > > drivers/pci/setup-bus.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index > ed960436df5e..7ab42f76579e 100644 > --- a/drivers/pci/setup-bus.c > +++ b/drivers/pci/setup-bus.c > @@ -741,6 +741,13 @@ static void pci_bridge_check_ranges(struct pci_bus > *bus) > struct resource *b_res; > > b_res = &bridge->resource[PCI_BRIDGE_RESOURCES]; > + > + /* Don't re-check after this was called once already: > + * important since bridge might be in use. > + */ > + if (b_res[1].flags & IORESOURCE_MEM) > + return; > + > b_res[1].flags |= IORESOURCE_MEM; > > pci_read_config_word(bridge, PCI_IO_BASE, &io); > -- > MST
RE: [PATCH] vfio iommu type1: no need to check task->mm if task has been destroyed
Hi Alex, After days of intensive test, the bug did not show up any more. So I think the new patch does solve this problem. > -Original Message- > From: Alex Williamson [mailto:alex.william...@redhat.com] > Sent: Friday, April 20, 2018 3:55 AM > To: xuyandong > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org; Zhanghailiang > ; wangxin (U) > ; Kirti Wankhede > > Subject: Re: [PATCH] vfio iommu type1: no need to check task->mm if task > has been destroyed > > On Thu, 19 Apr 2018 10:19:26 -0600 > Alex Williamson wrote: > > > [cc +Kirti] > > > > On Wed, 18 Apr 2018 18:55:45 +0800 > > Xu Yandong wrote: > > > > > The task structure in vfio_dma struct used to identify the same task > > > who map it or other task who shares same adress space is allowed to > > > unmap. But if the task who map it has exited, mm of the task has > > > been set to null, we should unmap the vfio dma directly. > > > > > > Signed-off-by: Xu Yandong > > > --- > > > Hi all, > > > When I unplug a vcpu from a VM lanched with a VFIO hostdev device, I > > > found that the *vfio_dma* mapped by this vcpu task could not be > > > unmaped in the future, so I send this patch to unmap vfio_dma > > > directly if the task who mapped it has exited. > > > > > > Howerver this patch may introduce a new security risk because any task > can > > > unmap the *vfio_dma* if the mapper task has exited. > > > > Well that's unexpected, but adding some debugging code I can clearly > > see that the map and unmap ioctls are typically called by the various > > processor threads, which all share the same mm_struct (so accounting > > is correct regardless of which CPU does the unmap). I don't think the > > fix below is correct though, it's not for a security risk, but for > > accounting issue and correctness issues. The pages are mapped and > > accounted against the users locked memory limits, if we simply bail > > out, both the IOMMU mappings and the limit accounting are wrong. > > Perhaps rather than referencing the calling task_struct in the > > vfio_dma on mapping, we should traverse to the highest parent task > > sharing the same mm_struct. Kirti, any thoughts since this code > > originated for mdev support? Thanks, > > I think something like below is a better solution. More research required on > group_leader and if it needs to be sanity tested or if testing mm_struct is > redundant, but I think it should resolve the failing test case, all mappings > reference the same task regardless of which vCPU triggers it. Thanks, > > Alex > > diff --git a/drivers/vfio/vfio_iommu_type1.c > b/drivers/vfio/vfio_iommu_type1.c index 5c212bf29640..3a1d3695c3fb > 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -1093,6 +1093,7 @@ static int vfio_dma_do_map(struct vfio_iommu > *iommu, > int ret = 0, prot = 0; > uint64_t mask; > struct vfio_dma *dma; > + struct task_struct *task; > > /* Verify that none of our __u64 fields overflow */ > if (map->size != size || map->vaddr != vaddr || map->iova != iova) > @@ -1131,8 +1132,12 @@ static int vfio_dma_do_map(struct vfio_iommu > *iommu, > dma->iova = iova; > dma->vaddr = vaddr; > dma->prot = prot; > - get_task_struct(current); > - dma->task = current; > + > + task = (current->mm == current->group_leader->mm ? > + current->group_leader : current); > + get_task_struct(task); > + dma->task = task; > + > dma->pfn_list = RB_ROOT; > > /* Insert zero-sized and grow as we map chunks of it */ > >