RE: [PATCH] PCI: Probe bridge window attributes once at enumeration-time

2019-02-10 Thread Xuyandong (Yandong Xu, Euler5)
> On Tue, Jan 29, 2019 at 05:02:26PM -0600, Bjorn Helgaas wrote:
> > On Tue, Jan 29, 2019 at 05:47:32PM -0500, Michael S. Tsirkin wrote:
> > > On Tue, Jan 29, 2019 at 04:43:33PM -0600, Bjorn Helgaas wrote:
> > > > On Tue, Jan 22, 2019 at 01:02:54PM -0600, Bjorn Helgaas wrote:
> > > > > From: Bjorn Helgaas 
> > > > >
> > > > > pci_bridge_check_ranges() determines whether a bridge supports
> > > > > the optional I/O and prefetchable memory windows and sets the
> > > > > flag bits in the bridge resources.  This *could* be done once
> > > > > during enumeration except that the resource allocation code
> > > > > completely clears the flag bits, e.g., in the
> > > > > pci_assign_unassigned_bridge_resources() path.
> > > > >
> > > > > The problem with pci_bridge_check_ranges() in the resource
> > > > > allocation path is that we may allocate resources after devices
> > > > > have been claimed by drivers, and pci_bridge_check_ranges()
> > > > > *changes* the window registers to determine whether they're
> > > > > writable.  This may break concurrent accesses to devices behind the
> bridge.
> > > > >
> > > > > Add a new pci_read_bridge_windows() to determine whether a
> > > > > bridge supports the optional windows, call it once during
> > > > > enumeration, remember the results, and change
> > > > > pci_bridge_check_ranges() so it doesn't touch the bridge windows but
> sets the flag bits based on those remembered results.
> > > > >
> > > > > Link:
> > > > > https://lore.kernel.org/linux-pci/1506151482-113560-1-git-send-e
> > > > > mail-wangzh...@hisilicon.com
> > > > > Link:
> > > > > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.h
> > > > > tml
> > > > > Reported-by: xuyandong 
> > > > > Tested-by: xuyandong 
> > > > > Cc: Sagi Grimberg 
> > > > > Cc: Ofer Hayut 
> > > > > Cc: Roy Shterman 
> > > > > Cc: Keith Busch 
> > > > > Cc: Zhou Wang 
> > > >
> > > > Applied to pci/enumeration for v5.1.
> > > >
> > > > This is fairly simple in concept, but doesn't meet the letter of
> > > > the restrictions in Documentation/process/stable-kernel-rules.rst,
> > > > so I didn't tag it for stable.
> > > >
> > > > Is there a compelling argument to mark it for stable?
> > >
> > > Well it's needed downstream.
> > > It's a bit above 100 lines with context. Is this what you mean?
> >
> > Yep, I should have been explicit about that.
> >
> > > If yes how about using my patch for stable as an alternative?
> > > The rules allow for equivalent patches.
> >
> > Well, yes, that would be a possibility.
> >
> > I would rather develop an argument for marking *this* patch for
> > stable.  Even though it is technically too large, I think we could
> > make it work if we have compelling reasons.
> >
> > Those would need to be fleshed out a little more than "needed
> > downstream" -- something about the scenarios where this happens, what
> > the serious problem is, etc.
> >
> > The *ideal* thing would be to have an actual bugzilla.kernel.org
> > report of the problem with a way to reproduce it and dmesg
> > illustrating the problem.
> 
> xuyandong would you like to take a stub at creating the bugzilla entry?
> If not let me know and I'll do it.
> Thanks!
> 
Thanks, but I  do not want to create the bugzilla entry.


> 
> > > > > ---
> > > > >  drivers/pci/probe.c |   52
> +++
> > > > >  drivers/pci/setup-bus.c |   45 
> > > > > -
> > > > >  include/linux/pci.h |3 +++
> > > > >  3 files changed, 59 insertions(+), 41 deletions(-)
> > > > >
> > > > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index
> > > > > 257b9f6f2ebb..2ef8b954c65a 100644
> > > > > --- a/drivers/pci/probe.c
> > > > > +++ b/drivers/pci/probe.c
> > > > > @@ -348,6 +348,57 @@ static void pci_read_bases(struct pci_dev *dev,
> unsigned int howmany, int rom)
> > > > >   }
> > > > >  }

RE: [PATCH v3] PCI: avoid bridge feature re-probing on hotplug

2019-01-21 Thread xuyandong
Hi Bjorn and Michael

After trying to reproduce the problem for a whole day, the bug did not show up
any more. So I think the new patch does solve this problem.

> -Original Message-
> From: Bjorn Helgaas [mailto:helg...@kernel.org]
> Sent: Sunday, January 20, 2019 4:13 AM
> To: Michael S. Tsirkin 
> Cc: linux-kernel@vger.kernel.org; xuyandong ;
> Yinghai Lu ; Jesse Barnes ;
> linux-...@vger.kernel.org; Sagi Grimberg ; Ofer Hayut
> ; Roy Shterman ; Keith
> Busch ; Wangzhou (B) 
> Subject: [PATCH v3] PCI: avoid bridge feature re-probing on hotplug
> 
> I gave up trying to reproduce the problem and test this patch with qemu; can
> you guys (Michael and Xu (sorry if I mangled your name)) give this a try?
> 
> I cc'd a few other people who have noticed this issue in the past, so just 
> FYI for
> them.
> 
> Bjorn
> 
> 
> commit dd21b922db366ba069291b6fef2a8ce6768756a2
> Author: Bjorn Helgaas 
> Date:   Sat Jan 19 11:35:04 2019 -0600
> 
> PCI: Probe bridge window attributes once at enumeration-time
> 
> pci_bridge_check_ranges() determines whether a bridge supports the
> optional
> I/O and prefetchable memory windows and sets the flag bits in the bridge
> resources.  This could be done once during enumeration except that the
> resource allocation code completely clears the flag bits, e.g., in the
> pci_assign_unassigned_bridge_resources() path.
> 
> The problem was that in some cases pci_bridge_check_ranges() *changes*
> the
> window registers to determine whether they're writable, and this may break
> concurrent accesses to devices behind the bridge.
> 
> Add a new pci_read_bridge_windows() to determine whether a bridge
> supports
> the optional windows, call it once during enumeration, remember the
> results, and change pci_bridge_check_ranges() to set the flag bits based 
> on
> those remembered results.
> 
> Link: 
> https://lore.kernel.org/linux-pci/1506151482-113560-1-git-send-email-
> wangzh...@hisilicon.com
> Link: https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html
> Reported-by: xuyandong 
> Cc: Sagi Grimberg 
> Cc: Ofer Hayut 
> Cc: Roy Shterman 
> Cc: Keith Busch 
> Cc: Zhou Wang 
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index
> 257b9f6f2ebb..2ef8b954c65a 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -348,6 +348,57 @@ static void pci_read_bases(struct pci_dev *dev,
> unsigned int howmany, int rom)
>   }
>  }
> 
> +static void pci_read_bridge_windows(struct pci_dev *bridge) {
> + u16 io;
> + u32 pmem, tmp;
> +
> + pci_read_config_word(bridge, PCI_IO_BASE, &io);
> + if (!io) {
> + pci_write_config_word(bridge, PCI_IO_BASE, 0xe0f0);
> + pci_read_config_word(bridge, PCI_IO_BASE, &io);
> + pci_write_config_word(bridge, PCI_IO_BASE, 0x0);
> + }
> + if (io)
> + bridge->io_window = 1;
> +
> + /*
> +  * DECchip 21050 pass 2 errata: the bridge may miss an address
> +  * disconnect boundary by one PCI data phase.  Workaround: do not
> +  * use prefetching on this device.
> +  */
> + if (bridge->vendor == PCI_VENDOR_ID_DEC && bridge->device ==
> 0x0001)
> + return;
> +
> + pci_read_config_dword(bridge, PCI_PREF_MEMORY_BASE, &pmem);
> + if (!pmem) {
> + pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE,
> +0xffe0fff0);
> + pci_read_config_dword(bridge, PCI_PREF_MEMORY_BASE,
> &pmem);
> + pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE,
> 0x0);
> + }
> + if (!pmem)
> + return;
> +
> + bridge->pref_window = 1;
> +
> + if ((pmem & PCI_PREF_RANGE_TYPE_MASK) ==
> PCI_PREF_RANGE_TYPE_64) {
> +
> + /*
> +  * Bridge claims to have a 64-bit prefetchable memory
> +  * window; verify that the upper bits are actually
> +  * writable.
> +  */
> + pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> &pmem);
> + pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> +0x);
> + pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> &tmp);
> + pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> pmem);
> + if (tmp)
> + bridge->pref_64_window = 1;
> + }
> +}
> +
>  static void pci_read_bridge_io(struct pci_bus *child)  {
>   st

RE: [PATCH] pci: avoid bridge feature re-probing on hotplug

2018-12-10 Thread xuyandong


> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Tuesday, December 11, 2018 10:19 AM
> To: linux-kernel@vger.kernel.org
> Cc: xuyandong ; sta...@vger.kernel.org; Yinghai
> Lu ; Jesse Barnes ; Bjorn
> Helgaas ; linux-...@vger.kernel.org
> Subject: [PATCH] pci: avoid bridge feature re-probing on hotplug
> 
> commit 1f82de10d6 ("PCI/x86: don't assume prefetchable ranges are
> 64bit") added probing of bridge support for 64 bit memory each time bridge is
> re-enumerated.
> 
> Unfortunately this probing is destructive if any device behind the bridge is 
> in
> use at this time.
> 
> There's no real need to re-probe the bridge features as the regiters in 
> question
> never change - detect that using the memory flag being set and skip the
> probing.
> Avoiding repeated calls to pci_bridge_check_ranges might be even nicer would
> be a bigger patch and probably not appropriate on stable.
> 
> Reported-by: xuyandong 
> Cc: sta...@vger.kernel.org
> Cc: Yinghai Lu 
> Cc: Jesse Barnes 
> Signed-off-by: Michael S. Tsirkin 

Tested-by: xuyandong 

> ---
> 
> This issue has been reported on upstream Linux and Centos.
> 
>  drivers/pci/setup-bus.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index
> ed960436df5e..7ab42f76579e 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -741,6 +741,13 @@ static void pci_bridge_check_ranges(struct pci_bus
> *bus)
>   struct resource *b_res;
> 
>   b_res = &bridge->resource[PCI_BRIDGE_RESOURCES];
> +
> + /* Don't re-check after this was called once already:
> +  * important since bridge might be in use.
> +  */
> + if (b_res[1].flags & IORESOURCE_MEM)
> + return;
> +
>   b_res[1].flags |= IORESOURCE_MEM;
> 
>   pci_read_config_word(bridge, PCI_IO_BASE, &io);
> --
> MST


RE: [PATCH] vfio iommu type1: no need to check task->mm if task has been destroyed

2018-04-20 Thread xuyandong
Hi Alex,
 After days of intensive test, the bug did not show up any more.
 So I think the new patch does solve this problem.

> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Friday, April 20, 2018 3:55 AM
> To: xuyandong 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org; Zhanghailiang
> ; wangxin (U)
> ; Kirti Wankhede
> 
> Subject: Re: [PATCH] vfio iommu type1: no need to check task->mm if task
> has been destroyed
> 
> On Thu, 19 Apr 2018 10:19:26 -0600
> Alex Williamson  wrote:
> 
> > [cc +Kirti]
> >
> > On Wed, 18 Apr 2018 18:55:45 +0800
> > Xu Yandong  wrote:
> >
> > > The task structure in vfio_dma struct used to identify the same task
> > > who map it or other task who shares same adress space is allowed to
> > > unmap. But if the task who map it has exited, mm of the task has
> > > been set to null, we should unmap the vfio dma directly.
> > >
> > > Signed-off-by: Xu Yandong 
> > > ---
> > > Hi all,
> > > When I unplug a vcpu from a VM lanched with a VFIO hostdev device, I
> > > found that the *vfio_dma* mapped by this vcpu task could not be
> > > unmaped in the future, so I send this patch to unmap vfio_dma
> > > directly if the task who mapped it has exited.
> > >
> > > Howerver this patch may introduce a new security risk because any task
> can
> > > unmap the *vfio_dma* if the mapper task has exited.
> >
> > Well that's unexpected, but adding some debugging code I can clearly
> > see that the map and unmap ioctls are typically called by the various
> > processor threads, which all share the same mm_struct (so accounting
> > is correct regardless of which CPU does the unmap).  I don't think the
> > fix below is correct though, it's not for a security risk, but for
> > accounting issue and correctness issues.  The pages are mapped and
> > accounted against the users locked memory limits, if we simply bail
> > out, both the IOMMU mappings and the limit accounting are wrong.
> > Perhaps rather than referencing the calling task_struct in the
> > vfio_dma on mapping, we should traverse to the highest parent task
> > sharing the same mm_struct.  Kirti, any thoughts since this code
> > originated for mdev support?  Thanks,
> 
> I think something like below is a better solution.  More research required on
> group_leader and if it needs to be sanity tested or if testing mm_struct is
> redundant, but I think it should resolve the failing test case, all mappings
> reference the same task regardless of which vCPU triggers it.  Thanks,
> 
> Alex
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c
> b/drivers/vfio/vfio_iommu_type1.c index 5c212bf29640..3a1d3695c3fb
> 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1093,6 +1093,7 @@ static int vfio_dma_do_map(struct vfio_iommu
> *iommu,
>   int ret = 0, prot = 0;
>   uint64_t mask;
>   struct vfio_dma *dma;
> + struct task_struct *task;
> 
>   /* Verify that none of our __u64 fields overflow */
>   if (map->size != size || map->vaddr != vaddr || map->iova != iova)
> @@ -1131,8 +1132,12 @@ static int vfio_dma_do_map(struct vfio_iommu
> *iommu,
>   dma->iova = iova;
>   dma->vaddr = vaddr;
>   dma->prot = prot;
> - get_task_struct(current);
> - dma->task = current;
> +
> + task = (current->mm == current->group_leader->mm ?
> + current->group_leader : current);
> + get_task_struct(task);
> + dma->task = task;
> +
>   dma->pfn_list = RB_ROOT;
> 
>   /* Insert zero-sized and grow as we map chunks of it */
> 
>