RE: [PATCH] drm: add func to better detect wether swiotlb is needed

2019-02-21 Thread Paul Durrant
> -Original Message-
> From: Michael D Labriola [mailto:michael.d.labri...@gmail.com]
> Sent: 19 February 2019 23:08
> To: dri-devel@lists.freedesktop.org; Alex Deucher
> ; Christian Koenig ;
> Chunming Zhou ; amd-...@lists.freedesktop.org; Monk
> Liu 
> Cc: Juergen Gross ; Christoph Hellwig
> ; Andrew Cooper ; Paul
> Durrant ; xen-de...@lists.xen.org; Jan Beulich
> ; Konrad Rzeszutek Wilk ;
> Michael D Labriola 
> Subject: [PATCH] drm: add func to better detect wether swiotlb is needed
> 
> This commit fixes DRM failures on Xen PV systems that were introduced in
> v4.17 by the following commits:
> 
> 82626363 drm: add func to get max iomem address v2
> fd5fd480 drm/amdgpu: only enable swiotlb alloc when need v2
> 1bc3d3cc drm/radeon: only enable swiotlb path when need v2
> 
> The introduction of ->need_swiotlb to the ttm_dma_populate() conditionals
> in the radeon and amdgpu device drivers causes Gnome to immediately crash
> on Xen PV systems, returning the user to the login screen.  The following
> kernel errors get logged:
> 
> [   28.554259] radeon_dp_aux_transfer_native: 200 callbacks suppressed
> [   31.219821] radeon :01:00.0: swiotlb buffer is full (sz: 2097152
> bytes)
> [   31.220030] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to
> allocate GEM object (16384000, 2, 4096, -14)
> [   31.226109] radeon :01:00.0: swiotlb buffer is full (sz: 2097152
> bytes)
> [   31.226300] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to
> allocate GEM object (16384000, 2, 4096, -14)
> [   31.300734] gnome-shell[1935]: segfault at 88 ip 7f39151cd904 sp
> 7ffc97611ad8 error 4 in libmutter-cogl.so[7f3915178000+aa000]
> [   31.300745] Code: 5f c3 0f 1f 40 00 48 8b 47 78 48 8b 40 40 ff e0 66 0f
> 1f 44 00 00 48 8b 47 78 48 8b 40 48 ff e0 66 0f 1f 44 00 00 48 8b 47 78
> <48> 8b 80 88 00 00 00 ff e0 0f 1f 00 48 8b 47 78 48 8b 40 68 ff e0
> [   38.193302] radeon_dp_aux_transfer_native: 116 callbacks suppressed
> [   40.009317] radeon :01:00.0: swiotlb buffer is full (sz: 2097152
> bytes)
> [   40.009488] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to
> allocate GEM object (16384000, 2, 4096, -14)
> [   40.015114] radeon :01:00.0: swiotlb buffer is full (sz: 2097152
> bytes)
> [   40.015297] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to
> allocate GEM object (16384000, 2, 4096, -14)
> [   40.028302] gnome-shell[2431]: segfault at 2dadf40 ip 02dadf40
> sp 7ffcd24ea5f8 error 15
> [   40.028306] Code: 20 6e 31 00 00 00 00 00 00 00 00 37 e3 3d 2d 7f 00 00
> 80 f4 e6 3d 2d 7f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> <00> 00 00 00 00 00 00 00 c1 00 00 00 00 00 00 00 80 e1 d2 03 00 00
> 
> This commit renames drm_get_max_iomem() to drm_need_swiotlb(), adds a
> xen_pv_domain() check to it, and moves the bit shifting comparison that
> always follows its usage into the function (simplifying the drm driver
> code).
> 
> Signed-off-by: Michael D Labriola 
> ---
>  drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c  |  2 +-
>  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c  |  2 +-
>  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  2 +-
>  drivers/gpu/drm/drm_memory.c   | 19 ---
>  drivers/gpu/drm/radeon/radeon_device.c |  2 +-
>  include/drm/drm_cache.h|  2 +-
>  6 files changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> index 910c4ce..6bc0266 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> @@ -1029,7 +1029,7 @@ static int gmc_v7_0_sw_init(void *handle)
>   pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
>   pr_warn("amdgpu: No coherent DMA available\n");
>   }
> - adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
> + adev->need_swiotlb = drm_need_swiotlb(dma_bits);
> 
>   r = gmc_v7_0_init_microcode(adev);
>   if (r) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> index 747c068..8638adf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> @@ -1155,7 +1155,7 @@ static int gmc_v8_0_sw_init(void *handle)
>   pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
>   pr_warn("amdgpu: No coherent DMA available\n");
>   }
> - adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
> + adev->need_swiotlb = drm_need_swiotlb(dma_bits);
> 
>   r = gmc_v8_0_init_microcode(adev);
>   if (r) {
> diff --git

RE: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-19 Thread Paul Durrant
> -Original Message-
> From: Oleksandr Andrushchenko [mailto:andr2...@gmail.com]
> Sent: 18 April 2018 11:21
> To: Paul Durrant ; Roger Pau Monne
> 
> Cc: jgr...@suse.com; Artem Mygaiev ;
> Dongwon Kim ; airl...@linux.ie;
> oleksandr_andrushche...@epam.com; linux-ker...@vger.kernel.org; dri-
> de...@lists.freedesktop.org; Potrola, MateuszX
> ; xen-de...@lists.xenproject.org;
> daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper
> 
> Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy
> helper DRM driver
> 
> On 04/18/2018 01:18 PM, Paul Durrant wrote:
> >> -Original Message-
> >> From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On
> Behalf
> >> Of Roger Pau Monné
> >> Sent: 18 April 2018 11:11
> >> To: Oleksandr Andrushchenko 
> >> Cc: jgr...@suse.com; Artem Mygaiev ;
> >> Dongwon Kim ; airl...@linux.ie;
> >> oleksandr_andrushche...@epam.com; linux-ker...@vger.kernel.org;
> dri-
> >> de...@lists.freedesktop.org; Potrola, MateuszX
> >> ; xen-de...@lists.xenproject.org;
> >> daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper
> >> 
> >> Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy
> >> helper DRM driver
> >>
> >> On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko
> >> wrote:
> >>> On 04/18/2018 10:35 AM, Roger Pau Monné wrote:
> >>>> On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko
> >> wrote:
> >>>>> On 04/17/2018 11:57 PM, Dongwon Kim wrote:
> >>>>>> On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote:
> >>>>>>> On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote:
> >>>>> 3.2 Backend exports dma-buf to xen-front
> >>>>>
> >>>>> In this case Dom0 pages are shared with DomU. As before, DomU can
> >> only write
> >>>>> to these pages, not any other page from Dom0, so it can be still
> >> considered
> >>>>> safe.
> >>>>> But, the following must be considered (highlighted in xen-front's
> Kernel
> >>>>> documentation):
> >>>>>    - If guest domain dies then pages/grants received from the backend
> >> cannot
> >>>>>      be claimed back - think of it as memory lost to Dom0 (won't be used
> >> for
> >>>>> any
> >>>>>      other guest)
> >>>>>    - Misbehaving guest may send too many requests to the backend
> >> exhausting
> >>>>>      its grant references and memory (consider this from security POV).
> >> As the
> >>>>>      backend runs in the trusted domain we also assume that it is
> trusted
> >> as
> >>>>> well,
> >>>>>      e.g. must take measures to prevent DDoS attacks.
> >>>> I cannot parse the above sentence:
> >>>>
> >>>> "As the backend runs in the trusted domain we also assume that it is
> >>>> trusted as well, e.g. must take measures to prevent DDoS attacks."
> >>>>
> >>>> What's the relation between being trusted and protecting from DoS
> >>>> attacks?
> >>> I mean that we trust the backend that it can prevent Dom0
> >>> from crashing in case DomU's frontend misbehaves, e.g.
> >>> if the frontend sends too many memory requests etc.
> >>>> In any case, all? PV protocols are implemented with the frontend
> >>>> sharing pages to the backend, and I think there's a reason why this
> >>>> model is used, and it should continue to be used.
> >>> This is the first use-case above. But there are real-world
> >>> use-cases (embedded in my case) when physically contiguous memory
> >>> needs to be shared, one of the possible ways to achieve this is
> >>> to share contiguous memory from Dom0 to DomU (the second use-case
> >> above)
> >>>> Having to add logic in the backend to prevent such attacks means
> >>>> that:
> >>>>
> >>>>- We need more code in the backend, which increases complexity and
> >>>>  chances of bugs.
> >>>>- Such code/logic could be wrong, thus allowing DoS.
> >>> You can live without this code at all, but this is then up to
> >>> backend which may make Dom0 down because of DomU's frontend
> doin

RE: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-19 Thread Paul Durrant
> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf
> Of Roger Pau Monné
> Sent: 18 April 2018 11:11
> To: Oleksandr Andrushchenko 
> Cc: jgr...@suse.com; Artem Mygaiev ;
> Dongwon Kim ; airl...@linux.ie;
> oleksandr_andrushche...@epam.com; linux-ker...@vger.kernel.org; dri-
> de...@lists.freedesktop.org; Potrola, MateuszX
> ; xen-de...@lists.xenproject.org;
> daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper
> 
> Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy
> helper DRM driver
> 
> On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko
> wrote:
> > On 04/18/2018 10:35 AM, Roger Pau Monné wrote:
> > > On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko
> wrote:
> > > > On 04/17/2018 11:57 PM, Dongwon Kim wrote:
> > > > > On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote:
> > > > > > On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote:
> > > > 3.2 Backend exports dma-buf to xen-front
> > > >
> > > > In this case Dom0 pages are shared with DomU. As before, DomU can
> only write
> > > > to these pages, not any other page from Dom0, so it can be still
> considered
> > > > safe.
> > > > But, the following must be considered (highlighted in xen-front's Kernel
> > > > documentation):
> > > >   - If guest domain dies then pages/grants received from the backend
> cannot
> > > >     be claimed back - think of it as memory lost to Dom0 (won't be used
> for
> > > > any
> > > >     other guest)
> > > >   - Misbehaving guest may send too many requests to the backend
> exhausting
> > > >     its grant references and memory (consider this from security POV).
> As the
> > > >     backend runs in the trusted domain we also assume that it is trusted
> as
> > > > well,
> > > >     e.g. must take measures to prevent DDoS attacks.
> > > I cannot parse the above sentence:
> > >
> > > "As the backend runs in the trusted domain we also assume that it is
> > > trusted as well, e.g. must take measures to prevent DDoS attacks."
> > >
> > > What's the relation between being trusted and protecting from DoS
> > > attacks?
> > I mean that we trust the backend that it can prevent Dom0
> > from crashing in case DomU's frontend misbehaves, e.g.
> > if the frontend sends too many memory requests etc.
> > > In any case, all? PV protocols are implemented with the frontend
> > > sharing pages to the backend, and I think there's a reason why this
> > > model is used, and it should continue to be used.
> > This is the first use-case above. But there are real-world
> > use-cases (embedded in my case) when physically contiguous memory
> > needs to be shared, one of the possible ways to achieve this is
> > to share contiguous memory from Dom0 to DomU (the second use-case
> above)
> > > Having to add logic in the backend to prevent such attacks means
> > > that:
> > >
> > >   - We need more code in the backend, which increases complexity and
> > > chances of bugs.
> > >   - Such code/logic could be wrong, thus allowing DoS.
> > You can live without this code at all, but this is then up to
> > backend which may make Dom0 down because of DomU's frontend doing
> evil
> > things
> 
> IMO we should design protocols that do not allow such attacks instead
> of having to defend against them.
> 
> > > > 4. xen-front/backend/xen-zcopy synchronization
> > > >
> > > > 4.1. As I already said in 2) all the inter VM communication happens
> between
> > > > xen-front and the backend, xen-zcopy is NOT involved in that.
> > > > When xen-front wants to destroy a display buffer (dumb/dma-buf) it
> issues a
> > > > XENDISPL_OP_DBUF_DESTROY command (opposite to
> XENDISPL_OP_DBUF_CREATE).
> > > > This call is synchronous, so xen-front expects that backend does free
> the
> > > > buffer pages on return.
> > > >
> > > > 4.2. Backend, on XENDISPL_OP_DBUF_DESTROY:
> > > >    - closes all dumb handles/fd's of the buffer according to [3]
> > > >    - issues DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE IOCTL to xen-
> zcopy to make
> > > > sure
> > > >      the buffer is freed (think of it as it waits for dma-buf->release
> > > > callback)
> > > So this zcopy thing keeps some kind of track of the memory usage? Why
> > > can't the user-space backend keep track of the buffer usage?
> > Because there is no dma-buf UAPI which allows to track the buffer life cycle
> > (e.g. wait until dma-buf's .release callback is called)
> > > >    - replies to xen-front that the buffer can be destroyed.
> > > > This way deletion of the buffer happens synchronously on both Dom0
> and DomU
> > > > sides. In case if DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE returns
> with time-out
> > > > error
> > > > (BTW, wait time is a parameter of this IOCTL), Xen will defer grant
> > > > reference
> > > > removal and will retry later until those are free.
> > > >
> > > > Hope this helps understand how buffers are synchronously deleted in
> case
> > > > of xen-zcopy with a single protocol co