Re: [PATCH 2/8] drm/panfrost: Fix a race in panfrost_ioctl_madvise()

2019-12-06 Thread Boris Brezillon
On Fri, 6 Dec 2019 08:53:27 +0100
Boris Brezillon  wrote:

> On Thu, 5 Dec 2019 17:08:02 -0600
> Rob Herring  wrote:
> 
> > On Fri, Nov 29, 2019 at 8:33 AM Boris Brezillon
> >  wrote:  
> > >
> > > On Fri, 29 Nov 2019 14:24:48 +
> > > Steven Price  wrote:
> > >
> > > > On 29/11/2019 13:59, Boris Brezillon wrote:
> > > > > If 2 threads change the MADVISE property of the same BO in parallel we
> > > > > might end up with an shmem->madv value that's inconsistent with the
> > > > > presence of the BO in the shrinker list.
> > > >
> > > > I'm a bit worried from the point of view of user space sanity that you
> > > > observed this - but clearly the kernel should be robust!
> > >
> > > It's not something I observed, just found the race by inspecting the
> > > code, and I thought it was worth fixing it.
> > 
> > I'm not so sure there's a race.  
> 
> I'm pretty sure there's one:
> 
> T0T1
> 
> lock(pages)
> madv = 1
> unlock(pages)
> 
>   lock(pages)
>   madv = 0
>   unlock(pages)
> 
>   lock(shrinker)
>   remove_from_list(bo)
>   unlock(shrinker)
> 
> lock(shrinker)
> add_to_list(bo)
> unlock(shrinker)
> 
> You end up with madv = 0 and the BO is added to the list.
> 
> > If there is, we still check madv value
> > when purging, so it would be harmless even if the state is
> > inconsistent.  
> 
> Indeed. Note that you could also have this other situation where the BO
> is marked purgeable but not present in the list. In that case it will
> never be purged, but it's kinda user space fault anyway. I agree, none
> of this problems are critical, and I'm fine leaving it unfixed as long
> as it's documented somewhere that the race exist and is harmless.
> 
> >   
> > > > > The easiest solution to fix that is to protect the
> > > > > drm_gem_shmem_madvise() call with the shrinker lock.
> > > > >
> > > > > Fixes: 013b65101315 ("drm/panfrost: Add madvise and shrinker support")
> > > > > Cc: 
> > > > > Signed-off-by: Boris Brezillon 
> > > >
> > > > Reviewed-by: Steven Price 
> > >
> > > Thanks.
> > >
> > > >
> > > > > ---
> > > > >  drivers/gpu/drm/panfrost/panfrost_drv.c | 9 -
> > > > >  1 file changed, 4 insertions(+), 5 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
> > > > > b/drivers/gpu/drm/panfrost/panfrost_drv.c
> > > > > index f21bc8a7ee3a..efc0a24d1f4c 100644
> > > > > --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> > > > > +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> > > > > @@ -347,20 +347,19 @@ static int panfrost_ioctl_madvise(struct 
> > > > > drm_device *dev, void *data,
> > > > > return -ENOENT;
> > > > > }
> > > > >
> > > > > +   mutex_lock(>shrinker_lock);
> > > > > args->retained = drm_gem_shmem_madvise(gem_obj, args->madv);
> > 
> > This means we now hold the shrinker_lock while we take the pages_lock.
> > Is lockdep happy with this change? I suspect not given all the fun I
> > had getting lockdep happy.  
> 
> I have tested with lockdep enabled and it's all good from lockdep PoV
> because the locks are taken in the same order in the madvise() and
> schinker_scan() path (first the shrinker lock, then the pages lock).
> 
> Note that patch 7 introduces a deadlock in the shrinker path, but this
> is unrelated to this shrinker lock being taken earlier in madvise
> (drm_gem_put_pages() is called while the pages lock is already held).

My bad, there's no deadlock in this version, because we don't use
->pages_use_count to retain the page table (we just use a gpu_usecount
in patch 8 to prevent the purge). But I started working on a version
that uses ->pages_use_count instead of introducing yet another
refcount, and in this version I take/release a ref on the page table in
the mmu_map()/mmu_unmap() path. This causes a deadlock when GEM mappings
are teared down by the shrinker logic (because the pages lock is already
taken in panfrost_gem_purge())...


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 2/8] drm/panfrost: Fix a race in panfrost_ioctl_madvise()

2019-12-05 Thread Boris Brezillon
On Thu, 5 Dec 2019 17:08:02 -0600
Rob Herring  wrote:

> On Fri, Nov 29, 2019 at 8:33 AM Boris Brezillon
>  wrote:
> >
> > On Fri, 29 Nov 2019 14:24:48 +
> > Steven Price  wrote:
> >  
> > > On 29/11/2019 13:59, Boris Brezillon wrote:  
> > > > If 2 threads change the MADVISE property of the same BO in parallel we
> > > > might end up with an shmem->madv value that's inconsistent with the
> > > > presence of the BO in the shrinker list.  
> > >
> > > I'm a bit worried from the point of view of user space sanity that you
> > > observed this - but clearly the kernel should be robust!  
> >
> > It's not something I observed, just found the race by inspecting the
> > code, and I thought it was worth fixing it.  
> 
> I'm not so sure there's a race.

I'm pretty sure there's one:

T0  T1

lock(pages)
madv = 1
unlock(pages)

lock(pages)
madv = 0
unlock(pages)

lock(shrinker)
remove_from_list(bo)
unlock(shrinker)

lock(shrinker)
add_to_list(bo)
unlock(shrinker)

You end up with madv = 0 and the BO is added to the list.

> If there is, we still check madv value
> when purging, so it would be harmless even if the state is
> inconsistent.

Indeed. Note that you could also have this other situation where the BO
is marked purgeable but not present in the list. In that case it will
never be purged, but it's kinda user space fault anyway. I agree, none
of this problems are critical, and I'm fine leaving it unfixed as long
as it's documented somewhere that the race exist and is harmless.

> 
> > > > The easiest solution to fix that is to protect the
> > > > drm_gem_shmem_madvise() call with the shrinker lock.
> > > >
> > > > Fixes: 013b65101315 ("drm/panfrost: Add madvise and shrinker support")
> > > > Cc: 
> > > > Signed-off-by: Boris Brezillon   
> > >
> > > Reviewed-by: Steven Price   
> >
> > Thanks.
> >  
> > >  
> > > > ---
> > > >  drivers/gpu/drm/panfrost/panfrost_drv.c | 9 -
> > > >  1 file changed, 4 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
> > > > b/drivers/gpu/drm/panfrost/panfrost_drv.c
> > > > index f21bc8a7ee3a..efc0a24d1f4c 100644
> > > > --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> > > > +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> > > > @@ -347,20 +347,19 @@ static int panfrost_ioctl_madvise(struct 
> > > > drm_device *dev, void *data,
> > > > return -ENOENT;
> > > > }
> > > >
> > > > +   mutex_lock(>shrinker_lock);
> > > > args->retained = drm_gem_shmem_madvise(gem_obj, args->madv);  
> 
> This means we now hold the shrinker_lock while we take the pages_lock.
> Is lockdep happy with this change? I suspect not given all the fun I
> had getting lockdep happy.

I have tested with lockdep enabled and it's all good from lockdep PoV
because the locks are taken in the same order in the madvise() and
schinker_scan() path (first the shrinker lock, then the pages lock).

Note that patch 7 introduces a deadlock in the shrinker path, but this
is unrelated to this shrinker lock being taken earlier in madvise
(drm_gem_put_pages() is called while the pages lock is already held).
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 2/8] drm/panfrost: Fix a race in panfrost_ioctl_madvise()

2019-12-05 Thread Rob Herring
On Fri, Nov 29, 2019 at 8:33 AM Boris Brezillon
 wrote:
>
> On Fri, 29 Nov 2019 14:24:48 +
> Steven Price  wrote:
>
> > On 29/11/2019 13:59, Boris Brezillon wrote:
> > > If 2 threads change the MADVISE property of the same BO in parallel we
> > > might end up with an shmem->madv value that's inconsistent with the
> > > presence of the BO in the shrinker list.
> >
> > I'm a bit worried from the point of view of user space sanity that you
> > observed this - but clearly the kernel should be robust!
>
> It's not something I observed, just found the race by inspecting the
> code, and I thought it was worth fixing it.

I'm not so sure there's a race. If there is, we still check madv value
when purging, so it would be harmless even if the state is
inconsistent.

> > > The easiest solution to fix that is to protect the
> > > drm_gem_shmem_madvise() call with the shrinker lock.
> > >
> > > Fixes: 013b65101315 ("drm/panfrost: Add madvise and shrinker support")
> > > Cc: 
> > > Signed-off-by: Boris Brezillon 
> >
> > Reviewed-by: Steven Price 
>
> Thanks.
>
> >
> > > ---
> > >  drivers/gpu/drm/panfrost/panfrost_drv.c | 9 -
> > >  1 file changed, 4 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
> > > b/drivers/gpu/drm/panfrost/panfrost_drv.c
> > > index f21bc8a7ee3a..efc0a24d1f4c 100644
> > > --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> > > +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> > > @@ -347,20 +347,19 @@ static int panfrost_ioctl_madvise(struct drm_device 
> > > *dev, void *data,
> > > return -ENOENT;
> > > }
> > >
> > > +   mutex_lock(>shrinker_lock);
> > > args->retained = drm_gem_shmem_madvise(gem_obj, args->madv);

This means we now hold the shrinker_lock while we take the pages_lock.
Is lockdep happy with this change? I suspect not given all the fun I
had getting lockdep happy.

Rob
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 2/8] drm/panfrost: Fix a race in panfrost_ioctl_madvise()

2019-11-29 Thread Boris Brezillon
On Fri, 29 Nov 2019 21:07:33 +0100
Daniel Vetter  wrote:

> On Fri, Nov 29, 2019 at 02:40:34PM +, Steven Price wrote:
> > On 29/11/2019 14:33, Boris Brezillon wrote:  
> > > On Fri, 29 Nov 2019 14:24:48 +
> > > Steven Price  wrote:
> > >   
> > >> On 29/11/2019 13:59, Boris Brezillon wrote:  
> > >>> If 2 threads change the MADVISE property of the same BO in parallel we
> > >>> might end up with an shmem->madv value that's inconsistent with the
> > >>> presence of the BO in the shrinker list.
> > >>
> > >> I'm a bit worried from the point of view of user space sanity that you
> > >> observed this - but clearly the kernel should be robust!  
> > > 
> > > It's not something I observed, just found the race by inspecting the
> > > code, and I thought it was worth fixing it.  
> > 
> > Good! ;) Your cover letter referring to a "debug session" made me think
> > you'd actually hit all these.  
> 
> Time for some igt to make sure this is actually correct?

That's not something I can easily trigger without instrumenting the
code: I need 2 threads doing MADVISE with 2 different values,
and there has to be a context switch between the
drm_gem_shmem_madvise() call and the mutex_lock(shrinker_lock) one. And
last but not least, I'll need a way to report such inconsistencies to
userspace (trace?).
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 2/8] drm/panfrost: Fix a race in panfrost_ioctl_madvise()

2019-11-29 Thread Daniel Vetter
On Fri, Nov 29, 2019 at 02:40:34PM +, Steven Price wrote:
> On 29/11/2019 14:33, Boris Brezillon wrote:
> > On Fri, 29 Nov 2019 14:24:48 +
> > Steven Price  wrote:
> > 
> >> On 29/11/2019 13:59, Boris Brezillon wrote:
> >>> If 2 threads change the MADVISE property of the same BO in parallel we
> >>> might end up with an shmem->madv value that's inconsistent with the
> >>> presence of the BO in the shrinker list.  
> >>
> >> I'm a bit worried from the point of view of user space sanity that you
> >> observed this - but clearly the kernel should be robust!
> > 
> > It's not something I observed, just found the race by inspecting the
> > code, and I thought it was worth fixing it.
> 
> Good! ;) Your cover letter referring to a "debug session" made me think
> you'd actually hit all these.

Time for some igt to make sure this is actually correct?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 2/8] drm/panfrost: Fix a race in panfrost_ioctl_madvise()

2019-11-29 Thread Steven Price
On 29/11/2019 14:33, Boris Brezillon wrote:
> On Fri, 29 Nov 2019 14:24:48 +
> Steven Price  wrote:
> 
>> On 29/11/2019 13:59, Boris Brezillon wrote:
>>> If 2 threads change the MADVISE property of the same BO in parallel we
>>> might end up with an shmem->madv value that's inconsistent with the
>>> presence of the BO in the shrinker list.  
>>
>> I'm a bit worried from the point of view of user space sanity that you
>> observed this - but clearly the kernel should be robust!
> 
> It's not something I observed, just found the race by inspecting the
> code, and I thought it was worth fixing it.

Good! ;) Your cover letter referring to a "debug session" made me think
you'd actually hit all these.

Steve

>>
>>>
>>> The easiest solution to fix that is to protect the
>>> drm_gem_shmem_madvise() call with the shrinker lock.
>>>
>>> Fixes: 013b65101315 ("drm/panfrost: Add madvise and shrinker support")
>>> Cc: 
>>> Signed-off-by: Boris Brezillon   
>>
>> Reviewed-by: Steven Price 
> 
> Thanks.
> 
>>
>>> ---
>>>  drivers/gpu/drm/panfrost/panfrost_drv.c | 9 -
>>>  1 file changed, 4 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
>>> b/drivers/gpu/drm/panfrost/panfrost_drv.c
>>> index f21bc8a7ee3a..efc0a24d1f4c 100644
>>> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
>>> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
>>> @@ -347,20 +347,19 @@ static int panfrost_ioctl_madvise(struct drm_device 
>>> *dev, void *data,
>>> return -ENOENT;
>>> }
>>>  
>>> +   mutex_lock(>shrinker_lock);
>>> args->retained = drm_gem_shmem_madvise(gem_obj, args->madv);
>>>  
>>> if (args->retained) {
>>> struct panfrost_gem_object *bo = to_panfrost_bo(gem_obj);
>>>  
>>> -   mutex_lock(>shrinker_lock);
>>> -
>>> if (args->madv == PANFROST_MADV_DONTNEED)
>>> -   list_add_tail(>base.madv_list, 
>>> >shrinker_list);
>>> +   list_add_tail(>base.madv_list,
>>> + >shrinker_list);
>>> else if (args->madv == PANFROST_MADV_WILLNEED)
>>> list_del_init(>base.madv_list);
>>> -
>>> -   mutex_unlock(>shrinker_lock);
>>> }
>>> +   mutex_unlock(>shrinker_lock);
>>>  
>>> drm_gem_object_put_unlocked(gem_obj);
>>> return 0;
>>>   
>>
> 
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 2/8] drm/panfrost: Fix a race in panfrost_ioctl_madvise()

2019-11-29 Thread Boris Brezillon
On Fri, 29 Nov 2019 14:24:48 +
Steven Price  wrote:

> On 29/11/2019 13:59, Boris Brezillon wrote:
> > If 2 threads change the MADVISE property of the same BO in parallel we
> > might end up with an shmem->madv value that's inconsistent with the
> > presence of the BO in the shrinker list.  
> 
> I'm a bit worried from the point of view of user space sanity that you
> observed this - but clearly the kernel should be robust!

It's not something I observed, just found the race by inspecting the
code, and I thought it was worth fixing it.

> 
> > 
> > The easiest solution to fix that is to protect the
> > drm_gem_shmem_madvise() call with the shrinker lock.
> > 
> > Fixes: 013b65101315 ("drm/panfrost: Add madvise and shrinker support")
> > Cc: 
> > Signed-off-by: Boris Brezillon   
> 
> Reviewed-by: Steven Price 

Thanks.

> 
> > ---
> >  drivers/gpu/drm/panfrost/panfrost_drv.c | 9 -
> >  1 file changed, 4 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
> > b/drivers/gpu/drm/panfrost/panfrost_drv.c
> > index f21bc8a7ee3a..efc0a24d1f4c 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> > +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> > @@ -347,20 +347,19 @@ static int panfrost_ioctl_madvise(struct drm_device 
> > *dev, void *data,
> > return -ENOENT;
> > }
> >  
> > +   mutex_lock(>shrinker_lock);
> > args->retained = drm_gem_shmem_madvise(gem_obj, args->madv);
> >  
> > if (args->retained) {
> > struct panfrost_gem_object *bo = to_panfrost_bo(gem_obj);
> >  
> > -   mutex_lock(>shrinker_lock);
> > -
> > if (args->madv == PANFROST_MADV_DONTNEED)
> > -   list_add_tail(>base.madv_list, 
> > >shrinker_list);
> > +   list_add_tail(>base.madv_list,
> > + >shrinker_list);
> > else if (args->madv == PANFROST_MADV_WILLNEED)
> > list_del_init(>base.madv_list);
> > -
> > -   mutex_unlock(>shrinker_lock);
> > }
> > +   mutex_unlock(>shrinker_lock);
> >  
> > drm_gem_object_put_unlocked(gem_obj);
> > return 0;
> >   
> 

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 2/8] drm/panfrost: Fix a race in panfrost_ioctl_madvise()

2019-11-29 Thread Steven Price
On 29/11/2019 13:59, Boris Brezillon wrote:
> If 2 threads change the MADVISE property of the same BO in parallel we
> might end up with an shmem->madv value that's inconsistent with the
> presence of the BO in the shrinker list.

I'm a bit worried from the point of view of user space sanity that you
observed this - but clearly the kernel should be robust!

> 
> The easiest solution to fix that is to protect the
> drm_gem_shmem_madvise() call with the shrinker lock.
> 
> Fixes: 013b65101315 ("drm/panfrost: Add madvise and shrinker support")
> Cc: 
> Signed-off-by: Boris Brezillon 

Reviewed-by: Steven Price 

> ---
>  drivers/gpu/drm/panfrost/panfrost_drv.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
> b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index f21bc8a7ee3a..efc0a24d1f4c 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -347,20 +347,19 @@ static int panfrost_ioctl_madvise(struct drm_device 
> *dev, void *data,
>   return -ENOENT;
>   }
>  
> + mutex_lock(>shrinker_lock);
>   args->retained = drm_gem_shmem_madvise(gem_obj, args->madv);
>  
>   if (args->retained) {
>   struct panfrost_gem_object *bo = to_panfrost_bo(gem_obj);
>  
> - mutex_lock(>shrinker_lock);
> -
>   if (args->madv == PANFROST_MADV_DONTNEED)
> - list_add_tail(>base.madv_list, 
> >shrinker_list);
> + list_add_tail(>base.madv_list,
> +   >shrinker_list);
>   else if (args->madv == PANFROST_MADV_WILLNEED)
>   list_del_init(>base.madv_list);
> -
> - mutex_unlock(>shrinker_lock);
>   }
> + mutex_unlock(>shrinker_lock);
>  
>   drm_gem_object_put_unlocked(gem_obj);
>   return 0;
> 

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel