Re: [Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance

2016-08-12 Thread Joonas Lahtinen
On ke, 2016-08-10 at 14:00 +0300, Joonas Lahtinen wrote:
> 
> Reviewed-by: Joonas Lahtinen 
> 
> I still think it's fragile, though. But lets see once the dust settles
> if we can make improvements.
> 

Daniel pointed out that the engine_id could still be different during
middle section when the engine_id is captured if the request is briefly
reused.

So backing off with the Reviewed-by, either we handle the possibly
wrong engine_id (no extra tests, so we might actually hit it in
testing) or we avoid it completely (with locking).

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance

2016-08-10 Thread Joonas Lahtinen
On ke, 2016-08-10 at 12:13 +0200, Daniel Vetter wrote:
> On Wed, Aug 10, 2016 at 12:12:37PM +0200, Daniel Vetter wrote:
> > 
> > On Tue, Aug 09, 2016 at 10:05:30AM +0100, Chris Wilson wrote:
> > > 
> > > On Tue, Aug 09, 2016 at 11:48:56AM +0300, Joonas Lahtinen wrote:
> > > > 
> > > > On ti, 2016-08-09 at 08:14 +0100, Chris Wilson wrote:
> > > > > 
> > > > > On Tue, Aug 09, 2016 at 09:36:48AM +0300, Joonas Lahtinen wrote:
> > > > > > 
> > > > > > 
> > > > > > On ma, 2016-08-08 at 10:45 +0100, Chris Wilson wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote:
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > In the debate as to whether the second read of 
> > > > > > > > > > active->request is
> > > > > > > > > > ordered after the dependent reads of the first read of 
> > > > > > > > > > active->request,
> > > > > > > > > > just give in and throw a smp_rmb() in there so that 
> > > > > > > > > > ordering of loads is
> > > > > > > > > > assured.
> > > > > > > > > > 
> > > > > > > > > > v2: Explain the manual smp_rmb()
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Chris Wilson 
> > > > > > > > > > Cc: Daniel Vetter 
> > > > > > > > > > Reviewed-by: Daniel Vetter 
> > > > > > > > > r-b confirmed.
> > > > > > > > It's still fishy that we are implying an SMP effect where we 
> > > > > > > > need to
> > > > > > > > mandate the local processor order (that being the order 
> > > > > > > > evaluation of
> > > > > > > > request = *active; engine = *request; *active). The two *active 
> > > > > > > > are
> > > > > > > > already ordered across SMP, so we are only concered about this 
> > > > > > > > cpu. :|
> > > > > > > More second thoughts. rcu_assign_pointer(NULL) is not visible to
> > > > > > > rcu_access_pointer on another CPU without the smp_rmb. 
> > > > > > Should not a RCU read side lock be involved?
> > > > > Yes, we use rcu read lock here. The question here is about visibility 
> > > > > of
> > > > > the other processor writes vs the local processor order. Before the
> > > > > other processor can overwrite the request during reallocation, it will
> > > > > have updated the active->request and gone through a wmb. During busy
> > > > > ioctl's read of the request, we want to make sure that the values we
> > > > > read (request->engine, request->seqno) have not been overwritten as we
> > > > > do so - and we do that by serialising the second pointer check with 
> > > > > the
> > > > > other cpus.
> > > > As discussed in IRC, some other mechanism than an improvised spinning
> > > > loop + some SMP barriers thrown around would be much preferred.
> > > > 
> > > > You suggested a seqlock, and it would likely be ok.
> > > I was comparing the read latching as they are identical. Using a
> > > read/write seqlock around the request modification does not prevent all
> > > dangers such as using kzalloc() and introduces a second sequence counter
> > > to the one we already have. And for good reason seqlock says to use RCU
> > > here. Which puts us in a bit of a catch-22 and having to guard against
> > > SLAB_DESTROY_BY_RCU.
> > Since I was part of that irc party too: I think this is perfectly fine
> > as-is. Essentially what we do here is plain rcu+kref_get_unless_zero to
> > protect against zombies. This is exactly the design fence_get_rcu is meant
> > to be used in. But for this fastpath we can avoid even the zombie
> > protection if we're careful enough. Trying to make this less scary by
> > using a seqlock instead of plain rcu would run counter to the overall
> > fence_get_rcu design. And since the goal is to share fences widely and
> > far, we don't want to build up slightly different locking scheme here
> > where fence pointers are not really protected by rcu.
> > 
> > Given all that I think the open-coded scary dance (with lots of comments)
> > is acceptable, also since this clearly is a fastpath that userspace loves
> > to beat on.
> Still would like to see Joonas' r-b on the first few patches ofc.

With the extra comments;

Reviewed-by: Joonas Lahtinen 

I still think it's fragile, though. But lets see once the dust settles
if we can make improvements.

Regards, Joonas

> -Daniel
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance

2016-08-10 Thread Daniel Vetter
On Wed, Aug 10, 2016 at 12:12:37PM +0200, Daniel Vetter wrote:
> On Tue, Aug 09, 2016 at 10:05:30AM +0100, Chris Wilson wrote:
> > On Tue, Aug 09, 2016 at 11:48:56AM +0300, Joonas Lahtinen wrote:
> > > On ti, 2016-08-09 at 08:14 +0100, Chris Wilson wrote:
> > > > On Tue, Aug 09, 2016 at 09:36:48AM +0300, Joonas Lahtinen wrote:
> > > > > 
> > > > > On ma, 2016-08-08 at 10:45 +0100, Chris Wilson wrote:
> > > > > > 
> > > > > > On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > In the debate as to whether the second read of 
> > > > > > > > > active->request is
> > > > > > > > > ordered after the dependent reads of the first read of 
> > > > > > > > > active->request,
> > > > > > > > > just give in and throw a smp_rmb() in there so that ordering 
> > > > > > > > > of loads is
> > > > > > > > > assured.
> > > > > > > > > 
> > > > > > > > > v2: Explain the manual smp_rmb()
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Chris Wilson 
> > > > > > > > > Cc: Daniel Vetter 
> > > > > > > > > Reviewed-by: Daniel Vetter 
> > > > > > > > r-b confirmed.
> > > > > > > It's still fishy that we are implying an SMP effect where we need 
> > > > > > > to
> > > > > > > mandate the local processor order (that being the order 
> > > > > > > evaluation of
> > > > > > > request = *active; engine = *request; *active). The two *active 
> > > > > > > are
> > > > > > > already ordered across SMP, so we are only concered about this 
> > > > > > > cpu. :|
> > > > > > More second thoughts. rcu_assign_pointer(NULL) is not visible to
> > > > > > rcu_access_pointer on another CPU without the smp_rmb. 
> > > > > Should not a RCU read side lock be involved?
> > > > Yes, we use rcu read lock here. The question here is about visibility of
> > > > the other processor writes vs the local processor order. Before the
> > > > other processor can overwrite the request during reallocation, it will
> > > > have updated the active->request and gone through a wmb. During busy
> > > > ioctl's read of the request, we want to make sure that the values we
> > > > read (request->engine, request->seqno) have not been overwritten as we
> > > > do so - and we do that by serialising the second pointer check with the
> > > > other cpus.
> > > 
> > > As discussed in IRC, some other mechanism than an improvised spinning
> > > loop + some SMP barriers thrown around would be much preferred.
> > > 
> > > You suggested a seqlock, and it would likely be ok.
> > 
> > I was comparing the read latching as they are identical. Using a
> > read/write seqlock around the request modification does not prevent all
> > dangers such as using kzalloc() and introduces a second sequence counter
> > to the one we already have. And for good reason seqlock says to use RCU
> > here. Which puts us in a bit of a catch-22 and having to guard against
> > SLAB_DESTROY_BY_RCU.
> 
> Since I was part of that irc party too: I think this is perfectly fine
> as-is. Essentially what we do here is plain rcu+kref_get_unless_zero to
> protect against zombies. This is exactly the design fence_get_rcu is meant
> to be used in. But for this fastpath we can avoid even the zombie
> protection if we're careful enough. Trying to make this less scary by
> using a seqlock instead of plain rcu would run counter to the overall
> fence_get_rcu design. And since the goal is to share fences widely and
> far, we don't want to build up slightly different locking scheme here
> where fence pointers are not really protected by rcu.
> 
> Given all that I think the open-coded scary dance (with lots of comments)
> is acceptable, also since this clearly is a fastpath that userspace loves
> to beat on.

Still would like to see Joonas' r-b on the first few patches ofc.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance

2016-08-10 Thread Daniel Vetter
On Tue, Aug 09, 2016 at 10:05:30AM +0100, Chris Wilson wrote:
> On Tue, Aug 09, 2016 at 11:48:56AM +0300, Joonas Lahtinen wrote:
> > On ti, 2016-08-09 at 08:14 +0100, Chris Wilson wrote:
> > > On Tue, Aug 09, 2016 at 09:36:48AM +0300, Joonas Lahtinen wrote:
> > > > 
> > > > On ma, 2016-08-08 at 10:45 +0100, Chris Wilson wrote:
> > > > > 
> > > > > On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote:
> > > > > > 
> > > > > > 
> > > > > > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > In the debate as to whether the second read of active->request 
> > > > > > > > is
> > > > > > > > ordered after the dependent reads of the first read of 
> > > > > > > > active->request,
> > > > > > > > just give in and throw a smp_rmb() in there so that ordering of 
> > > > > > > > loads is
> > > > > > > > assured.
> > > > > > > > 
> > > > > > > > v2: Explain the manual smp_rmb()
> > > > > > > > 
> > > > > > > > Signed-off-by: Chris Wilson 
> > > > > > > > Cc: Daniel Vetter 
> > > > > > > > Reviewed-by: Daniel Vetter 
> > > > > > > r-b confirmed.
> > > > > > It's still fishy that we are implying an SMP effect where we need to
> > > > > > mandate the local processor order (that being the order evaluation 
> > > > > > of
> > > > > > request = *active; engine = *request; *active). The two *active are
> > > > > > already ordered across SMP, so we are only concered about this cpu. 
> > > > > > :|
> > > > > More second thoughts. rcu_assign_pointer(NULL) is not visible to
> > > > > rcu_access_pointer on another CPU without the smp_rmb. 
> > > > Should not a RCU read side lock be involved?
> > > Yes, we use rcu read lock here. The question here is about visibility of
> > > the other processor writes vs the local processor order. Before the
> > > other processor can overwrite the request during reallocation, it will
> > > have updated the active->request and gone through a wmb. During busy
> > > ioctl's read of the request, we want to make sure that the values we
> > > read (request->engine, request->seqno) have not been overwritten as we
> > > do so - and we do that by serialising the second pointer check with the
> > > other cpus.
> > 
> > As discussed in IRC, some other mechanism than an improvised spinning
> > loop + some SMP barriers thrown around would be much preferred.
> > 
> > You suggested a seqlock, and it would likely be ok.
> 
> I was comparing the read latching as they are identical. Using a
> read/write seqlock around the request modification does not prevent all
> dangers such as using kzalloc() and introduces a second sequence counter
> to the one we already have. And for good reason seqlock says to use RCU
> here. Which puts us in a bit of a catch-22 and having to guard against
> SLAB_DESTROY_BY_RCU.

Since I was part of that irc party too: I think this is perfectly fine
as-is. Essentially what we do here is plain rcu+kref_get_unless_zero to
protect against zombies. This is exactly the design fence_get_rcu is meant
to be used in. But for this fastpath we can avoid even the zombie
protection if we're careful enough. Trying to make this less scary by
using a seqlock instead of plain rcu would run counter to the overall
fence_get_rcu design. And since the goal is to share fences widely and
far, we don't want to build up slightly different locking scheme here
where fence pointers are not really protected by rcu.

Given all that I think the open-coded scary dance (with lots of comments)
is acceptable, also since this clearly is a fastpath that userspace loves
to beat on.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance

2016-08-09 Thread Chris Wilson
On Tue, Aug 09, 2016 at 11:48:56AM +0300, Joonas Lahtinen wrote:
> On ti, 2016-08-09 at 08:14 +0100, Chris Wilson wrote:
> > On Tue, Aug 09, 2016 at 09:36:48AM +0300, Joonas Lahtinen wrote:
> > > 
> > > On ma, 2016-08-08 at 10:45 +0100, Chris Wilson wrote:
> > > > 
> > > > On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote:
> > > > > 
> > > > > 
> > > > > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote:
> > > > > > 
> > > > > > 
> > > > > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > In the debate as to whether the second read of active->request is
> > > > > > > ordered after the dependent reads of the first read of 
> > > > > > > active->request,
> > > > > > > just give in and throw a smp_rmb() in there so that ordering of 
> > > > > > > loads is
> > > > > > > assured.
> > > > > > > 
> > > > > > > v2: Explain the manual smp_rmb()
> > > > > > > 
> > > > > > > Signed-off-by: Chris Wilson 
> > > > > > > Cc: Daniel Vetter 
> > > > > > > Reviewed-by: Daniel Vetter 
> > > > > > r-b confirmed.
> > > > > It's still fishy that we are implying an SMP effect where we need to
> > > > > mandate the local processor order (that being the order evaluation of
> > > > > request = *active; engine = *request; *active). The two *active are
> > > > > already ordered across SMP, so we are only concered about this cpu. :|
> > > > More second thoughts. rcu_assign_pointer(NULL) is not visible to
> > > > rcu_access_pointer on another CPU without the smp_rmb. 
> > > Should not a RCU read side lock be involved?
> > Yes, we use rcu read lock here. The question here is about visibility of
> > the other processor writes vs the local processor order. Before the
> > other processor can overwrite the request during reallocation, it will
> > have updated the active->request and gone through a wmb. During busy
> > ioctl's read of the request, we want to make sure that the values we
> > read (request->engine, request->seqno) have not been overwritten as we
> > do so - and we do that by serialising the second pointer check with the
> > other cpus.
> 
> As discussed in IRC, some other mechanism than an improvised spinning
> loop + some SMP barriers thrown around would be much preferred.
> 
> You suggested a seqlock, and it would likely be ok.

I was comparing the read latching as they are identical. Using a
read/write seqlock around the request modification does not prevent all
dangers such as using kzalloc() and introduces a second sequence counter
to the one we already have. And for good reason seqlock says to use RCU
here. Which puts us in a bit of a catch-22 and having to guard against
SLAB_DESTROY_BY_RCU.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance

2016-08-09 Thread Joonas Lahtinen
On ti, 2016-08-09 at 08:14 +0100, Chris Wilson wrote:
> On Tue, Aug 09, 2016 at 09:36:48AM +0300, Joonas Lahtinen wrote:
> > 
> > On ma, 2016-08-08 at 10:45 +0100, Chris Wilson wrote:
> > > 
> > > On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote:
> > > > 
> > > > 
> > > > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote:
> > > > > 
> > > > > 
> > > > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote:
> > > > > > 
> > > > > > 
> > > > > > In the debate as to whether the second read of active->request is
> > > > > > ordered after the dependent reads of the first read of 
> > > > > > active->request,
> > > > > > just give in and throw a smp_rmb() in there so that ordering of 
> > > > > > loads is
> > > > > > assured.
> > > > > > 
> > > > > > v2: Explain the manual smp_rmb()
> > > > > > 
> > > > > > Signed-off-by: Chris Wilson 
> > > > > > Cc: Daniel Vetter 
> > > > > > Reviewed-by: Daniel Vetter 
> > > > > r-b confirmed.
> > > > It's still fishy that we are implying an SMP effect where we need to
> > > > mandate the local processor order (that being the order evaluation of
> > > > request = *active; engine = *request; *active). The two *active are
> > > > already ordered across SMP, so we are only concered about this cpu. :|
> > > More second thoughts. rcu_assign_pointer(NULL) is not visible to
> > > rcu_access_pointer on another CPU without the smp_rmb. 
> > Should not a RCU read side lock be involved?
> Yes, we use rcu read lock here. The question here is about visibility of
> the other processor writes vs the local processor order. Before the
> other processor can overwrite the request during reallocation, it will
> have updated the active->request and gone through a wmb. During busy
> ioctl's read of the request, we want to make sure that the values we
> read (request->engine, request->seqno) have not been overwritten as we
> do so - and we do that by serialising the second pointer check with the
> other cpus.

As discussed in IRC, some other mechanism than an improvised spinning
loop + some SMP barriers thrown around would be much preferred.

You suggested a seqlock, and it would likely be ok.

Regards, Joonas

> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance

2016-08-09 Thread Chris Wilson
On Tue, Aug 09, 2016 at 09:36:48AM +0300, Joonas Lahtinen wrote:
> On ma, 2016-08-08 at 10:45 +0100, Chris Wilson wrote:
> > On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote:
> > > 
> > > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote:
> > > > 
> > > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote:
> > > > > 
> > > > > In the debate as to whether the second read of active->request is
> > > > > ordered after the dependent reads of the first read of 
> > > > > active->request,
> > > > > just give in and throw a smp_rmb() in there so that ordering of loads 
> > > > > is
> > > > > assured.
> > > > > 
> > > > > v2: Explain the manual smp_rmb()
> > > > > 
> > > > > Signed-off-by: Chris Wilson 
> > > > > Cc: Daniel Vetter 
> > > > > Reviewed-by: Daniel Vetter 
> > > > r-b confirmed.
> > > It's still fishy that we are implying an SMP effect where we need to
> > > mandate the local processor order (that being the order evaluation of
> > > request = *active; engine = *request; *active). The two *active are
> > > already ordered across SMP, so we are only concered about this cpu. :|
> > More second thoughts. rcu_assign_pointer(NULL) is not visible to
> > rcu_access_pointer on another CPU without the smp_rmb. 
> 
> Should not a RCU read side lock be involved?

Yes, we use rcu read lock here. The question here is about visibility of
the other processor writes vs the local processor order. Before the
other processor can overwrite the request during reallocation, it will
have updated the active->request and gone through a wmb. During busy
ioctl's read of the request, we want to make sure that the values we
read (request->engine, request->seqno) have not been overwritten as we
do so - and we do that by serialising the second pointer check with the
other cpus.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance

2016-08-09 Thread Joonas Lahtinen
On ma, 2016-08-08 at 10:45 +0100, Chris Wilson wrote:
> On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote:
> > 
> > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote:
> > > 
> > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote:
> > > > 
> > > > In the debate as to whether the second read of active->request is
> > > > ordered after the dependent reads of the first read of active->request,
> > > > just give in and throw a smp_rmb() in there so that ordering of loads is
> > > > assured.
> > > > 
> > > > v2: Explain the manual smp_rmb()
> > > > 
> > > > Signed-off-by: Chris Wilson 
> > > > Cc: Daniel Vetter 
> > > > Reviewed-by: Daniel Vetter 
> > > r-b confirmed.
> > It's still fishy that we are implying an SMP effect where we need to
> > mandate the local processor order (that being the order evaluation of
> > request = *active; engine = *request; *active). The two *active are
> > already ordered across SMP, so we are only concered about this cpu. :|
> More second thoughts. rcu_assign_pointer(NULL) is not visible to
> rcu_access_pointer on another CPU without the smp_rmb. 

Should not a RCU read side lock be involved?

Is it not kind of the point that rcu_assign_pointer() will only be
visible everywhere when all previous read side critical sections have
ended after calling rcu_synchronize()? And will be valid during
rcu_read_lock().

If we do not use read side critical sections, how do we expect the
synchronization to happen by RCU code?

Regards, Joonas

> I think I am
> overestimating the barriers in place for RCU, and they are weaker than
> what I imagined for good reason.
> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance

2016-08-08 Thread Chris Wilson
On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote:
> On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote:
> > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote:
> > > In the debate as to whether the second read of active->request is
> > > ordered after the dependent reads of the first read of active->request,
> > > just give in and throw a smp_rmb() in there so that ordering of loads is
> > > assured.
> > > 
> > > v2: Explain the manual smp_rmb()
> > > 
> > > Signed-off-by: Chris Wilson 
> > > Cc: Daniel Vetter 
> > > Reviewed-by: Daniel Vetter 
> > 
> > r-b confirmed.
> 
> It's still fishy that we are implying an SMP effect where we need to
> mandate the local processor order (that being the order evaluation of
> request = *active; engine = *request; *active). The two *active are
> already ordered across SMP, so we are only concered about this cpu. :|

More second thoughts. rcu_assign_pointer(NULL) is not visible to
rcu_access_pointer on another CPU without the smp_rmb. I think I am
overestimating the barriers in place for RCU, and they are weaker than
what I imagined for good reason.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance

2016-08-08 Thread Chris Wilson
On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote:
> On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote:
> > In the debate as to whether the second read of active->request is
> > ordered after the dependent reads of the first read of active->request,
> > just give in and throw a smp_rmb() in there so that ordering of loads is
> > assured.
> > 
> > v2: Explain the manual smp_rmb()
> > 
> > Signed-off-by: Chris Wilson 
> > Cc: Daniel Vetter 
> > Reviewed-by: Daniel Vetter 
> 
> r-b confirmed.

It's still fishy that we are implying an SMP effect where we need to
mandate the local processor order (that being the order evaluation of
request = *active; engine = *request; *active). The two *active are
already ordered across SMP, so we are only concered about this cpu. :|
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance

2016-08-08 Thread Daniel Vetter
On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote:
> In the debate as to whether the second read of active->request is
> ordered after the dependent reads of the first read of active->request,
> just give in and throw a smp_rmb() in there so that ordering of loads is
> assured.
> 
> v2: Explain the manual smp_rmb()
> 
> Signed-off-by: Chris Wilson 
> Cc: Daniel Vetter 
> Reviewed-by: Daniel Vetter 

r-b confirmed.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 25 -
>  drivers/gpu/drm/i915/i915_gem_request.h |  3 +++
>  2 files changed, 23 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index f4f8eaa90f2a..654f0b015f97 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3735,7 +3735,7 @@ i915_gem_object_ggtt_unpin_view(struct 
> drm_i915_gem_object *obj,
>   i915_vma_unpin(i915_gem_obj_to_ggtt_view(obj, view));
>  }
>  
> -static __always_inline unsigned __busy_read_flag(unsigned int id)
> +static __always_inline unsigned int __busy_read_flag(unsigned int id)
>  {
>   /* Note that we could alias engines in the execbuf API, but
>* that would be very unwise as it prevents userspace from
> @@ -3753,7 +3753,7 @@ static __always_inline unsigned int 
> __busy_write_id(unsigned int id)
>   return id;
>  }
>  
> -static __always_inline unsigned
> +static __always_inline unsigned int
>  __busy_set_if_active(const struct i915_gem_active *active,
>unsigned int (*flag)(unsigned int id))
>  {
> @@ -3770,19 +3770,34 @@ __busy_set_if_active(const struct i915_gem_active 
> *active,
>  
>   id = request->engine->exec_id;
>  
> - /* Check that the pointer wasn't reassigned and overwritten. */
> + /* Check that the pointer wasn't reassigned and overwritten.
> +  *
> +  * In __i915_gem_active_get_rcu(), we enforce ordering between
> +  * the first rcu pointer dereference (imposing a
> +  * read-dependency only on access through the pointer) and
> +  * the second lockless access through the memory barrier
> +  * following a successful atomic_inc_not_zero(). Here there
> +  * is no such barrier, and so we must manually insert an
> +  * explicit read barrier to ensure that the following
> +  * access occurs after all the loads through the first
> +  * pointer.
> +  *
> +  * The corresponding write barrier is part of
> +  * rcu_assign_pointer().
> +  */
> + smp_rmb();
>   if (request == rcu_access_pointer(active->request))
>   return flag(id);
>   } while (1);
>  }
>  
> -static inline unsigned
> +static __always_inline unsigned int
>  busy_check_reader(const struct i915_gem_active *active)
>  {
>   return __busy_set_if_active(active, __busy_read_flag);
>  }
>  
> -static inline unsigned
> +static __always_inline unsigned int
>  busy_check_writer(const struct i915_gem_active *active)
>  {
>   return __busy_set_if_active(active, __busy_write_id);
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h 
> b/drivers/gpu/drm/i915/i915_gem_request.h
> index 3496e28785e7..b2456dede3ad 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.h
> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
> @@ -497,6 +497,9 @@ __i915_gem_active_get_rcu(const struct i915_gem_active 
> *active)
>* incremented) then the following read for rcu_access_pointer()
>* must occur after the atomic operation and so confirm
>* that this request is the one currently being tracked.
> +  *
> +  * The corresponding write barrier is part of
> +  * rcu_assign_pointer().
>*/
>   if (!request || request == rcu_access_pointer(active->request))
>   return rcu_pointer_handoff(request);
> -- 
> 2.8.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance

2016-08-07 Thread Chris Wilson
In the debate as to whether the second read of active->request is
ordered after the dependent reads of the first read of active->request,
just give in and throw a smp_rmb() in there so that ordering of loads is
assured.

v2: Explain the manual smp_rmb()

Signed-off-by: Chris Wilson 
Cc: Daniel Vetter 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/i915_gem.c | 25 -
 drivers/gpu/drm/i915/i915_gem_request.h |  3 +++
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f4f8eaa90f2a..654f0b015f97 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3735,7 +3735,7 @@ i915_gem_object_ggtt_unpin_view(struct 
drm_i915_gem_object *obj,
i915_vma_unpin(i915_gem_obj_to_ggtt_view(obj, view));
 }
 
-static __always_inline unsigned __busy_read_flag(unsigned int id)
+static __always_inline unsigned int __busy_read_flag(unsigned int id)
 {
/* Note that we could alias engines in the execbuf API, but
 * that would be very unwise as it prevents userspace from
@@ -3753,7 +3753,7 @@ static __always_inline unsigned int 
__busy_write_id(unsigned int id)
return id;
 }
 
-static __always_inline unsigned
+static __always_inline unsigned int
 __busy_set_if_active(const struct i915_gem_active *active,
 unsigned int (*flag)(unsigned int id))
 {
@@ -3770,19 +3770,34 @@ __busy_set_if_active(const struct i915_gem_active 
*active,
 
id = request->engine->exec_id;
 
-   /* Check that the pointer wasn't reassigned and overwritten. */
+   /* Check that the pointer wasn't reassigned and overwritten.
+*
+* In __i915_gem_active_get_rcu(), we enforce ordering between
+* the first rcu pointer dereference (imposing a
+* read-dependency only on access through the pointer) and
+* the second lockless access through the memory barrier
+* following a successful atomic_inc_not_zero(). Here there
+* is no such barrier, and so we must manually insert an
+* explicit read barrier to ensure that the following
+* access occurs after all the loads through the first
+* pointer.
+*
+* The corresponding write barrier is part of
+* rcu_assign_pointer().
+*/
+   smp_rmb();
if (request == rcu_access_pointer(active->request))
return flag(id);
} while (1);
 }
 
-static inline unsigned
+static __always_inline unsigned int
 busy_check_reader(const struct i915_gem_active *active)
 {
return __busy_set_if_active(active, __busy_read_flag);
 }
 
-static inline unsigned
+static __always_inline unsigned int
 busy_check_writer(const struct i915_gem_active *active)
 {
return __busy_set_if_active(active, __busy_write_id);
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h 
b/drivers/gpu/drm/i915/i915_gem_request.h
index 3496e28785e7..b2456dede3ad 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -497,6 +497,9 @@ __i915_gem_active_get_rcu(const struct i915_gem_active 
*active)
 * incremented) then the following read for rcu_access_pointer()
 * must occur after the atomic operation and so confirm
 * that this request is the one currently being tracked.
+*
+* The corresponding write barrier is part of
+* rcu_assign_pointer().
 */
if (!request || request == rcu_access_pointer(active->request))
return rcu_pointer_handoff(request);
-- 
2.8.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx