Re: [Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
On ke, 2016-08-10 at 14:00 +0300, Joonas Lahtinen wrote: > > Reviewed-by: Joonas Lahtinen> > I still think it's fragile, though. But lets see once the dust settles > if we can make improvements. > Daniel pointed out that the engine_id could still be different during middle section when the engine_id is captured if the request is briefly reused. So backing off with the Reviewed-by, either we handle the possibly wrong engine_id (no extra tests, so we might actually hit it in testing) or we avoid it completely (with locking). Regards, Joonas -- Joonas Lahtinen Open Source Technology Center Intel Corporation ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
On ke, 2016-08-10 at 12:13 +0200, Daniel Vetter wrote: > On Wed, Aug 10, 2016 at 12:12:37PM +0200, Daniel Vetter wrote: > > > > On Tue, Aug 09, 2016 at 10:05:30AM +0100, Chris Wilson wrote: > > > > > > On Tue, Aug 09, 2016 at 11:48:56AM +0300, Joonas Lahtinen wrote: > > > > > > > > On ti, 2016-08-09 at 08:14 +0100, Chris Wilson wrote: > > > > > > > > > > On Tue, Aug 09, 2016 at 09:36:48AM +0300, Joonas Lahtinen wrote: > > > > > > > > > > > > > > > > > > On ma, 2016-08-08 at 10:45 +0100, Chris Wilson wrote: > > > > > > > > > > > > > > > > > > > > > On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In the debate as to whether the second read of > > > > > > > > > > active->request is > > > > > > > > > > ordered after the dependent reads of the first read of > > > > > > > > > > active->request, > > > > > > > > > > just give in and throw a smp_rmb() in there so that > > > > > > > > > > ordering of loads is > > > > > > > > > > assured. > > > > > > > > > > > > > > > > > > > > v2: Explain the manual smp_rmb() > > > > > > > > > > > > > > > > > > > > Signed-off-by: Chris Wilson > > > > > > > > > > Cc: Daniel Vetter> > > > > > > > > > Reviewed-by: Daniel Vetter > > > > > > > > > r-b confirmed. > > > > > > > > It's still fishy that we are implying an SMP effect where we > > > > > > > > need to > > > > > > > > mandate the local processor order (that being the order > > > > > > > > evaluation of > > > > > > > > request = *active; engine = *request; *active). The two *active > > > > > > > > are > > > > > > > > already ordered across SMP, so we are only concered about this > > > > > > > > cpu. :| > > > > > > > More second thoughts. rcu_assign_pointer(NULL) is not visible to > > > > > > > rcu_access_pointer on another CPU without the smp_rmb. > > > > > > Should not a RCU read side lock be involved? > > > > > Yes, we use rcu read lock here. The question here is about visibility > > > > > of > > > > > the other processor writes vs the local processor order. Before the > > > > > other processor can overwrite the request during reallocation, it will > > > > > have updated the active->request and gone through a wmb. During busy > > > > > ioctl's read of the request, we want to make sure that the values we > > > > > read (request->engine, request->seqno) have not been overwritten as we > > > > > do so - and we do that by serialising the second pointer check with > > > > > the > > > > > other cpus. > > > > As discussed in IRC, some other mechanism than an improvised spinning > > > > loop + some SMP barriers thrown around would be much preferred. > > > > > > > > You suggested a seqlock, and it would likely be ok. > > > I was comparing the read latching as they are identical. Using a > > > read/write seqlock around the request modification does not prevent all > > > dangers such as using kzalloc() and introduces a second sequence counter > > > to the one we already have. And for good reason seqlock says to use RCU > > > here. Which puts us in a bit of a catch-22 and having to guard against > > > SLAB_DESTROY_BY_RCU. > > Since I was part of that irc party too: I think this is perfectly fine > > as-is. Essentially what we do here is plain rcu+kref_get_unless_zero to > > protect against zombies. This is exactly the design fence_get_rcu is meant > > to be used in. But for this fastpath we can avoid even the zombie > > protection if we're careful enough. Trying to make this less scary by > > using a seqlock instead of plain rcu would run counter to the overall > > fence_get_rcu design. And since the goal is to share fences widely and > > far, we don't want to build up slightly different locking scheme here > > where fence pointers are not really protected by rcu. > > > > Given all that I think the open-coded scary dance (with lots of comments) > > is acceptable, also since this clearly is a fastpath that userspace loves > > to beat on. > Still would like to see Joonas' r-b on the first few patches ofc. With the extra comments; Reviewed-by: Joonas Lahtinen I still think it's fragile, though. But lets see once the dust settles if we can make improvements. Regards, Joonas > -Daniel -- Joonas Lahtinen Open Source Technology Center Intel Corporation ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
On Wed, Aug 10, 2016 at 12:12:37PM +0200, Daniel Vetter wrote: > On Tue, Aug 09, 2016 at 10:05:30AM +0100, Chris Wilson wrote: > > On Tue, Aug 09, 2016 at 11:48:56AM +0300, Joonas Lahtinen wrote: > > > On ti, 2016-08-09 at 08:14 +0100, Chris Wilson wrote: > > > > On Tue, Aug 09, 2016 at 09:36:48AM +0300, Joonas Lahtinen wrote: > > > > > > > > > > On ma, 2016-08-08 at 10:45 +0100, Chris Wilson wrote: > > > > > > > > > > > > On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote: > > > > > > > > > > > > > > > > > > > > > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote: > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > In the debate as to whether the second read of > > > > > > > > > active->request is > > > > > > > > > ordered after the dependent reads of the first read of > > > > > > > > > active->request, > > > > > > > > > just give in and throw a smp_rmb() in there so that ordering > > > > > > > > > of loads is > > > > > > > > > assured. > > > > > > > > > > > > > > > > > > v2: Explain the manual smp_rmb() > > > > > > > > > > > > > > > > > > Signed-off-by: Chris Wilson> > > > > > > > > Cc: Daniel Vetter > > > > > > > > > Reviewed-by: Daniel Vetter > > > > > > > > r-b confirmed. > > > > > > > It's still fishy that we are implying an SMP effect where we need > > > > > > > to > > > > > > > mandate the local processor order (that being the order > > > > > > > evaluation of > > > > > > > request = *active; engine = *request; *active). The two *active > > > > > > > are > > > > > > > already ordered across SMP, so we are only concered about this > > > > > > > cpu. :| > > > > > > More second thoughts. rcu_assign_pointer(NULL) is not visible to > > > > > > rcu_access_pointer on another CPU without the smp_rmb. > > > > > Should not a RCU read side lock be involved? > > > > Yes, we use rcu read lock here. The question here is about visibility of > > > > the other processor writes vs the local processor order. Before the > > > > other processor can overwrite the request during reallocation, it will > > > > have updated the active->request and gone through a wmb. During busy > > > > ioctl's read of the request, we want to make sure that the values we > > > > read (request->engine, request->seqno) have not been overwritten as we > > > > do so - and we do that by serialising the second pointer check with the > > > > other cpus. > > > > > > As discussed in IRC, some other mechanism than an improvised spinning > > > loop + some SMP barriers thrown around would be much preferred. > > > > > > You suggested a seqlock, and it would likely be ok. > > > > I was comparing the read latching as they are identical. Using a > > read/write seqlock around the request modification does not prevent all > > dangers such as using kzalloc() and introduces a second sequence counter > > to the one we already have. And for good reason seqlock says to use RCU > > here. Which puts us in a bit of a catch-22 and having to guard against > > SLAB_DESTROY_BY_RCU. > > Since I was part of that irc party too: I think this is perfectly fine > as-is. Essentially what we do here is plain rcu+kref_get_unless_zero to > protect against zombies. This is exactly the design fence_get_rcu is meant > to be used in. But for this fastpath we can avoid even the zombie > protection if we're careful enough. Trying to make this less scary by > using a seqlock instead of plain rcu would run counter to the overall > fence_get_rcu design. And since the goal is to share fences widely and > far, we don't want to build up slightly different locking scheme here > where fence pointers are not really protected by rcu. > > Given all that I think the open-coded scary dance (with lots of comments) > is acceptable, also since this clearly is a fastpath that userspace loves > to beat on. Still would like to see Joonas' r-b on the first few patches ofc. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
On Tue, Aug 09, 2016 at 10:05:30AM +0100, Chris Wilson wrote: > On Tue, Aug 09, 2016 at 11:48:56AM +0300, Joonas Lahtinen wrote: > > On ti, 2016-08-09 at 08:14 +0100, Chris Wilson wrote: > > > On Tue, Aug 09, 2016 at 09:36:48AM +0300, Joonas Lahtinen wrote: > > > > > > > > On ma, 2016-08-08 at 10:45 +0100, Chris Wilson wrote: > > > > > > > > > > On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote: > > > > > > > > > > > > > > > > > > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote: > > > > > > > > > > > > > > > > > > > > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote: > > > > > > > > > > > > > > > > > > > > > > > > In the debate as to whether the second read of active->request > > > > > > > > is > > > > > > > > ordered after the dependent reads of the first read of > > > > > > > > active->request, > > > > > > > > just give in and throw a smp_rmb() in there so that ordering of > > > > > > > > loads is > > > > > > > > assured. > > > > > > > > > > > > > > > > v2: Explain the manual smp_rmb() > > > > > > > > > > > > > > > > Signed-off-by: Chris Wilson> > > > > > > > Cc: Daniel Vetter > > > > > > > > Reviewed-by: Daniel Vetter > > > > > > > r-b confirmed. > > > > > > It's still fishy that we are implying an SMP effect where we need to > > > > > > mandate the local processor order (that being the order evaluation > > > > > > of > > > > > > request = *active; engine = *request; *active). The two *active are > > > > > > already ordered across SMP, so we are only concered about this cpu. > > > > > > :| > > > > > More second thoughts. rcu_assign_pointer(NULL) is not visible to > > > > > rcu_access_pointer on another CPU without the smp_rmb. > > > > Should not a RCU read side lock be involved? > > > Yes, we use rcu read lock here. The question here is about visibility of > > > the other processor writes vs the local processor order. Before the > > > other processor can overwrite the request during reallocation, it will > > > have updated the active->request and gone through a wmb. During busy > > > ioctl's read of the request, we want to make sure that the values we > > > read (request->engine, request->seqno) have not been overwritten as we > > > do so - and we do that by serialising the second pointer check with the > > > other cpus. > > > > As discussed in IRC, some other mechanism than an improvised spinning > > loop + some SMP barriers thrown around would be much preferred. > > > > You suggested a seqlock, and it would likely be ok. > > I was comparing the read latching as they are identical. Using a > read/write seqlock around the request modification does not prevent all > dangers such as using kzalloc() and introduces a second sequence counter > to the one we already have. And for good reason seqlock says to use RCU > here. Which puts us in a bit of a catch-22 and having to guard against > SLAB_DESTROY_BY_RCU. Since I was part of that irc party too: I think this is perfectly fine as-is. Essentially what we do here is plain rcu+kref_get_unless_zero to protect against zombies. This is exactly the design fence_get_rcu is meant to be used in. But for this fastpath we can avoid even the zombie protection if we're careful enough. Trying to make this less scary by using a seqlock instead of plain rcu would run counter to the overall fence_get_rcu design. And since the goal is to share fences widely and far, we don't want to build up slightly different locking scheme here where fence pointers are not really protected by rcu. Given all that I think the open-coded scary dance (with lots of comments) is acceptable, also since this clearly is a fastpath that userspace loves to beat on. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
On Tue, Aug 09, 2016 at 11:48:56AM +0300, Joonas Lahtinen wrote: > On ti, 2016-08-09 at 08:14 +0100, Chris Wilson wrote: > > On Tue, Aug 09, 2016 at 09:36:48AM +0300, Joonas Lahtinen wrote: > > > > > > On ma, 2016-08-08 at 10:45 +0100, Chris Wilson wrote: > > > > > > > > On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote: > > > > > > > > > > > > > > > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote: > > > > > > > > > > > > > > > > > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote: > > > > > > > > > > > > > > > > > > > > > In the debate as to whether the second read of active->request is > > > > > > > ordered after the dependent reads of the first read of > > > > > > > active->request, > > > > > > > just give in and throw a smp_rmb() in there so that ordering of > > > > > > > loads is > > > > > > > assured. > > > > > > > > > > > > > > v2: Explain the manual smp_rmb() > > > > > > > > > > > > > > Signed-off-by: Chris Wilson> > > > > > > Cc: Daniel Vetter > > > > > > > Reviewed-by: Daniel Vetter > > > > > > r-b confirmed. > > > > > It's still fishy that we are implying an SMP effect where we need to > > > > > mandate the local processor order (that being the order evaluation of > > > > > request = *active; engine = *request; *active). The two *active are > > > > > already ordered across SMP, so we are only concered about this cpu. :| > > > > More second thoughts. rcu_assign_pointer(NULL) is not visible to > > > > rcu_access_pointer on another CPU without the smp_rmb. > > > Should not a RCU read side lock be involved? > > Yes, we use rcu read lock here. The question here is about visibility of > > the other processor writes vs the local processor order. Before the > > other processor can overwrite the request during reallocation, it will > > have updated the active->request and gone through a wmb. During busy > > ioctl's read of the request, we want to make sure that the values we > > read (request->engine, request->seqno) have not been overwritten as we > > do so - and we do that by serialising the second pointer check with the > > other cpus. > > As discussed in IRC, some other mechanism than an improvised spinning > loop + some SMP barriers thrown around would be much preferred. > > You suggested a seqlock, and it would likely be ok. I was comparing the read latching as they are identical. Using a read/write seqlock around the request modification does not prevent all dangers such as using kzalloc() and introduces a second sequence counter to the one we already have. And for good reason seqlock says to use RCU here. Which puts us in a bit of a catch-22 and having to guard against SLAB_DESTROY_BY_RCU. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
On ti, 2016-08-09 at 08:14 +0100, Chris Wilson wrote: > On Tue, Aug 09, 2016 at 09:36:48AM +0300, Joonas Lahtinen wrote: > > > > On ma, 2016-08-08 at 10:45 +0100, Chris Wilson wrote: > > > > > > On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote: > > > > > > > > > > > > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote: > > > > > > > > > > > > > > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote: > > > > > > > > > > > > > > > > > > In the debate as to whether the second read of active->request is > > > > > > ordered after the dependent reads of the first read of > > > > > > active->request, > > > > > > just give in and throw a smp_rmb() in there so that ordering of > > > > > > loads is > > > > > > assured. > > > > > > > > > > > > v2: Explain the manual smp_rmb() > > > > > > > > > > > > Signed-off-by: Chris Wilson> > > > > > Cc: Daniel Vetter > > > > > > Reviewed-by: Daniel Vetter > > > > > r-b confirmed. > > > > It's still fishy that we are implying an SMP effect where we need to > > > > mandate the local processor order (that being the order evaluation of > > > > request = *active; engine = *request; *active). The two *active are > > > > already ordered across SMP, so we are only concered about this cpu. :| > > > More second thoughts. rcu_assign_pointer(NULL) is not visible to > > > rcu_access_pointer on another CPU without the smp_rmb. > > Should not a RCU read side lock be involved? > Yes, we use rcu read lock here. The question here is about visibility of > the other processor writes vs the local processor order. Before the > other processor can overwrite the request during reallocation, it will > have updated the active->request and gone through a wmb. During busy > ioctl's read of the request, we want to make sure that the values we > read (request->engine, request->seqno) have not been overwritten as we > do so - and we do that by serialising the second pointer check with the > other cpus. As discussed in IRC, some other mechanism than an improvised spinning loop + some SMP barriers thrown around would be much preferred. You suggested a seqlock, and it would likely be ok. Regards, Joonas > -Chris > -- Joonas Lahtinen Open Source Technology Center Intel Corporation ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
On Tue, Aug 09, 2016 at 09:36:48AM +0300, Joonas Lahtinen wrote: > On ma, 2016-08-08 at 10:45 +0100, Chris Wilson wrote: > > On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote: > > > > > > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote: > > > > > > > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote: > > > > > > > > > > In the debate as to whether the second read of active->request is > > > > > ordered after the dependent reads of the first read of > > > > > active->request, > > > > > just give in and throw a smp_rmb() in there so that ordering of loads > > > > > is > > > > > assured. > > > > > > > > > > v2: Explain the manual smp_rmb() > > > > > > > > > > Signed-off-by: Chris Wilson> > > > > Cc: Daniel Vetter > > > > > Reviewed-by: Daniel Vetter > > > > r-b confirmed. > > > It's still fishy that we are implying an SMP effect where we need to > > > mandate the local processor order (that being the order evaluation of > > > request = *active; engine = *request; *active). The two *active are > > > already ordered across SMP, so we are only concered about this cpu. :| > > More second thoughts. rcu_assign_pointer(NULL) is not visible to > > rcu_access_pointer on another CPU without the smp_rmb. > > Should not a RCU read side lock be involved? Yes, we use rcu read lock here. The question here is about visibility of the other processor writes vs the local processor order. Before the other processor can overwrite the request during reallocation, it will have updated the active->request and gone through a wmb. During busy ioctl's read of the request, we want to make sure that the values we read (request->engine, request->seqno) have not been overwritten as we do so - and we do that by serialising the second pointer check with the other cpus. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
On ma, 2016-08-08 at 10:45 +0100, Chris Wilson wrote: > On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote: > > > > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote: > > > > > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote: > > > > > > > > In the debate as to whether the second read of active->request is > > > > ordered after the dependent reads of the first read of active->request, > > > > just give in and throw a smp_rmb() in there so that ordering of loads is > > > > assured. > > > > > > > > v2: Explain the manual smp_rmb() > > > > > > > > Signed-off-by: Chris Wilson> > > > Cc: Daniel Vetter > > > > Reviewed-by: Daniel Vetter > > > r-b confirmed. > > It's still fishy that we are implying an SMP effect where we need to > > mandate the local processor order (that being the order evaluation of > > request = *active; engine = *request; *active). The two *active are > > already ordered across SMP, so we are only concered about this cpu. :| > More second thoughts. rcu_assign_pointer(NULL) is not visible to > rcu_access_pointer on another CPU without the smp_rmb. Should not a RCU read side lock be involved? Is it not kind of the point that rcu_assign_pointer() will only be visible everywhere when all previous read side critical sections have ended after calling rcu_synchronize()? And will be valid during rcu_read_lock(). If we do not use read side critical sections, how do we expect the synchronization to happen by RCU code? Regards, Joonas > I think I am > overestimating the barriers in place for RCU, and they are weaker than > what I imagined for good reason. > -Chris > -- Joonas Lahtinen Open Source Technology Center Intel Corporation ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote: > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote: > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote: > > > In the debate as to whether the second read of active->request is > > > ordered after the dependent reads of the first read of active->request, > > > just give in and throw a smp_rmb() in there so that ordering of loads is > > > assured. > > > > > > v2: Explain the manual smp_rmb() > > > > > > Signed-off-by: Chris Wilson> > > Cc: Daniel Vetter > > > Reviewed-by: Daniel Vetter > > > > r-b confirmed. > > It's still fishy that we are implying an SMP effect where we need to > mandate the local processor order (that being the order evaluation of > request = *active; engine = *request; *active). The two *active are > already ordered across SMP, so we are only concered about this cpu. :| More second thoughts. rcu_assign_pointer(NULL) is not visible to rcu_access_pointer on another CPU without the smp_rmb. I think I am overestimating the barriers in place for RCU, and they are weaker than what I imagined for good reason. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote: > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote: > > In the debate as to whether the second read of active->request is > > ordered after the dependent reads of the first read of active->request, > > just give in and throw a smp_rmb() in there so that ordering of loads is > > assured. > > > > v2: Explain the manual smp_rmb() > > > > Signed-off-by: Chris Wilson> > Cc: Daniel Vetter > > Reviewed-by: Daniel Vetter > > r-b confirmed. It's still fishy that we are implying an SMP effect where we need to mandate the local processor order (that being the order evaluation of request = *active; engine = *request; *active). The two *active are already ordered across SMP, so we are only concered about this cpu. :| -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote: > In the debate as to whether the second read of active->request is > ordered after the dependent reads of the first read of active->request, > just give in and throw a smp_rmb() in there so that ordering of loads is > assured. > > v2: Explain the manual smp_rmb() > > Signed-off-by: Chris Wilson> Cc: Daniel Vetter > Reviewed-by: Daniel Vetter r-b confirmed. -Daniel > --- > drivers/gpu/drm/i915/i915_gem.c | 25 - > drivers/gpu/drm/i915/i915_gem_request.h | 3 +++ > 2 files changed, 23 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index f4f8eaa90f2a..654f0b015f97 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -3735,7 +3735,7 @@ i915_gem_object_ggtt_unpin_view(struct > drm_i915_gem_object *obj, > i915_vma_unpin(i915_gem_obj_to_ggtt_view(obj, view)); > } > > -static __always_inline unsigned __busy_read_flag(unsigned int id) > +static __always_inline unsigned int __busy_read_flag(unsigned int id) > { > /* Note that we could alias engines in the execbuf API, but >* that would be very unwise as it prevents userspace from > @@ -3753,7 +3753,7 @@ static __always_inline unsigned int > __busy_write_id(unsigned int id) > return id; > } > > -static __always_inline unsigned > +static __always_inline unsigned int > __busy_set_if_active(const struct i915_gem_active *active, >unsigned int (*flag)(unsigned int id)) > { > @@ -3770,19 +3770,34 @@ __busy_set_if_active(const struct i915_gem_active > *active, > > id = request->engine->exec_id; > > - /* Check that the pointer wasn't reassigned and overwritten. */ > + /* Check that the pointer wasn't reassigned and overwritten. > + * > + * In __i915_gem_active_get_rcu(), we enforce ordering between > + * the first rcu pointer dereference (imposing a > + * read-dependency only on access through the pointer) and > + * the second lockless access through the memory barrier > + * following a successful atomic_inc_not_zero(). Here there > + * is no such barrier, and so we must manually insert an > + * explicit read barrier to ensure that the following > + * access occurs after all the loads through the first > + * pointer. > + * > + * The corresponding write barrier is part of > + * rcu_assign_pointer(). > + */ > + smp_rmb(); > if (request == rcu_access_pointer(active->request)) > return flag(id); > } while (1); > } > > -static inline unsigned > +static __always_inline unsigned int > busy_check_reader(const struct i915_gem_active *active) > { > return __busy_set_if_active(active, __busy_read_flag); > } > > -static inline unsigned > +static __always_inline unsigned int > busy_check_writer(const struct i915_gem_active *active) > { > return __busy_set_if_active(active, __busy_write_id); > diff --git a/drivers/gpu/drm/i915/i915_gem_request.h > b/drivers/gpu/drm/i915/i915_gem_request.h > index 3496e28785e7..b2456dede3ad 100644 > --- a/drivers/gpu/drm/i915/i915_gem_request.h > +++ b/drivers/gpu/drm/i915/i915_gem_request.h > @@ -497,6 +497,9 @@ __i915_gem_active_get_rcu(const struct i915_gem_active > *active) >* incremented) then the following read for rcu_access_pointer() >* must occur after the atomic operation and so confirm >* that this request is the one currently being tracked. > + * > + * The corresponding write barrier is part of > + * rcu_assign_pointer(). >*/ > if (!request || request == rcu_access_pointer(active->request)) > return rcu_pointer_handoff(request); > -- > 2.8.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
In the debate as to whether the second read of active->request is ordered after the dependent reads of the first read of active->request, just give in and throw a smp_rmb() in there so that ordering of loads is assured. v2: Explain the manual smp_rmb() Signed-off-by: Chris WilsonCc: Daniel Vetter Reviewed-by: Daniel Vetter --- drivers/gpu/drm/i915/i915_gem.c | 25 - drivers/gpu/drm/i915/i915_gem_request.h | 3 +++ 2 files changed, 23 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index f4f8eaa90f2a..654f0b015f97 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3735,7 +3735,7 @@ i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj, i915_vma_unpin(i915_gem_obj_to_ggtt_view(obj, view)); } -static __always_inline unsigned __busy_read_flag(unsigned int id) +static __always_inline unsigned int __busy_read_flag(unsigned int id) { /* Note that we could alias engines in the execbuf API, but * that would be very unwise as it prevents userspace from @@ -3753,7 +3753,7 @@ static __always_inline unsigned int __busy_write_id(unsigned int id) return id; } -static __always_inline unsigned +static __always_inline unsigned int __busy_set_if_active(const struct i915_gem_active *active, unsigned int (*flag)(unsigned int id)) { @@ -3770,19 +3770,34 @@ __busy_set_if_active(const struct i915_gem_active *active, id = request->engine->exec_id; - /* Check that the pointer wasn't reassigned and overwritten. */ + /* Check that the pointer wasn't reassigned and overwritten. +* +* In __i915_gem_active_get_rcu(), we enforce ordering between +* the first rcu pointer dereference (imposing a +* read-dependency only on access through the pointer) and +* the second lockless access through the memory barrier +* following a successful atomic_inc_not_zero(). Here there +* is no such barrier, and so we must manually insert an +* explicit read barrier to ensure that the following +* access occurs after all the loads through the first +* pointer. +* +* The corresponding write barrier is part of +* rcu_assign_pointer(). +*/ + smp_rmb(); if (request == rcu_access_pointer(active->request)) return flag(id); } while (1); } -static inline unsigned +static __always_inline unsigned int busy_check_reader(const struct i915_gem_active *active) { return __busy_set_if_active(active, __busy_read_flag); } -static inline unsigned +static __always_inline unsigned int busy_check_writer(const struct i915_gem_active *active) { return __busy_set_if_active(active, __busy_write_id); diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h index 3496e28785e7..b2456dede3ad 100644 --- a/drivers/gpu/drm/i915/i915_gem_request.h +++ b/drivers/gpu/drm/i915/i915_gem_request.h @@ -497,6 +497,9 @@ __i915_gem_active_get_rcu(const struct i915_gem_active *active) * incremented) then the following read for rcu_access_pointer() * must occur after the atomic operation and so confirm * that this request is the one currently being tracked. +* +* The corresponding write barrier is part of +* rcu_assign_pointer(). */ if (!request || request == rcu_access_pointer(active->request)) return rcu_pointer_handoff(request); -- 2.8.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx