Re: [Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from overtaking each other on preemption

2019-09-20 Thread Bloomfield, Jon
> -Original Message-
> From: Chris Wilson 
> Sent: Friday, September 20, 2019 9:04 AM
> To: Bloomfield, Jon ; intel-
> g...@lists.freedesktop.org; Tvrtko Ursulin 
> Subject: RE: [Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from
> overtaking each other on preemption
> 
> Quoting Bloomfield, Jon (2019-09-20 16:50:57)
> > > -Original Message-
> > > From: Intel-gfx  On Behalf Of
> Tvrtko
> > > Ursulin
> > > Sent: Friday, September 20, 2019 8:12 AM
> > > To: Chris Wilson ; 
> > > intel-gfx@lists.freedesktop.org
> > > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from
> > > overtaking each other on preemption
> > >
> > >
> > > On 20/09/2019 15:57, Chris Wilson wrote:
> > > > Quoting Chris Wilson (2019-09-20 09:36:24)
> > > >> Force bonded requests to run on distinct engines so that they cannot be
> > > >> shuffled onto the same engine where timeslicing will reverse the order.
> > > >> A bonded request will often wait on a semaphore signaled by its master,
> > > >> creating an implicit dependency -- if we ignore that implicit 
> > > >> dependency
> > > >> and allow the bonded request to run on the same engine and before its
> > > >> master, we will cause a GPU hang.
> > > >
> > > > Thinking more, it should not directly cause a GPU hang, as the stuck
> request
> > > > should be timesliced away, and each preemption should be enough to
> keep
> > > > hangcheck at bay (though we have evidence it may not). So at best it 
> > > > runs
> > > > at half-speed, at worst a third (if my model is correct).
> > >
> > > But I think it is still correct to do since we don't have the coupling
> > > information on re-submit. Hm.. but don't we need to prevent slave from
> > > changing engines as well?
> >
> > Unless I'm missing something, the proposal here is to set the engines in 
> > stone
> at first submission, and never change them?
> 
> For submission here, think execution (submission to actual HW). (We have
> 2 separate phases that all like to be called submit()!)
> 
> > If so, that does sound overly restrictive, and will prevent any kind of
> rebalancing as workloads (of varying slave counts) come and go.
> 
> We are only restricting this request, not the contexts. We still have
> balancing overall, just not instantaneous balancing if we timeslice out
> of this request -- we put it back onto the "same" engine and not another.
> Which is in some ways is less than ideal, although strictly we are only
> saying don't put it back onto an engine we have earmarked for our bonded
> request, and so we avoid contending with our parallel request reducing
> that to serial (and often bad) behaviour.
> 
> [So at the end of this statement, I'm more happy with the restriction ;]
> 
> > During the original design it was called out that the workloads should be 
> > pre-
> empted atomically. That allows the entire bonding mask to be re-evaluated at
> every context switch and so we can then rebalance. Still not easy to achieve I
> agree :-(
> 
> The problem with that statement is that atomic implies a global
> scheduling decision. Blood, sweat and tears.

Agreed - It isn't fun. Perhaps it doesn't matter anyway. Once GuC is offloading 
the scheduling it should be able to do a little more wrt rebalancing. Let's 
make it a GuC headache instead.

> 
> Of course, with your endless scheme, scheduling is all in the purview of
> the user :)

Hey, don't tarnish me with that brush. I don't like it either.
Actually, it's your scheme technically. I just asked for a way to enable HPC 
workloads, and you enthusiastically offered heartbeats So 
shall history be written :-)

> -Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from overtaking each other on preemption

2019-09-20 Thread Chris Wilson
Quoting Chris Wilson (2019-09-20 17:03:34)
> Quoting Bloomfield, Jon (2019-09-20 16:50:57)
> > > -Original Message-
> > > From: Intel-gfx  On Behalf Of 
> > > Tvrtko
> > > Ursulin
> > > Sent: Friday, September 20, 2019 8:12 AM
> > > To: Chris Wilson ; 
> > > intel-gfx@lists.freedesktop.org
> > > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from
> > > overtaking each other on preemption
> > > 
> > > 
> > > On 20/09/2019 15:57, Chris Wilson wrote:
> > > > Quoting Chris Wilson (2019-09-20 09:36:24)
> > > >> Force bonded requests to run on distinct engines so that they cannot be
> > > >> shuffled onto the same engine where timeslicing will reverse the order.
> > > >> A bonded request will often wait on a semaphore signaled by its master,
> > > >> creating an implicit dependency -- if we ignore that implicit 
> > > >> dependency
> > > >> and allow the bonded request to run on the same engine and before its
> > > >> master, we will cause a GPU hang.
> > > >
> > > > Thinking more, it should not directly cause a GPU hang, as the stuck 
> > > > request
> > > > should be timesliced away, and each preemption should be enough to keep
> > > > hangcheck at bay (though we have evidence it may not). So at best it 
> > > > runs
> > > > at half-speed, at worst a third (if my model is correct).
> > > 
> > > But I think it is still correct to do since we don't have the coupling
> > > information on re-submit. Hm.. but don't we need to prevent slave from
> > > changing engines as well?
> > 
> > Unless I'm missing something, the proposal here is to set the engines in 
> > stone at first submission, and never change them?
> 
> For submission here, think execution (submission to actual HW). (We have
> 2 separate phases that all like to be called submit()!)
s/2/3/
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from overtaking each other on preemption

2019-09-20 Thread Chris Wilson
Quoting Bloomfield, Jon (2019-09-20 16:50:57)
> > -Original Message-
> > From: Intel-gfx  On Behalf Of 
> > Tvrtko
> > Ursulin
> > Sent: Friday, September 20, 2019 8:12 AM
> > To: Chris Wilson ; intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from
> > overtaking each other on preemption
> > 
> > 
> > On 20/09/2019 15:57, Chris Wilson wrote:
> > > Quoting Chris Wilson (2019-09-20 09:36:24)
> > >> Force bonded requests to run on distinct engines so that they cannot be
> > >> shuffled onto the same engine where timeslicing will reverse the order.
> > >> A bonded request will often wait on a semaphore signaled by its master,
> > >> creating an implicit dependency -- if we ignore that implicit dependency
> > >> and allow the bonded request to run on the same engine and before its
> > >> master, we will cause a GPU hang.
> > >
> > > Thinking more, it should not directly cause a GPU hang, as the stuck 
> > > request
> > > should be timesliced away, and each preemption should be enough to keep
> > > hangcheck at bay (though we have evidence it may not). So at best it runs
> > > at half-speed, at worst a third (if my model is correct).
> > 
> > But I think it is still correct to do since we don't have the coupling
> > information on re-submit. Hm.. but don't we need to prevent slave from
> > changing engines as well?
> 
> Unless I'm missing something, the proposal here is to set the engines in 
> stone at first submission, and never change them?

For submission here, think execution (submission to actual HW). (We have
2 separate phases that all like to be called submit()!)

> If so, that does sound overly restrictive, and will prevent any kind of 
> rebalancing as workloads (of varying slave counts) come and go.

We are only restricting this request, not the contexts. We still have
balancing overall, just not instantaneous balancing if we timeslice out
of this request -- we put it back onto the "same" engine and not another.
Which is in some ways is less than ideal, although strictly we are only
saying don't put it back onto an engine we have earmarked for our bonded
request, and so we avoid contending with our parallel request reducing
that to serial (and often bad) behaviour.

[So at the end of this statement, I'm more happy with the restriction ;]

> During the original design it was called out that the workloads should be 
> pre-empted atomically. That allows the entire bonding mask to be re-evaluated 
> at every context switch and so we can then rebalance. Still not easy to 
> achieve I agree :-(

The problem with that statement is that atomic implies a global
scheduling decision. Blood, sweat and tears.

Of course, with your endless scheme, scheduling is all in the purview of
the user :)
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from overtaking each other on preemption

2019-09-20 Thread Chris Wilson
Quoting Tvrtko Ursulin (2019-09-20 16:12:23)
> 
> On 20/09/2019 15:57, Chris Wilson wrote:
> > Quoting Chris Wilson (2019-09-20 09:36:24)
> >> Force bonded requests to run on distinct engines so that they cannot be
> >> shuffled onto the same engine where timeslicing will reverse the order.
> >> A bonded request will often wait on a semaphore signaled by its master,
> >> creating an implicit dependency -- if we ignore that implicit dependency
> >> and allow the bonded request to run on the same engine and before its
> >> master, we will cause a GPU hang.
> > 
> > Thinking more, it should not directly cause a GPU hang, as the stuck request
> > should be timesliced away, and each preemption should be enough to keep
> > hangcheck at bay (though we have evidence it may not). So at best it runs
> > at half-speed, at worst a third (if my model is correct).
> 
> But I think it is still correct to do since we don't have the coupling 
> information on re-submit. Hm.. but don't we need to prevent slave from 
> changing engines as well?

Yes, it still looks like a sensible enough patch (even if I am biased
because I think it is cute), especially in light of how we only run the
bond_execute once and do not reconfigure the execution_mask on unsubmit.

Still working on the test cases to exercise timeslicing +
submit/bonding.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from overtaking each other on preemption

2019-09-20 Thread Bloomfield, Jon
> -Original Message-
> From: Intel-gfx  On Behalf Of Tvrtko
> Ursulin
> Sent: Friday, September 20, 2019 8:12 AM
> To: Chris Wilson ; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from
> overtaking each other on preemption
> 
> 
> On 20/09/2019 15:57, Chris Wilson wrote:
> > Quoting Chris Wilson (2019-09-20 09:36:24)
> >> Force bonded requests to run on distinct engines so that they cannot be
> >> shuffled onto the same engine where timeslicing will reverse the order.
> >> A bonded request will often wait on a semaphore signaled by its master,
> >> creating an implicit dependency -- if we ignore that implicit dependency
> >> and allow the bonded request to run on the same engine and before its
> >> master, we will cause a GPU hang.
> >
> > Thinking more, it should not directly cause a GPU hang, as the stuck request
> > should be timesliced away, and each preemption should be enough to keep
> > hangcheck at bay (though we have evidence it may not). So at best it runs
> > at half-speed, at worst a third (if my model is correct).
> 
> But I think it is still correct to do since we don't have the coupling
> information on re-submit. Hm.. but don't we need to prevent slave from
> changing engines as well?

Unless I'm missing something, the proposal here is to set the engines in stone 
at first submission, and never change them?
If so, that does sound overly restrictive, and will prevent any kind of 
rebalancing as workloads (of varying slave counts) come and go.
During the original design it was called out that the workloads should be 
pre-empted atomically. That allows the entire bonding mask to be re-evaluated 
at every context switch and so we can then rebalance. Still not easy to achieve 
I agree :-(
 
> 
> Regards,
> 
> Tvrtko
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from overtaking each other on preemption

2019-09-20 Thread Tvrtko Ursulin


On 20/09/2019 15:57, Chris Wilson wrote:

Quoting Chris Wilson (2019-09-20 09:36:24)

Force bonded requests to run on distinct engines so that they cannot be
shuffled onto the same engine where timeslicing will reverse the order.
A bonded request will often wait on a semaphore signaled by its master,
creating an implicit dependency -- if we ignore that implicit dependency
and allow the bonded request to run on the same engine and before its
master, we will cause a GPU hang.


Thinking more, it should not directly cause a GPU hang, as the stuck request
should be timesliced away, and each preemption should be enough to keep
hangcheck at bay (though we have evidence it may not). So at best it runs
at half-speed, at worst a third (if my model is correct).


But I think it is still correct to do since we don't have the coupling 
information on re-submit. Hm.. but don't we need to prevent slave from 
changing engines as well?


Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from overtaking each other on preemption

2019-09-20 Thread Chris Wilson
Quoting Tvrtko Ursulin (2019-09-20 15:51:35)
> 
> On 20/09/2019 13:42, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-09-20 13:24:47)
> >>
> >> On 20/09/2019 09:36, Chris Wilson wrote:
> >>> Force bonded requests to run on distinct engines so that they cannot be
> >>> shuffled onto the same engine where timeslicing will reverse the order.
> >>> A bonded request will often wait on a semaphore signaled by its master,
> >>> creating an implicit dependency -- if we ignore that implicit dependency
> >>> and allow the bonded request to run on the same engine and before its
> >>> master, we will cause a GPU hang.
> >>>
> >>> We can prevent this inversion by restricting which engines we allow
> >>> ourselves to jump to upon preemption, i.e. baking in the arrangement
> >>> established at first execution. (We should also consider capturing the
> >>> implicit dependency using i915_sched_add_dependency(), but first we need
> >>> to think about the constraints that requires on the execution/retirement
> >>> ordering.)
> >>>
> >>> Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
> >>> References: ee1136908e9b ("drm/i915/execlists: Virtual engine bonding")
> >>> Signed-off-by: Chris Wilson 
> >>> Cc: Mika Kuoppala 
> >>> Cc: Tvrtko Ursulin 
> >>> ---
> >>>drivers/gpu/drm/i915/gt/intel_lrc.c | 19 +++
> >>>1 file changed, 11 insertions(+), 8 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
> >>> b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> index a99166a2d2eb..7920649e4d87 100644
> >>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> @@ -3755,18 +3755,21 @@ static void
> >>>virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
> >>>{
> >>>struct virtual_engine *ve = to_virtual_engine(rq->engine);
> >>> + intel_engine_mask_t allowed, exec;
> >>>struct ve_bond *bond;
> >>>
> >>>bond = virtual_find_bond(ve, to_request(signal)->engine);
> >>> - if (bond) {
> >>> - intel_engine_mask_t old, new, cmp;
> >>> + if (!bond)
> >>> + return;
> >>>
> >>> - cmp = READ_ONCE(rq->execution_mask);
> >>> - do {
> >>> - old = cmp;
> >>> - new = cmp & bond->sibling_mask;
> >>> - } while ((cmp = cmpxchg(>execution_mask, old, new)) != 
> >>> old);
> >>> - }
> >>> + /* Restrict the bonded request to run on only the slaved engines */
> >>> + allowed = bond->sibling_mask & ~to_request(signal)->engine->mask;
> >>
> >> Hmm.. isn't it a miss on the uapi level that we allow master to be
> >> mentioned in the list of bonds? That's the only scenario where this line
> >> does something I think. So should we just forbid this setup on the uapi
> >> level?
> > 
> > That's just a lot of digging!
> 
> It's simple, in set_engines__bond in the bonds loop we just do "if 
> (master == bond) oh_no_you_wont". Am I missing something?

There doesn't even need to be a bond, which is something I forgot
to handle. So I'm looking at something more like

static void
virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
{
struct virtual_engine *ve = to_virtual_engine(rq->engine);
intel_engine_mask_t allowed, exec;
struct ve_bond *bond;

allowed = ~to_request(signal)->engine->mask;

bond = virtual_find_bond(ve, to_request(signal)->engine);
if (bond)
allowed &= bond->sibling_mask;

/* Restrict the bonded request to run on only the available engines */
exec = READ_ONCE(rq->execution_mask);
while (!try_cmpxchg(>execution_mask, , exec & allowed))
;

/* Prevent the master from being re-run on the bonded engines */
to_request(signal)->execution_mask &= ~allowed;
}

(The joy of trying to write a test case.)
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from overtaking each other on preemption

2019-09-20 Thread Chris Wilson
Quoting Chris Wilson (2019-09-20 09:36:24)
> Force bonded requests to run on distinct engines so that they cannot be
> shuffled onto the same engine where timeslicing will reverse the order.
> A bonded request will often wait on a semaphore signaled by its master,
> creating an implicit dependency -- if we ignore that implicit dependency
> and allow the bonded request to run on the same engine and before its
> master, we will cause a GPU hang.

Thinking more, it should not directly cause a GPU hang, as the stuck request
should be timesliced away, and each preemption should be enough to keep
hangcheck at bay (though we have evidence it may not). So at best it runs
at half-speed, at worst a third (if my model is correct).
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from overtaking each other on preemption

2019-09-20 Thread Tvrtko Ursulin


On 20/09/2019 13:42, Chris Wilson wrote:

Quoting Tvrtko Ursulin (2019-09-20 13:24:47)


On 20/09/2019 09:36, Chris Wilson wrote:

Force bonded requests to run on distinct engines so that they cannot be
shuffled onto the same engine where timeslicing will reverse the order.
A bonded request will often wait on a semaphore signaled by its master,
creating an implicit dependency -- if we ignore that implicit dependency
and allow the bonded request to run on the same engine and before its
master, we will cause a GPU hang.

We can prevent this inversion by restricting which engines we allow
ourselves to jump to upon preemption, i.e. baking in the arrangement
established at first execution. (We should also consider capturing the
implicit dependency using i915_sched_add_dependency(), but first we need
to think about the constraints that requires on the execution/retirement
ordering.)

Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
References: ee1136908e9b ("drm/i915/execlists: Virtual engine bonding")
Signed-off-by: Chris Wilson 
Cc: Mika Kuoppala 
Cc: Tvrtko Ursulin 
---
   drivers/gpu/drm/i915/gt/intel_lrc.c | 19 +++
   1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index a99166a2d2eb..7920649e4d87 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -3755,18 +3755,21 @@ static void
   virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
   {
   struct virtual_engine *ve = to_virtual_engine(rq->engine);
+ intel_engine_mask_t allowed, exec;
   struct ve_bond *bond;
   
   bond = virtual_find_bond(ve, to_request(signal)->engine);

- if (bond) {
- intel_engine_mask_t old, new, cmp;
+ if (!bond)
+ return;
   
- cmp = READ_ONCE(rq->execution_mask);

- do {
- old = cmp;
- new = cmp & bond->sibling_mask;
- } while ((cmp = cmpxchg(>execution_mask, old, new)) != old);
- }
+ /* Restrict the bonded request to run on only the slaved engines */
+ allowed = bond->sibling_mask & ~to_request(signal)->engine->mask;


Hmm.. isn't it a miss on the uapi level that we allow master to be
mentioned in the list of bonds? That's the only scenario where this line
does something I think. So should we just forbid this setup on the uapi
level?


That's just a lot of digging!


It's simple, in set_engines__bond in the bonds loop we just do "if 
(master == bond) oh_no_you_wont". Am I missing something?



+ exec = READ_ONCE(rq->execution_mask);
+ while (!try_cmpxchg(>execution_mask, , exec & allowed))
+ ;
+
+ /* Prevent the master from being re-run on the slaved engines */
+ to_request(signal)->execution_mask &= ~allowed;


This sounds unfortunate for future scheduling. There shouldn't be a
fundamental reason why next execution for the master couldn't be on an
engine which can also be a slave. So if we have:


Note though that we do not reset the execution_mask at any point :)
That's actually harder to do than it sounds, as after the bonded
execution, they are no longer linked. :|


master
.veng=vcs0,vcs1
slave
.veng=vcs0,vcs1
.bond(master=vcs0, mask=vcs1)
.bond(master=vcs1, mask=vcs0)

This should be allowed setup but with this change it would fix the
master to only be one of the options.


It would fix it to the first one it selected and executed on. It can
still pick either vcs0 or vcs1 and the slave would then be on vcs1 or
vcs0 respectively.


Is the real problem that after preemption for timeslicing and subsequent
re-submit we miss some hooks to re-evaluate the bonded relationship?


That doesn't exist, yes. But it's more than that, as we don't have the
notion of global preemption -- we don't evaluate between engines whether
or not there are cross dependencies.
  

I guess looking would be hard to do any peeking from one submission
tasklet to another (different engines) to check if one of the pair is
already executing again and so to pick the other end correctly?


Hard indeed. I would throw a flag onto the request that says if you
preempt me, stop the world (intel_engine_mask_t perhaps). Even that
requires some tricks we don't yet have. But we can't touch the other
engines within the tasklet unless we can come up with a lockless
strategy (hence the strategy of punting to a supreme thread with
oversight of all engines, gah.)
  

I think in practical terms for media this work since they are not
setting it up like my sketch shows. So it could be just fine in practice
for current users.


I think your example works better than you think -- we just end up
concreted into our first choice and can't jump around hogs in the
system. (For example, to prove the above, we can launch two such tasks,
with a spinner as both masters and the system should get stuck on both
spinners.)


Re: [Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from overtaking each other on preemption

2019-09-20 Thread Chris Wilson
Quoting Tvrtko Ursulin (2019-09-20 13:24:47)
> 
> On 20/09/2019 09:36, Chris Wilson wrote:
> > Force bonded requests to run on distinct engines so that they cannot be
> > shuffled onto the same engine where timeslicing will reverse the order.
> > A bonded request will often wait on a semaphore signaled by its master,
> > creating an implicit dependency -- if we ignore that implicit dependency
> > and allow the bonded request to run on the same engine and before its
> > master, we will cause a GPU hang.
> > 
> > We can prevent this inversion by restricting which engines we allow
> > ourselves to jump to upon preemption, i.e. baking in the arrangement
> > established at first execution. (We should also consider capturing the
> > implicit dependency using i915_sched_add_dependency(), but first we need
> > to think about the constraints that requires on the execution/retirement
> > ordering.)
> > 
> > Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
> > References: ee1136908e9b ("drm/i915/execlists: Virtual engine bonding")
> > Signed-off-by: Chris Wilson 
> > Cc: Mika Kuoppala 
> > Cc: Tvrtko Ursulin 
> > ---
> >   drivers/gpu/drm/i915/gt/intel_lrc.c | 19 +++
> >   1 file changed, 11 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
> > b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index a99166a2d2eb..7920649e4d87 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -3755,18 +3755,21 @@ static void
> >   virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
> >   {
> >   struct virtual_engine *ve = to_virtual_engine(rq->engine);
> > + intel_engine_mask_t allowed, exec;
> >   struct ve_bond *bond;
> >   
> >   bond = virtual_find_bond(ve, to_request(signal)->engine);
> > - if (bond) {
> > - intel_engine_mask_t old, new, cmp;
> > + if (!bond)
> > + return;
> >   
> > - cmp = READ_ONCE(rq->execution_mask);
> > - do {
> > - old = cmp;
> > - new = cmp & bond->sibling_mask;
> > - } while ((cmp = cmpxchg(>execution_mask, old, new)) != 
> > old);
> > - }
> > + /* Restrict the bonded request to run on only the slaved engines */
> > + allowed = bond->sibling_mask & ~to_request(signal)->engine->mask;
> 
> Hmm.. isn't it a miss on the uapi level that we allow master to be 
> mentioned in the list of bonds? That's the only scenario where this line 
> does something I think. So should we just forbid this setup on the uapi 
> level?

That's just a lot of digging!

> > + exec = READ_ONCE(rq->execution_mask);
> > + while (!try_cmpxchg(>execution_mask, , exec & allowed))
> > + ;
> > +
> > + /* Prevent the master from being re-run on the slaved engines */
> > + to_request(signal)->execution_mask &= ~allowed;
> 
> This sounds unfortunate for future scheduling. There shouldn't be a 
> fundamental reason why next execution for the master couldn't be on an 
> engine which can also be a slave. So if we have:

Note though that we do not reset the execution_mask at any point :)
That's actually harder to do than it sounds, as after the bonded
execution, they are no longer linked. :|

> master
>.veng=vcs0,vcs1
> slave
>.veng=vcs0,vcs1
>.bond(master=vcs0, mask=vcs1)
>.bond(master=vcs1, mask=vcs0)
> 
> This should be allowed setup but with this change it would fix the 
> master to only be one of the options.

It would fix it to the first one it selected and executed on. It can
still pick either vcs0 or vcs1 and the slave would then be on vcs1 or
vcs0 respectively.

> Is the real problem that after preemption for timeslicing and subsequent 
> re-submit we miss some hooks to re-evaluate the bonded relationship?

That doesn't exist, yes. But it's more than that, as we don't have the
notion of global preemption -- we don't evaluate between engines whether
or not there are cross dependencies.
 
> I guess looking would be hard to do any peeking from one submission 
> tasklet to another (different engines) to check if one of the pair is 
> already executing again and so to pick the other end correctly?

Hard indeed. I would throw a flag onto the request that says if you
preempt me, stop the world (intel_engine_mask_t perhaps). Even that
requires some tricks we don't yet have. But we can't touch the other
engines within the tasklet unless we can come up with a lockless
strategy (hence the strategy of punting to a supreme thread with
oversight of all engines, gah.)
 
> I think in practical terms for media this work since they are not 
> setting it up like my sketch shows. So it could be just fine in practice 
> for current users.

I think your example works better than you think -- we just end up
concreted into our first choice and can't jump around hogs in the
system. (For example, to prove the above, we can launch two such 

Re: [Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from overtaking each other on preemption

2019-09-20 Thread Tvrtko Ursulin


On 20/09/2019 09:36, Chris Wilson wrote:

Force bonded requests to run on distinct engines so that they cannot be
shuffled onto the same engine where timeslicing will reverse the order.
A bonded request will often wait on a semaphore signaled by its master,
creating an implicit dependency -- if we ignore that implicit dependency
and allow the bonded request to run on the same engine and before its
master, we will cause a GPU hang.

We can prevent this inversion by restricting which engines we allow
ourselves to jump to upon preemption, i.e. baking in the arrangement
established at first execution. (We should also consider capturing the
implicit dependency using i915_sched_add_dependency(), but first we need
to think about the constraints that requires on the execution/retirement
ordering.)

Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
References: ee1136908e9b ("drm/i915/execlists: Virtual engine bonding")
Signed-off-by: Chris Wilson 
Cc: Mika Kuoppala 
Cc: Tvrtko Ursulin 
---
  drivers/gpu/drm/i915/gt/intel_lrc.c | 19 +++
  1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index a99166a2d2eb..7920649e4d87 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -3755,18 +3755,21 @@ static void
  virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
  {
struct virtual_engine *ve = to_virtual_engine(rq->engine);
+   intel_engine_mask_t allowed, exec;
struct ve_bond *bond;
  
  	bond = virtual_find_bond(ve, to_request(signal)->engine);

-   if (bond) {
-   intel_engine_mask_t old, new, cmp;
+   if (!bond)
+   return;
  
-		cmp = READ_ONCE(rq->execution_mask);

-   do {
-   old = cmp;
-   new = cmp & bond->sibling_mask;
-   } while ((cmp = cmpxchg(>execution_mask, old, new)) != old);
-   }
+   /* Restrict the bonded request to run on only the slaved engines */
+   allowed = bond->sibling_mask & ~to_request(signal)->engine->mask;


Hmm.. isn't it a miss on the uapi level that we allow master to be 
mentioned in the list of bonds? That's the only scenario where this line 
does something I think. So should we just forbid this setup on the uapi 
level?



+   exec = READ_ONCE(rq->execution_mask);
+   while (!try_cmpxchg(>execution_mask, , exec & allowed))
+   ;
+
+   /* Prevent the master from being re-run on the slaved engines */
+   to_request(signal)->execution_mask &= ~allowed;


This sounds unfortunate for future scheduling. There shouldn't be a 
fundamental reason why next execution for the master couldn't be on an 
engine which can also be a slave. So if we have:


master
  .veng=vcs0,vcs1
slave
  .veng=vcs0,vcs1
  .bond(master=vcs0, mask=vcs1)
  .bond(master=vcs1, mask=vcs0)

This should be allowed setup but with this change it would fix the 
master to only be one of the options.


Is the real problem that after preemption for timeslicing and subsequent 
re-submit we miss some hooks to re-evaluate the bonded relationship?


I guess looking would be hard to do any peeking from one submission 
tasklet to another (different engines) to check if one of the pair is 
already executing again and so to pick the other end correctly?


I think in practical terms for media this work since they are not 
setting it up like my sketch shows. So it could be just fine in practice 
for current users.


Regards,

Tvrtko


  }
  
  struct intel_context *



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from overtaking each other on preemption

2019-09-20 Thread Chris Wilson
Force bonded requests to run on distinct engines so that they cannot be
shuffled onto the same engine where timeslicing will reverse the order.
A bonded request will often wait on a semaphore signaled by its master,
creating an implicit dependency -- if we ignore that implicit dependency
and allow the bonded request to run on the same engine and before its
master, we will cause a GPU hang.

We can prevent this inversion by restricting which engines we allow
ourselves to jump to upon preemption, i.e. baking in the arrangement
established at first execution. (We should also consider capturing the
implicit dependency using i915_sched_add_dependency(), but first we need
to think about the constraints that requires on the execution/retirement
ordering.)

Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
References: ee1136908e9b ("drm/i915/execlists: Virtual engine bonding")
Signed-off-by: Chris Wilson 
Cc: Mika Kuoppala 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index a99166a2d2eb..7920649e4d87 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -3755,18 +3755,21 @@ static void
 virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
 {
struct virtual_engine *ve = to_virtual_engine(rq->engine);
+   intel_engine_mask_t allowed, exec;
struct ve_bond *bond;
 
bond = virtual_find_bond(ve, to_request(signal)->engine);
-   if (bond) {
-   intel_engine_mask_t old, new, cmp;
+   if (!bond)
+   return;
 
-   cmp = READ_ONCE(rq->execution_mask);
-   do {
-   old = cmp;
-   new = cmp & bond->sibling_mask;
-   } while ((cmp = cmpxchg(>execution_mask, old, new)) != old);
-   }
+   /* Restrict the bonded request to run on only the slaved engines */
+   allowed = bond->sibling_mask & ~to_request(signal)->engine->mask;
+   exec = READ_ONCE(rq->execution_mask);
+   while (!try_cmpxchg(>execution_mask, , exec & allowed))
+   ;
+
+   /* Prevent the master from being re-run on the slaved engines */
+   to_request(signal)->execution_mask &= ~allowed;
 }
 
 struct intel_context *
-- 
2.23.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx