Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context

2021-08-18 Thread Daniel Vetter
On Wed, Aug 18, 2021 at 2:28 AM John Harrison  wrote:
> On 8/9/2021 23:36, Daniel Vetter wrote:
> > On Mon, Aug 09, 2021 at 04:12:52PM -0700, John Harrison wrote:
> >> On 8/6/2021 12:46, Daniel Vetter wrote:
> >>> Seen this fly by and figured I dropped a few thoughts in here. At the
> >>> likely cost of looking a bit out of whack :-)
> >>>
> >>> On Fri, Aug 6, 2021 at 8:01 PM John Harrison  
> >>> wrote:
>  On 8/2/2021 02:40, Tvrtko Ursulin wrote:
> > On 30/07/2021 19:13, John Harrison wrote:
> >> On 7/30/2021 02:49, Tvrtko Ursulin wrote:
> >>> On 30/07/2021 01:13, John Harrison wrote:
>  On 7/28/2021 17:34, Matthew Brost wrote:
> > If an engine associated with a context does not have a heartbeat,
> > ban it
> > immediately. This is needed for GuC submission as a idle pulse
> > doesn't
> > kick the context off the hardware where it then can check for a
> > heartbeat and ban the context.
> >>> Pulse, that is a request with I915_PRIORITY_BARRIER, does not
> >>> preempt a running normal priority context?
> >>>
> >>> Why does it matter then whether or not heartbeats are enabled - when
> >>> heartbeat just ends up sending the same engine pulse (eventually,
> >>> with raising priority)?
> >> The point is that the pulse is pointless. See the rest of my comments
> >> below, specifically "the context will get resubmitted to the hardware
> >> after the pulse completes". To re-iterate...
> >>
> >> Yes, it preempts the context. Yes, it does so whether heartbeats are
> >> enabled or not. But so what? Who cares? You have preempted a context.
> >> It is no longer running on the hardware. BUT IT IS STILL A VALID
> >> CONTEXT.
> > It is valid yes, and it even may be the current ABI so another
> > question is whether it is okay to change that.
> >
> >> The backend scheduler will just resubmit it to the hardware as soon
> >> as the pulse completes. The only reason this works at all is because
> >> of the horrid hack in the execlist scheduler's back end
> >> implementation (in __execlists_schedule_in):
> >>if (unlikely(intel_context_is_closed(ce) &&
> >> !intel_engine_has_heartbeat(engine)))
> >>intel_context_set_banned(ce);
> > Right, is the above code then needed with this patch - when ban is
> > immediately applied on the higher level?
> >
> >> The actual back end scheduler is saying "Is this a zombie context? Is
> >> the heartbeat disabled? Then ban it". No other scheduler backend is
> >> going to have knowledge of zombie context status or of the heartbeat
> >> status. Nor are they going to call back into the higher levels of the
> >> i915 driver to trigger a ban operation. Certainly a hardware
> >> implemented scheduler is not going to be looking at private i915
> >> driver information to decide whether to submit a context or whether
> >> to tell the OS to kill it off instead.
> >>
> >> For persistence to work with a hardware scheduler (or a non-Intel
> >> specific scheduler such as the DRM one), the handling of zombie
> >> contexts, banning, etc. *must* be done entirely in the front end. It
> >> cannot rely on any backend hacks. That means you can't rely on any
> >> fancy behaviour of pulses.
> >>
> >> If you want to ban a context then you must explicitly ban that
> >> context. If you want to ban it at some later point then you need to
> >> track it at the top level as a zombie and then explicitly ban that
> >> zombie at whatever later point.
> > I am still trying to understand it all. If I go by the commit message:
> >
> > """
> > This is needed for GuC submission as a idle pulse doesn't
> > kick the context off the hardware where it then can check for a
> > heartbeat and ban the context.
> > """
> >
> > That did not explain things for me. Sentence does not appear to make
> > sense. Now, it seems "kick off the hardware" is meant as revoke and
> > not just preempt. Which is fine, perhaps just needs to be written more
> > explicitly. But the part of checking for heartbeat after idle pulse
> > does not compute for me. It is the heartbeat which emits idle pulses,
> > not idle pulse emitting heartbeats.
>  I am in agreement that the commit message is confusing and does not
>  explain either the problem or the solution.
> 
> 
> > But anyway, I can buy the handling at the front end story completely.
> > It makes sense. We just need to agree that a) it is okay to change the
> > ABI and b) remove the backend check from execlists if it is not needed
> > any longer.
> >
> > And if ABI change is okay then commit message needs to talk about it
> > loudly and clearly.
>  I don't think we have a choice. The current ABI is 

Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context

2021-08-17 Thread John Harrison

On 8/9/2021 23:36, Daniel Vetter wrote:

On Mon, Aug 09, 2021 at 04:12:52PM -0700, John Harrison wrote:

On 8/6/2021 12:46, Daniel Vetter wrote:

Seen this fly by and figured I dropped a few thoughts in here. At the
likely cost of looking a bit out of whack :-)

On Fri, Aug 6, 2021 at 8:01 PM John Harrison  wrote:

On 8/2/2021 02:40, Tvrtko Ursulin wrote:

On 30/07/2021 19:13, John Harrison wrote:

On 7/30/2021 02:49, Tvrtko Ursulin wrote:

On 30/07/2021 01:13, John Harrison wrote:

On 7/28/2021 17:34, Matthew Brost wrote:

If an engine associated with a context does not have a heartbeat,
ban it
immediately. This is needed for GuC submission as a idle pulse
doesn't
kick the context off the hardware where it then can check for a
heartbeat and ban the context.

Pulse, that is a request with I915_PRIORITY_BARRIER, does not
preempt a running normal priority context?

Why does it matter then whether or not heartbeats are enabled - when
heartbeat just ends up sending the same engine pulse (eventually,
with raising priority)?

The point is that the pulse is pointless. See the rest of my comments
below, specifically "the context will get resubmitted to the hardware
after the pulse completes". To re-iterate...

Yes, it preempts the context. Yes, it does so whether heartbeats are
enabled or not. But so what? Who cares? You have preempted a context.
It is no longer running on the hardware. BUT IT IS STILL A VALID
CONTEXT.

It is valid yes, and it even may be the current ABI so another
question is whether it is okay to change that.


The backend scheduler will just resubmit it to the hardware as soon
as the pulse completes. The only reason this works at all is because
of the horrid hack in the execlist scheduler's back end
implementation (in __execlists_schedule_in):
   if (unlikely(intel_context_is_closed(ce) &&
!intel_engine_has_heartbeat(engine)))
   intel_context_set_banned(ce);

Right, is the above code then needed with this patch - when ban is
immediately applied on the higher level?


The actual back end scheduler is saying "Is this a zombie context? Is
the heartbeat disabled? Then ban it". No other scheduler backend is
going to have knowledge of zombie context status or of the heartbeat
status. Nor are they going to call back into the higher levels of the
i915 driver to trigger a ban operation. Certainly a hardware
implemented scheduler is not going to be looking at private i915
driver information to decide whether to submit a context or whether
to tell the OS to kill it off instead.

For persistence to work with a hardware scheduler (or a non-Intel
specific scheduler such as the DRM one), the handling of zombie
contexts, banning, etc. *must* be done entirely in the front end. It
cannot rely on any backend hacks. That means you can't rely on any
fancy behaviour of pulses.

If you want to ban a context then you must explicitly ban that
context. If you want to ban it at some later point then you need to
track it at the top level as a zombie and then explicitly ban that
zombie at whatever later point.

I am still trying to understand it all. If I go by the commit message:

"""
This is needed for GuC submission as a idle pulse doesn't
kick the context off the hardware where it then can check for a
heartbeat and ban the context.
"""

That did not explain things for me. Sentence does not appear to make
sense. Now, it seems "kick off the hardware" is meant as revoke and
not just preempt. Which is fine, perhaps just needs to be written more
explicitly. But the part of checking for heartbeat after idle pulse
does not compute for me. It is the heartbeat which emits idle pulses,
not idle pulse emitting heartbeats.

I am in agreement that the commit message is confusing and does not
explain either the problem or the solution.



But anyway, I can buy the handling at the front end story completely.
It makes sense. We just need to agree that a) it is okay to change the
ABI and b) remove the backend check from execlists if it is not needed
any longer.

And if ABI change is okay then commit message needs to talk about it
loudly and clearly.

I don't think we have a choice. The current ABI is not and cannot ever
be compatible with any scheduler external to i915. It cannot be
implemented with a hardware scheduler such as the GuC and it cannot be
implemented with an external software scheduler such as the DRM one.

So generally on linux we implement helper libraries, which means
massive flexibility everywhere.

https://blog.ffwll.ch/2016/12/midlayers-once-more-with-feeling.html

So it shouldn't be an insurmountable problem to make this happen even
with drm/scheduler, we can patch it up.

Whether that's justified is another question.

Helper libraries won't work with a hardware scheduler.

Hm I guess I misunderstood then what exactly the hold-up is. This entire
discussion feels at least a bit like "heartbeat is unchangeable and guc
must fit", which is pretty much the midlayer 

Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context

2021-08-10 Thread Daniel Vetter
On Mon, Aug 09, 2021 at 04:12:52PM -0700, John Harrison wrote:
> On 8/6/2021 12:46, Daniel Vetter wrote:
> > Seen this fly by and figured I dropped a few thoughts in here. At the
> > likely cost of looking a bit out of whack :-)
> > 
> > On Fri, Aug 6, 2021 at 8:01 PM John Harrison  
> > wrote:
> > > On 8/2/2021 02:40, Tvrtko Ursulin wrote:
> > > > On 30/07/2021 19:13, John Harrison wrote:
> > > > > On 7/30/2021 02:49, Tvrtko Ursulin wrote:
> > > > > > On 30/07/2021 01:13, John Harrison wrote:
> > > > > > > On 7/28/2021 17:34, Matthew Brost wrote:
> > > > > > > > If an engine associated with a context does not have a 
> > > > > > > > heartbeat,
> > > > > > > > ban it
> > > > > > > > immediately. This is needed for GuC submission as a idle pulse
> > > > > > > > doesn't
> > > > > > > > kick the context off the hardware where it then can check for a
> > > > > > > > heartbeat and ban the context.
> > > > > > Pulse, that is a request with I915_PRIORITY_BARRIER, does not
> > > > > > preempt a running normal priority context?
> > > > > > 
> > > > > > Why does it matter then whether or not heartbeats are enabled - when
> > > > > > heartbeat just ends up sending the same engine pulse (eventually,
> > > > > > with raising priority)?
> > > > > The point is that the pulse is pointless. See the rest of my comments
> > > > > below, specifically "the context will get resubmitted to the hardware
> > > > > after the pulse completes". To re-iterate...
> > > > > 
> > > > > Yes, it preempts the context. Yes, it does so whether heartbeats are
> > > > > enabled or not. But so what? Who cares? You have preempted a context.
> > > > > It is no longer running on the hardware. BUT IT IS STILL A VALID
> > > > > CONTEXT.
> > > > It is valid yes, and it even may be the current ABI so another
> > > > question is whether it is okay to change that.
> > > > 
> > > > > The backend scheduler will just resubmit it to the hardware as soon
> > > > > as the pulse completes. The only reason this works at all is because
> > > > > of the horrid hack in the execlist scheduler's back end
> > > > > implementation (in __execlists_schedule_in):
> > > > >   if (unlikely(intel_context_is_closed(ce) &&
> > > > >!intel_engine_has_heartbeat(engine)))
> > > > >   intel_context_set_banned(ce);
> > > > Right, is the above code then needed with this patch - when ban is
> > > > immediately applied on the higher level?
> > > > 
> > > > > The actual back end scheduler is saying "Is this a zombie context? Is
> > > > > the heartbeat disabled? Then ban it". No other scheduler backend is
> > > > > going to have knowledge of zombie context status or of the heartbeat
> > > > > status. Nor are they going to call back into the higher levels of the
> > > > > i915 driver to trigger a ban operation. Certainly a hardware
> > > > > implemented scheduler is not going to be looking at private i915
> > > > > driver information to decide whether to submit a context or whether
> > > > > to tell the OS to kill it off instead.
> > > > > 
> > > > > For persistence to work with a hardware scheduler (or a non-Intel
> > > > > specific scheduler such as the DRM one), the handling of zombie
> > > > > contexts, banning, etc. *must* be done entirely in the front end. It
> > > > > cannot rely on any backend hacks. That means you can't rely on any
> > > > > fancy behaviour of pulses.
> > > > > 
> > > > > If you want to ban a context then you must explicitly ban that
> > > > > context. If you want to ban it at some later point then you need to
> > > > > track it at the top level as a zombie and then explicitly ban that
> > > > > zombie at whatever later point.
> > > > I am still trying to understand it all. If I go by the commit message:
> > > > 
> > > > """
> > > > This is needed for GuC submission as a idle pulse doesn't
> > > > kick the context off the hardware where it then can check for a
> > > > heartbeat and ban the context.
> > > > """
> > > > 
> > > > That did not explain things for me. Sentence does not appear to make
> > > > sense. Now, it seems "kick off the hardware" is meant as revoke and
> > > > not just preempt. Which is fine, perhaps just needs to be written more
> > > > explicitly. But the part of checking for heartbeat after idle pulse
> > > > does not compute for me. It is the heartbeat which emits idle pulses,
> > > > not idle pulse emitting heartbeats.
> > > I am in agreement that the commit message is confusing and does not
> > > explain either the problem or the solution.
> > > 
> > > 
> > > > 
> > > > But anyway, I can buy the handling at the front end story completely.
> > > > It makes sense. We just need to agree that a) it is okay to change the
> > > > ABI and b) remove the backend check from execlists if it is not needed
> > > > any longer.
> > > > 
> > > > And if ABI change is okay then commit message needs to talk about it
> > > > loudly and clearly.
> > > I don't think we have a choice. The current ABI is not and 

Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context

2021-08-09 Thread John Harrison

On 8/6/2021 12:46, Daniel Vetter wrote:

Seen this fly by and figured I dropped a few thoughts in here. At the
likely cost of looking a bit out of whack :-)

On Fri, Aug 6, 2021 at 8:01 PM John Harrison  wrote:

On 8/2/2021 02:40, Tvrtko Ursulin wrote:

On 30/07/2021 19:13, John Harrison wrote:

On 7/30/2021 02:49, Tvrtko Ursulin wrote:

On 30/07/2021 01:13, John Harrison wrote:

On 7/28/2021 17:34, Matthew Brost wrote:

If an engine associated with a context does not have a heartbeat,
ban it
immediately. This is needed for GuC submission as a idle pulse
doesn't
kick the context off the hardware where it then can check for a
heartbeat and ban the context.

Pulse, that is a request with I915_PRIORITY_BARRIER, does not
preempt a running normal priority context?

Why does it matter then whether or not heartbeats are enabled - when
heartbeat just ends up sending the same engine pulse (eventually,
with raising priority)?

The point is that the pulse is pointless. See the rest of my comments
below, specifically "the context will get resubmitted to the hardware
after the pulse completes". To re-iterate...

Yes, it preempts the context. Yes, it does so whether heartbeats are
enabled or not. But so what? Who cares? You have preempted a context.
It is no longer running on the hardware. BUT IT IS STILL A VALID
CONTEXT.

It is valid yes, and it even may be the current ABI so another
question is whether it is okay to change that.


The backend scheduler will just resubmit it to the hardware as soon
as the pulse completes. The only reason this works at all is because
of the horrid hack in the execlist scheduler's back end
implementation (in __execlists_schedule_in):
  if (unlikely(intel_context_is_closed(ce) &&
   !intel_engine_has_heartbeat(engine)))
  intel_context_set_banned(ce);

Right, is the above code then needed with this patch - when ban is
immediately applied on the higher level?


The actual back end scheduler is saying "Is this a zombie context? Is
the heartbeat disabled? Then ban it". No other scheduler backend is
going to have knowledge of zombie context status or of the heartbeat
status. Nor are they going to call back into the higher levels of the
i915 driver to trigger a ban operation. Certainly a hardware
implemented scheduler is not going to be looking at private i915
driver information to decide whether to submit a context or whether
to tell the OS to kill it off instead.

For persistence to work with a hardware scheduler (or a non-Intel
specific scheduler such as the DRM one), the handling of zombie
contexts, banning, etc. *must* be done entirely in the front end. It
cannot rely on any backend hacks. That means you can't rely on any
fancy behaviour of pulses.

If you want to ban a context then you must explicitly ban that
context. If you want to ban it at some later point then you need to
track it at the top level as a zombie and then explicitly ban that
zombie at whatever later point.

I am still trying to understand it all. If I go by the commit message:

"""
This is needed for GuC submission as a idle pulse doesn't
kick the context off the hardware where it then can check for a
heartbeat and ban the context.
"""

That did not explain things for me. Sentence does not appear to make
sense. Now, it seems "kick off the hardware" is meant as revoke and
not just preempt. Which is fine, perhaps just needs to be written more
explicitly. But the part of checking for heartbeat after idle pulse
does not compute for me. It is the heartbeat which emits idle pulses,
not idle pulse emitting heartbeats.

I am in agreement that the commit message is confusing and does not
explain either the problem or the solution.




But anyway, I can buy the handling at the front end story completely.
It makes sense. We just need to agree that a) it is okay to change the
ABI and b) remove the backend check from execlists if it is not needed
any longer.

And if ABI change is okay then commit message needs to talk about it
loudly and clearly.

I don't think we have a choice. The current ABI is not and cannot ever
be compatible with any scheduler external to i915. It cannot be
implemented with a hardware scheduler such as the GuC and it cannot be
implemented with an external software scheduler such as the DRM one.

So generally on linux we implement helper libraries, which means
massive flexibility everywhere.

https://blog.ffwll.ch/2016/12/midlayers-once-more-with-feeling.html

So it shouldn't be an insurmountable problem to make this happen even
with drm/scheduler, we can patch it up.

Whether that's justified is another question.

Helper libraries won't work with a hardware scheduler.




My view is that any implementation involving knowledge of the heartbeat
is fundamentally broken.

According to Daniel Vetter, the DRM ABI on this subject is that an
actively executing context should persist until the DRM file handle is
closed. That seems like a much more plausible and simple ABI 

Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context

2021-08-06 Thread Daniel Vetter
Seen this fly by and figured I dropped a few thoughts in here. At the
likely cost of looking a bit out of whack :-)

On Fri, Aug 6, 2021 at 8:01 PM John Harrison  wrote:
> On 8/2/2021 02:40, Tvrtko Ursulin wrote:
> > On 30/07/2021 19:13, John Harrison wrote:
> >> On 7/30/2021 02:49, Tvrtko Ursulin wrote:
> >>> On 30/07/2021 01:13, John Harrison wrote:
>  On 7/28/2021 17:34, Matthew Brost wrote:
> > If an engine associated with a context does not have a heartbeat,
> > ban it
> > immediately. This is needed for GuC submission as a idle pulse
> > doesn't
> > kick the context off the hardware where it then can check for a
> > heartbeat and ban the context.
> >>>
> >>> Pulse, that is a request with I915_PRIORITY_BARRIER, does not
> >>> preempt a running normal priority context?
> >>>
> >>> Why does it matter then whether or not heartbeats are enabled - when
> >>> heartbeat just ends up sending the same engine pulse (eventually,
> >>> with raising priority)?
> >> The point is that the pulse is pointless. See the rest of my comments
> >> below, specifically "the context will get resubmitted to the hardware
> >> after the pulse completes". To re-iterate...
> >>
> >> Yes, it preempts the context. Yes, it does so whether heartbeats are
> >> enabled or not. But so what? Who cares? You have preempted a context.
> >> It is no longer running on the hardware. BUT IT IS STILL A VALID
> >> CONTEXT.
> >
> > It is valid yes, and it even may be the current ABI so another
> > question is whether it is okay to change that.
> >
> >> The backend scheduler will just resubmit it to the hardware as soon
> >> as the pulse completes. The only reason this works at all is because
> >> of the horrid hack in the execlist scheduler's back end
> >> implementation (in __execlists_schedule_in):
> >>  if (unlikely(intel_context_is_closed(ce) &&
> >>   !intel_engine_has_heartbeat(engine)))
> >>  intel_context_set_banned(ce);
> >
> > Right, is the above code then needed with this patch - when ban is
> > immediately applied on the higher level?
> >
> >> The actual back end scheduler is saying "Is this a zombie context? Is
> >> the heartbeat disabled? Then ban it". No other scheduler backend is
> >> going to have knowledge of zombie context status or of the heartbeat
> >> status. Nor are they going to call back into the higher levels of the
> >> i915 driver to trigger a ban operation. Certainly a hardware
> >> implemented scheduler is not going to be looking at private i915
> >> driver information to decide whether to submit a context or whether
> >> to tell the OS to kill it off instead.
> >>
> >> For persistence to work with a hardware scheduler (or a non-Intel
> >> specific scheduler such as the DRM one), the handling of zombie
> >> contexts, banning, etc. *must* be done entirely in the front end. It
> >> cannot rely on any backend hacks. That means you can't rely on any
> >> fancy behaviour of pulses.
> >>
> >> If you want to ban a context then you must explicitly ban that
> >> context. If you want to ban it at some later point then you need to
> >> track it at the top level as a zombie and then explicitly ban that
> >> zombie at whatever later point.
> >
> > I am still trying to understand it all. If I go by the commit message:
> >
> > """
> > This is needed for GuC submission as a idle pulse doesn't
> > kick the context off the hardware where it then can check for a
> > heartbeat and ban the context.
> > """
> >
> > That did not explain things for me. Sentence does not appear to make
> > sense. Now, it seems "kick off the hardware" is meant as revoke and
> > not just preempt. Which is fine, perhaps just needs to be written more
> > explicitly. But the part of checking for heartbeat after idle pulse
> > does not compute for me. It is the heartbeat which emits idle pulses,
> > not idle pulse emitting heartbeats.
> I am in agreement that the commit message is confusing and does not
> explain either the problem or the solution.
>
>
> >
> >
> > But anyway, I can buy the handling at the front end story completely.
> > It makes sense. We just need to agree that a) it is okay to change the
> > ABI and b) remove the backend check from execlists if it is not needed
> > any longer.
> >
> > And if ABI change is okay then commit message needs to talk about it
> > loudly and clearly.
> I don't think we have a choice. The current ABI is not and cannot ever
> be compatible with any scheduler external to i915. It cannot be
> implemented with a hardware scheduler such as the GuC and it cannot be
> implemented with an external software scheduler such as the DRM one.

So generally on linux we implement helper libraries, which means
massive flexibility everywhere.

https://blog.ffwll.ch/2016/12/midlayers-once-more-with-feeling.html

So it shouldn't be an insurmountable problem to make this happen even
with drm/scheduler, we can patch it up.

Whether that's justified is another question.

> 

Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context

2021-08-06 Thread John Harrison

On 8/2/2021 02:40, Tvrtko Ursulin wrote:

On 30/07/2021 19:13, John Harrison wrote:

On 7/30/2021 02:49, Tvrtko Ursulin wrote:

On 30/07/2021 01:13, John Harrison wrote:

On 7/28/2021 17:34, Matthew Brost wrote:
If an engine associated with a context does not have a heartbeat, 
ban it
immediately. This is needed for GuC submission as a idle pulse 
doesn't

kick the context off the hardware where it then can check for a
heartbeat and ban the context.


Pulse, that is a request with I915_PRIORITY_BARRIER, does not 
preempt a running normal priority context?


Why does it matter then whether or not heartbeats are enabled - when 
heartbeat just ends up sending the same engine pulse (eventually, 
with raising priority)?
The point is that the pulse is pointless. See the rest of my comments 
below, specifically "the context will get resubmitted to the hardware 
after the pulse completes". To re-iterate...


Yes, it preempts the context. Yes, it does so whether heartbeats are 
enabled or not. But so what? Who cares? You have preempted a context. 
It is no longer running on the hardware. BUT IT IS STILL A VALID 
CONTEXT. 


It is valid yes, and it even may be the current ABI so another 
question is whether it is okay to change that.


The backend scheduler will just resubmit it to the hardware as soon 
as the pulse completes. The only reason this works at all is because 
of the horrid hack in the execlist scheduler's back end 
implementation (in __execlists_schedule_in):

 if (unlikely(intel_context_is_closed(ce) &&
  !intel_engine_has_heartbeat(engine)))
 intel_context_set_banned(ce);


Right, is the above code then needed with this patch - when ban is 
immediately applied on the higher level?


The actual back end scheduler is saying "Is this a zombie context? Is 
the heartbeat disabled? Then ban it". No other scheduler backend is 
going to have knowledge of zombie context status or of the heartbeat 
status. Nor are they going to call back into the higher levels of the 
i915 driver to trigger a ban operation. Certainly a hardware 
implemented scheduler is not going to be looking at private i915 
driver information to decide whether to submit a context or whether 
to tell the OS to kill it off instead.


For persistence to work with a hardware scheduler (or a non-Intel 
specific scheduler such as the DRM one), the handling of zombie 
contexts, banning, etc. *must* be done entirely in the front end. It 
cannot rely on any backend hacks. That means you can't rely on any 
fancy behaviour of pulses.


If you want to ban a context then you must explicitly ban that 
context. If you want to ban it at some later point then you need to 
track it at the top level as a zombie and then explicitly ban that 
zombie at whatever later point.


I am still trying to understand it all. If I go by the commit message:

"""
This is needed for GuC submission as a idle pulse doesn't
kick the context off the hardware where it then can check for a
heartbeat and ban the context.
"""

That did not explain things for me. Sentence does not appear to make 
sense. Now, it seems "kick off the hardware" is meant as revoke and 
not just preempt. Which is fine, perhaps just needs to be written more 
explicitly. But the part of checking for heartbeat after idle pulse 
does not compute for me. It is the heartbeat which emits idle pulses, 
not idle pulse emitting heartbeats.
I am in agreement that the commit message is confusing and does not 
explain either the problem or the solution.






But anyway, I can buy the handling at the front end story completely. 
It makes sense. We just need to agree that a) it is okay to change the 
ABI and b) remove the backend check from execlists if it is not needed 
any longer.


And if ABI change is okay then commit message needs to talk about it 
loudly and clearly.
I don't think we have a choice. The current ABI is not and cannot ever 
be compatible with any scheduler external to i915. It cannot be 
implemented with a hardware scheduler such as the GuC and it cannot be 
implemented with an external software scheduler such as the DRM one.


My view is that any implementation involving knowledge of the heartbeat 
is fundamentally broken.


According to Daniel Vetter, the DRM ABI on this subject is that an 
actively executing context should persist until the DRM file handle is 
closed. That seems like a much more plausible and simple ABI than one 
that says 'if the heartbeat is running then a context will persist 
forever, if the heartbeat is not running then it will be killed 
immediately, if the heart was running but then stops running then the 
context will be killed on the next context switch, ...'. And if I 
understand it correctly, the current ABI allows a badly written user app 
to cause a denial of service by leaving contexts permanently running an 
infinite loop on the hardware even after the app has been killed! How 
can that ever be considered a good idea?



Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context

2021-08-02 Thread Tvrtko Ursulin



On 30/07/2021 19:13, John Harrison wrote:

On 7/30/2021 02:49, Tvrtko Ursulin wrote:

On 30/07/2021 01:13, John Harrison wrote:

On 7/28/2021 17:34, Matthew Brost wrote:
If an engine associated with a context does not have a heartbeat, 
ban it

immediately. This is needed for GuC submission as a idle pulse doesn't
kick the context off the hardware where it then can check for a
heartbeat and ban the context.


Pulse, that is a request with I915_PRIORITY_BARRIER, does not preempt 
a running normal priority context?


Why does it matter then whether or not heartbeats are enabled - when 
heartbeat just ends up sending the same engine pulse (eventually, with 
raising priority)?
The point is that the pulse is pointless. See the rest of my comments 
below, specifically "the context will get resubmitted to the hardware 
after the pulse completes". To re-iterate...


Yes, it preempts the context. Yes, it does so whether heartbeats are 
enabled or not. But so what? Who cares? You have preempted a context. It 
is no longer running on the hardware. BUT IT IS STILL A VALID CONTEXT. 


It is valid yes, and it even may be the current ABI so another question 
is whether it is okay to change that.


The backend scheduler will just resubmit it to the hardware as soon as 
the pulse completes. The only reason this works at all is because of the 
horrid hack in the execlist scheduler's back end implementation (in 
__execlists_schedule_in):

     if (unlikely(intel_context_is_closed(ce) &&
  !intel_engine_has_heartbeat(engine)))
     intel_context_set_banned(ce);


Right, is the above code then needed with this patch - when ban is 
immediately applied on the higher level?


The actual back end scheduler is saying "Is this a zombie context? Is 
the heartbeat disabled? Then ban it". No other scheduler backend is 
going to have knowledge of zombie context status or of the heartbeat 
status. Nor are they going to call back into the higher levels of the 
i915 driver to trigger a ban operation. Certainly a hardware implemented 
scheduler is not going to be looking at private i915 driver information 
to decide whether to submit a context or whether to tell the OS to kill 
it off instead.


For persistence to work with a hardware scheduler (or a non-Intel 
specific scheduler such as the DRM one), the handling of zombie 
contexts, banning, etc. *must* be done entirely in the front end. It 
cannot rely on any backend hacks. That means you can't rely on any fancy 
behaviour of pulses.


If you want to ban a context then you must explicitly ban that context. 
If you want to ban it at some later point then you need to track it at 
the top level as a zombie and then explicitly ban that zombie at 
whatever later point.


I am still trying to understand it all. If I go by the commit message:

"""
This is needed for GuC submission as a idle pulse doesn't
kick the context off the hardware where it then can check for a
heartbeat and ban the context.
"""

That did not explain things for me. Sentence does not appear to make 
sense. Now, it seems "kick off the hardware" is meant as revoke and not 
just preempt. Which is fine, perhaps just needs to be written more 
explicitly. But the part of checking for heartbeat after idle pulse does 
not compute for me. It is the heartbeat which emits idle pulses, not 
idle pulse emitting heartbeats.


But anyway, I can buy the handling at the front end story completely. It 
makes sense. We just need to agree that a) it is okay to change the ABI 
and b) remove the backend check from execlists if it is not needed any 
longer.


And if ABI change is okay then commit message needs to talk about it 
loudly and clearly.


Or perhaps there is no ABI change? I am not really clear how does 
setting banned status propagate to the GuC backend. I mean at which 
point does i915 ends up passing that info to the firmware?


Regards,

Tvrtko






It's worse than this. If the engine in question is an individual 
physical engine then sending a pulse (with sufficiently high 
priority) will pre-empt the engine and kick the context off. However, 
the GuC 


Why it is different for physical vs virtual, aren't both just 
schedulable contexts with different engine masks for what GuC is 
concerned? Oh, is it a matter of needing to send pulses to all engines 
which comprise a virtual one?
It isn't different. It is totally broken for both. It is potentially 
more broken for virtual engines because of the question of which engine 
to pulse. But as stated above, the pulse is pointless anyway so the 
which engine question doesn't even matter.


John.




scheduler does not have hacks in it to check the state of the 
heartbeat or whether a context is actually a zombie or not. Thus, the 
context will get resubmitted to the hardware after the pulse 
completes and effectively nothing will have happened.


I would assume that the DRM scheduler which we are meant to be 
switching to for execlist as well as 

Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context

2021-07-30 Thread Matthew Brost
On Fri, Jul 30, 2021 at 10:49:01AM +0100, Tvrtko Ursulin wrote:
> 
> On 30/07/2021 01:13, John Harrison wrote:
> > On 7/28/2021 17:34, Matthew Brost wrote:
> > > If an engine associated with a context does not have a heartbeat, ban it
> > > immediately. This is needed for GuC submission as a idle pulse doesn't
> > > kick the context off the hardware where it then can check for a
> > > heartbeat and ban the context.
> 
> Pulse, that is a request with I915_PRIORITY_BARRIER, does not preempt a
> running normal priority context?
> 

Yes, in both execlists and GuC submission the contexts gets preempted.
With execlists the i915 see the preempt CSB while with GuC submission
the GUC sees it.

> Why does it matter then whether or not heartbeats are enabled - when
> heartbeat just ends up sending the same engine pulse (eventually, with
> raising priority)?
>

With execlists when the request gets resubmitted, there is check if the
context is closed and the heartbeat is disabled. If this is true, the
context gets banned. See __execlists_schedule_in.

With the Guc sense it owns the CSB / resubmission, the heartbeat /
closed check doesn't exist to ban the context. 

> > It's worse than this. If the engine in question is an individual
> > physical engine then sending a pulse (with sufficiently high priority)
> > will pre-empt the engine and kick the context off. However, the GuC
> 
> Why it is different for physical vs virtual, aren't both just schedulable
> contexts with different engine masks for what GuC is concerned? Oh, is it a
> matter of needing to send pulses to all engines which comprise a virtual
> one?

Yes. The whole idle pulse thing is kinda junk. It really makes an
assumption that the backend is execlists. We likely have a bit more work
here.

> 
> > scheduler does not have hacks in it to check the state of the heartbeat
> > or whether a context is actually a zombie or not. Thus, the context will
> > get resubmitted to the hardware after the pulse completes and
> > effectively nothing will have happened.
> > 
> > I would assume that the DRM scheduler which we are meant to be switching
> > to for execlist as well as GuC submission is also unlikely to have hacks
> > for zombie contexts and tests for whether the i915 specific heartbeat
> > has been disabled since the context became a zombie. So when that switch
> > happens, this test will also fail in execlist mode as well as GuC mode.
> > 
> > The choices I see here are to simply remove persistence completely (it
> > is a basically a bug that became UAPI because it wasn't caught soon
> > enough!) or to implement it in a way that does not require hacks in the
> > back end scheduler. Apparently, the DRM scheduler is expected to allow
> > zombie contexts to persist until the DRM file handle is closed. So
> > presumably we will have to go with option two.
> > 
> > That means flagging a context as being a zombie when it is closed but
> > still active. The driver would then add it to a zombie list owned by the
> > DRM client object. When that client object is closed, i915 would go
> > through the list and genuinely kill all the contexts. No back end
> > scheduler hacks required and no intimate knowledge of the i915 heartbeat
> > mechanism required either.
> > 
> > John.
> > 
> > 
> > > 
> > > This patch also updates intel_engine_has_heartbeat to be a vfunc as we
> > > now need to call this function on execlists virtual engines too.
> > > 
> > > Signed-off-by: Matthew Brost 
> > > ---
> > >   drivers/gpu/drm/i915/gem/i915_gem_context.c   |  5 +++--
> > >   drivers/gpu/drm/i915/gt/intel_context_types.h |  2 ++
> > >   drivers/gpu/drm/i915/gt/intel_engine.h    | 21 ++-
> > >   .../drm/i915/gt/intel_execlists_submission.c  | 14 +
> > >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  6 +-
> > >   .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  2 --
> > >   6 files changed, 26 insertions(+), 24 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > index 9c3672bac0e2..b8e01c5ba9e5 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > @@ -1090,8 +1090,9 @@ static void kill_engines(struct
> > > i915_gem_engines *engines, bool ban)
> > >    */
> > >   for_each_gem_engine(ce, engines, it) {
> > >   struct intel_engine_cs *engine;
> > > +    bool local_ban = ban || !intel_engine_has_heartbeat(ce->engine);
> 
> In any case (pending me understanding what's really going on there), why
> would this check not be in kill_context with currently does this:
> 
>   bool ban = (!i915_gem_context_is_persistent(ctx) ||
>   !ctx->i915->params.enable_hangcheck);
> ...

This gem_context level check, while the other check is per
intel_context. We don't have the intel_context here.

>   kill_engines(pos, ban);
> 
> So whether to ban decision would 

Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context

2021-07-30 Thread John Harrison

On 7/30/2021 02:49, Tvrtko Ursulin wrote:

On 30/07/2021 01:13, John Harrison wrote:

On 7/28/2021 17:34, Matthew Brost wrote:
If an engine associated with a context does not have a heartbeat, 
ban it

immediately. This is needed for GuC submission as a idle pulse doesn't
kick the context off the hardware where it then can check for a
heartbeat and ban the context.


Pulse, that is a request with I915_PRIORITY_BARRIER, does not preempt 
a running normal priority context?


Why does it matter then whether or not heartbeats are enabled - when 
heartbeat just ends up sending the same engine pulse (eventually, with 
raising priority)?
The point is that the pulse is pointless. See the rest of my comments 
below, specifically "the context will get resubmitted to the hardware 
after the pulse completes". To re-iterate...


Yes, it preempts the context. Yes, it does so whether heartbeats are 
enabled or not. But so what? Who cares? You have preempted a context. It 
is no longer running on the hardware. BUT IT IS STILL A VALID CONTEXT. 
The backend scheduler will just resubmit it to the hardware as soon as 
the pulse completes. The only reason this works at all is because of the 
horrid hack in the execlist scheduler's back end implementation (in 
__execlists_schedule_in):

    if (unlikely(intel_context_is_closed(ce) &&
 !intel_engine_has_heartbeat(engine)))
    intel_context_set_banned(ce);

The actual back end scheduler is saying "Is this a zombie context? Is 
the heartbeat disabled? Then ban it". No other scheduler backend is 
going to have knowledge of zombie context status or of the heartbeat 
status. Nor are they going to call back into the higher levels of the 
i915 driver to trigger a ban operation. Certainly a hardware implemented 
scheduler is not going to be looking at private i915 driver information 
to decide whether to submit a context or whether to tell the OS to kill 
it off instead.


For persistence to work with a hardware scheduler (or a non-Intel 
specific scheduler such as the DRM one), the handling of zombie 
contexts, banning, etc. *must* be done entirely in the front end. It 
cannot rely on any backend hacks. That means you can't rely on any fancy 
behaviour of pulses.


If you want to ban a context then you must explicitly ban that context. 
If you want to ban it at some later point then you need to track it at 
the top level as a zombie and then explicitly ban that zombie at 
whatever later point.





It's worse than this. If the engine in question is an individual 
physical engine then sending a pulse (with sufficiently high 
priority) will pre-empt the engine and kick the context off. However, 
the GuC 


Why it is different for physical vs virtual, aren't both just 
schedulable contexts with different engine masks for what GuC is 
concerned? Oh, is it a matter of needing to send pulses to all engines 
which comprise a virtual one?
It isn't different. It is totally broken for both. It is potentially 
more broken for virtual engines because of the question of which engine 
to pulse. But as stated above, the pulse is pointless anyway so the 
which engine question doesn't even matter.


John.




scheduler does not have hacks in it to check the state of the 
heartbeat or whether a context is actually a zombie or not. Thus, the 
context will get resubmitted to the hardware after the pulse 
completes and effectively nothing will have happened.


I would assume that the DRM scheduler which we are meant to be 
switching to for execlist as well as GuC submission is also unlikely 
to have hacks for zombie contexts and tests for whether the i915 
specific heartbeat has been disabled since the context became a 
zombie. So when that switch happens, this test will also fail in 
execlist mode as well as GuC mode.


The choices I see here are to simply remove persistence completely 
(it is a basically a bug that became UAPI because it wasn't caught 
soon enough!) or to implement it in a way that does not require hacks 
in the back end scheduler. Apparently, the DRM scheduler is expected 
to allow zombie contexts to persist until the DRM file handle is 
closed. So presumably we will have to go with option two.


That means flagging a context as being a zombie when it is closed but 
still active. The driver would then add it to a zombie list owned by 
the DRM client object. When that client object is closed, i915 would 
go through the list and genuinely kill all the contexts. No back end 
scheduler hacks required and no intimate knowledge of the i915 
heartbeat mechanism required either.


John.




This patch also updates intel_engine_has_heartbeat to be a vfunc as we
now need to call this function on execlists virtual engines too.

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  5 +++--
  drivers/gpu/drm/i915/gt/intel_context_types.h |  2 ++
  drivers/gpu/drm/i915/gt/intel_engine.h    | 21 
++-

  

Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context

2021-07-30 Thread Tvrtko Ursulin



On 30/07/2021 01:13, John Harrison wrote:

On 7/28/2021 17:34, Matthew Brost wrote:

If an engine associated with a context does not have a heartbeat, ban it
immediately. This is needed for GuC submission as a idle pulse doesn't
kick the context off the hardware where it then can check for a
heartbeat and ban the context.


Pulse, that is a request with I915_PRIORITY_BARRIER, does not preempt a 
running normal priority context?


Why does it matter then whether or not heartbeats are enabled - when 
heartbeat just ends up sending the same engine pulse (eventually, with 
raising priority)?


It's worse than this. If the engine in question is an individual 
physical engine then sending a pulse (with sufficiently high priority) 
will pre-empt the engine and kick the context off. However, the GuC 


Why it is different for physical vs virtual, aren't both just 
schedulable contexts with different engine masks for what GuC is 
concerned? Oh, is it a matter of needing to send pulses to all engines 
which comprise a virtual one?


scheduler does not have hacks in it to check the state of the heartbeat 
or whether a context is actually a zombie or not. Thus, the context will 
get resubmitted to the hardware after the pulse completes and 
effectively nothing will have happened.


I would assume that the DRM scheduler which we are meant to be switching 
to for execlist as well as GuC submission is also unlikely to have hacks 
for zombie contexts and tests for whether the i915 specific heartbeat 
has been disabled since the context became a zombie. So when that switch 
happens, this test will also fail in execlist mode as well as GuC mode.


The choices I see here are to simply remove persistence completely (it 
is a basically a bug that became UAPI because it wasn't caught soon 
enough!) or to implement it in a way that does not require hacks in the 
back end scheduler. Apparently, the DRM scheduler is expected to allow 
zombie contexts to persist until the DRM file handle is closed. So 
presumably we will have to go with option two.


That means flagging a context as being a zombie when it is closed but 
still active. The driver would then add it to a zombie list owned by the 
DRM client object. When that client object is closed, i915 would go 
through the list and genuinely kill all the contexts. No back end 
scheduler hacks required and no intimate knowledge of the i915 heartbeat 
mechanism required either.


John.




This patch also updates intel_engine_has_heartbeat to be a vfunc as we
now need to call this function on execlists virtual engines too.

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  5 +++--
  drivers/gpu/drm/i915/gt/intel_context_types.h |  2 ++
  drivers/gpu/drm/i915/gt/intel_engine.h    | 21 ++-
  .../drm/i915/gt/intel_execlists_submission.c  | 14 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  6 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  2 --
  6 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c

index 9c3672bac0e2..b8e01c5ba9e5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1090,8 +1090,9 @@ static void kill_engines(struct i915_gem_engines 
*engines, bool ban)

   */
  for_each_gem_engine(ce, engines, it) {
  struct intel_engine_cs *engine;
+    bool local_ban = ban || !intel_engine_has_heartbeat(ce->engine);


In any case (pending me understanding what's really going on there), why 
would this check not be in kill_context with currently does this:


bool ban = (!i915_gem_context_is_persistent(ctx) ||
!ctx->i915->params.enable_hangcheck);
...
kill_engines(pos, ban);

So whether to ban decision would be consolidated to one place.

In fact, decision on whether to allow persistent is tied to 
enable_hangcheck, which also drives hearbeat emission. So perhaps one 
part of the correct fix is to extend the above (kill_context) ban 
criteria to include hearbeat values anyway. Otherwise isn't it a simple 
miss that this check fails to account to hearbeat disablement via sysfs?


Regards,

Tvrtko


-    if (ban && intel_context_ban(ce, NULL))
+    if (local_ban && intel_context_ban(ce, NULL))
  continue;
  /*
@@ -1104,7 +1105,7 @@ static void kill_engines(struct i915_gem_engines 
*engines, bool ban)

  engine = active_engine(ce);
  /* First attempt to gracefully cancel the context */
-    if (engine && !__cancel_engine(engine) && ban)
+    if (engine && !__cancel_engine(engine) && local_ban)
  /*
   * If we are unable to send a preemptive pulse to bump
   * the context from the GPU, we have to resort to a full
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 

Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context

2021-07-29 Thread John Harrison

On 7/28/2021 17:34, Matthew Brost wrote:

If an engine associated with a context does not have a heartbeat, ban it
immediately. This is needed for GuC submission as a idle pulse doesn't
kick the context off the hardware where it then can check for a
heartbeat and ban the context.
It's worse than this. If the engine in question is an individual 
physical engine then sending a pulse (with sufficiently high priority) 
will pre-empt the engine and kick the context off. However, the GuC 
scheduler does not have hacks in it to check the state of the heartbeat 
or whether a context is actually a zombie or not. Thus, the context will 
get resubmitted to the hardware after the pulse completes and 
effectively nothing will have happened.


I would assume that the DRM scheduler which we are meant to be switching 
to for execlist as well as GuC submission is also unlikely to have hacks 
for zombie contexts and tests for whether the i915 specific heartbeat 
has been disabled since the context became a zombie. So when that switch 
happens, this test will also fail in execlist mode as well as GuC mode.


The choices I see here are to simply remove persistence completely (it 
is a basically a bug that became UAPI because it wasn't caught soon 
enough!) or to implement it in a way that does not require hacks in the 
back end scheduler. Apparently, the DRM scheduler is expected to allow 
zombie contexts to persist until the DRM file handle is closed. So 
presumably we will have to go with option two.


That means flagging a context as being a zombie when it is closed but 
still active. The driver would then add it to a zombie list owned by the 
DRM client object. When that client object is closed, i915 would go 
through the list and genuinely kill all the contexts. No back end 
scheduler hacks required and no intimate knowledge of the i915 heartbeat 
mechanism required either.


John.




This patch also updates intel_engine_has_heartbeat to be a vfunc as we
now need to call this function on execlists virtual engines too.

Signed-off-by: Matthew Brost 
---
  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  5 +++--
  drivers/gpu/drm/i915/gt/intel_context_types.h |  2 ++
  drivers/gpu/drm/i915/gt/intel_engine.h| 21 ++-
  .../drm/i915/gt/intel_execlists_submission.c  | 14 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  6 +-
  .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  2 --
  6 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 9c3672bac0e2..b8e01c5ba9e5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1090,8 +1090,9 @@ static void kill_engines(struct i915_gem_engines 
*engines, bool ban)
 */
for_each_gem_engine(ce, engines, it) {
struct intel_engine_cs *engine;
+   bool local_ban = ban || !intel_engine_has_heartbeat(ce->engine);
  
-		if (ban && intel_context_ban(ce, NULL))

+   if (local_ban && intel_context_ban(ce, NULL))
continue;
  
  		/*

@@ -1104,7 +1105,7 @@ static void kill_engines(struct i915_gem_engines 
*engines, bool ban)
engine = active_engine(ce);
  
  		/* First attempt to gracefully cancel the context */

-   if (engine && !__cancel_engine(engine) && ban)
+   if (engine && !__cancel_engine(engine) && local_ban)
/*
 * If we are unable to send a preemptive pulse to bump
 * the context from the GPU, we have to resort to a full
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index e54351a170e2..65f2eb2a78e4 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -55,6 +55,8 @@ struct intel_context_ops {
void (*reset)(struct intel_context *ce);
void (*destroy)(struct kref *kref);
  
+	bool (*has_heartbeat)(const struct intel_engine_cs *engine);

+
/* virtual engine/context interface */
struct intel_context *(*create_virtual)(struct intel_engine_cs **engine,
unsigned int count);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
b/drivers/gpu/drm/i915/gt/intel_engine.h
index c2a5640ae055..1b11a808acc4 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -283,28 +283,11 @@ struct intel_context *
  intel_engine_create_virtual(struct intel_engine_cs **siblings,
unsigned int count);
  
-static inline bool

-intel_virtual_engine_has_heartbeat(const struct intel_engine_cs *engine)
-{
-   /*
-* For non-GuC submission we expect the back-end to look at the
-* heartbeat status of the actual physical engine that the work
-