Re: [Intel-gfx] [PATCH 1/3] i915/perf: Store a mask of valid OA formats for a platform

2021-02-02 Thread Chris Wilson
Quoting Umesh Nerlige Ramappa (2021-02-02 20:10:44)
> On Tue, Feb 02, 2021 at 08:24:15AM +0000, Chris Wilson wrote:
> >Ok, this looks as compact and readable as writing it as a bunch of
> >tables. I presume there's a reason you didn't just use generation rather
> >than platform.
> >
> >switch (gen) {
> >case 7:
> >   haswell();
> >   break;
> >case 8 .. 11:
> >   broadwell();
> >   break;
> >case 12:
> >   tigerlake();
> >   break;
> >}
> >if you wanted to stick with a switch rather than an if-else tree for the
> >ranges.
> 
> only haswell is supported on gen7 and gen12 may define new formats that 
> are platform specific.
> 
> How about a mix? -
> 
> if (gen == 7 && haswell)
> haswell();
> else if (gen >= 8 && gen <= 11)
> broadwell;
> else
> gen12_formats();
> 
> gen12_formats can choose to use the switch if formats vary between 
> platforms.

I didn't mind the platform switch too much, so no need to change at the
moment. I just worry that it's more typing to maintain :)

What I thought you were going to do (from the subject) were tables with
a platform_mask for applicability,  but that I feell would be just as
much typing, now and in the future.


I thought support started at Haswell, so the other gen7 were not a
concern? But yes, if we look at how we end up doing it else where it's a
mix of gen and platform

if (gen >= 12)
gen12_formats;
else if (gen >= 8)
gen8_formats;
else if (IS_HSW)
hsw_formats;
else
MISSING_CASE(gen)

At the end of the day, you're the person who is typing this, so it's up
to you how much effort you want to spend now to save later. :)
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [CI 08/14] drm/i915/selftests: Force a rewind if at first we don't succeed

2021-02-02 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-02-02 16:52:18)
> 
> On 02/02/2021 15:14, Chris Wilson wrote:
> > live_timeslice_rewind assumes a particular traversal and reordering
> > after the first timeslice yield. However, the outcome can be either
> > (A1, A2, B1) or (A1, B2, A2) depending on the path taken through the
> > dependency graph. So if we do not get the outcome we need at first, give
> > it a priority kick to force a rewind.
> > 
> > Signed-off-by: Chris Wilson 
> > ---
> >   drivers/gpu/drm/i915/gt/selftest_execlists.c | 21 +++-
> >   1 file changed, 20 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c 
> > b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > index 951e2bf867e1..68e1398704a4 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > @@ -1107,6 +1107,7 @@ static int live_timeslice_rewind(void *arg)
> >   struct i915_request *rq[3] = {};
> >   struct intel_context *ce;
> >   unsigned long timeslice;
> > + unsigned long timeout;
> >   int i, err = 0;
> >   u32 *slot;
> >   
> > @@ -1173,11 +1174,29 @@ static int live_timeslice_rewind(void *arg)
> >   
> >   /* ELSP[] = { { A:rq1, A:rq2 }, { B:rq1 } } */
> >   ENGINE_TRACE(engine, "forcing tasklet for rewind\n");
> > - while (i915_request_is_active(rq[A2])) { /* semaphore yield! 
> > */
> > + i = 0;
> > + timeout = jiffies + HZ;
> > + while (i915_request_is_active(rq[A2]) &&
> > +time_before(jiffies, timeout)) { /* semaphore yield! */
> >   /* Wait for the timeslice to kick in */
> >   del_timer(>execlists.timer);
> >   tasklet_hi_schedule(>execlists.tasklet);
> >   intel_engine_flush_submission(engine);
> > +
> > + /*
> > +  * Unfortunately this assumes that during the
> > +  * search of the wait tree it sees the requests
> > +  * in a particular order. That order is not
> > +  * strictly determined and it may pick either
> > +  * A2 or B1 to immediately follow A1.
> > +  *
> > +  * Break the tie with a set-priority. This defeats
> > +  * the goal of trying to cause a rewind with a
> > +  * timeslice, but alas, a rewind is better than
> > +  * none.
> > +  */
> > + if (i++)
> > + i915_request_set_priority(rq[B1], 1);
> >   }
> >   /* -> ELSP[] = { { A:rq1 }, { B:rq1 } } */
> >   GEM_BUG_ON(!i915_request_is_active(rq[A1]));
> > 
> 
> Didn't fully get the intricacies of the test, but, how about not messing 
> with priorities but just kicking it for longer until it eventually 
> re-orders to the desired sequence? Surely if it keeps insisting of the 
> same order which is making no progress there is a flaw in timeslicing 
> anyway? Or if it fails skip the test.

Ah. The test is trying to prove internals of the ELSP[] behave in a
certain manner without forcing it to. However, there's no requirement
for it to do anything of the sort.

[What is the test trying to prove? That on timeslice we are capable of
removing a request from an earlier context to allow early switching to a
second context. This requires us to force the context switch to prevent
the currently executing context from keeping its RING_TAIL (which points
at the A2) but resample it so that it ends at A1. We attempt to prove
that with independent spinners, if we don't reset A2 then it will remain
executing instead of switching to B2 as we expect.]

So what happens is that we queue

[{A1, A2}, {B1}]

trigger a timeslice [by forcing the timer expiry]

and expect us to rearrange ELSP 

as [{A1}, {B2}]

because B2 depends on A1, on every timeslice that pair must be in that
order.

And we are looking for A2 to be back in the queue.

Since A2 has no dependency on B2, and vice versa that is a free
variable. Everytime we walk the graph, we start with deferring
A1, then A2, then B2. Looking at the graph in the same order everytime,
and end up packing {A1, A2} together into the same context submission.

You are right that if we allowed A1 to finish, then the timeslicing would
reverse A2, B2. However, we don't let spinner A1 finis

Re: [Intel-gfx] [CI 07/14] drm/i915/selftests: Exercise priority inheritance around an engine loop

2021-02-02 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-02-02 16:44:26)
> 
> On 02/02/2021 15:14, Chris Wilson wrote:
> > + err = 0;
> > + count = 0;
> > + for_each_uabi_engine(engine, i915) {
> > + if (!intel_engine_has_scheduler(engine))
> > + continue;
> > +
> > + rq = __write_timestamp(engine, obj, count, rq);
> > + if (IS_ERR(rq)) {
> > + err = PTR_ERR(rq);
> > + break;
> > + }
> > +
> > + count++;
> > + }
> 
>   - two of the same by copy error or couldn't be bothered 
> with outer loop?

It was just my thought process at the time, I wanted the
A->Z; A->Z pair so that it clear that it was cyclic and just didn't
think of putting it inside another loop.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [CI] Revert "ALSA: jack: implement software jack injection via debugfs"

2021-02-02 Thread Chris Wilson
This reverts commit 2d670ea2bd53a9792f453bb5b97cb8ef695988ff.
---
 Documentation/sound/designs/index.rst |   1 -
 .../sound/designs/jack-injection.rst  | 166 --
 include/sound/core.h  |   6 -
 include/sound/jack.h  |   1 -
 sound/core/Kconfig|   9 -
 sound/core/init.c |  16 -
 sound/core/jack.c | 304 +-
 sound/core/sound.c|  13 -
 8 files changed, 4 insertions(+), 512 deletions(-)
 delete mode 100644 Documentation/sound/designs/jack-injection.rst

diff --git a/Documentation/sound/designs/index.rst 
b/Documentation/sound/designs/index.rst
index 1eb08e7bae52..f0749943ccb2 100644
--- a/Documentation/sound/designs/index.rst
+++ b/Documentation/sound/designs/index.rst
@@ -14,4 +14,3 @@ Designs and Implementations
powersave
oss-emulation
seq-oss
-   jack-injection
diff --git a/Documentation/sound/designs/jack-injection.rst 
b/Documentation/sound/designs/jack-injection.rst
deleted file mode 100644
index f9790521523e..
--- a/Documentation/sound/designs/jack-injection.rst
+++ /dev/null
@@ -1,166 +0,0 @@
-
-ALSA Jack Software Injection
-
-
-Simple Introduction On Jack Injection
-=
-
-Here jack injection means users could inject plugin or plugout events
-to the audio jacks through debugfs interface, it is helpful to
-validate ALSA userspace changes. For example, we change the audio
-profile switching code in the pulseaudio, and we want to verify if the
-change works as expected and if the change introduce the regression,
-in this case, we could inject plugin or plugout events to an audio
-jack or to some audio jacks, we don't need to physically access the
-machine and plug/unplug physical devices to the audio jack.
-
-In this design, an audio jack doesn't equal to a physical audio jack.
-Sometimes a physical audio jack contains multi functions, and the
-ALSA driver creates multi ``jack_kctl`` for a ``snd_jack``, here the
-``snd_jack`` represents a physical audio jack and the ``jack_kctl``
-represents a function, for example a physical jack has two functions:
-headphone and mic_in, the ALSA ASoC driver will build 2 ``jack_kctl``
-for this jack. The jack injection is implemented based on the
-``jack_kctl`` instead of ``snd_jack``.
-
-To inject events to audio jacks, we need to enable the jack injection
-via ``sw_inject_enable`` first, once it is enabled, this jack will not
-change the state by hardware events anymore, we could inject plugin or
-plugout events via ``jackin_inject`` and check the jack state via
-``status``, after we finish our test, we need to disable the jack
-injection via ``sw_inject_enable`` too, once it is disabled, the jack
-state will be restored according to the last reported hardware events
-and will change by future hardware events.
-
-The Layout of Jack Injection Interface
-==
-
-If users enable the SND_JACK_INJECTION_DEBUG in the kernel, the audio
-jack injection interface will be created as below:
-::
-
-   $debugfs_mount_dir/sound
-   |-- card0
-   |-- |-- HDMI_DP_pcm_10_Jack
-   |-- |-- |-- jackin_inject
-   |-- |-- |-- kctl_id
-   |-- |-- |-- mask_bits
-   |-- |-- |-- status
-   |-- |-- |-- sw_inject_enable
-   |-- |-- |-- type
-   ...
-   |-- |-- HDMI_DP_pcm_9_Jack
-   |-- |-- jackin_inject
-   |-- |-- kctl_id
-   |-- |-- mask_bits
-   |-- |-- status
-   |-- |-- sw_inject_enable
-   |-- |-- type
-   |-- card1
-   |-- HDMI_DP_pcm_5_Jack
-   |-- |-- jackin_inject
-   |-- |-- kctl_id
-   |-- |-- mask_bits
-   |-- |-- status
-   |-- |-- sw_inject_enable
-   |-- |-- type
-   ...
-   |-- Headphone_Jack
-   |-- |-- jackin_inject
-   |-- |-- kctl_id
-   |-- |-- mask_bits
-   |-- |-- status
-   |-- |-- sw_inject_enable
-   |-- |-- type
-   |-- Headset_Mic_Jack
-   |-- jackin_inject
-   |-- kctl_id
-   |-- mask_bits
-   |-- status
-   |-- sw_inject_enable
-   |-- type
-
-The Explanation Of The Nodes
-==
-
-kctl_id
-  read-only, get jack_kctl->kctl's id
-  ::
-
- sound/card1/Headphone_Jack# cat kctl_id
- Headphone Jack
-
-mask_bits
-  read-only, get jack_kctl's supported events mask_bits
-  ::
-
- sound/card1/Headphone_Jack# cat mask_bits
- 0x0001 HEADPHONE(0x0001)
-
-status
-  read-only, get jack_kctl's current status
-
-- headphone unplugged:
-
-  ::
-
- sound/card1/Headphone_Jack# cat status
- Unplugged
-
-- headphone plugged:
-
-  ::
-
- sound/card1/Headphone_Jack# cat status
- Plugged
-
-type
-  read-only, get snd_jack's supported events from type (all supported events 
on the physical audio jack)
-  ::
-
- sound/card1/Headphone_Jack# cat type

[Intel-gfx] [PATCH v2] drm/i915/gt: Move CS interrupt handler to the backend

2021-02-02 Thread Chris Wilson
The different submission backends each have their own preferred
behaviour and interrupt setup. Let each handle their own interrupts.

This becomes more useful later as we to extract the use of auxiliary
state in the interrupt handler that is backend specific.

v2: An overabundance of caution is always justified; put a barrier on
updating the irq handler so that we know that the next interrupt will
be redirected towards ourselves.

Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c |  7 ++
 drivers/gpu/drm/i915/gt/intel_engine_types.h  | 14 +---
 .../drm/i915/gt/intel_execlists_submission.c  | 40 +
 drivers/gpu/drm/i915/gt/intel_gt_irq.c| 82 ++-
 drivers/gpu/drm/i915/gt/intel_gt_irq.h| 22 +
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  7 ++
 drivers/gpu/drm/i915/gt/intel_rps.c   |  2 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 10 ++-
 drivers/gpu/drm/i915/i915_irq.c   |  8 +-
 9 files changed, 118 insertions(+), 74 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index c7d17f8767a1..e06ae4ae1710 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -255,6 +255,11 @@ static void intel_engine_sanitize_mmio(struct 
intel_engine_cs *engine)
intel_engine_set_hwsp_writemask(engine, ~0u);
 }
 
+static void nop_irq_handler(struct intel_engine_cs *engine, u32 iir)
+{
+   GEM_DEBUG_WARN_ON(iir);
+}
+
 static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
 {
const struct engine_info *info = _engines[id];
@@ -292,6 +297,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum 
intel_engine_id id)
engine->hw_id = info->hw_id;
engine->guc_id = MAKE_GUC_ID(info->class, info->instance);
 
+   engine->irq_handler = nop_irq_handler;
+
engine->class = info->class;
engine->instance = info->instance;
__sprint_engine_name(engine);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 9d59de5c559a..7fd035d45263 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -402,6 +402,7 @@ struct intel_engine_cs {
u32 irq_enable_mask; /* bitmask to enable ring interrupt */
void(*irq_enable)(struct intel_engine_cs *engine);
void(*irq_disable)(struct intel_engine_cs *engine);
+   void(*irq_handler)(struct intel_engine_cs *engine, u32 iir);
 
void(*sanitize)(struct intel_engine_cs *engine);
int (*resume)(struct intel_engine_cs *engine);
@@ -481,10 +482,9 @@ struct intel_engine_cs {
 #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
 #define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
 #define I915_ENGINE_HAS_TIMESLICES   BIT(4)
-#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(5)
-#define I915_ENGINE_IS_VIRTUAL   BIT(6)
-#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(7)
-#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(8)
+#define I915_ENGINE_IS_VIRTUAL   BIT(5)
+#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6)
+#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
unsigned int flags;
 
/*
@@ -588,12 +588,6 @@ intel_engine_has_timeslices(const struct intel_engine_cs 
*engine)
return engine->flags & I915_ENGINE_HAS_TIMESLICES;
 }
 
-static inline bool
-intel_engine_needs_breadcrumb_tasklet(const struct intel_engine_cs *engine)
-{
-   return engine->flags & I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
-}
-
 static inline bool
 intel_engine_is_virtual(const struct intel_engine_cs *engine)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 4ddd2099a931..05846f97f1af 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -2394,6 +2394,45 @@ static void execlists_submission_tasklet(struct 
tasklet_struct *t)
rcu_read_unlock();
 }
 
+static void execlists_irq_handler(struct intel_engine_cs *engine, u32 iir)
+{
+   bool tasklet = false;
+
+   if (unlikely(iir & GT_CS_MASTER_ERROR_INTERRUPT)) {
+   u32 eir;
+
+   /* Upper 16b are the enabling mask, rsvd for internal errors */
+   eir = ENGINE_READ(engine, RING_EIR) & GENMASK(15, 0);
+   ENGINE_TRACE(engine, "CS error: %x\n", eir);
+
+   /* Disable the error interrupt until after the reset */
+   if (likely(eir)) {
+   ENGINE_WRITE(engine, RING_EMR, ~0u);
+   ENGINE_WRITE(engine, RING_EIR, eir);
+   WRITE_ONCE(engine->execlists.error_interrupt, eir);
+   

Re: [Intel-gfx] [CI 03/14] drm/i915/gt: Move CS interrupt handler to the backend

2021-02-02 Thread Chris Wilson
Quoting Chris Wilson (2021-02-02 15:53:41)
> Quoting Tvrtko Ursulin (2021-02-02 15:49:59)
> > 
> > On 02/02/2021 15:14, Chris Wilson wrote:
> > > The different submission backends each have their own preferred
> > > behaviour and interrupt setup. Let each handle their own interrupts.
> > > 
> > > This becomes more useful later as we to extract the use of auxiliary
> > > state in the interrupt handler that is backend specific.
> > > 
> > > Signed-off-by: Chris Wilson 
> > > ---
> > >   drivers/gpu/drm/i915/gt/intel_engine_cs.c |  7 ++
> > >   drivers/gpu/drm/i915/gt/intel_engine_types.h  | 14 +---
> > >   .../drm/i915/gt/intel_execlists_submission.c  | 40 +
> > >   drivers/gpu/drm/i915/gt/intel_gt_irq.c| 82 ++-
> > >   drivers/gpu/drm/i915/gt/intel_gt_irq.h|  7 ++
> > >   .../gpu/drm/i915/gt/intel_ring_submission.c   |  7 ++
> > >   drivers/gpu/drm/i915/gt/intel_rps.c   |  2 +-
> > >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 10 ++-
> > >   drivers/gpu/drm/i915/i915_irq.c   |  8 +-
> > >   9 files changed, 103 insertions(+), 74 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
> > > b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > index dab8d734e272..2a453ba5f25a 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > @@ -255,6 +255,11 @@ static void intel_engine_sanitize_mmio(struct 
> > > intel_engine_cs *engine)
> > >   intel_engine_set_hwsp_writemask(engine, ~0u);
> > >   }
> > >   
> > > +static void nop_irq_handler(struct intel_engine_cs *engine, u32 iir)
> > > +{
> > > + GEM_DEBUG_WARN_ON(iir);
> > > +}
> > > +
> > >   static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id 
> > > id)
> > >   {
> > >   const struct engine_info *info = _engines[id];
> > > @@ -292,6 +297,8 @@ static int intel_engine_setup(struct intel_gt *gt, 
> > > enum intel_engine_id id)
> > >   engine->hw_id = info->hw_id;
> > >   engine->guc_id = MAKE_GUC_ID(info->class, info->instance);
> > >   
> > > + engine->irq_handler = nop_irq_handler;
> > > +
> > >   engine->class = info->class;
> > >   engine->instance = info->instance;
> > >   __sprint_engine_name(engine);
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
> > > b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > index 9d59de5c559a..7fd035d45263 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > @@ -402,6 +402,7 @@ struct intel_engine_cs {
> > >   u32 irq_enable_mask; /* bitmask to enable ring 
> > > interrupt */
> > >   void(*irq_enable)(struct intel_engine_cs *engine);
> > >   void(*irq_disable)(struct intel_engine_cs *engine);
> > > + void(*irq_handler)(struct intel_engine_cs *engine, u32 
> > > iir);
> > >   
> > >   void(*sanitize)(struct intel_engine_cs *engine);
> > >   int (*resume)(struct intel_engine_cs *engine);
> > > @@ -481,10 +482,9 @@ struct intel_engine_cs {
> > >   #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
> > >   #define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
> > >   #define I915_ENGINE_HAS_TIMESLICES   BIT(4)
> > > -#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(5)
> > > -#define I915_ENGINE_IS_VIRTUAL   BIT(6)
> > > -#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(7)
> > > -#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(8)
> > > +#define I915_ENGINE_IS_VIRTUAL   BIT(5)
> > > +#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6)
> > > +#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
> > >   unsigned int flags;
> > >   
> > >   /*
> > > @@ -588,12 +588,6 @@ intel_engine_has_timeslices(const struct 
> > > intel_engine_cs *engine)
> > >   return engine->flags & I915_ENGINE_HAS_TIMESLICES;
> > >   }
> > >   
> > > -static inline bool
> > > -intel_engine_needs_breadcrumb_tasklet(const struct intel_engine_cs 
> > > *engine)
> > > -{
> > > - return engine->flags & I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
> > > -}

Re: [Intel-gfx] [CI 03/14] drm/i915/gt: Move CS interrupt handler to the backend

2021-02-02 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-02-02 15:49:59)
> 
> On 02/02/2021 15:14, Chris Wilson wrote:
> > The different submission backends each have their own preferred
> > behaviour and interrupt setup. Let each handle their own interrupts.
> > 
> > This becomes more useful later as we to extract the use of auxiliary
> > state in the interrupt handler that is backend specific.
> > 
> > Signed-off-by: Chris Wilson 
> > ---
> >   drivers/gpu/drm/i915/gt/intel_engine_cs.c |  7 ++
> >   drivers/gpu/drm/i915/gt/intel_engine_types.h  | 14 +---
> >   .../drm/i915/gt/intel_execlists_submission.c  | 40 +
> >   drivers/gpu/drm/i915/gt/intel_gt_irq.c| 82 ++-
> >   drivers/gpu/drm/i915/gt/intel_gt_irq.h|  7 ++
> >   .../gpu/drm/i915/gt/intel_ring_submission.c   |  7 ++
> >   drivers/gpu/drm/i915/gt/intel_rps.c   |  2 +-
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 10 ++-
> >   drivers/gpu/drm/i915/i915_irq.c   |  8 +-
> >   9 files changed, 103 insertions(+), 74 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
> > b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > index dab8d734e272..2a453ba5f25a 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > @@ -255,6 +255,11 @@ static void intel_engine_sanitize_mmio(struct 
> > intel_engine_cs *engine)
> >   intel_engine_set_hwsp_writemask(engine, ~0u);
> >   }
> >   
> > +static void nop_irq_handler(struct intel_engine_cs *engine, u32 iir)
> > +{
> > + GEM_DEBUG_WARN_ON(iir);
> > +}
> > +
> >   static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id 
> > id)
> >   {
> >   const struct engine_info *info = _engines[id];
> > @@ -292,6 +297,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum 
> > intel_engine_id id)
> >   engine->hw_id = info->hw_id;
> >   engine->guc_id = MAKE_GUC_ID(info->class, info->instance);
> >   
> > + engine->irq_handler = nop_irq_handler;
> > +
> >   engine->class = info->class;
> >   engine->instance = info->instance;
> >   __sprint_engine_name(engine);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
> > b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > index 9d59de5c559a..7fd035d45263 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > @@ -402,6 +402,7 @@ struct intel_engine_cs {
> >   u32 irq_enable_mask; /* bitmask to enable ring interrupt 
> > */
> >   void(*irq_enable)(struct intel_engine_cs *engine);
> >   void(*irq_disable)(struct intel_engine_cs *engine);
> > + void(*irq_handler)(struct intel_engine_cs *engine, u32 
> > iir);
> >   
> >   void(*sanitize)(struct intel_engine_cs *engine);
> >   int (*resume)(struct intel_engine_cs *engine);
> > @@ -481,10 +482,9 @@ struct intel_engine_cs {
> >   #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
> >   #define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
> >   #define I915_ENGINE_HAS_TIMESLICES   BIT(4)
> > -#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(5)
> > -#define I915_ENGINE_IS_VIRTUAL   BIT(6)
> > -#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(7)
> > -#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(8)
> > +#define I915_ENGINE_IS_VIRTUAL   BIT(5)
> > +#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6)
> > +#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
> >   unsigned int flags;
> >   
> >   /*
> > @@ -588,12 +588,6 @@ intel_engine_has_timeslices(const struct 
> > intel_engine_cs *engine)
> >   return engine->flags & I915_ENGINE_HAS_TIMESLICES;
> >   }
> >   
> > -static inline bool
> > -intel_engine_needs_breadcrumb_tasklet(const struct intel_engine_cs *engine)
> > -{
> > - return engine->flags & I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
> > -}
> > -
> >   static inline bool
> >   intel_engine_is_virtual(const struct intel_engine_cs *engine)
> >   {
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
> > b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > index 4ddd2099a931..ed62e4b549d2 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > @@ -2394,6 +2394,45 @@ st

[Intel-gfx] [PATCH 1/2] drm/i915: Remove notion of GEM from i915_gem_shrinker_taints_mutex

2021-02-02 Thread Chris Wilson
Since we dropped the use of dev->struct_mutex from inside the shrinker,
we no longer include that as part of our fs_reclaim tainting. We can
drop the i915 argument and rebrand it as a generic fs_reclaim tainter.

Signed-off-by: Chris Wilson 
Cc: Thomas Hellström 
Reviewed-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c   |  3 +--
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 14 --
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.h |  2 --
 drivers/gpu/drm/i915/gt/intel_gtt.c  |  2 +-
 drivers/gpu/drm/i915/gt/intel_reset.c|  2 +-
 drivers/gpu/drm/i915/i915_utils.c| 13 +
 drivers/gpu/drm/i915/i915_utils.h|  2 ++
 7 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 70f798405f7f..6cdff5fc5882 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -86,8 +86,7 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
mutex_init(>mm.get_dma_page.lock);
 
if (IS_ENABLED(CONFIG_LOCKDEP) && i915_gem_object_is_shrinkable(obj))
-   i915_gem_shrinker_taints_mutex(to_i915(obj->base.dev),
-  >mm.lock);
+   fs_reclaim_taints_mutex(>mm.lock);
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c 
b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
index c2dba1cd9532..b64a0788381f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
@@ -415,20 +415,6 @@ void i915_gem_driver_unregister__shrinker(struct 
drm_i915_private *i915)
unregister_shrinker(>mm.shrinker);
 }
 
-void i915_gem_shrinker_taints_mutex(struct drm_i915_private *i915,
-   struct mutex *mutex)
-{
-   if (!IS_ENABLED(CONFIG_LOCKDEP))
-   return;
-
-   fs_reclaim_acquire(GFP_KERNEL);
-
-   mutex_acquire(>dep_map, 0, 0, _RET_IP_);
-   mutex_release(>dep_map, _RET_IP_);
-
-   fs_reclaim_release(GFP_KERNEL);
-}
-
 #define obj_to_i915(obj__) to_i915((obj__)->base.dev)
 
 void i915_gem_object_make_unshrinkable(struct drm_i915_gem_object *obj)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h 
b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h
index b397d7785789..a25754a51ac3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h
@@ -25,7 +25,5 @@ unsigned long i915_gem_shrink(struct drm_i915_private *i915,
 unsigned long i915_gem_shrink_all(struct drm_i915_private *i915);
 void i915_gem_driver_register__shrinker(struct drm_i915_private *i915);
 void i915_gem_driver_unregister__shrinker(struct drm_i915_private *i915);
-void i915_gem_shrinker_taints_mutex(struct drm_i915_private *i915,
-   struct mutex *mutex);
 
 #endif /* __I915_GEM_SHRINKER_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 04aa6601e984..d34770ae4c9a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -97,7 +97,7 @@ void i915_address_space_init(struct i915_address_space *vm, 
int subclass)
 */
mutex_init(>mutex);
lockdep_set_subclass(>mutex, subclass);
-   i915_gem_shrinker_taints_mutex(vm->i915, >mutex);
+   fs_reclaim_taints_mutex(>mutex);
 
GEM_BUG_ON(!vm->total);
drm_mm_init(>mm, 0, vm->total);
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c 
b/drivers/gpu/drm/i915/gt/intel_reset.c
index c8cf3981ad7f..7638fb2a45f4 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -1401,7 +1401,7 @@ void intel_gt_init_reset(struct intel_gt *gt)
 * within the shrinker, we forbid ourselves from performing any
 * fs-reclaim or taking related locks during reset.
 */
-   i915_gem_shrinker_taints_mutex(gt->i915, >reset.mutex);
+   fs_reclaim_taints_mutex(>reset.mutex);
 
/* no GPU until we are ready! */
__set_bit(I915_WEDGED, >reset.flags);
diff --git a/drivers/gpu/drm/i915/i915_utils.c 
b/drivers/gpu/drm/i915/i915_utils.c
index f9e780dee9de..90c7f0c4838c 100644
--- a/drivers/gpu/drm/i915/i915_utils.c
+++ b/drivers/gpu/drm/i915/i915_utils.c
@@ -114,3 +114,16 @@ void set_timer_ms(struct timer_list *t, unsigned long 
timeout)
/* Keep t->expires = 0 reserved to indicate a canceled timer. */
mod_timer(t, jiffies + timeout ?: 1);
 }
+
+void fs_reclaim_taints_mutex(struct mutex *mutex)
+{
+   if (!IS_ENABLED(CONFIG_LOCKDEP))
+   return;
+
+   fs_reclaim_acquire(GFP_KERNEL);
+
+   mutex_acquire(>dep_map, 0, 0, _RET_IP_);
+   mutex_release(>dep_map, _RET_IP_);
+
+   fs_reclaim_release(GFP_KERNEL);
+}
diff --gi

[Intel-gfx] [PATCH 2/2] drm/i915: Lift marking a lock as used to utils

2021-02-02 Thread Chris Wilson
After calling lock_set_subclass() the lock _must_ be used, or else
lockdep's internal nr_used_locks becomes unbalanced. Extract the little
utility function to i915_utils.c

Signed-off-by: Chris Wilson 
Cc: Thomas Hellström 
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c | 13 +
 drivers/gpu/drm/i915/i915_utils.c | 15 +++
 drivers/gpu/drm/i915/i915_utils.h |  7 +++
 3 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 56fb9cece71b..f11ea72645ac 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -769,18 +769,7 @@ intel_engine_init_active(struct intel_engine_cs *engine, 
unsigned int subclass)
 
spin_lock_init(>active.lock);
lockdep_set_subclass(>active.lock, subclass);
-
-   /*
-* Due to an interesting quirk in lockdep's internal debug tracking,
-* after setting a subclass we must ensure the lock is used. Otherwise,
-* nr_unused_locks is incremented once too often.
-*/
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
-   local_irq_disable();
-   lock_map_acquire(>active.lock.dep_map);
-   lock_map_release(>active.lock.dep_map);
-   local_irq_enable();
-#endif
+   mark_lock_used_irq(>active.lock);
 }
 
 static struct intel_context *
diff --git a/drivers/gpu/drm/i915/i915_utils.c 
b/drivers/gpu/drm/i915/i915_utils.c
index 90c7f0c4838c..894de60833ec 100644
--- a/drivers/gpu/drm/i915/i915_utils.c
+++ b/drivers/gpu/drm/i915/i915_utils.c
@@ -127,3 +127,18 @@ void fs_reclaim_taints_mutex(struct mutex *mutex)
 
fs_reclaim_release(GFP_KERNEL);
 }
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+void __mark_lock_used_irq(struct lockdep_map *lock)
+{
+   /*
+* Due to an interesting quirk in lockdep's internal debug tracking,
+* after setting a subclass we must ensure the lock is used. Otherwise,
+* nr_unused_locks is incremented once too often.
+*/
+   local_irq_disable();
+   lock_map_acquire(lock);
+   lock_map_release(lock);
+   local_irq_enable();
+}
+#endif
diff --git a/drivers/gpu/drm/i915/i915_utils.h 
b/drivers/gpu/drm/i915/i915_utils.h
index 3f616d00de42..610616d6bf29 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -450,6 +450,13 @@ static inline bool timer_expired(const struct timer_list 
*t)
return timer_active(t) && !timer_pending(t);
 }
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+void __mark_lock_used_irq(struct lockdep_map *lock);
+#define mark_lock_used_irq(lock) __mark_lock_used_irq(&(lock)->dep_map)
+#else
+#define mark_lock_used_irq(lock)
+#endif
+
 /*
  * This is a lookalike for IS_ENABLED() that takes a kconfig value,
  * e.g. CONFIG_DRM_I915_SPIN_REQUEST, and evaluates whether it is non-zero
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Remove notion of GEM from i915_gem_shrinker_taints_mutex

2021-02-02 Thread Chris Wilson
Since we dropped the use of dev->struct_mutex from inside the shrinker,
we no longer include that as part of our fs_reclaim tainting. We can
drop the i915 argument and rebrand it as a generic fs_reclaim tainter.

Signed-off-by: Chris Wilson 
Cc: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c   |  3 +--
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 14 --
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.h |  2 --
 drivers/gpu/drm/i915/gt/intel_gtt.c  |  2 +-
 drivers/gpu/drm/i915/gt/intel_reset.c|  2 +-
 drivers/gpu/drm/i915/i915_utils.c| 13 +
 drivers/gpu/drm/i915/i915_utils.h|  2 ++
 7 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 70f798405f7f..6cdff5fc5882 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -86,8 +86,7 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
mutex_init(>mm.get_dma_page.lock);
 
if (IS_ENABLED(CONFIG_LOCKDEP) && i915_gem_object_is_shrinkable(obj))
-   i915_gem_shrinker_taints_mutex(to_i915(obj->base.dev),
-  >mm.lock);
+   fs_reclaim_taints_mutex(>mm.lock);
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c 
b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
index c2dba1cd9532..b64a0788381f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
@@ -415,20 +415,6 @@ void i915_gem_driver_unregister__shrinker(struct 
drm_i915_private *i915)
unregister_shrinker(>mm.shrinker);
 }
 
-void i915_gem_shrinker_taints_mutex(struct drm_i915_private *i915,
-   struct mutex *mutex)
-{
-   if (!IS_ENABLED(CONFIG_LOCKDEP))
-   return;
-
-   fs_reclaim_acquire(GFP_KERNEL);
-
-   mutex_acquire(>dep_map, 0, 0, _RET_IP_);
-   mutex_release(>dep_map, _RET_IP_);
-
-   fs_reclaim_release(GFP_KERNEL);
-}
-
 #define obj_to_i915(obj__) to_i915((obj__)->base.dev)
 
 void i915_gem_object_make_unshrinkable(struct drm_i915_gem_object *obj)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h 
b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h
index b397d7785789..a25754a51ac3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h
@@ -25,7 +25,5 @@ unsigned long i915_gem_shrink(struct drm_i915_private *i915,
 unsigned long i915_gem_shrink_all(struct drm_i915_private *i915);
 void i915_gem_driver_register__shrinker(struct drm_i915_private *i915);
 void i915_gem_driver_unregister__shrinker(struct drm_i915_private *i915);
-void i915_gem_shrinker_taints_mutex(struct drm_i915_private *i915,
-   struct mutex *mutex);
 
 #endif /* __I915_GEM_SHRINKER_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 04aa6601e984..d34770ae4c9a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -97,7 +97,7 @@ void i915_address_space_init(struct i915_address_space *vm, 
int subclass)
 */
mutex_init(>mutex);
lockdep_set_subclass(>mutex, subclass);
-   i915_gem_shrinker_taints_mutex(vm->i915, >mutex);
+   fs_reclaim_taints_mutex(>mutex);
 
GEM_BUG_ON(!vm->total);
drm_mm_init(>mm, 0, vm->total);
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c 
b/drivers/gpu/drm/i915/gt/intel_reset.c
index c8cf3981ad7f..7638fb2a45f4 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -1401,7 +1401,7 @@ void intel_gt_init_reset(struct intel_gt *gt)
 * within the shrinker, we forbid ourselves from performing any
 * fs-reclaim or taking related locks during reset.
 */
-   i915_gem_shrinker_taints_mutex(gt->i915, >reset.mutex);
+   fs_reclaim_taints_mutex(>reset.mutex);
 
/* no GPU until we are ready! */
__set_bit(I915_WEDGED, >reset.flags);
diff --git a/drivers/gpu/drm/i915/i915_utils.c 
b/drivers/gpu/drm/i915/i915_utils.c
index f9e780dee9de..90c7f0c4838c 100644
--- a/drivers/gpu/drm/i915/i915_utils.c
+++ b/drivers/gpu/drm/i915/i915_utils.c
@@ -114,3 +114,16 @@ void set_timer_ms(struct timer_list *t, unsigned long 
timeout)
/* Keep t->expires = 0 reserved to indicate a canceled timer. */
mod_timer(t, jiffies + timeout ?: 1);
 }
+
+void fs_reclaim_taints_mutex(struct mutex *mutex)
+{
+   if (!IS_ENABLED(CONFIG_LOCKDEP))
+   return;
+
+   fs_reclaim_acquire(GFP_KERNEL);
+
+   mutex_acquire(>dep_map, 0, 0, _RET_IP_);
+   mutex_release(>dep_map, _RET_IP_);
+
+   fs_reclaim_release(GFP_KERNEL);
+}
diff --git a/drivers/gpu/drm/i915/i915_util

[Intel-gfx] [CI 02/14] drm/i915/gt: Move submission_method into intel_gt

2021-02-02 Thread Chris Wilson
Since we setup the submission method for the engines once, it is easy to
assign an enum and use that instead of probing into the backends.

Signed-off-by: Chris Wilson 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/gt/intel_engine.h   |  8 +++-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c| 12 
 drivers/gpu/drm/i915/gt/intel_execlists_submission.c |  8 
 drivers/gpu/drm/i915/gt/intel_execlists_submission.h |  3 ---
 drivers/gpu/drm/i915/gt/intel_gt_types.h |  7 +++
 drivers/gpu/drm/i915/gt/intel_reset.c|  7 +++
 drivers/gpu/drm/i915/gt/selftest_execlists.c |  2 +-
 drivers/gpu/drm/i915/gt/selftest_ring_submission.c   |  2 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c|  5 -
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h|  1 -
 drivers/gpu/drm/i915/i915_perf.c | 10 +-
 11 files changed, 32 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
b/drivers/gpu/drm/i915/gt/intel_engine.h
index 47ee8578e511..8d9184920c51 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -13,8 +13,9 @@
 #include "i915_reg.h"
 #include "i915_request.h"
 #include "i915_selftest.h"
-#include "gt/intel_timeline.h"
 #include "intel_engine_types.h"
+#include "intel_gt_types.h"
+#include "intel_timeline.h"
 #include "intel_workarounds.h"
 
 struct drm_printer;
@@ -262,6 +263,11 @@ void intel_engine_init_active(struct intel_engine_cs 
*engine,
 #define ENGINE_MOCK1
 #define ENGINE_VIRTUAL 2
 
+static inline bool intel_engine_uses_guc(const struct intel_engine_cs *engine)
+{
+   return engine->gt->submission_method >= INTEL_SUBMISSION_GUC;
+}
+
 static inline bool
 intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 56fb9cece71b..dab8d734e272 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -891,12 +891,16 @@ int intel_engines_init(struct intel_gt *gt)
enum intel_engine_id id;
int err;
 
-   if (intel_uc_uses_guc_submission(>uc))
+   if (intel_uc_uses_guc_submission(>uc)) {
+   gt->submission_method = INTEL_SUBMISSION_GUC;
setup = intel_guc_submission_setup;
-   else if (HAS_EXECLISTS(gt->i915))
+   } else if (HAS_EXECLISTS(gt->i915)) {
+   gt->submission_method = INTEL_SUBMISSION_ELSP;
setup = intel_execlists_submission_setup;
-   else
+   } else {
+   gt->submission_method = INTEL_SUBMISSION_RING;
setup = intel_ring_submission_setup;
+   }
 
for_each_engine(engine, gt, id) {
err = engine_setup_common(engine);
@@ -1467,7 +1471,7 @@ static void intel_engine_print_registers(struct 
intel_engine_cs *engine,
drm_printf(m, "\tIPEHR: 0x%08x\n", ENGINE_READ(engine, IPEHR));
}
 
-   if (intel_engine_in_guc_submission_mode(engine)) {
+   if (intel_engine_uses_guc(engine)) {
/* nothing to print yet */
} else if (HAS_EXECLISTS(dev_priv)) {
struct i915_request * const *port, *rq;
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 5d824e1cfcba..4ddd2099a931 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -1757,7 +1757,6 @@ process_csb(struct intel_engine_cs *engine, struct 
i915_request **inactive)
 */
GEM_BUG_ON(!tasklet_is_locked(>tasklet) &&
   !reset_in_progress(execlists));
-   GEM_BUG_ON(!intel_engine_in_execlists_submission_mode(engine));
 
/*
 * Note that csb_write, csb_status may be either in HWSP or mmio.
@@ -3897,13 +3896,6 @@ void intel_execlists_show_requests(struct 
intel_engine_cs *engine,
spin_unlock_irqrestore(>active.lock, flags);
 }
 
-bool
-intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine)
-{
-   return engine->set_default_submission ==
-  execlists_set_default_submission;
-}
-
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftest_execlists.c"
 #endif
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
index a8fd7adefd82..f7bd3fccfee8 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
@@ -41,7 +41,4 @@ int intel_virtual_engine_attach_bond(struct intel_engine_cs 
*engine,
 const struct intel_engine_cs *m

[Intel-gfx] [CI 06/14] drm/i915/selftests: Measure set-priority duration

2021-02-02 Thread Chris Wilson
As a topological sort, we expect it to run in linear graph time,
O(V+E). In removing the recursion, it is no longer a DFS but rather a
BFS, and performs as O(VE). Let's demonstrate how bad this is with a few
examples, and build a few test cases to verify a potential fix.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_scheduler.c |   4 +
 .../drm/i915/selftests/i915_live_selftests.h  |   1 +
 .../drm/i915/selftests/i915_perf_selftests.h  |   1 +
 .../gpu/drm/i915/selftests/i915_scheduler.c   | 672 ++
 4 files changed, 678 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/selftests/i915_scheduler.c

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 035e4be5d573..27bda7617b29 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -609,6 +609,10 @@ void i915_request_show_with_schedule(struct drm_printer *m,
rcu_read_unlock();
 }
 
+#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
+#include "selftests/i915_scheduler.c"
+#endif
+
 static void i915_global_scheduler_shrink(void)
 {
kmem_cache_shrink(global.slab_dependencies);
diff --git a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h 
b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
index a92c0e9b7e6b..2200a5baa68e 100644
--- a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
+++ b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
@@ -26,6 +26,7 @@ selftest(gt_mocs, intel_mocs_live_selftests)
 selftest(gt_pm, intel_gt_pm_live_selftests)
 selftest(gt_heartbeat, intel_heartbeat_live_selftests)
 selftest(requests, i915_request_live_selftests)
+selftest(scheduler, i915_scheduler_live_selftests)
 selftest(active, i915_active_live_selftests)
 selftest(objects, i915_gem_object_live_selftests)
 selftest(mman, i915_gem_mman_live_selftests)
diff --git a/drivers/gpu/drm/i915/selftests/i915_perf_selftests.h 
b/drivers/gpu/drm/i915/selftests/i915_perf_selftests.h
index c2389f8a257d..137e35283fee 100644
--- a/drivers/gpu/drm/i915/selftests/i915_perf_selftests.h
+++ b/drivers/gpu/drm/i915/selftests/i915_perf_selftests.h
@@ -17,5 +17,6 @@
  */
 selftest(engine_cs, intel_engine_cs_perf_selftests)
 selftest(request, i915_request_perf_selftests)
+selftest(scheduler, i915_scheduler_perf_selftests)
 selftest(blt, i915_gem_object_blt_perf_selftests)
 selftest(region, intel_memory_region_perf_selftests)
diff --git a/drivers/gpu/drm/i915/selftests/i915_scheduler.c 
b/drivers/gpu/drm/i915/selftests/i915_scheduler.c
new file mode 100644
index ..d095fab2ccec
--- /dev/null
+++ b/drivers/gpu/drm/i915/selftests/i915_scheduler.c
@@ -0,0 +1,672 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#include "i915_selftest.h"
+
+#include "gt/intel_context.h"
+#include "gt/intel_gpu_commands.h"
+#include "gt/selftest_engine_heartbeat.h"
+#include "selftests/igt_spinner.h"
+#include "selftests/i915_random.h"
+
+static void scheduling_disable(struct intel_engine_cs *engine)
+{
+   engine->props.preempt_timeout_ms = 0;
+   engine->props.timeslice_duration_ms = 0;
+
+   st_engine_heartbeat_disable(engine);
+}
+
+static void scheduling_enable(struct intel_engine_cs *engine)
+{
+   st_engine_heartbeat_enable(engine);
+
+   engine->props.preempt_timeout_ms =
+   engine->defaults.preempt_timeout_ms;
+   engine->props.timeslice_duration_ms =
+   engine->defaults.timeslice_duration_ms;
+}
+
+static int first_engine(struct drm_i915_private *i915,
+   int (*chain)(struct intel_engine_cs *engine,
+unsigned long param,
+bool (*fn)(struct i915_request *rq,
+   unsigned long v,
+   unsigned long e)),
+   unsigned long param,
+   bool (*fn)(struct i915_request *rq,
+  unsigned long v, unsigned long e))
+{
+   struct intel_engine_cs *engine;
+
+   for_each_uabi_engine(engine, i915) {
+   if (!intel_engine_has_scheduler(engine))
+   continue;
+
+   return chain(engine, param, fn);
+   }
+
+   return 0;
+}
+
+static int all_engines(struct drm_i915_private *i915,
+  int (*chain)(struct intel_engine_cs *engine,
+   unsigned long param,
+   bool (*fn)(struct i915_request *rq,
+  unsigned long v,
+  unsigned long e)),
+  unsigned long param,
+  bool (*fn)(struct i915_request *rq,
+ unsigned long v, unsigned long e))
+{
+   stru

[Intel-gfx] [CI 03/14] drm/i915/gt: Move CS interrupt handler to the backend

2021-02-02 Thread Chris Wilson
The different submission backends each have their own preferred
behaviour and interrupt setup. Let each handle their own interrupts.

This becomes more useful later as we to extract the use of auxiliary
state in the interrupt handler that is backend specific.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c |  7 ++
 drivers/gpu/drm/i915/gt/intel_engine_types.h  | 14 +---
 .../drm/i915/gt/intel_execlists_submission.c  | 40 +
 drivers/gpu/drm/i915/gt/intel_gt_irq.c| 82 ++-
 drivers/gpu/drm/i915/gt/intel_gt_irq.h|  7 ++
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  7 ++
 drivers/gpu/drm/i915/gt/intel_rps.c   |  2 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 10 ++-
 drivers/gpu/drm/i915/i915_irq.c   |  8 +-
 9 files changed, 103 insertions(+), 74 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index dab8d734e272..2a453ba5f25a 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -255,6 +255,11 @@ static void intel_engine_sanitize_mmio(struct 
intel_engine_cs *engine)
intel_engine_set_hwsp_writemask(engine, ~0u);
 }
 
+static void nop_irq_handler(struct intel_engine_cs *engine, u32 iir)
+{
+   GEM_DEBUG_WARN_ON(iir);
+}
+
 static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
 {
const struct engine_info *info = _engines[id];
@@ -292,6 +297,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum 
intel_engine_id id)
engine->hw_id = info->hw_id;
engine->guc_id = MAKE_GUC_ID(info->class, info->instance);
 
+   engine->irq_handler = nop_irq_handler;
+
engine->class = info->class;
engine->instance = info->instance;
__sprint_engine_name(engine);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 9d59de5c559a..7fd035d45263 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -402,6 +402,7 @@ struct intel_engine_cs {
u32 irq_enable_mask; /* bitmask to enable ring interrupt */
void(*irq_enable)(struct intel_engine_cs *engine);
void(*irq_disable)(struct intel_engine_cs *engine);
+   void(*irq_handler)(struct intel_engine_cs *engine, u32 iir);
 
void(*sanitize)(struct intel_engine_cs *engine);
int (*resume)(struct intel_engine_cs *engine);
@@ -481,10 +482,9 @@ struct intel_engine_cs {
 #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
 #define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
 #define I915_ENGINE_HAS_TIMESLICES   BIT(4)
-#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(5)
-#define I915_ENGINE_IS_VIRTUAL   BIT(6)
-#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(7)
-#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(8)
+#define I915_ENGINE_IS_VIRTUAL   BIT(5)
+#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6)
+#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
unsigned int flags;
 
/*
@@ -588,12 +588,6 @@ intel_engine_has_timeslices(const struct intel_engine_cs 
*engine)
return engine->flags & I915_ENGINE_HAS_TIMESLICES;
 }
 
-static inline bool
-intel_engine_needs_breadcrumb_tasklet(const struct intel_engine_cs *engine)
-{
-   return engine->flags & I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
-}
-
 static inline bool
 intel_engine_is_virtual(const struct intel_engine_cs *engine)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 4ddd2099a931..ed62e4b549d2 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -2394,6 +2394,45 @@ static void execlists_submission_tasklet(struct 
tasklet_struct *t)
rcu_read_unlock();
 }
 
+static void execlists_irq_handler(struct intel_engine_cs *engine, u32 iir)
+{
+   bool tasklet = false;
+
+   if (unlikely(iir & GT_CS_MASTER_ERROR_INTERRUPT)) {
+   u32 eir;
+
+   /* Upper 16b are the enabling mask, rsvd for internal errors */
+   eir = ENGINE_READ(engine, RING_EIR) & GENMASK(15, 0);
+   ENGINE_TRACE(engine, "CS error: %x\n", eir);
+
+   /* Disable the error interrupt until after the reset */
+   if (likely(eir)) {
+   ENGINE_WRITE(engine, RING_EMR, ~0u);
+   ENGINE_WRITE(engine, RING_EIR, eir);
+   WRITE_ONCE(engine->execlists.error_interrupt, eir);
+   tasklet = true;
+   }
+   }
+
+   if (iir & GT_WAIT_SEMAPHORE_INTERRUPT) {
+   WRITE_ONCE(engine->execlists.yield,
+  ENGINE_READ_FW(engine, RING_EXE

[Intel-gfx] [CI 04/14] drm/i915: Replace engine->schedule() with a known request operation

2021-02-02 Thread Chris Wilson
Looking to the future, we want to set the scheduling attributes
explicitly and so replace the generic engine->schedule() with the more
direct i915_request_set_priority()

What it loses in removing the 'schedule' name from the function, it
gains in having an explicit entry point with a stated goal.

Signed-off-by: Chris Wilson 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/display/intel_display.c  |  5 ++-
 drivers/gpu/drm/i915/gem/i915_gem_object.h|  5 ++-
 drivers/gpu/drm/i915/gem/i915_gem_wait.c  | 29 +---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c |  3 --
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |  4 +--
 drivers/gpu/drm/i915/gt/intel_engine_types.h  | 27 ---
 drivers/gpu/drm/i915/gt/intel_engine_user.c   |  2 +-
 .../drm/i915/gt/intel_execlists_submission.c  |  3 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  | 33 +--
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  | 11 +++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  3 +-
 drivers/gpu/drm/i915/i915_request.c   | 10 +++---
 drivers/gpu/drm/i915/i915_request.h   |  5 +++
 drivers/gpu/drm/i915/i915_scheduler.c | 15 +
 drivers/gpu/drm/i915/i915_scheduler.h |  3 +-
 15 files changed, 64 insertions(+), 94 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
b/drivers/gpu/drm/i915/display/intel_display.c
index d8f10589e09e..aca964f7ba72 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -13662,7 +13662,6 @@ int
 intel_prepare_plane_fb(struct drm_plane *_plane,
   struct drm_plane_state *_new_plane_state)
 {
-   struct i915_sched_attr attr = { .priority = I915_PRIORITY_DISPLAY };
struct intel_plane *plane = to_intel_plane(_plane);
struct intel_plane_state *new_plane_state =
to_intel_plane_state(_new_plane_state);
@@ -13703,7 +13702,7 @@ intel_prepare_plane_fb(struct drm_plane *_plane,
 
if (new_plane_state->uapi.fence) { /* explicit fencing */
i915_gem_fence_wait_priority(new_plane_state->uapi.fence,
-);
+I915_PRIORITY_DISPLAY);
ret = i915_sw_fence_await_dma_fence(>commit_ready,
new_plane_state->uapi.fence,

i915_fence_timeout(dev_priv),
@@ -13725,7 +13724,7 @@ intel_prepare_plane_fb(struct drm_plane *_plane,
if (ret)
return ret;
 
-   i915_gem_object_wait_priority(obj, 0, );
+   i915_gem_object_wait_priority(obj, 0, I915_PRIORITY_DISPLAY);
i915_gem_object_flush_frontbuffer(obj, ORIGIN_DIRTYFB);
 
if (!new_plane_state->uapi.fence) { /* implicit fencing */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 3411ad197fa6..325766abca21 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -549,15 +549,14 @@ static inline void __start_cpu_write(struct 
drm_i915_gem_object *obj)
obj->cache_dirty = true;
 }
 
-void i915_gem_fence_wait_priority(struct dma_fence *fence,
- const struct i915_sched_attr *attr);
+void i915_gem_fence_wait_priority(struct dma_fence *fence, int prio);
 
 int i915_gem_object_wait(struct drm_i915_gem_object *obj,
 unsigned int flags,
 long timeout);
 int i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
  unsigned int flags,
- const struct i915_sched_attr *attr);
+ int prio);
 
 void __i915_gem_object_flush_frontbuffer(struct drm_i915_gem_object *obj,
 enum fb_op_origin origin);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c 
b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
index 4b9856d5ba14..d79bf16083bd 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
@@ -91,22 +91,12 @@ i915_gem_object_wait_reservation(struct dma_resv *resv,
return timeout;
 }
 
-static void fence_set_priority(struct dma_fence *fence,
-  const struct i915_sched_attr *attr)
+static void fence_set_priority(struct dma_fence *fence, int prio)
 {
-   struct i915_request *rq;
-   struct intel_engine_cs *engine;
-
if (dma_fence_is_signaled(fence) || !dma_fence_is_i915(fence))
return;
 
-   rq = to_request(fence);
-   engine = rq->engine;
-
-   rcu_read_lock(); /* RCU serialisation for set-wedged protection */
-   if (engine->schedule)
-   engine->schedule(rq, attr);
-   rcu_read_unlock();
+   i915_request_set_pr

[Intel-gfx] [CI 07/14] drm/i915/selftests: Exercise priority inheritance around an engine loop

2021-02-02 Thread Chris Wilson
Exercise rescheduling priority inheritance around a sequence of requests
that wrap around all the engines.

Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/selftests/i915_scheduler.c   | 225 ++
 1 file changed, 225 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/i915_scheduler.c 
b/drivers/gpu/drm/i915/selftests/i915_scheduler.c
index d095fab2ccec..acc666f755d7 100644
--- a/drivers/gpu/drm/i915/selftests/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/selftests/i915_scheduler.c
@@ -7,6 +7,7 @@
 
 #include "gt/intel_context.h"
 #include "gt/intel_gpu_commands.h"
+#include "gt/intel_ring.h"
 #include "gt/selftest_engine_heartbeat.h"
 #include "selftests/igt_spinner.h"
 #include "selftests/i915_random.h"
@@ -504,10 +505,234 @@ static int igt_priority_chains(void *arg)
return igt_schedule_chains(arg, igt_priority);
 }
 
+static struct i915_request *
+__write_timestamp(struct intel_engine_cs *engine,
+ struct drm_i915_gem_object *obj,
+ int slot,
+ struct i915_request *prev)
+{
+   struct i915_request *rq = ERR_PTR(-EINVAL);
+   bool use_64b = INTEL_GEN(engine->i915) >= 8;
+   struct intel_context *ce;
+   struct i915_vma *vma;
+   int err = 0;
+   u32 *cs;
+
+   ce = intel_context_create(engine);
+   if (IS_ERR(ce))
+   return ERR_CAST(ce);
+
+   vma = i915_vma_instance(obj, ce->vm, NULL);
+   if (IS_ERR(vma)) {
+   err = PTR_ERR(vma);
+   goto out_ce;
+   }
+
+   err = i915_vma_pin(vma, 0, 0, PIN_USER);
+   if (err)
+   goto out_ce;
+
+   rq = intel_context_create_request(ce);
+   if (IS_ERR(rq)) {
+   err = PTR_ERR(rq);
+   goto out_unpin;
+   }
+
+   i915_vma_lock(vma);
+   err = i915_vma_move_to_active(vma, rq, EXEC_OBJECT_WRITE);
+   i915_vma_unlock(vma);
+   if (err)
+   goto out_request;
+
+   if (prev) {
+   err = i915_request_await_dma_fence(rq, >fence);
+   if (err)
+   goto out_request;
+   }
+
+   if (engine->emit_init_breadcrumb) {
+   err = engine->emit_init_breadcrumb(rq);
+   if (err)
+   goto out_request;
+   }
+
+   cs = intel_ring_begin(rq, 4);
+   if (IS_ERR(cs)) {
+   err = PTR_ERR(cs);
+   goto out_request;
+   }
+
+   *cs++ = MI_STORE_REGISTER_MEM + use_64b;
+   *cs++ = i915_mmio_reg_offset(RING_TIMESTAMP(engine->mmio_base));
+   *cs++ = lower_32_bits(vma->node.start) + sizeof(u32) * slot;
+   *cs++ = upper_32_bits(vma->node.start);
+   intel_ring_advance(rq, cs);
+
+   i915_request_get(rq);
+out_request:
+   i915_request_add(rq);
+out_unpin:
+   i915_vma_unpin(vma);
+out_ce:
+   intel_context_put(ce);
+   i915_request_put(prev);
+   return err ? ERR_PTR(err) : rq;
+}
+
+static struct i915_request *create_spinner(struct drm_i915_private *i915,
+  struct igt_spinner *spin)
+{
+   struct intel_engine_cs *engine;
+
+   for_each_uabi_engine(engine, i915) {
+   struct intel_context *ce;
+   struct i915_request *rq;
+
+   if (igt_spinner_init(spin, engine->gt))
+   return ERR_PTR(-ENOMEM);
+
+   ce = intel_context_create(engine);
+   if (IS_ERR(ce))
+   return ERR_CAST(ce);
+
+   rq = igt_spinner_create_request(spin, ce, MI_NOOP);
+   intel_context_put(ce);
+   if (rq == ERR_PTR(-ENODEV))
+   continue;
+   if (IS_ERR(rq))
+   return rq;
+
+   i915_request_get(rq);
+   i915_request_add(rq);
+   return rq;
+   }
+
+   return ERR_PTR(-ENODEV);
+}
+
+static bool has_timestamp(const struct drm_i915_private *i915)
+{
+   return INTEL_GEN(i915) >= 7;
+}
+
+static int __igt_schedule_cycle(struct drm_i915_private *i915,
+   bool (*fn)(struct i915_request *rq,
+  unsigned long v, unsigned long e))
+{
+   struct intel_engine_cs *engine;
+   struct drm_i915_gem_object *obj;
+   struct igt_spinner spin;
+   struct i915_request *rq;
+   unsigned long count, n;
+   u32 *time, last;
+   int err;
+
+   /*
+* Queue a bunch of ordered requests (each waiting on the previous)
+* around the engines a couple of times. Each request will write
+* the timestamp it executes at into the scratch, with the expectation
+* that the timestamp will be in our desired execution order.
+*/
+
+   if (!i915->caps.scheduler || !has_timestamp(i915))
+   return 0;
+
+   obj = i915_g

[Intel-gfx] [CI 12/14] drm/i915: Extract request suspension from the execlists

2021-02-02 Thread Chris Wilson
Make the ability to suspend and resume a request and its dependents
generic.

Signed-off-by: Chris Wilson 
Reviewed-by: Tvrtko Ursulin 
---
 .../drm/i915/gt/intel_execlists_submission.c  | 167 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |   8 +-
 drivers/gpu/drm/i915/i915_scheduler.c | 153 
 drivers/gpu/drm/i915/i915_scheduler.h |  10 ++
 4 files changed, 169 insertions(+), 169 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 4add205ec30e..a971b3bee532 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -1921,169 +1921,6 @@ static void post_process_csb(struct i915_request **port,
execlists_schedule_out(*port++);
 }
 
-static void __execlists_hold(struct i915_request *rq)
-{
-   LIST_HEAD(list);
-
-   do {
-   struct i915_dependency *p;
-
-   if (i915_request_is_active(rq))
-   __i915_request_unsubmit(rq);
-
-   clear_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags);
-   list_move_tail(>sched.link, >engine->active.hold);
-   i915_request_set_hold(rq);
-   RQ_TRACE(rq, "on hold\n");
-
-   for_each_waiter(p, rq) {
-   struct i915_request *w =
-   container_of(p->waiter, typeof(*w), sched);
-
-   if (p->flags & I915_DEPENDENCY_WEAK)
-   continue;
-
-   /* Leave semaphores spinning on the other engines */
-   if (w->engine != rq->engine)
-   continue;
-
-   if (!i915_request_is_ready(w))
-   continue;
-
-   if (__i915_request_is_complete(w))
-   continue;
-
-   if (i915_request_on_hold(w))
-   continue;
-
-   list_move_tail(>sched.link, );
-   }
-
-   rq = list_first_entry_or_null(, typeof(*rq), sched.link);
-   } while (rq);
-}
-
-static bool execlists_hold(struct intel_engine_cs *engine,
-  struct i915_request *rq)
-{
-   if (i915_request_on_hold(rq))
-   return false;
-
-   spin_lock_irq(>active.lock);
-
-   if (__i915_request_is_complete(rq)) { /* too late! */
-   rq = NULL;
-   goto unlock;
-   }
-
-   /*
-* Transfer this request onto the hold queue to prevent it
-* being resumbitted to HW (and potentially completed) before we have
-* released it. Since we may have already submitted following
-* requests, we need to remove those as well.
-*/
-   GEM_BUG_ON(i915_request_on_hold(rq));
-   GEM_BUG_ON(rq->engine != engine);
-   __execlists_hold(rq);
-   GEM_BUG_ON(list_empty(>active.hold));
-
-unlock:
-   spin_unlock_irq(>active.lock);
-   return rq;
-}
-
-static bool hold_request(const struct i915_request *rq)
-{
-   struct i915_dependency *p;
-   bool result = false;
-
-   /*
-* If one of our ancestors is on hold, we must also be on hold,
-* otherwise we will bypass it and execute before it.
-*/
-   rcu_read_lock();
-   for_each_signaler(p, rq) {
-   const struct i915_request *s =
-   container_of(p->signaler, typeof(*s), sched);
-
-   if (s->engine != rq->engine)
-   continue;
-
-   result = i915_request_on_hold(s);
-   if (result)
-   break;
-   }
-   rcu_read_unlock();
-
-   return result;
-}
-
-static void __execlists_unhold(struct i915_request *rq)
-{
-   LIST_HEAD(list);
-
-   do {
-   struct i915_dependency *p;
-
-   RQ_TRACE(rq, "hold release\n");
-
-   GEM_BUG_ON(!i915_request_on_hold(rq));
-   GEM_BUG_ON(!i915_sw_fence_signaled(>submit));
-
-   i915_request_clear_hold(rq);
-   list_move_tail(>sched.link,
-  i915_sched_lookup_priolist(rq->engine,
- rq_prio(rq)));
-   set_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags);
-
-   /* Also release any children on this engine that are ready */
-   for_each_waiter(p, rq) {
-   struct i915_request *w =
-   container_of(p->waiter, typeof(*w), sched);
-
-   if (p->flags & I915_DEPENDENCY_WEAK)
-   continue;
-
-   /* Propagate any change in error status */
-   

[Intel-gfx] [CI 09/14] drm/i915: Improve DFS for priority inheritance

2021-02-02 Thread Chris Wilson
The core of the scheduling algorithm is that we compute the topological
order of the fence DAG. Knowing that we have a DAG, we should be able to
use a DFS to compute the topological sort in linear time. However,
during the conversion of the recursive algorithm into an iterative one,
the memoization of how far we had progressed down a branch was
forgotten. The result was that instead of running in linear time, it was
running in geometric time and could easily run for a few hundred
milliseconds given a wide enough graph, not the microseconds as required.

Signed-off-by: Chris Wilson 
Reviewed-by: Andi Shyti 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_scheduler.c   | 58 -
 drivers/gpu/drm/i915/i915_scheduler_types.h |  6 ++-
 2 files changed, 39 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 27bda7617b29..9e88417bf451 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -242,6 +242,26 @@ void __i915_priolist_free(struct i915_priolist *p)
kmem_cache_free(global.slab_priorities, p);
 }
 
+static struct i915_request *
+stack_push(struct i915_request *rq,
+  struct i915_request *prev,
+  struct list_head *pos)
+{
+   prev->sched.dfs.pos = pos;
+   rq->sched.dfs.prev = prev;
+   return rq;
+}
+
+static struct i915_request *
+stack_pop(struct i915_request *rq,
+ struct list_head **pos)
+{
+   rq = rq->sched.dfs.prev;
+   if (rq)
+   *pos = rq->sched.dfs.pos;
+   return rq;
+}
+
 static inline bool need_preempt(int prio, int active)
 {
/*
@@ -306,11 +326,10 @@ static void ipi_priority(struct i915_request *rq, int 
prio)
 static void __i915_request_set_priority(struct i915_request *rq, int prio)
 {
struct intel_engine_cs *engine = rq->engine;
-   struct i915_request *rn;
+   struct list_head *pos = >sched.signalers_list;
struct list_head *plist;
-   LIST_HEAD(dfs);
 
-   list_add(>sched.dfs, );
+   plist = i915_sched_lookup_priolist(engine, prio);
 
/*
 * Recursively bump all dependent priorities to match the new request.
@@ -330,40 +349,31 @@ static void __i915_request_set_priority(struct 
i915_request *rq, int prio)
 * end result is a topological list of requests in reverse order, the
 * last element in the list is the request we must execute first.
 */
-   list_for_each_entry(rq, , sched.dfs) {
-   struct i915_dependency *p;
-
-   /* Also release any children on this engine that are ready */
-   GEM_BUG_ON(rq->engine != engine);
-
-   for_each_signaler(p, rq) {
+   rq->sched.dfs.prev = NULL;
+   do {
+   list_for_each_continue(pos, >sched.signalers_list) {
+   struct i915_dependency *p =
+   list_entry(pos, typeof(*p), signal_link);
struct i915_request *s =
container_of(p->signaler, typeof(*s), sched);
 
-   GEM_BUG_ON(s == rq);
-
if (rq_prio(s) >= prio)
continue;
 
if (__i915_request_is_complete(s))
continue;
 
-   if (s->engine != rq->engine) {
+   if (s->engine != engine) {
ipi_priority(s, prio);
continue;
}
 
-   list_move_tail(>sched.dfs, );
+   /* Remember our position along this branch */
+   rq = stack_push(s, rq, pos);
+   pos = >sched.signalers_list;
}
-   }
 
-   plist = i915_sched_lookup_priolist(engine, prio);
-
-   /* Fifo and depth-first replacement ensure our deps execute first */
-   list_for_each_entry_safe_reverse(rq, rn, , sched.dfs) {
-   GEM_BUG_ON(rq->engine != engine);
-
-   INIT_LIST_HEAD(>sched.dfs);
+   RQ_TRACE(rq, "set-priority:%d\n", prio);
WRITE_ONCE(rq->sched.attr.priority, prio);
 
/*
@@ -377,12 +387,13 @@ static void __i915_request_set_priority(struct 
i915_request *rq, int prio)
if (!i915_request_is_ready(rq))
continue;
 
+   GEM_BUG_ON(rq->engine != engine);
if (i915_request_in_priority_queue(rq))
list_move_tail(>sched.link, plist);
 
/* Defer (tasklet) submission until after all updates. */
kick_submission(engine, rq, prio);
-   }
+   } while ((rq = stack_pop(rq, )));
 }
 
 #define all_signalers_checked(p, rq) \
@@ -456,7 +467,6 @@ void i915_sched_no

[Intel-gfx] [CI 08/14] drm/i915/selftests: Force a rewind if at first we don't succeed

2021-02-02 Thread Chris Wilson
live_timeslice_rewind assumes a particular traversal and reordering
after the first timeslice yield. However, the outcome can be either
(A1, A2, B1) or (A1, B2, A2) depending on the path taken through the
dependency graph. So if we do not get the outcome we need at first, give
it a priority kick to force a rewind.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/selftest_execlists.c | 21 +++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c 
b/drivers/gpu/drm/i915/gt/selftest_execlists.c
index 951e2bf867e1..68e1398704a4 100644
--- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
+++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
@@ -1107,6 +1107,7 @@ static int live_timeslice_rewind(void *arg)
struct i915_request *rq[3] = {};
struct intel_context *ce;
unsigned long timeslice;
+   unsigned long timeout;
int i, err = 0;
u32 *slot;
 
@@ -1173,11 +1174,29 @@ static int live_timeslice_rewind(void *arg)
 
/* ELSP[] = { { A:rq1, A:rq2 }, { B:rq1 } } */
ENGINE_TRACE(engine, "forcing tasklet for rewind\n");
-   while (i915_request_is_active(rq[A2])) { /* semaphore yield! */
+   i = 0;
+   timeout = jiffies + HZ;
+   while (i915_request_is_active(rq[A2]) &&
+  time_before(jiffies, timeout)) { /* semaphore yield! */
/* Wait for the timeslice to kick in */
del_timer(>execlists.timer);
tasklet_hi_schedule(>execlists.tasklet);
intel_engine_flush_submission(engine);
+
+   /*
+* Unfortunately this assumes that during the
+* search of the wait tree it sees the requests
+* in a particular order. That order is not
+* strictly determined and it may pick either
+* A2 or B1 to immediately follow A1.
+*
+* Break the tie with a set-priority. This defeats
+* the goal of trying to cause a rewind with a
+* timeslice, but alas, a rewind is better than
+* none.
+*/
+   if (i++)
+   i915_request_set_priority(rq[B1], 1);
}
/* -> ELSP[] = { { A:rq1 }, { B:rq1 } } */
GEM_BUG_ON(!i915_request_is_active(rq[A1]));
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [CI 10/14] drm/i915: Extract request submission from execlists

2021-02-02 Thread Chris Wilson
In the process of preparing to reuse the request submission logic for
other backends, lift it out of the execlists backend. It already
operates on the common structs, so just a matter of moving and renaming.

Signed-off-by: Chris Wilson 
Reviewed-by: Tvrtko Ursulin 
---
 .../drm/i915/gt/intel_execlists_submission.c  | 55 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 30 +--
 drivers/gpu/drm/i915/i915_scheduler.c | 82 +++
 drivers/gpu/drm/i915/i915_scheduler.h |  2 +
 4 files changed, 86 insertions(+), 83 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 6b8984c64b60..62e83acc7221 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -2452,59 +2452,6 @@ static void execlists_preempt(struct timer_list *timer)
execlists_kick(timer, preempt);
 }
 
-static void queue_request(struct intel_engine_cs *engine,
- struct i915_request *rq)
-{
-   GEM_BUG_ON(!list_empty(>sched.link));
-   list_add_tail(>sched.link,
- i915_sched_lookup_priolist(engine, rq_prio(rq)));
-   set_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags);
-}
-
-static bool submit_queue(struct intel_engine_cs *engine,
-const struct i915_request *rq)
-{
-   struct intel_engine_execlists *execlists = >execlists;
-
-   if (rq_prio(rq) <= execlists->queue_priority_hint)
-   return false;
-
-   execlists->queue_priority_hint = rq_prio(rq);
-   return true;
-}
-
-static bool ancestor_on_hold(const struct intel_engine_cs *engine,
-const struct i915_request *rq)
-{
-   GEM_BUG_ON(i915_request_on_hold(rq));
-   return !list_empty(>active.hold) && hold_request(rq);
-}
-
-static void execlists_submit_request(struct i915_request *request)
-{
-   struct intel_engine_cs *engine = request->engine;
-   unsigned long flags;
-
-   /* Will be called from irq-context when using foreign fences. */
-   spin_lock_irqsave(>active.lock, flags);
-
-   if (unlikely(ancestor_on_hold(engine, request))) {
-   RQ_TRACE(request, "ancestor on hold\n");
-   list_add_tail(>sched.link, >active.hold);
-   i915_request_set_hold(request);
-   } else {
-   queue_request(engine, request);
-
-   GEM_BUG_ON(RB_EMPTY_ROOT(>execlists.queue.rb_root));
-   GEM_BUG_ON(list_empty(>sched.link));
-
-   if (submit_queue(engine, request))
-   __execlists_kick(>execlists);
-   }
-
-   spin_unlock_irqrestore(>active.lock, flags);
-}
-
 static int execlists_context_pre_pin(struct intel_context *ce,
 struct i915_gem_ww_ctx *ww,
 void **vaddr)
@@ -3124,7 +3071,7 @@ static bool can_preempt(struct intel_engine_cs *engine)
 
 static void execlists_set_default_submission(struct intel_engine_cs *engine)
 {
-   engine->submit_request = execlists_submit_request;
+   engine->submit_request = i915_request_enqueue;
engine->execlists.tasklet.callback = execlists_submission_tasklet;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 7db2c9decf21..f5b8f89d30bc 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -519,34 +519,6 @@ static int guc_request_alloc(struct i915_request *request)
return 0;
 }
 
-static inline void queue_request(struct intel_engine_cs *engine,
-struct i915_request *rq,
-int prio)
-{
-   GEM_BUG_ON(!list_empty(>sched.link));
-   list_add_tail(>sched.link,
- i915_sched_lookup_priolist(engine, prio));
-   set_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags);
-}
-
-static void guc_submit_request(struct i915_request *rq)
-{
-   struct intel_engine_cs *engine = rq->engine;
-   unsigned long flags;
-
-   /* Will be called from irq-context when using foreign fences. */
-   spin_lock_irqsave(>active.lock, flags);
-
-   queue_request(engine, rq, rq_prio(rq));
-
-   GEM_BUG_ON(RB_EMPTY_ROOT(>execlists.queue.rb_root));
-   GEM_BUG_ON(list_empty(>sched.link));
-
-   tasklet_hi_schedule(>execlists.tasklet);
-
-   spin_unlock_irqrestore(>active.lock, flags);
-}
-
 static void sanitize_hwsp(struct intel_engine_cs *engine)
 {
struct intel_timeline *tl;
@@ -615,7 +587,7 @@ static int guc_resume(struct intel_engine_cs *engine)
 
 static void guc_set_default_submission(struct intel_engine_cs *engine)
 {
-   engine->submit_request = guc_submit_request;
+ 

[Intel-gfx] [CI 11/14] drm/i915: Extract request rewinding from execlists

2021-02-02 Thread Chris Wilson
In the process of preparing to reuse the request submission logic for
other backends, lift it out of the execlists backend.

While this operates on the common structs, we do have a bit of backend
knowledge, which is harmless for !lrc but still unsightly.

Signed-off-by: Chris Wilson 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/gt/intel_engine.h|  3 -
 .../drm/i915/gt/intel_execlists_submission.c  | 58 ++-
 drivers/gpu/drm/i915/gt/intel_lrc_reg.h   |  3 +
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |  2 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  3 +-
 drivers/gpu/drm/i915/i915_scheduler.c | 44 ++
 drivers/gpu/drm/i915/i915_scheduler.h |  3 +
 7 files changed, 56 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
b/drivers/gpu/drm/i915/gt/intel_engine.h
index 8d9184920c51..cc2df80eb449 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -137,9 +137,6 @@ execlists_active_unlock_bh(struct intel_engine_execlists 
*execlists)
local_bh_enable(); /* restore softirq, and kick ksoftirqd! */
 }
 
-struct i915_request *
-execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists);
-
 static inline u32
 intel_read_status_page(const struct intel_engine_cs *engine, int reg)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 62e83acc7221..4add205ec30e 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -359,56 +359,6 @@ assert_priority_queue(const struct i915_request *prev,
return rq_prio(prev) >= rq_prio(next);
 }
 
-static struct i915_request *
-__unwind_incomplete_requests(struct intel_engine_cs *engine)
-{
-   struct i915_request *rq, *rn, *active = NULL;
-   struct list_head *pl;
-   int prio = I915_PRIORITY_INVALID;
-
-   lockdep_assert_held(>active.lock);
-
-   list_for_each_entry_safe_reverse(rq, rn,
->active.requests,
-sched.link) {
-   if (__i915_request_is_complete(rq)) {
-   list_del_init(>sched.link);
-   continue;
-   }
-
-   __i915_request_unsubmit(rq);
-
-   GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
-   if (rq_prio(rq) != prio) {
-   prio = rq_prio(rq);
-   pl = i915_sched_lookup_priolist(engine, prio);
-   }
-   GEM_BUG_ON(RB_EMPTY_ROOT(>execlists.queue.rb_root));
-
-   list_move(>sched.link, pl);
-   set_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags);
-
-   /* Check in case we rollback so far we wrap [size/2] */
-   if (intel_ring_direction(rq->ring,
-rq->tail,
-rq->ring->tail + 8) > 0)
-   rq->context->lrc.desc |= CTX_DESC_FORCE_RESTORE;
-
-   active = rq;
-   }
-
-   return active;
-}
-
-struct i915_request *
-execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists)
-{
-   struct intel_engine_cs *engine =
-   container_of(execlists, typeof(*engine), execlists);
-
-   return __unwind_incomplete_requests(engine);
-}
-
 static void
 execlists_context_status_change(struct i915_request *rq, unsigned long status)
 {
@@ -1080,7 +1030,7 @@ static void defer_active(struct intel_engine_cs *engine)
 {
struct i915_request *rq;
 
-   rq = __unwind_incomplete_requests(engine);
+   rq = __i915_sched_rewind_requests(engine);
if (!rq)
return;
 
@@ -1292,7 +1242,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 * the preemption, some of the unwound requests may
 * complete!
 */
-   __unwind_incomplete_requests(engine);
+   __i915_sched_rewind_requests(engine);
 
last = NULL;
} else if (timeslice_expired(engine, last)) {
@@ -2287,7 +2237,7 @@ static void execlists_capture(struct intel_engine_cs 
*engine)
 * which we return it to the queue for signaling.
 *
 * By removing them from the execlists queue, we also remove the
-* requests from being processed by __unwind_incomplete_requests()
+* requests from being processed by __intel_engine_rewind_requests()
 * during the intel_engine_reset(), and so they will *not* be replayed
 * afterwards.
 *
@@ -2917,7 +2867,7 @@ static void execlists_reset_rewind(struct intel_engine_cs 
*engine, bool stalled)
/* Push back any inco

[Intel-gfx] [CI 14/14] drm/i915: Fix the iterative dfs for defering requests

2021-02-02 Thread Chris Wilson
The current implementation of walking the children of a deferred
requests lacks the backtracking required to reduce the dfs to linear.
Having pulled it from execlists into the common layer, we can reuse the
dfs code for priority inheritance.

Signed-off-by: Chris Wilson 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_scheduler.c | 56 +++
 1 file changed, 40 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 641141f3ce10..8dd999f09412 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -466,8 +466,10 @@ void i915_request_set_priority(struct i915_request *rq, 
int prio)
 void __i915_sched_defer_request(struct intel_engine_cs *engine,
struct i915_request *rq)
 {
-   struct list_head *pl;
-   LIST_HEAD(list);
+   struct list_head *pos = >sched.waiters_list;
+   const int prio = rq_prio(rq);
+   struct i915_request *rn;
+   LIST_HEAD(dfs);
 
lockdep_assert_held(>active.lock);
GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags));
@@ -477,14 +479,11 @@ void __i915_sched_defer_request(struct intel_engine_cs 
*engine,
 * to those that are waiting upon it. So we traverse its chain of
 * waiters and move any that are earlier than the request to after it.
 */
-   pl = lookup_priolist(engine, rq_prio(rq));
+   rq->sched.dfs.prev = NULL;
do {
-   struct i915_dependency *p;
-
-   GEM_BUG_ON(i915_request_is_active(rq));
-   list_move_tail(>sched.link, pl);
-
-   for_each_waiter(p, rq) {
+   list_for_each_continue(pos, >sched.waiters_list) {
+   struct i915_dependency *p =
+   list_entry(pos, typeof(*p), wait_link);
struct i915_request *w =
container_of(p->waiter, typeof(*w), sched);
 
@@ -500,19 +499,44 @@ void __i915_sched_defer_request(struct intel_engine_cs 
*engine,
   __i915_request_has_started(w) &&
   !__i915_request_is_complete(rq));
 
-   if (!i915_request_is_ready(w))
+   if (!i915_request_in_priority_queue(w))
continue;
 
-   if (rq_prio(w) < rq_prio(rq))
+   /*
+* We also need to reorder within the same priority.
+*
+* This is unlike priority-inheritance, where if the
+* signaler already has a higher priority [earlier
+* deadline] than us, we can ignore as it will be
+* scheduled first. If a waiter already has the
+* same priority, we still have to push it to the end
+* of the list. This unfortunately means we cannot
+* use the rq_deadline() itself as a 'visited' bit.
+*/
+   if (rq_prio(w) < prio)
continue;
 
-   GEM_BUG_ON(rq_prio(w) > rq_prio(rq));
-   GEM_BUG_ON(i915_request_is_active(w));
-   list_move_tail(>sched.link, );
+   GEM_BUG_ON(rq_prio(w) != prio);
+
+   /* Remember our position along this branch */
+   rq = stack_push(w, rq, pos);
+   pos = >sched.waiters_list;
}
 
-   rq = list_first_entry_or_null(, typeof(*rq), sched.link);
-   } while (rq);
+   /* Note list is reversed for waiters wrt signal hierarchy */
+   GEM_BUG_ON(rq->engine != engine);
+   GEM_BUG_ON(!i915_request_in_priority_queue(rq));
+   list_move(>sched.link, );
+
+   /* Track our visit, and prevent duplicate processing */
+   clear_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags);
+   } while ((rq = stack_pop(rq, )));
+
+   pos = lookup_priolist(engine, prio);
+   list_for_each_entry_safe(rq, rn, , sched.link) {
+   set_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags);
+   list_add_tail(>sched.link, pos);
+   }
 }
 
 static void queue_request(struct intel_engine_cs *engine,
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [CI 05/14] drm/i915: Restructure priority inheritance

2021-02-02 Thread Chris Wilson
In anticipation of wanting to be able to call pi from underneath an
engine's active.lock, rework the priority inheritance to primarily work
along an engine's priority queue, delegating any other engine that the
chain may traverse to a worker. This reduces the global spinlock from
governing the entire multi-engine priority inheritance depth-first search,
to a smaller lock on each engine around a single list on that engine.

Signed-off-by: Chris Wilson 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c |   2 +
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   3 +-
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   3 +
 drivers/gpu/drm/i915/i915_scheduler.c | 356 +++---
 drivers/gpu/drm/i915/i915_scheduler.h |   3 +
 drivers/gpu/drm/i915/i915_scheduler_types.h   |  23 +-
 6 files changed, 249 insertions(+), 141 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 92a3c8a43e14..36c6b8d7287d 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -582,6 +582,8 @@ void intel_engine_init_execlists(struct intel_engine_cs 
*engine)
 
execlists->queue_priority_hint = INT_MIN;
execlists->queue = RB_ROOT_CACHED;
+
+   i915_sched_init_ipi(>ipi);
 }
 
 static void cleanup_status_page(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c 
b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index 0b026cde9f09..48a91c0dbad6 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -114,8 +114,7 @@ static void heartbeat(struct work_struct *wrk)
 * but all other contexts, including the kernel
 * context are stuck waiting for the signal.
 */
-   } else if (intel_engine_has_scheduler(engine) &&
-  rq->sched.attr.priority < I915_PRIORITY_BARRIER) {
+   } else if (rq->sched.attr.priority < I915_PRIORITY_BARRIER) {
/*
 * Gradually raise the priority of the heartbeat to
 * give high priority work [which presumably desires
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index cb81f0d93189..1b404fef40a6 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -20,6 +20,7 @@
 #include "i915_gem.h"
 #include "i915_pmu.h"
 #include "i915_priolist_types.h"
+#include "i915_scheduler_types.h"
 #include "i915_selftest.h"
 #include "intel_breadcrumbs_types.h"
 #include "intel_sseu.h"
@@ -257,6 +258,8 @@ struct intel_engine_execlists {
struct rb_root_cached queue;
struct rb_root_cached virtual;
 
+   struct i915_sched_ipi ipi;
+
/**
 * @csb_write: control register for Context Switch buffer
 *
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 84a55df88687..035e4be5d573 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -17,7 +17,25 @@ static struct i915_global_scheduler {
struct kmem_cache *slab_priorities;
 } global;
 
-static DEFINE_SPINLOCK(schedule_lock);
+/*
+ * Virtual engines complicate acquiring the engine timeline lock,
+ * as their rq->engine pointer is not stable until under that
+ * engine lock. The simple ploy we use is to take the lock then
+ * check that the rq still belongs to the newly locked engine.
+ */
+#define lock_engine_irqsave(rq, flags) ({ \
+   struct i915_request * const rq__ = (rq); \
+   struct intel_engine_cs *engine__ = READ_ONCE(rq__->engine); \
+\
+   spin_lock_irqsave(__->active.lock, (flags)); \
+   while (engine__ != READ_ONCE((rq__)->engine)) { \
+   spin_unlock(__->active.lock); \
+   engine__ = READ_ONCE(rq__->engine); \
+   spin_lock(__->active.lock); \
+   } \
+\
+   engine__; \
+})
 
 static struct i915_sched_node *node_get(struct i915_sched_node *node)
 {
@@ -30,17 +48,104 @@ static void node_put(struct i915_sched_node *node)
i915_request_put(container_of(node, struct i915_request, sched));
 }
 
+static inline int rq_prio(const struct i915_request *rq)
+{
+   return READ_ONCE(rq->sched.attr.priority);
+}
+
+static int ipi_get_prio(struct i915_request *rq)
+{
+   if (READ_ONCE(rq->sched.ipi_priority) == I915_PRIORITY_INVALID)
+   return I915_PRIORITY_INVALID;
+
+   return xchg(>sched.ipi_priority, I915_PRIORITY_INVALID);
+}
+
+static void ipi_schedule(struct work_struct *wrk)
+{
+   struct i915_sched_ipi *ipi = container_of(wrk, typeof

[Intel-gfx] [CI 13/14] drm/i915: Extract the ability to defer and rerun a request later

2021-02-02 Thread Chris Wilson
Lift the ability to defer a request until later from execlists into the
common layer.

Signed-off-by: Chris Wilson 
Reviewed-by: Tvrtko Ursulin 
---
 .../drm/i915/gt/intel_execlists_submission.c  | 57 +++--
 drivers/gpu/drm/i915/i915_scheduler.c | 63 +--
 drivers/gpu/drm/i915/i915_scheduler.h |  5 +-
 3 files changed, 67 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index a971b3bee532..b1761d937a5f 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -978,54 +978,6 @@ static void virtual_xfer_context(struct virtual_engine *ve,
}
 }
 
-static void defer_request(struct i915_request *rq, struct list_head * const pl)
-{
-   LIST_HEAD(list);
-
-   /*
-* We want to move the interrupted request to the back of
-* the round-robin list (i.e. its priority level), but
-* in doing so, we must then move all requests that were in
-* flight and were waiting for the interrupted request to
-* be run after it again.
-*/
-   do {
-   struct i915_dependency *p;
-
-   GEM_BUG_ON(i915_request_is_active(rq));
-   list_move_tail(>sched.link, pl);
-
-   for_each_waiter(p, rq) {
-   struct i915_request *w =
-   container_of(p->waiter, typeof(*w), sched);
-
-   if (p->flags & I915_DEPENDENCY_WEAK)
-   continue;
-
-   /* Leave semaphores spinning on the other engines */
-   if (w->engine != rq->engine)
-   continue;
-
-   /* No waiter should start before its signaler */
-   GEM_BUG_ON(i915_request_has_initial_breadcrumb(w) &&
-  __i915_request_has_started(w) &&
-  !__i915_request_is_complete(rq));
-
-   if (!i915_request_is_ready(w))
-   continue;
-
-   if (rq_prio(w) < rq_prio(rq))
-   continue;
-
-   GEM_BUG_ON(rq_prio(w) > rq_prio(rq));
-   GEM_BUG_ON(i915_request_is_active(w));
-   list_move_tail(>sched.link, );
-   }
-
-   rq = list_first_entry_or_null(, typeof(*rq), sched.link);
-   } while (rq);
-}
-
 static void defer_active(struct intel_engine_cs *engine)
 {
struct i915_request *rq;
@@ -1034,7 +986,14 @@ static void defer_active(struct intel_engine_cs *engine)
if (!rq)
return;
 
-   defer_request(rq, i915_sched_lookup_priolist(engine, rq_prio(rq)));
+   /*
+* We want to move the interrupted request to the back of
+* the round-robin list (i.e. its priority level), but
+* in doing so, we must then move all requests that were in
+* flight and were waiting for the interrupted request to
+* be run after it again.
+*/
+   __i915_sched_defer_request(engine, rq);
 }
 
 static bool
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index a5df27061c3c..641141f3ce10 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -179,8 +179,8 @@ static void assert_priolists(struct intel_engine_execlists 
* const execlists)
}
 }
 
-struct list_head *
-i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
+static struct list_head *
+lookup_priolist(struct intel_engine_cs *engine, int prio)
 {
struct intel_engine_execlists * const execlists = >execlists;
struct i915_priolist *p;
@@ -332,7 +332,7 @@ static void __i915_request_set_priority(struct i915_request 
*rq, int prio)
struct list_head *pos = >sched.signalers_list;
struct list_head *plist;
 
-   plist = i915_sched_lookup_priolist(engine, prio);
+   plist = lookup_priolist(engine, prio);
 
/*
 * Recursively bump all dependent priorities to match the new request.
@@ -463,12 +463,63 @@ void i915_request_set_priority(struct i915_request *rq, 
int prio)
spin_unlock_irqrestore(>active.lock, flags);
 }
 
+void __i915_sched_defer_request(struct intel_engine_cs *engine,
+   struct i915_request *rq)
+{
+   struct list_head *pl;
+   LIST_HEAD(list);
+
+   lockdep_assert_held(>active.lock);
+   GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags));
+
+   /*
+* When we defer a request, we must maintain its order with respect
+* to those that are waiting upon it. So we traverse its chain of
+* waiters and move any that are e

[Intel-gfx] [CI 01/14] drm/i915/gt: Move engine setup out of set_default_submission

2021-02-02 Thread Chris Wilson
Now that we no longer switch back and forth between guc and execlists,
we no longer need to restore the backend's vfunc and can leave them set
after initialisation. The only catch is that we lose the submission on
wedging and still need to reset the submit_request vfunc on unwedging.

Signed-off-by: Chris Wilson 
Reviewed-by: Tvrtko Ursulin 
---
 .../drm/i915/gt/intel_execlists_submission.c  | 46 -
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  4 --
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 50 ---
 3 files changed, 44 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 45a8ac152b88..5d824e1cfcba 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3089,29 +3089,6 @@ static void execlists_set_default_submission(struct 
intel_engine_cs *engine)
engine->submit_request = execlists_submit_request;
engine->schedule = i915_schedule;
engine->execlists.tasklet.callback = execlists_submission_tasklet;
-
-   engine->reset.prepare = execlists_reset_prepare;
-   engine->reset.rewind = execlists_reset_rewind;
-   engine->reset.cancel = execlists_reset_cancel;
-   engine->reset.finish = execlists_reset_finish;
-
-   engine->park = execlists_park;
-   engine->unpark = NULL;
-
-   engine->flags |= I915_ENGINE_SUPPORTS_STATS;
-   if (!intel_vgpu_active(engine->i915)) {
-   engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
-   if (can_preempt(engine)) {
-   engine->flags |= I915_ENGINE_HAS_PREEMPTION;
-   if (IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION))
-   engine->flags |= I915_ENGINE_HAS_TIMESLICES;
-   }
-   }
-
-   if (intel_engine_has_preemption(engine))
-   engine->emit_bb_start = gen8_emit_bb_start;
-   else
-   engine->emit_bb_start = gen8_emit_bb_start_noarb;
 }
 
 static void execlists_shutdown(struct intel_engine_cs *engine)
@@ -3142,6 +3119,14 @@ logical_ring_default_vfuncs(struct intel_engine_cs 
*engine)
engine->cops = _context_ops;
engine->request_alloc = execlists_request_alloc;
 
+   engine->reset.prepare = execlists_reset_prepare;
+   engine->reset.rewind = execlists_reset_rewind;
+   engine->reset.cancel = execlists_reset_cancel;
+   engine->reset.finish = execlists_reset_finish;
+
+   engine->park = execlists_park;
+   engine->unpark = NULL;
+
engine->emit_flush = gen8_emit_flush_xcs;
engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_xcs;
@@ -3162,6 +3147,21 @@ logical_ring_default_vfuncs(struct intel_engine_cs 
*engine)
 * until a more refined solution exists.
 */
}
+
+   engine->flags |= I915_ENGINE_SUPPORTS_STATS;
+   if (!intel_vgpu_active(engine->i915)) {
+   engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
+   if (can_preempt(engine)) {
+   engine->flags |= I915_ENGINE_HAS_PREEMPTION;
+   if (IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION))
+   engine->flags |= I915_ENGINE_HAS_TIMESLICES;
+   }
+   }
+
+   if (intel_engine_has_preemption(engine))
+   engine->emit_bb_start = gen8_emit_bb_start;
+   else
+   engine->emit_bb_start = gen8_emit_bb_start_noarb;
 }
 
 static void logical_ring_default_irqs(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c 
b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 9c2c605d7a92..3cb2ce503544 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -969,14 +969,10 @@ static void gen6_bsd_submit_request(struct i915_request 
*request)
 static void i9xx_set_default_submission(struct intel_engine_cs *engine)
 {
engine->submit_request = i9xx_submit_request;
-
-   engine->park = NULL;
-   engine->unpark = NULL;
 }
 
 static void gen6_bsd_set_default_submission(struct intel_engine_cs *engine)
 {
-   i9xx_set_default_submission(engine);
engine->submit_request = gen6_bsd_submit_request;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 92688a9b6717..f72faa0b8339 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -608,35 +608,6 @@ static int guc_resume(struct intel_engine_cs *engine)
 static void guc_set_default_submission(struct intel_engine_cs *engine

Re: [Intel-gfx] [PATCH 17/57] drm/i915: Extract request suspension from the execlists

2021-02-02 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-02-02 13:15:52)
> 
> On 01/02/2021 08:56, Chris Wilson wrote:
> > Make the ability to suspend and resume a request and its dependents
> > generic.

> > +bool __i915_sched_suspend_request(struct intel_engine_cs *engine,
> > +   struct i915_request *rq)
> > +{
> > + LIST_HEAD(list);
> > +
> > + lockdep_assert_held(>active.lock);
> > + GEM_BUG_ON(rq->engine != engine);
> > +
> > + if (__i915_request_is_complete(rq)) /* too late! */
> > + return false;
> > +
> > + if (i915_request_on_hold(rq))
> > + return false;
> 
> This was a GEM_BUG_ON before so not pure extraction / code movement.

It was part of making it generic to allow other callers.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 17/57] drm/i915: Extract request suspension from the execlists

2021-02-02 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-02-02 13:15:52)
> 
> On 01/02/2021 08:56, Chris Wilson wrote:
> > +void __i915_sched_resume_request(struct intel_engine_cs *engine,
> > +  struct i915_request *rq)
> > +{
> > + LIST_HEAD(list);
> > +
> > + lockdep_assert_held(>active.lock);
> > +
> > + if (rq_prio(rq) > engine->execlists.queue_priority_hint) {
> > + engine->execlists.queue_priority_hint = rq_prio(rq);
> > + tasklet_hi_schedule(>execlists.tasklet);
> > + }
> > +
> > + if (!i915_request_on_hold(rq))
> > + return;
> > +
> > + ENGINE_TRACE(engine, "resuming request %llx:%lld\n",
> > +  rq->fence.context, rq->fence.seqno);
> > +
> > + /*
> > +  * Move this request back to the priority queue, and all of its
> > +  * children and grandchildren that were suspended along with it.
> > +  */
> > + do {
> > + struct i915_dependency *p;
> > +
> > + RQ_TRACE(rq, "hold release\n");
> > +
> > + GEM_BUG_ON(!i915_request_on_hold(rq));
> > + GEM_BUG_ON(!i915_sw_fence_signaled(>submit));
> > +
> > + i915_request_clear_hold(rq);
> > + list_del_init(>sched.link);
> > +
> > + queue_request(engine, rq);
> > +
> > + /* Also release any children on this engine that are ready */
> > + for_each_waiter(p, rq) {
> > + struct i915_request *w =
> > + container_of(p->waiter, typeof(*w), sched);
> > +
> > + if (p->flags & I915_DEPENDENCY_WEAK)
> > + continue;
> > +
> > + /* Propagate any change in error status */
> > + if (rq->fence.error)
> > + i915_request_set_error_once(w, 
> > rq->fence.error);
> > +
> > + if (w->engine != engine)
> > + continue;
> > +
> > + /* We also treat the on-hold status as a visited bit 
> > */
> > + if (!i915_request_on_hold(w))
> > + continue;
> > +
> > + /* Check that no other parents are also on hold [BFS] 
> > */
> > + if (hold_request(w))
> > + continue;
> 
> hold_request() appears deleted in the patch so possible rebase error.

The secret is we get to de-duplicate after having duplicated
hold_request() in i915_scheduler in an earlier patch,
  drm/i915: Extract request submission from execlists
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 08/57] drm/i915/gt: Move submission_method into intel_gt

2021-02-02 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-02-02 12:03:02)
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c 
> > b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > index 5d7fac383add..9304a35384aa 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > @@ -4715,7 +4715,7 @@ int intel_execlists_live_selftests(struct 
> > drm_i915_private *i915)
> >   SUBTEST(live_virtual_reset),
> >   };
> >   
> > - if (!HAS_EXECLISTS(i915))
> > + if (i915->gt.submission_method != INTEL_SUBMISSION_ELSP)
> >   return 0;
> >   
> >   if (intel_gt_is_wedged(>gt))
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c 
> > b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
> > index 3350e7c995bc..6cd9f6bc240c 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_ring_submission.c
> > @@ -291,7 +291,7 @@ int intel_ring_submission_live_selftests(struct 
> > drm_i915_private *i915)
> >   SUBTEST(live_ctx_switch_wa),
> >   };
> >   
> > - if (HAS_EXECLISTS(i915))
> > + if (i915->gt.submission_method > INTEL_SUBMISSION_RING)
> 
> Not sure the above two hunks in selftests are an improvement, not seeing 
> how using enum ordering is better than a feature check.

Wait 40 patches.

> Mechanics looks fine. I'd prefer the selftests to remain as is but not 
> mandatory.

The execlists tests are not suitable as-is for the guc. And they are in
the habit of breaking the test to hide impedance mismatches with the
guc.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] Fixes that failed to apply to v5.11-rc4

2021-02-02 Thread Chris Wilson
Quoting Jani Nikula (2021-02-02 07:15:18)
> On Mon, 18 Jan 2021, Jani Nikula  wrote:
> > The following commits have been marked as Cc: stable or fixing something
> > in v5.11-rc4 or earlier, but failed to cherry-pick to
> > drm-intel-fixes. Please see if they are worth backporting, and please do
> > so if they are.
> >
> > Conflicts:
> > dbe13ae1d6ab ("drm/i915/pmu: Don't grab wakeref when enabling events")
> > 9bb36cf66091 ("drm/i915: Check for rq->hwsp validity after acquiring RCU 
> > lock")
> > 5b4dc95cf7f5 ("drm/i915/gt: Prevent use of engine->wa_ctx after error")
> > 6a3daee1b38e ("drm/i915/selftests: Fix some error codes")
> > 67fba3f1c73b ("drm/i915/dp: Fix LTTPR vswing/pre-emp setting in 
> > non-transparent mode")
> >
> > Fails to build:
> > 3170a21f7059 ("drm/i915: Only enable DFP 4:4:4->4:2:0 conversion when 
> > outputting YCbCr 4:4:4")
> >
> > BR,
> > Jani.
> 
> Update.
> 
> Conflicts:
> 5b4dc95cf7f5 ("drm/i915/gt: Prevent use of engine->wa_ctx after error")

Already in 488751a0ef9b ("drm/i915/gt: Prevent use of engine->wa_ctx after 
error")

> 6a3daee1b38e ("drm/i915/selftests: Fix some error codes")

No user or even likely CI impact, not worth backporting [unless it turns
up later as a prerequisite].

> 67fba3f1c73b ("drm/i915/dp: Fix LTTPR vswing/pre-emp setting in 
> non-transparent mode")
> 699390f7f026 ("drm/i915: Fix the PHY compliance test vs. hotplug mishap")
> e7004ea4f5f5 ("drm/i915/gt: Close race between enable_breadcrumbs and 
> cancel_breadcrumbs")

Required at least one other friend.

There's another patch that we need in fixes for v5.10, so I'll include
that: drm/i915/gem: Drop lru bumping on display unpinning

I've put the 3 patches on fdo,
https://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=dif

Hopefully they are a happy bunch.

p.s. 5.11-rc6 kills CI.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/3] i915/perf: Store a mask of valid OA formats for a platform

2021-02-02 Thread Chris Wilson
Quoting Umesh Nerlige Ramappa (2021-02-02 07:54:15)
> Validity of an OA format is checked by using a sparse array of formats
> per gen. Instead maintain a mask of supported formats for a platform in
> the perf object.
> 
> Signed-off-by: Umesh Nerlige Ramappa 
> ---
>  drivers/gpu/drm/i915/i915_perf.c   | 64 +-
>  drivers/gpu/drm/i915/i915_perf_types.h | 16 +++
>  2 files changed, 79 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
> b/drivers/gpu/drm/i915/i915_perf.c
> index 112ba5f2ce90..973577fcad58 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -3524,6 +3524,19 @@ static u64 oa_exponent_to_ns(struct i915_perf *perf, 
> int exponent)
>  2ULL << exponent);
>  }
>  
> +static __always_inline bool
> +oa_format_valid(struct i915_perf *perf, enum drm_i915_oa_format format)
> +{
> +   return !!(perf->format_mask[__format_index(format)] &
> + __format_bit(format));

!! is already provided by the implicit cast to (bool)

> +}
> +
> +static __always_inline void
> +oa_format_add(struct i915_perf *perf, enum drm_i915_oa_format format)
> +{
> +   perf->format_mask[__format_index(format)] |= __format_bit(format);
> +}
> +
>  /**
>   * read_properties_unlocked - validate + copy userspace stream open 
> properties
>   * @perf: i915 perf instance
> @@ -3615,7 +3628,7 @@ static int read_properties_unlocked(struct i915_perf 
> *perf,
>   value);
> return -EINVAL;
> }
> -   if (!perf->oa_formats[value].size) {
> +   if (!oa_format_valid(perf, value)) {
> DRM_DEBUG("Unsupported OA report format 
> %llu\n",
>   value);
> return -EINVAL;
> @@ -4259,6 +4272,53 @@ static struct ctl_table dev_root[] = {
> {}
>  };
>  
> +static void oa_init_supported_formats(struct i915_perf *perf)
> +{
> +   struct drm_i915_private *i915 = perf->i915;
> +   enum intel_platform platform = INTEL_INFO(i915)->platform;
> +
> +   switch (platform) {
> +   case INTEL_HASWELL:
> +   oa_format_add(perf, I915_OA_FORMAT_A13);
> +   oa_format_add(perf, I915_OA_FORMAT_A13);
> +   oa_format_add(perf, I915_OA_FORMAT_A29);
> +   oa_format_add(perf, I915_OA_FORMAT_A13_B8_C8);
> +   oa_format_add(perf, I915_OA_FORMAT_B4_C8);
> +   oa_format_add(perf, I915_OA_FORMAT_A45_B8_C8);
> +   oa_format_add(perf, I915_OA_FORMAT_B4_C8_A16);
> +   oa_format_add(perf, I915_OA_FORMAT_C4_B8);
> +   break;
> +
> +   case INTEL_BROADWELL:
> +   case INTEL_CHERRYVIEW:
> +   case INTEL_SKYLAKE:
> +   case INTEL_BROXTON:
> +   case INTEL_KABYLAKE:
> +   case INTEL_GEMINILAKE:
> +   case INTEL_COFFEELAKE:
> +   case INTEL_COMETLAKE:
> +   case INTEL_CANNONLAKE:
> +   case INTEL_ICELAKE:
> +   case INTEL_ELKHARTLAKE:
> +   case INTEL_JASPERLAKE:
> +   oa_format_add(perf, I915_OA_FORMAT_A12);
> +   oa_format_add(perf, I915_OA_FORMAT_A12_B8_C8);
> +   oa_format_add(perf, I915_OA_FORMAT_A32u40_A4u32_B8_C8);
> +   oa_format_add(perf, I915_OA_FORMAT_C4_B8);
> +   break;

Ok, this looks as compact and readable as writing it as a bunch of
tables. I presume there's a reason you didn't just use generation rather
than platform.

switch (gen) {
case 7:
haswell();
break;
case 8 .. 11:
broadwell();
break;
case 12:
tigerlake();
break;
}
if you wanted to stick with a switch rather than an if-else tree for the
ranges.

Note you could equally do 
case INTEL_BROADWELL .. INTEL_JASPERLAKE:
but I expect that to cause confusion for the reader.

> +   /**
> +* Use a format mask to store the supported formats
> +* for a platform.
> +*/
> +#define __fbits (BITS_PER_TYPE(u32))
> +#define __format_bit(__f) \
> +   BIT((__f) & (__fbits - 1))
> +
> +#define __format_index_shift (5)
> +#define __format_index(__f) \
> +   (((__f) & ~(__fbits - 1)) >> __format_index_shift)
> +
> +#define FORMAT_MASK_SIZE (((I915_OA_FORMAT_MAX - 1) / __fbits) + 1)
> +   u32 format_mask[FORMAT_MASK_SIZE];

This is just open-coding set_bit/test_bit

#define FORMAT_MASK_SIZE DIV_ROUND_UP(I915_OA_FORMAT_MAX - 1, BITS_PER_LONG)
unsigned long format_mask[FORMAT_MASK_SIZE];

static __always_inline bool
oa_format_valid(struct i915_perf *perf, enum drm_i915_oa_format format)
{
return test_bit(format, perf->format_mask);
}

static __always_inline void
oa_format_add(struct i915_perf *perf, enum drm_i915_oa_format format)
{
__set_bit(format, perf->format_mask);
}
-Chris

Re: [Intel-gfx] [PATCH] drm/i915/dg1: Add GuC and HuC support

2021-02-01 Thread Chris Wilson
Quoting Srivatsa, Anusha (2021-02-01 23:19:40)
> 
> 
> > -Original Message-
> > From: Chris Wilson 
> > Sent: Monday, February 1, 2021 3:05 PM
> > To: Srivatsa, Anusha ; intel-
> > g...@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH] drm/i915/dg1: Add GuC and HuC support
> > 
> > Quoting Anusha Srivatsa (2021-02-01 23:01:33)
> > > Add support to load GuC and HuC firmware for Dg1.
> > 
> > Do you have the corresponding link for the linux-firmware.git? That is
> > useful for cross referencing that the target version does exist in the 
> > public
> > repository.
> 
> I am waiting for CI runs before I can propagate it to linux-firmware.git. 

>From upstream CI? We don't have guc loading enabled for dg1, or much of
dg1 for that matter. Best we can do is compile check :(
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/dg1: Add GuC and HuC support

2021-02-01 Thread Chris Wilson
Quoting Anusha Srivatsa (2021-02-01 23:01:33)
> Add support to load GuC and HuC firmware for Dg1.

Do you have the corresponding link for the linux-firmware.git? That is
useful for cross referencing that the target version does exist in the
public repository.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915/gt: Retire unexpected starting state error dumping

2021-02-01 Thread Chris Wilson
We have not seen an occurrence of the false restart state recenty, and if
we did such an event from inside engine-reset, it would deadlock on
trying to suspend the tasklet to read the register state. Instead, we
inspect the context state before submission which will alert us to any
issues prior to execution.

Signed-off-by: Chris Wilson 
Cc: Mika Kuoppala 
Cc: Tvrtko Ursulin 
Cc: Andi Shyti 
---
 .../drm/i915/gt/intel_execlists_submission.c  | 20 ---
 1 file changed, 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index e7593df6777d..30c7631323e5 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -2742,31 +2742,11 @@ static void enable_execlists(struct intel_engine_cs 
*engine)
enable_error_interrupt(engine);
 }
 
-static bool unexpected_starting_state(struct intel_engine_cs *engine)
-{
-   bool unexpected = false;
-
-   if (ENGINE_READ_FW(engine, RING_MI_MODE) & STOP_RING) {
-   drm_dbg(>i915->drm,
-   "STOP_RING still set in RING_MI_MODE\n");
-   unexpected = true;
-   }
-
-   return unexpected;
-}
-
 static int execlists_resume(struct intel_engine_cs *engine)
 {
intel_mocs_init_engine(engine);
-
intel_breadcrumbs_reset(engine->breadcrumbs);
 
-   if (GEM_SHOW_DEBUG() && unexpected_starting_state(engine)) {
-   struct drm_printer p = drm_debug_printer(__func__);
-
-   intel_engine_dump(engine, , NULL);
-   }
-
enable_execlists(engine);
 
return 0;
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make lrc_init_wa_ctx compatible with ww locking, v3.

2021-02-01 Thread Chris Wilson
Quoting Maarten Lankhorst (2021-02-01 12:50:37)
> Make creation separate from pinning, in order to take the lock only
> once, and pin the mapping with the lock held.
> 
> Changes since v1:
> - Rebase on top of upstream changes.
> Changes since v2:
> - Fully clear wa_ctx on error.
> 
> Signed-off-by: Maarten Lankhorst 
> Reviewed-by: Thomas Hellström 
> ---
>  drivers/gpu/drm/i915/gt/intel_lrc.c | 49 ++---
>  1 file changed, 38 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
> b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 8508b8d701c1..a2b916d27a39 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -1421,7 +1421,7 @@ gen10_init_indirectctx_bb(struct intel_engine_cs 
> *engine, u32 *batch)
>  
>  #define CTX_WA_BB_SIZE (PAGE_SIZE)
>  
> -static int lrc_setup_wa_ctx(struct intel_engine_cs *engine)
> +static int lrc_create_wa_ctx(struct intel_engine_cs *engine)
>  {
> struct drm_i915_gem_object *obj;
> struct i915_vma *vma;
> @@ -1437,10 +1437,6 @@ static int lrc_setup_wa_ctx(struct intel_engine_cs 
> *engine)
> goto err;
> }
>  
> -   err = i915_ggtt_pin(vma, NULL, 0, PIN_HIGH);
> -   if (err)
> -   goto err;
> -
> engine->wa_ctx.vma = vma;
> return 0;
>  
> @@ -1452,9 +1448,6 @@ static int lrc_setup_wa_ctx(struct intel_engine_cs 
> *engine)
>  void lrc_fini_wa_ctx(struct intel_engine_cs *engine)
>  {
> i915_vma_unpin_and_release(>wa_ctx.vma, 0);
> -
> -   /* Called on error unwind, clear all flags to prevent further use */
> -   memset(>wa_ctx, 0, sizeof(engine->wa_ctx));
>  }
>  
>  typedef u32 *(*wa_bb_func_t)(struct intel_engine_cs *engine, u32 *batch);
> @@ -1466,6 +1459,7 @@ void lrc_init_wa_ctx(struct intel_engine_cs *engine)
> _ctx->indirect_ctx, _ctx->per_ctx
> };
> wa_bb_func_t wa_bb_fn[ARRAY_SIZE(wa_bb)];
> +   struct i915_gem_ww_ctx ww;
> void *batch, *batch_ptr;
> unsigned int i;
> int err;
> @@ -1494,7 +1488,7 @@ void lrc_init_wa_ctx(struct intel_engine_cs *engine)
> return;
> }
>  
> -   err = lrc_setup_wa_ctx(engine);
> +   err = lrc_create_wa_ctx(engine);
> if (err) {
> /*
>  * We continue even if we fail to initialize WA batch
> @@ -1507,7 +1501,22 @@ void lrc_init_wa_ctx(struct intel_engine_cs *engine)
> return;
> }
>  
> +   if (!engine->wa_ctx.vma)
> +   return;
> +
> +   i915_gem_ww_ctx_init(, true);
> +retry:
> +   err = i915_gem_object_lock(wa_ctx->vma->obj, );
> +   if (!err)
> +   err = i915_ggtt_pin(wa_ctx->vma, , 0, PIN_HIGH);
> +   if (err)
> +   goto err;
> +
> batch = i915_gem_object_pin_map(wa_ctx->vma->obj, I915_MAP_WB);

Given that the pages are already pinned and must remain pinned for the
lifetime of the engine, and the ww is not used here to setup the CPU
page tables, fix the lack of primitives.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] intel_gpu_top: Wrap interactive header

2021-02-01 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-02-01 11:57:56)
> From: Tvrtko Ursulin 
> 
> Slight improvement with regards to wrapping header components to fit
> console width. If a single element is wider than max it can still
> overflow but it should now work better for practical console widths.

<
intel-gpu-top: Intel Kabylake (Gen9) @ /dev/dri/card0
 900/ 949 MHz;   0% RC6;  6.97/18.42 W;2 irqs/s
  IMC reads:6 MiB/s
 IMC writes:0 MiB/s

>
intel-gpu-top: Intel Kabylake (Gen9) @ /dev/dri/card0 -  903/ 954 MHz;   0% RC6
7.16/18.40 W;   14 irqs/s

  IMC reads:   80 MiB/s
 IMC writes:0 MiB/s

I thought it looked reasonably tidy, without adding any lines to the
header.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t 2/3] intel_gpu_top: Add option to hide inactive clients

2021-02-01 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-02-01 10:45:23)
> From: Tvrtko Ursulin 
> 
> Allow hiding inactive clients (used no GPU time ever) in interactive mode
> by pressing 'i'.
> 
> Signed-off-by: Tvrtko Ursulin 

Ok, that's where you meant. Coffee not winning the battle today.

Reviewed-by: Chris Wilson 
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t 1/3] intel_gpu_top: Update manual page for recent additions

2021-02-01 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-02-01 10:45:22)
> From: Tvrtko Ursulin 
> 
> Document numeric busyness overlay and sort selection.

I looked for a 'h' or '?' screen.

Reviewed-by: Chris Wilson 
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 3/3] intel_gpu_top: Fix interactive header

2021-02-01 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-02-01 10:45:24)
> From: Tvrtko Ursulin 
> 
> Client stats refactoring broke the header layout with an extra newline.

Argh; keep the newline, or at least check the terminal width and make
the newline conditional.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915/selftests: Use a single copy of the mocs table

2021-02-01 Thread Chris Wilson
Instead of copying the whole table to each category (mocs, l3cc), use a
single table with a pointer to it if the category is enabled.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/selftest_mocs.c | 32 +
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_mocs.c 
b/drivers/gpu/drm/i915/gt/selftest_mocs.c
index e6f6807487d4..44609d1c7780 100644
--- a/drivers/gpu/drm/i915/gt/selftest_mocs.c
+++ b/drivers/gpu/drm/i915/gt/selftest_mocs.c
@@ -12,8 +12,9 @@
 #include "selftests/igt_spinner.h"
 
 struct live_mocs {
-   struct drm_i915_mocs_table mocs;
-   struct drm_i915_mocs_table l3cc;
+   struct drm_i915_mocs_table table;
+   struct drm_i915_mocs_table *mocs;
+   struct drm_i915_mocs_table *l3cc;
struct i915_vma *scratch;
void *vaddr;
 };
@@ -58,21 +59,20 @@ static int request_add_spin(struct i915_request *rq, struct 
igt_spinner *spin)
 
 static int live_mocs_init(struct live_mocs *arg, struct intel_gt *gt)
 {
-   struct drm_i915_mocs_table table;
unsigned int flags;
int err;
 
memset(arg, 0, sizeof(*arg));
 
-   flags = get_mocs_settings(gt->i915, );
+   flags = get_mocs_settings(gt->i915, >table);
if (!flags)
return -EINVAL;
 
if (flags & HAS_RENDER_L3CC)
-   arg->l3cc = table;
+   arg->l3cc = >table;
 
if (flags & (HAS_GLOBAL_MOCS | HAS_ENGINE_MOCS))
-   arg->mocs = table;
+   arg->mocs = >table;
 
arg->scratch = __vm_create_scratch_for_read(>ggtt->vm, PAGE_SIZE);
if (IS_ERR(arg->scratch))
@@ -130,6 +130,9 @@ static int read_mocs_table(struct i915_request *rq,
 {
u32 addr;
 
+   if (!table)
+   return 0;
+
if (HAS_GLOBAL_MOCS_REGISTERS(rq->engine->i915))
addr = global_mocs_offset();
else
@@ -144,6 +147,9 @@ static int read_l3cc_table(struct i915_request *rq,
 {
u32 addr = i915_mmio_reg_offset(GEN9_LNCFCMOCS(0));
 
+   if (!table)
+   return 0;
+
return read_regs(rq, addr, (table->n_entries + 1) / 2, offset);
 }
 
@@ -154,6 +160,9 @@ static int check_mocs_table(struct intel_engine_cs *engine,
unsigned int i;
u32 expect;
 
+   if (!table)
+   return 0;
+
for_each_mocs(expect, table, i) {
if (**vaddr != expect) {
pr_err("%s: Invalid MOCS[%d] entry, found %08x, 
expected %08x\n",
@@ -185,6 +194,9 @@ static int check_l3cc_table(struct intel_engine_cs *engine,
unsigned int i;
u32 expect;
 
+   if (!table)
+   return 0;
+
for_each_l3cc(expect, table, i) {
if (!mcr_range(engine->i915, reg) && **vaddr != expect) {
pr_err("%s: Invalid L3CC[%d] entry, found %08x, 
expected %08x\n",
@@ -222,9 +234,9 @@ static int check_mocs_engine(struct live_mocs *arg,
/* Read the mocs tables back using SRM */
offset = i915_ggtt_offset(vma);
if (!err)
-   err = read_mocs_table(rq, >mocs, );
+   err = read_mocs_table(rq, arg->mocs, );
if (!err && ce->engine->class == RENDER_CLASS)
-   err = read_l3cc_table(rq, >l3cc, );
+   err = read_l3cc_table(rq, arg->l3cc, );
offset -= i915_ggtt_offset(vma);
GEM_BUG_ON(offset > PAGE_SIZE);
 
@@ -235,9 +247,9 @@ static int check_mocs_engine(struct live_mocs *arg,
/* Compare the results against the expected tables */
vaddr = arg->vaddr;
if (!err)
-   err = check_mocs_table(ce->engine, >mocs, );
+   err = check_mocs_table(ce->engine, arg->mocs, );
if (!err && ce->engine->class == RENDER_CLASS)
-   err = check_l3cc_table(ce->engine, >l3cc, );
+   err = check_l3cc_table(ce->engine, arg->l3cc, );
if (err)
return err;
 
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [igt-dev] [PATCH i-g-t v2] intel_gpu_top: Hide unused clients

2021-02-01 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-02-01 09:53:20)
> 
> On 01/02/2021 09:31, Chris Wilson wrote:
> > Hide inactive clients by pressing 'i' (toggle in interactive mode).
> > 
> > v2: Fix location of filter_idle.
> > 
> > Signed-off-by: Chris Wilson 
> > Cc: Tvrtko Ursulin 
> > Reviewed-by: Tvrtko Ursulin 
> > ---
> >   tools/intel_gpu_top.c | 9 +
> >   1 file changed, 9 insertions(+)
> > 
> > diff --git a/tools/intel_gpu_top.c b/tools/intel_gpu_top.c
> > index 60ff62d28..d88b6cc61 100644
> > --- a/tools/intel_gpu_top.c
> > +++ b/tools/intel_gpu_top.c
> > @@ -1595,6 +1595,7 @@ print_imc(struct engines *engines, double t, int 
> > lines, int con_w, int con_h)
> >   }
> >   
> >   static bool class_view;
> > +static bool filter_idle;
> >   
> >   static int
> >   print_engines_header(struct engines *engines, double t,
> > @@ -2115,6 +2116,9 @@ static void process_stdin(unsigned int timeout_us)
> >   case 'q':
> >   stop_top = true;
> >   break;
> > + case 'i':
> > + filter_idle ^= true;
> > + break;
> >   case '1':
> >   class_view ^= true;
> >   break;
> > @@ -2323,9 +2327,14 @@ int main(int argc, char **argv)
> >   
> >   for_each_client(clients, c, j) {
> >   assert(c->status != PROBE);
> > +
> >   if (c->status != ALIVE)
> >   break; /* Active clients are 
> > first in the array. */
> >   
> > + /* Active clients before idle */
> > + if (filter_idle && !c->total_runtime)
> > + break;
> > +
> 
> Break won't be correct for id sort. I don't see what did not work with 
> v1? It should be effectively the same apart from the break.

We didn't the client to peek into.

Maybe you want to do v3 :)
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH i-g-t v2] intel_gpu_top: Hide unused clients

2021-02-01 Thread Chris Wilson
Hide inactive clients by pressing 'i' (toggle in interactive mode).

v2: Fix location of filter_idle.

Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
Reviewed-by: Tvrtko Ursulin 
---
 tools/intel_gpu_top.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/tools/intel_gpu_top.c b/tools/intel_gpu_top.c
index 60ff62d28..d88b6cc61 100644
--- a/tools/intel_gpu_top.c
+++ b/tools/intel_gpu_top.c
@@ -1595,6 +1595,7 @@ print_imc(struct engines *engines, double t, int lines, 
int con_w, int con_h)
 }
 
 static bool class_view;
+static bool filter_idle;
 
 static int
 print_engines_header(struct engines *engines, double t,
@@ -2115,6 +2116,9 @@ static void process_stdin(unsigned int timeout_us)
case 'q':
stop_top = true;
break;
+   case 'i':
+   filter_idle ^= true;
+   break;
case '1':
class_view ^= true;
break;
@@ -2323,9 +2327,14 @@ int main(int argc, char **argv)
 
for_each_client(clients, c, j) {
assert(c->status != PROBE);
+
if (c->status != ALIVE)
break; /* Active clients are 
first in the array. */
 
+   /* Active clients before idle */
+   if (filter_idle && !c->total_runtime)
+   break;
+
if (lines >= con_h)
break;
 
-- 
2.30.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 08/57] drm/i915/gt: Move submission_method into intel_gt

2021-02-01 Thread Chris Wilson
Since we setup the submission method for the engines once, it is easy to
assign an enum and use that instead of probing into the backends.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/intel_engine.h   |  8 +++-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c| 12 
 drivers/gpu/drm/i915/gt/intel_execlists_submission.c |  8 
 drivers/gpu/drm/i915/gt/intel_execlists_submission.h |  3 ---
 drivers/gpu/drm/i915/gt/intel_gt_types.h |  7 +++
 drivers/gpu/drm/i915/gt/intel_reset.c|  7 +++
 drivers/gpu/drm/i915/gt/selftest_execlists.c |  2 +-
 drivers/gpu/drm/i915/gt/selftest_ring_submission.c   |  2 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c|  5 -
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h|  1 -
 drivers/gpu/drm/i915/i915_perf.c | 10 +-
 11 files changed, 32 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
b/drivers/gpu/drm/i915/gt/intel_engine.h
index 47ee8578e511..8d9184920c51 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -13,8 +13,9 @@
 #include "i915_reg.h"
 #include "i915_request.h"
 #include "i915_selftest.h"
-#include "gt/intel_timeline.h"
 #include "intel_engine_types.h"
+#include "intel_gt_types.h"
+#include "intel_timeline.h"
 #include "intel_workarounds.h"
 
 struct drm_printer;
@@ -262,6 +263,11 @@ void intel_engine_init_active(struct intel_engine_cs 
*engine,
 #define ENGINE_MOCK1
 #define ENGINE_VIRTUAL 2
 
+static inline bool intel_engine_uses_guc(const struct intel_engine_cs *engine)
+{
+   return engine->gt->submission_method >= INTEL_SUBMISSION_GUC;
+}
+
 static inline bool
 intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 727128c0166a..3d1bf6b3c3bf 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -891,12 +891,16 @@ int intel_engines_init(struct intel_gt *gt)
enum intel_engine_id id;
int err;
 
-   if (intel_uc_uses_guc_submission(>uc))
+   if (intel_uc_uses_guc_submission(>uc)) {
+   gt->submission_method = INTEL_SUBMISSION_GUC;
setup = intel_guc_submission_setup;
-   else if (HAS_EXECLISTS(gt->i915))
+   } else if (HAS_EXECLISTS(gt->i915)) {
+   gt->submission_method = INTEL_SUBMISSION_ELSP;
setup = intel_execlists_submission_setup;
-   else
+   } else {
+   gt->submission_method = INTEL_SUBMISSION_RING;
setup = intel_ring_submission_setup;
+   }
 
for_each_engine(engine, gt, id) {
err = engine_setup_common(engine);
@@ -1461,7 +1465,7 @@ static void intel_engine_print_registers(struct 
intel_engine_cs *engine,
drm_printf(m, "\tIPEHR: 0x%08x\n", ENGINE_READ(engine, IPEHR));
}
 
-   if (intel_engine_in_guc_submission_mode(engine)) {
+   if (intel_engine_uses_guc(engine)) {
/* nothing to print yet */
} else if (HAS_EXECLISTS(dev_priv)) {
struct i915_request * const *port, *rq;
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 5d824e1cfcba..4ddd2099a931 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -1757,7 +1757,6 @@ process_csb(struct intel_engine_cs *engine, struct 
i915_request **inactive)
 */
GEM_BUG_ON(!tasklet_is_locked(>tasklet) &&
   !reset_in_progress(execlists));
-   GEM_BUG_ON(!intel_engine_in_execlists_submission_mode(engine));
 
/*
 * Note that csb_write, csb_status may be either in HWSP or mmio.
@@ -3897,13 +3896,6 @@ void intel_execlists_show_requests(struct 
intel_engine_cs *engine,
spin_unlock_irqrestore(>active.lock, flags);
 }
 
-bool
-intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine)
-{
-   return engine->set_default_submission ==
-  execlists_set_default_submission;
-}
-
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftest_execlists.c"
 #endif
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
index a8fd7adefd82..f7bd3fccfee8 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
@@ -41,7 +41,4 @@ int intel_virtual_engine_attach_bond(struct intel_engine_cs 
*engine,
 const struct intel_engine_cs *master,
  

[Intel-gfx] [PATCH 06/57] drm/i915/gt: Always flush the submission queue on checking for idle

2021-02-01 Thread Chris Wilson
We check for idle during debug prints and other debugging actions.
Simplify the flow by not touching execlists state.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 0aaf0626425a..727128c0166a 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1248,14 +1248,8 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
return true;
 
/* Waiting to drain ELSP? */
-   if (execlists_active(>execlists)) {
-   synchronize_hardirq(engine->i915->drm.pdev->irq);
-
-   intel_engine_flush_submission(engine);
-
-   if (execlists_active(>execlists))
-   return false;
-   }
+   synchronize_hardirq(engine->i915->drm.pdev->irq);
+   intel_engine_flush_submission(engine);
 
/* ELSP is empty, but there are ready requests? E.g. after reset */
if (!RB_EMPTY_ROOT(>execlists.queue.rb_root))
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 21/57] drm/i915: Move common active lists from engine to i915_scheduler

2021-02-01 Thread Chris Wilson
Extract the scheduler lists into a related structure, stop sprawling
over struct intel_engine_cs. Also transfer the responsibility of tracing
the scheduler events from ENGINE_TRACE() to SCHED_TRACE().

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  8 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c | 33 ++--
 drivers/gpu/drm/i915/gt/intel_engine_types.h  | 10 +--
 .../drm/i915/gt/intel_execlists_submission.c  | 27 ---
 drivers/gpu/drm/i915/gt/mock_engine.c |  7 +-
 drivers/gpu/drm/i915/i915_request.c   |  8 +-
 drivers/gpu/drm/i915/i915_request.h   |  8 +-
 drivers/gpu/drm/i915/i915_scheduler.c | 78 ++-
 drivers/gpu/drm/i915/i915_scheduler.h | 13 +++-
 drivers/gpu/drm/i915/i915_scheduler_types.h   | 31 +++-
 .../gpu/drm/i915/selftests/i915_scheduler.c   |  1 +
 11 files changed, 143 insertions(+), 81 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index ecacfae8412d..ca37d93ef5e7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -422,11 +422,11 @@ __active_engine(struct i915_request *rq, struct 
intel_engine_cs **active)
 * check that we have acquired the lock on the final engine.
 */
locked = READ_ONCE(rq->engine);
-   spin_lock_irq(>active.lock);
+   spin_lock_irq(>sched.lock);
while (unlikely(locked != (engine = READ_ONCE(rq->engine {
-   spin_unlock(>active.lock);
+   spin_unlock(>sched.lock);
locked = engine;
-   spin_lock(>active.lock);
+   spin_lock(>sched.lock);
}
 
if (i915_request_is_active(rq)) {
@@ -435,7 +435,7 @@ __active_engine(struct i915_request *rq, struct 
intel_engine_cs **active)
ret = true;
}
 
-   spin_unlock_irq(>active.lock);
+   spin_unlock_irq(>sched.lock);
 
return ret;
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index a2916c7fcc48..d7ff84d92936 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -575,8 +575,6 @@ void intel_engine_init_execlists(struct intel_engine_cs 
*engine)
 
execlists->queue_priority_hint = INT_MIN;
execlists->queue = RB_ROOT_CACHED;
-
-   i915_sched_init_ipi(>ipi);
 }
 
 static void cleanup_status_page(struct intel_engine_cs *engine)
@@ -692,7 +690,12 @@ static int engine_setup_common(struct intel_engine_cs 
*engine)
goto err_status;
}
 
-   intel_engine_init_active(engine, ENGINE_PHYSICAL);
+   i915_sched_init(>sched,
+   engine->i915->drm.dev,
+   engine->name,
+   engine->mask,
+   ENGINE_PHYSICAL);
+
intel_engine_init_execlists(engine);
intel_engine_init_cmd_parser(engine);
intel_engine_init__pm(engine);
@@ -761,28 +764,6 @@ static int measure_breadcrumb_dw(struct intel_context *ce)
return dw;
 }
 
-void
-intel_engine_init_active(struct intel_engine_cs *engine, unsigned int subclass)
-{
-   INIT_LIST_HEAD(>active.requests);
-   INIT_LIST_HEAD(>active.hold);
-
-   spin_lock_init(>active.lock);
-   lockdep_set_subclass(>active.lock, subclass);
-
-   /*
-* Due to an interesting quirk in lockdep's internal debug tracking,
-* after setting a subclass we must ensure the lock is used. Otherwise,
-* nr_unused_locks is incremented once too often.
-*/
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
-   local_irq_disable();
-   lock_map_acquire(>active.lock.dep_map);
-   lock_map_release(>active.lock.dep_map);
-   local_irq_enable();
-#endif
-}
-
 static struct intel_context *
 create_pinned_context(struct intel_engine_cs *engine,
  unsigned int hwsp,
@@ -930,7 +911,7 @@ int intel_engines_init(struct intel_gt *gt)
  */
 void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 {
-   GEM_BUG_ON(!list_empty(>active.requests));
+   GEM_BUG_ON(!list_empty(>sched.requests));
tasklet_kill(>execlists.tasklet); /* flush the callback */
 
intel_breadcrumbs_free(engine->breadcrumbs);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index e5637e831d28..0936b0699cbb 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -258,8 +258,6 @@ struct intel_engine_execlists {
struct rb_root_cached queue;
struct rb_root_cached virtual;
 
-   struct i915_sched_ipi ipi;
-
/**
 * @csb_write: control register for Context Switch buffer
 *
@@ -329,11 +327,7 @@ st

[Intel-gfx] [PATCH 48/57] drm/i915/selftests: Exercise relative timeline modes

2021-02-01 Thread Chris Wilson
A quick test to verify that the backend accepts each type of timeline
and can use them to track and control request emission.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/selftest_timeline.c | 105 
 1 file changed, 105 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/selftest_timeline.c 
b/drivers/gpu/drm/i915/gt/selftest_timeline.c
index 6b412228a6fd..dcc03522b277 100644
--- a/drivers/gpu/drm/i915/gt/selftest_timeline.c
+++ b/drivers/gpu/drm/i915/gt/selftest_timeline.c
@@ -1364,9 +1364,114 @@ static int live_hwsp_recycle(void *arg)
return err;
 }
 
+static int live_hwsp_relative(void *arg)
+{
+   struct intel_gt *gt = arg;
+   struct intel_engine_cs *engine;
+   enum intel_engine_id id;
+
+   /*
+* Check backend support for different timeline modes.
+*/
+
+   for_each_engine(engine, gt, id) {
+   enum intel_timeline_mode mode;
+
+   if (!intel_engine_has_scheduler(engine))
+   continue;
+
+   for (mode = INTEL_TIMELINE_ABSOLUTE;
+mode <= INTEL_TIMELINE_RELATIVE_ENGINE;
+mode++) {
+   struct intel_timeline *tl;
+   struct i915_request *rq;
+   struct intel_context *ce;
+   const char *msg;
+   int err;
+
+   if (mode == INTEL_TIMELINE_RELATIVE_CONTEXT &&
+   !HAS_EXECLISTS(gt->i915))
+   continue;
+
+   ce = intel_context_create(engine);
+   if (IS_ERR(ce))
+   return PTR_ERR(ce);
+
+   err = intel_context_alloc_state(ce);
+   if (err) {
+   intel_context_put(ce);
+   return err;
+   }
+
+   switch (mode) {
+   case INTEL_TIMELINE_ABSOLUTE:
+   tl = intel_timeline_create(gt);
+   msg = "local";
+   break;
+
+   case INTEL_TIMELINE_RELATIVE_CONTEXT:
+   tl = __intel_timeline_create(gt,
+ce->state,
+
INTEL_TIMELINE_RELATIVE_CONTEXT |
+0x400);
+   msg = "ppHWSP";
+   break;
+
+   case INTEL_TIMELINE_RELATIVE_ENGINE:
+   tl = __intel_timeline_create(gt,
+
engine->status_page.vma,
+0x400);
+   msg = "HWSP";
+   break;
+   default:
+   continue;
+   }
+   if (IS_ERR(tl)) {
+   intel_context_put(ce);
+   return PTR_ERR(tl);
+   }
+
+   pr_info("Testing %s timeline on %s\n",
+   msg, engine->name);
+
+   intel_timeline_put(ce->timeline);
+   ce->timeline = tl;
+
+   err = intel_timeline_pin(tl, NULL);
+   if (err) {
+   intel_context_put(ce);
+   return err;
+   }
+   tl->seqno = 0xc000;
+   WRITE_ONCE(*(u32 *)tl->hwsp_seqno, tl->seqno);
+   intel_timeline_unpin(tl);
+
+   rq = intel_context_create_request(ce);
+   intel_context_put(ce);
+   if (IS_ERR(rq))
+   return PTR_ERR(rq);
+
+   GEM_BUG_ON(rcu_access_pointer(rq->timeline) != tl);
+
+   i915_request_get(rq);
+   i915_request_add(rq);
+
+   if (i915_request_wait(rq, 0, HZ / 5) < 0) {
+   i915_request_put(rq);
+   return -EIO;
+   }
+
+   i915_request_put(rq);
+   }
+   }
+
+   return 0;
+}
+
 int intel_timeline_live_selftests(struct drm_i915_private *i915)
 {
static const struct i915_subtest tests[] = {
+   SUBTEST(live_hwsp_relative),
SUBTEST(live_hwsp_recycle),
SUBTEST(live_hwsp_engine),
SUBTEST(live_hwsp_alternate),
-- 
2.20.1


[Intel-gfx] [PATCH 04/57] drm/i915: Protect against request freeing during cancellation on wedging

2021-02-01 Thread Chris Wilson
As soon as we mark a request as completed, it may be retired. So when
cancelling a request and marking it complete, make sure we first keep a
reference to the request.

Signed-off-by: Chris Wilson 
---
 .../drm/i915/gt/intel_execlists_submission.c  | 19 +++
 drivers/gpu/drm/i915/gt/intel_reset.c | 15 ++-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  2 +-
 drivers/gpu/drm/i915/gt/mock_engine.c |  8 +---
 drivers/gpu/drm/i915/i915_request.c   |  9 +++--
 drivers/gpu/drm/i915/i915_request.h   |  2 +-
 6 files changed, 31 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index e7593df6777d..45a8ac152b88 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -2976,7 +2976,7 @@ static void execlists_reset_cancel(struct intel_engine_cs 
*engine)
 
/* Mark all executing requests as skipped. */
list_for_each_entry(rq, >active.requests, sched.link)
-   i915_request_mark_eio(rq);
+   i915_request_put(i915_request_mark_eio(rq));
intel_engine_signal_breadcrumbs(engine);
 
/* Flush the queued requests to the timeline list (for retiring). */
@@ -2984,8 +2984,10 @@ static void execlists_reset_cancel(struct 
intel_engine_cs *engine)
struct i915_priolist *p = to_priolist(rb);
 
priolist_for_each_request_consume(rq, rn, p) {
-   i915_request_mark_eio(rq);
-   __i915_request_submit(rq);
+   if (i915_request_mark_eio(rq)) {
+   __i915_request_submit(rq);
+   i915_request_put(rq);
+   }
}
 
rb_erase_cached(>node, >queue);
@@ -2994,7 +2996,7 @@ static void execlists_reset_cancel(struct intel_engine_cs 
*engine)
 
/* On-hold requests will be flushed to timeline upon their release */
list_for_each_entry(rq, >active.hold, sched.link)
-   i915_request_mark_eio(rq);
+   i915_request_put(i915_request_mark_eio(rq));
 
/* Cancel all attached virtual engines */
while ((rb = rb_first_cached(>virtual))) {
@@ -3007,10 +3009,11 @@ static void execlists_reset_cancel(struct 
intel_engine_cs *engine)
spin_lock(>base.active.lock);
rq = fetch_and_zero(>request);
if (rq) {
-   i915_request_mark_eio(rq);
-
-   rq->engine = engine;
-   __i915_request_submit(rq);
+   if (i915_request_mark_eio(rq)) {
+   rq->engine = engine;
+   __i915_request_submit(rq);
+   i915_request_put(rq);
+   }
i915_request_put(rq);
 
ve->base.execlists.queue_priority_hint = INT_MIN;
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c 
b/drivers/gpu/drm/i915/gt/intel_reset.c
index 107430e1e864..a82c4d7b23bc 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -786,18 +786,15 @@ static void reset_finish(struct intel_gt *gt, 
intel_engine_mask_t awake)
 
 static void nop_submit_request(struct i915_request *request)
 {
-   struct intel_engine_cs *engine = request->engine;
-   unsigned long flags;
-
RQ_TRACE(request, "-EIO\n");
-   i915_request_set_error_once(request, -EIO);
 
-   spin_lock_irqsave(>active.lock, flags);
-   __i915_request_submit(request);
-   i915_request_mark_complete(request);
-   spin_unlock_irqrestore(>active.lock, flags);
+   request = i915_request_mark_eio(request);
+   if (request) {
+   i915_request_submit(request);
+   intel_engine_signal_breadcrumbs(request->engine);
 
-   intel_engine_signal_breadcrumbs(engine);
+   i915_request_put(request);
+   }
 }
 
 static void __intel_gt_set_wedged(struct intel_gt *gt)
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c 
b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 8b7cc637c432..9c2c605d7a92 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -400,7 +400,7 @@ static void reset_cancel(struct intel_engine_cs *engine)
 
/* Mark all submitted requests as skipped. */
list_for_each_entry(request, >active.requests, sched.link)
-   i915_request_mark_eio(request);
+   i915_request_put(i915_request_mark_eio(request));
intel_engine_signal_breadcrumbs(engine);
 
/* Remaining _unready_ requests will be nop'ed when submitted */
diff --git a/drivers/gpu/drm/i915/gt/mock_

[Intel-gfx] [PATCH 17/57] drm/i915: Extract request suspension from the execlists

2021-02-01 Thread Chris Wilson
Make the ability to suspend and resume a request and its dependents
generic.

Signed-off-by: Chris Wilson 
---
 .../drm/i915/gt/intel_execlists_submission.c  | 167 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |   8 +-
 drivers/gpu/drm/i915/i915_scheduler.c | 153 
 drivers/gpu/drm/i915/i915_scheduler.h |  10 ++
 4 files changed, 169 insertions(+), 169 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index b6dea80da533..853021314786 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -1921,169 +1921,6 @@ static void post_process_csb(struct i915_request **port,
execlists_schedule_out(*port++);
 }
 
-static void __execlists_hold(struct i915_request *rq)
-{
-   LIST_HEAD(list);
-
-   do {
-   struct i915_dependency *p;
-
-   if (i915_request_is_active(rq))
-   __i915_request_unsubmit(rq);
-
-   clear_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags);
-   list_move_tail(>sched.link, >engine->active.hold);
-   i915_request_set_hold(rq);
-   RQ_TRACE(rq, "on hold\n");
-
-   for_each_waiter(p, rq) {
-   struct i915_request *w =
-   container_of(p->waiter, typeof(*w), sched);
-
-   if (p->flags & I915_DEPENDENCY_WEAK)
-   continue;
-
-   /* Leave semaphores spinning on the other engines */
-   if (w->engine != rq->engine)
-   continue;
-
-   if (!i915_request_is_ready(w))
-   continue;
-
-   if (__i915_request_is_complete(w))
-   continue;
-
-   if (i915_request_on_hold(w))
-   continue;
-
-   list_move_tail(>sched.link, );
-   }
-
-   rq = list_first_entry_or_null(, typeof(*rq), sched.link);
-   } while (rq);
-}
-
-static bool execlists_hold(struct intel_engine_cs *engine,
-  struct i915_request *rq)
-{
-   if (i915_request_on_hold(rq))
-   return false;
-
-   spin_lock_irq(>active.lock);
-
-   if (__i915_request_is_complete(rq)) { /* too late! */
-   rq = NULL;
-   goto unlock;
-   }
-
-   /*
-* Transfer this request onto the hold queue to prevent it
-* being resumbitted to HW (and potentially completed) before we have
-* released it. Since we may have already submitted following
-* requests, we need to remove those as well.
-*/
-   GEM_BUG_ON(i915_request_on_hold(rq));
-   GEM_BUG_ON(rq->engine != engine);
-   __execlists_hold(rq);
-   GEM_BUG_ON(list_empty(>active.hold));
-
-unlock:
-   spin_unlock_irq(>active.lock);
-   return rq;
-}
-
-static bool hold_request(const struct i915_request *rq)
-{
-   struct i915_dependency *p;
-   bool result = false;
-
-   /*
-* If one of our ancestors is on hold, we must also be on hold,
-* otherwise we will bypass it and execute before it.
-*/
-   rcu_read_lock();
-   for_each_signaler(p, rq) {
-   const struct i915_request *s =
-   container_of(p->signaler, typeof(*s), sched);
-
-   if (s->engine != rq->engine)
-   continue;
-
-   result = i915_request_on_hold(s);
-   if (result)
-   break;
-   }
-   rcu_read_unlock();
-
-   return result;
-}
-
-static void __execlists_unhold(struct i915_request *rq)
-{
-   LIST_HEAD(list);
-
-   do {
-   struct i915_dependency *p;
-
-   RQ_TRACE(rq, "hold release\n");
-
-   GEM_BUG_ON(!i915_request_on_hold(rq));
-   GEM_BUG_ON(!i915_sw_fence_signaled(>submit));
-
-   i915_request_clear_hold(rq);
-   list_move_tail(>sched.link,
-  i915_sched_lookup_priolist(rq->engine,
- rq_prio(rq)));
-   set_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags);
-
-   /* Also release any children on this engine that are ready */
-   for_each_waiter(p, rq) {
-   struct i915_request *w =
-   container_of(p->waiter, typeof(*w), sched);
-
-   if (p->flags & I915_DEPENDENCY_WEAK)
-   continue;
-
-   /* Propagate any change in error status */
-   if (r

[Intel-gfx] [PATCH 29/57] drm/i915: Move scheduler flags

2021-02-01 Thread Chris Wilson
Start extracting the scheduling flags from the engine. We begin with its
own existence.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/intel_engine.h|  6 ++
 drivers/gpu/drm/i915/gt/intel_engine_types.h  | 21 +++
 .../drm/i915/gt/intel_execlists_submission.c  |  6 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  2 +-
 drivers/gpu/drm/i915/i915_request.h   |  2 +-
 drivers/gpu/drm/i915/i915_scheduler.c |  2 +-
 drivers/gpu/drm/i915/i915_scheduler_types.h   | 10 +
 7 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
b/drivers/gpu/drm/i915/gt/intel_engine.h
index c530839627bb..4f0163457aed 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -261,6 +261,12 @@ intel_engine_has_heartbeat(const struct intel_engine_cs 
*engine)
return READ_ONCE(engine->props.heartbeat_interval_ms);
 }
 
+static inline bool
+intel_engine_has_scheduler(struct intel_engine_cs *engine)
+{
+   return i915_sched_is_active(intel_engine_get_scheduler(engine));
+}
+
 static inline void
 intel_engine_kick_scheduler(struct intel_engine_cs *engine)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 6b0bde292916..a3024a0de1de 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -440,14 +440,13 @@ struct intel_engine_cs {
 
 #define I915_ENGINE_USING_CMD_PARSER BIT(0)
 #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
-#define I915_ENGINE_HAS_SCHEDULERBIT(2)
-#define I915_ENGINE_HAS_PREEMPTION   BIT(3)
-#define I915_ENGINE_HAS_SEMAPHORES   BIT(4)
-#define I915_ENGINE_HAS_TIMESLICES   BIT(5)
-#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(6)
-#define I915_ENGINE_IS_VIRTUAL   BIT(7)
-#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(8)
-#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(9)
+#define I915_ENGINE_HAS_PREEMPTION   BIT(2)
+#define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
+#define I915_ENGINE_HAS_TIMESLICES   BIT(4)
+#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(5)
+#define I915_ENGINE_IS_VIRTUAL   BIT(6)
+#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(7)
+#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(8)
unsigned int flags;
 
/*
@@ -530,12 +529,6 @@ intel_engine_supports_stats(const struct intel_engine_cs 
*engine)
return engine->flags & I915_ENGINE_SUPPORTS_STATS;
 }
 
-static inline bool
-intel_engine_has_scheduler(const struct intel_engine_cs *engine)
-{
-   return engine->flags & I915_ENGINE_HAS_SCHEDULER;
-}
-
 static inline bool
 intel_engine_has_preemption(const struct intel_engine_cs *engine)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index b1007e560527..3217cb4369ad 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -2913,7 +2913,6 @@ logical_ring_default_vfuncs(struct intel_engine_cs 
*engine)
 */
}
 
-   engine->flags |= I915_ENGINE_HAS_SCHEDULER;
engine->flags |= I915_ENGINE_SUPPORTS_STATS;
if (!intel_vgpu_active(engine->i915)) {
engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
@@ -2981,6 +2980,7 @@ int intel_execlists_submission_setup(struct 
intel_engine_cs *engine)
engine->sched.is_executing = execlists_is_executing;
engine->sched.show = execlists_show;
tasklet_setup(>sched.tasklet, execlists_submission_tasklet);
+   __set_bit(I915_SCHED_ACTIVE_BIT, >sched.flags);
timer_setup(>execlists.timer, execlists_timeslice, 0);
timer_setup(>execlists.preempt, execlists_preempt, 0);
 
@@ -3386,6 +3386,7 @@ intel_execlists_create_virtual(struct intel_engine_cs 
**siblings,
   unsigned int count)
 {
struct virtual_engine *ve;
+   unsigned long sched;
unsigned int n;
int err;
 
@@ -3444,6 +3445,7 @@ intel_execlists_create_virtual(struct intel_engine_cs 
**siblings,
goto err_put;
}
 
+   sched = ~0U;
for (n = 0; n < count; n++) {
struct intel_engine_cs *sibling = siblings[n];
 
@@ -3473,6 +3475,7 @@ intel_execlists_create_virtual(struct intel_engine_cs 
**siblings,
 
ve->siblings[ve->num_siblings++] = sibling;
ve->base.mask |= sibling->mask;
+   sched &= sibling->sched.flags;
 
/*
 * All physical engines must be compatible for their emission
@@ -3514,6 +3517,7 @@ intel_execlists_create_virtual(struct intel_engine_cs 
**siblings,
ve->base.name,
ve->base.mask,
ENGINE_VIRTUAL);
+   ve->base.sched.flags = sched;
 
   

[Intel-gfx] [PATCH 05/57] drm/i915: Take rcu_read_lock for querying fence's driver/timeline names

2021-02-01 Thread Chris Wilson
The name very often may be freed independently of the fence, with the
only protection being RCU. To be safe as we read the names, hold RCU.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_sw_fence.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c 
b/drivers/gpu/drm/i915/i915_sw_fence.c
index 2744558f3050..dfabf291e5cd 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -430,11 +430,13 @@ static void timer_i915_sw_fence_wake(struct timer_list *t)
if (!fence)
return;
 
+   rcu_read_lock();
pr_notice("Asynchronous wait on fence %s:%s:%llx timed out 
(hint:%ps)\n",
  cb->dma->ops->get_driver_name(cb->dma),
  cb->dma->ops->get_timeline_name(cb->dma),
  cb->dma->seqno,
  i915_sw_fence_debug_hint(fence));
+   rcu_read_unlock();
 
i915_sw_fence_set_error_once(fence, -ETIMEDOUT);
i915_sw_fence_complete(fence);
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 13/57] drm/i915/selftests: Force a rewind if at first we don't succeed

2021-02-01 Thread Chris Wilson
live_timeslice_rewind assumes a particular traversal and reordering
after the first timeslice yield. However, the outcome can be either
(A1, A2, B1) or (A1, B2, A2) depending on the path taken through the
dependency graph. So if we do not get the outcome we need at first, give
it a priority kick to force a rewind.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/selftest_execlists.c | 21 +++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c 
b/drivers/gpu/drm/i915/gt/selftest_execlists.c
index 951e2bf867e1..68e1398704a4 100644
--- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
+++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
@@ -1107,6 +1107,7 @@ static int live_timeslice_rewind(void *arg)
struct i915_request *rq[3] = {};
struct intel_context *ce;
unsigned long timeslice;
+   unsigned long timeout;
int i, err = 0;
u32 *slot;
 
@@ -1173,11 +1174,29 @@ static int live_timeslice_rewind(void *arg)
 
/* ELSP[] = { { A:rq1, A:rq2 }, { B:rq1 } } */
ENGINE_TRACE(engine, "forcing tasklet for rewind\n");
-   while (i915_request_is_active(rq[A2])) { /* semaphore yield! */
+   i = 0;
+   timeout = jiffies + HZ;
+   while (i915_request_is_active(rq[A2]) &&
+  time_before(jiffies, timeout)) { /* semaphore yield! */
/* Wait for the timeslice to kick in */
del_timer(>execlists.timer);
tasklet_hi_schedule(>execlists.tasklet);
intel_engine_flush_submission(engine);
+
+   /*
+* Unfortunately this assumes that during the
+* search of the wait tree it sees the requests
+* in a particular order. That order is not
+* strictly determined and it may pick either
+* A2 or B1 to immediately follow A1.
+*
+* Break the tie with a set-priority. This defeats
+* the goal of trying to cause a rewind with a
+* timeslice, but alas, a rewind is better than
+* none.
+*/
+   if (i++)
+   i915_request_set_priority(rq[B1], 1);
}
/* -> ELSP[] = { { A:rq1 }, { B:rq1 } } */
GEM_BUG_ON(!i915_request_is_active(rq[A1]));
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 22/57] drm/i915: Move scheduler queue

2021-02-01 Thread Chris Wilson
Extract the scheduling queue from "execlists" into the per-engine
scheduling structs, for reuse by other backends.

Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gem/i915_gem_context_types.h |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_wait.c  |  1 +
 drivers/gpu/drm/i915/gt/intel_engine_cs.c |  7 ++-
 drivers/gpu/drm/i915/gt/intel_engine_pm.c |  3 +-
 drivers/gpu/drm/i915/gt/intel_engine_types.h  | 14 -
 .../drm/i915/gt/intel_execlists_submission.c  | 29 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 ++--
 drivers/gpu/drm/i915/i915_drv.h   |  1 -
 drivers/gpu/drm/i915/i915_request.h   |  2 +-
 drivers/gpu/drm/i915/i915_scheduler.c | 57 ---
 drivers/gpu/drm/i915/i915_scheduler.h | 15 +
 drivers/gpu/drm/i915/i915_scheduler_types.h   | 14 +
 .../gpu/drm/i915/selftests/i915_scheduler.c   | 13 ++---
 13 files changed, 100 insertions(+), 69 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index 085f6a3735e8..d5bc75508048 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -19,7 +19,7 @@
 
 #include "gt/intel_context_types.h"
 
-#include "i915_scheduler.h"
+#include "i915_scheduler_types.h"
 #include "i915_sw_fence.h"
 
 struct pid;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c 
b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
index d79bf16083bd..4d1897c347b9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
@@ -13,6 +13,7 @@
 #include "dma_resv_utils.h"
 #include "i915_gem_ioctls.h"
 #include "i915_gem_object.h"
+#include "i915_scheduler.h"
 
 static long
 i915_gem_object_wait_fence(struct dma_fence *fence,
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index d7ff84d92936..4c07c6f61924 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -574,7 +574,6 @@ void intel_engine_init_execlists(struct intel_engine_cs 
*engine)
memset(execlists->inflight, 0, sizeof(execlists->inflight));
 
execlists->queue_priority_hint = INT_MIN;
-   execlists->queue = RB_ROOT_CACHED;
 }
 
 static void cleanup_status_page(struct intel_engine_cs *engine)
@@ -911,7 +910,7 @@ int intel_engines_init(struct intel_gt *gt)
  */
 void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 {
-   GEM_BUG_ON(!list_empty(>sched.requests));
+   i915_sched_fini(intel_engine_get_scheduler(engine));
tasklet_kill(>execlists.tasklet); /* flush the callback */
 
intel_breadcrumbs_free(engine->breadcrumbs);
@@ -1225,6 +1224,8 @@ void __intel_engine_flush_submission(struct 
intel_engine_cs *engine, bool sync)
  */
 bool intel_engine_is_idle(struct intel_engine_cs *engine)
 {
+   struct i915_sched *se = intel_engine_get_scheduler(engine);
+
/* More white lies, if wedged, hw state is inconsistent */
if (intel_gt_is_wedged(engine->gt))
return true;
@@ -1237,7 +1238,7 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
intel_engine_flush_submission(engine);
 
/* ELSP is empty, but there are ready requests? E.g. after reset */
-   if (!RB_EMPTY_ROOT(>execlists.queue.rb_root))
+   if (!i915_sched_is_idle(se))
return false;
 
/* Ring stopped? */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index 6372d7826bc9..3510c9236334 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -4,6 +4,7 @@
  */
 
 #include "i915_drv.h"
+#include "i915_scheduler.h"
 
 #include "intel_breadcrumbs.h"
 #include "intel_context.h"
@@ -276,7 +277,7 @@ static int __engine_park(struct intel_wakeref *wf)
if (engine->park)
engine->park(engine);
 
-   engine->execlists.no_priolist = false;
+   i915_sched_park(intel_engine_get_scheduler(engine));
 
/* While gt calls i915_vma_parked(), we have to break the lock cycle */
intel_gt_pm_put_async(engine->gt);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 0936b0699cbb..c36bdd957f8f 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -153,11 +153,6 @@ struct intel_engine_execlists {
 */
struct timer_list preempt;
 
-   /**
-* @default_priolist: priority list for I915_PRIORITY_NORMAL
-*/
-   struct i915_priolist default_priolist;
-
/**
 * @ccid: identifier for contexts submitted to this e

[Intel-gfx] [PATCH 28/57] drm/i915: Wrap i915_request_use_semaphores()

2021-02-01 Thread Chris Wilson
Wrap the query on whether the backend engine supports us emitting
semaphores to coordinate multiple requests.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_request.c | 2 +-
 drivers/gpu/drm/i915/i915_request.h | 5 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 459f727b03cd..e7b4c4bc41a6 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1141,7 +1141,7 @@ __i915_request_await_execution(struct i915_request *to,
 * immediate execution, and so we must wait until it reaches the
 * active slot.
 */
-   if (intel_engine_has_semaphores(to->engine) &&
+   if (i915_request_use_semaphores(to) &&
!i915_request_has_initial_breadcrumb(to)) {
err = __emit_semaphore_wait(to, from, from->fence.seqno - 1);
if (err < 0)
diff --git a/drivers/gpu/drm/i915/i915_request.h 
b/drivers/gpu/drm/i915/i915_request.h
index 8322f308b906..8d9e59e3cdcb 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -637,4 +637,9 @@ static inline bool i915_request_is_executing(const struct 
i915_request *rq)
return i915_request_get_scheduler(rq)->is_executing(rq);
 }
 
+static inline bool i915_request_use_semaphores(const struct i915_request *rq)
+{
+   return intel_engine_has_semaphores(rq->engine);
+}
+
 #endif /* I915_REQUEST_H */
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 11/57] drm/i915/selftests: Measure set-priority duration

2021-02-01 Thread Chris Wilson
As a topological sort, we expect it to run in linear graph time,
O(V+E). In removing the recursion, it is no longer a DFS but rather a
BFS, and performs as O(VE). Let's demonstrate how bad this is with a few
examples, and build a few test cases to verify a potential fix.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_scheduler.c |   4 +
 .../drm/i915/selftests/i915_live_selftests.h  |   1 +
 .../drm/i915/selftests/i915_perf_selftests.h  |   1 +
 .../gpu/drm/i915/selftests/i915_scheduler.c   | 672 ++
 4 files changed, 678 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/selftests/i915_scheduler.c

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index ec9da9109dc3..a56a812cbf29 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -609,6 +609,10 @@ void i915_request_show_with_schedule(struct drm_printer *m,
rcu_read_unlock();
 }
 
+#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
+#include "selftests/i915_scheduler.c"
+#endif
+
 static void i915_global_scheduler_shrink(void)
 {
kmem_cache_shrink(global.slab_dependencies);
diff --git a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h 
b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
index a92c0e9b7e6b..2200a5baa68e 100644
--- a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
+++ b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
@@ -26,6 +26,7 @@ selftest(gt_mocs, intel_mocs_live_selftests)
 selftest(gt_pm, intel_gt_pm_live_selftests)
 selftest(gt_heartbeat, intel_heartbeat_live_selftests)
 selftest(requests, i915_request_live_selftests)
+selftest(scheduler, i915_scheduler_live_selftests)
 selftest(active, i915_active_live_selftests)
 selftest(objects, i915_gem_object_live_selftests)
 selftest(mman, i915_gem_mman_live_selftests)
diff --git a/drivers/gpu/drm/i915/selftests/i915_perf_selftests.h 
b/drivers/gpu/drm/i915/selftests/i915_perf_selftests.h
index c2389f8a257d..137e35283fee 100644
--- a/drivers/gpu/drm/i915/selftests/i915_perf_selftests.h
+++ b/drivers/gpu/drm/i915/selftests/i915_perf_selftests.h
@@ -17,5 +17,6 @@
  */
 selftest(engine_cs, intel_engine_cs_perf_selftests)
 selftest(request, i915_request_perf_selftests)
+selftest(scheduler, i915_scheduler_perf_selftests)
 selftest(blt, i915_gem_object_blt_perf_selftests)
 selftest(region, intel_memory_region_perf_selftests)
diff --git a/drivers/gpu/drm/i915/selftests/i915_scheduler.c 
b/drivers/gpu/drm/i915/selftests/i915_scheduler.c
new file mode 100644
index ..d095fab2ccec
--- /dev/null
+++ b/drivers/gpu/drm/i915/selftests/i915_scheduler.c
@@ -0,0 +1,672 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#include "i915_selftest.h"
+
+#include "gt/intel_context.h"
+#include "gt/intel_gpu_commands.h"
+#include "gt/selftest_engine_heartbeat.h"
+#include "selftests/igt_spinner.h"
+#include "selftests/i915_random.h"
+
+static void scheduling_disable(struct intel_engine_cs *engine)
+{
+   engine->props.preempt_timeout_ms = 0;
+   engine->props.timeslice_duration_ms = 0;
+
+   st_engine_heartbeat_disable(engine);
+}
+
+static void scheduling_enable(struct intel_engine_cs *engine)
+{
+   st_engine_heartbeat_enable(engine);
+
+   engine->props.preempt_timeout_ms =
+   engine->defaults.preempt_timeout_ms;
+   engine->props.timeslice_duration_ms =
+   engine->defaults.timeslice_duration_ms;
+}
+
+static int first_engine(struct drm_i915_private *i915,
+   int (*chain)(struct intel_engine_cs *engine,
+unsigned long param,
+bool (*fn)(struct i915_request *rq,
+   unsigned long v,
+   unsigned long e)),
+   unsigned long param,
+   bool (*fn)(struct i915_request *rq,
+  unsigned long v, unsigned long e))
+{
+   struct intel_engine_cs *engine;
+
+   for_each_uabi_engine(engine, i915) {
+   if (!intel_engine_has_scheduler(engine))
+   continue;
+
+   return chain(engine, param, fn);
+   }
+
+   return 0;
+}
+
+static int all_engines(struct drm_i915_private *i915,
+  int (*chain)(struct intel_engine_cs *engine,
+   unsigned long param,
+   bool (*fn)(struct i915_request *rq,
+  unsigned long v,
+  unsigned long e)),
+  unsigned long param,
+  bool (*fn)(struct i915_request *rq,
+ unsigned long v, unsigned long e))
+{
+   stru

[Intel-gfx] [PATCH 19/57] drm/i915: Fix the iterative dfs for defering requests

2021-02-01 Thread Chris Wilson
The current implementation of walking the children of a deferred
requests lacks the backtracking required to reduce the dfs to linear.
Having pulled it from execlists into the common layer, we can reuse the
dfs code for priority inheritance.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_scheduler.c | 56 +++
 1 file changed, 40 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index bfd37ee801fd..694ca3a3b563 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -466,8 +466,10 @@ void i915_request_set_priority(struct i915_request *rq, 
int prio)
 void __i915_sched_defer_request(struct intel_engine_cs *engine,
struct i915_request *rq)
 {
-   struct list_head *pl;
-   LIST_HEAD(list);
+   struct list_head *pos = >sched.waiters_list;
+   const int prio = rq_prio(rq);
+   struct i915_request *rn;
+   LIST_HEAD(dfs);
 
lockdep_assert_held(>active.lock);
GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags));
@@ -477,14 +479,11 @@ void __i915_sched_defer_request(struct intel_engine_cs 
*engine,
 * to those that are waiting upon it. So we traverse its chain of
 * waiters and move any that are earlier than the request to after it.
 */
-   pl = lookup_priolist(engine, rq_prio(rq));
+   rq->sched.dfs.prev = NULL;
do {
-   struct i915_dependency *p;
-
-   GEM_BUG_ON(i915_request_is_active(rq));
-   list_move_tail(>sched.link, pl);
-
-   for_each_waiter(p, rq) {
+   list_for_each_continue(pos, >sched.waiters_list) {
+   struct i915_dependency *p =
+   list_entry(pos, typeof(*p), wait_link);
struct i915_request *w =
container_of(p->waiter, typeof(*w), sched);
 
@@ -500,19 +499,44 @@ void __i915_sched_defer_request(struct intel_engine_cs 
*engine,
   __i915_request_has_started(w) &&
   !__i915_request_is_complete(rq));
 
-   if (!i915_request_is_ready(w))
+   if (!i915_request_in_priority_queue(w))
continue;
 
-   if (rq_prio(w) < rq_prio(rq))
+   /*
+* We also need to reorder within the same priority.
+*
+* This is unlike priority-inheritance, where if the
+* signaler already has a higher priority [earlier
+* deadline] than us, we can ignore as it will be
+* scheduled first. If a waiter already has the
+* same priority, we still have to push it to the end
+* of the list. This unfortunately means we cannot
+* use the rq_deadline() itself as a 'visited' bit.
+*/
+   if (rq_prio(w) < prio)
continue;
 
-   GEM_BUG_ON(rq_prio(w) > rq_prio(rq));
-   GEM_BUG_ON(i915_request_is_active(w));
-   list_move_tail(>sched.link, );
+   GEM_BUG_ON(rq_prio(w) != prio);
+
+   /* Remember our position along this branch */
+   rq = stack_push(w, rq, pos);
+   pos = >sched.waiters_list;
}
 
-   rq = list_first_entry_or_null(, typeof(*rq), sched.link);
-   } while (rq);
+   /* Note list is reversed for waiters wrt signal hierarchy */
+   GEM_BUG_ON(rq->engine != engine);
+   GEM_BUG_ON(!i915_request_in_priority_queue(rq));
+   list_move(>sched.link, );
+
+   /* Track our visit, and prevent duplicate processing */
+   clear_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags);
+   } while ((rq = stack_pop(rq, )));
+
+   pos = lookup_priolist(engine, prio);
+   list_for_each_entry_safe(rq, rn, , sched.link) {
+   set_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags);
+   list_add_tail(>sched.link, pos);
+   }
 }
 
 static void queue_request(struct intel_engine_cs *engine,
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 02/57] drm/i915/selftests: Exercise relative mmio paths to non-privileged registers

2021-02-01 Thread Chris Wilson
Verify that context isolation is also preserved when accessing
context-local registers with relative-mmio commands.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/selftest_lrc.c | 88 --
 1 file changed, 67 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c 
b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 7bf34c439876..0524232378e4 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -910,7 +910,9 @@ create_user_vma(struct i915_address_space *vm, unsigned 
long size)
 }
 
 static struct i915_vma *
-store_context(struct intel_context *ce, struct i915_vma *scratch)
+store_context(struct intel_context *ce,
+ struct i915_vma *scratch,
+ bool relative)
 {
struct i915_vma *batch;
u32 dw, x, *cs, *hw;
@@ -939,6 +941,9 @@ store_context(struct intel_context *ce, struct i915_vma 
*scratch)
hw += LRC_STATE_OFFSET / sizeof(*hw);
do {
u32 len = hw[dw] & 0x7f;
+   u32 cmd = MI_STORE_REGISTER_MEM_GEN8;
+   u32 offset = 0;
+   u32 mask = ~0;
 
if (hw[dw] == 0) {
dw++;
@@ -950,11 +955,19 @@ store_context(struct intel_context *ce, struct i915_vma 
*scratch)
continue;
}
 
+   if (hw[dw] & MI_LRI_LRM_CS_MMIO) {
+   mask = 0xfff;
+   if (relative)
+   cmd |= MI_LRI_LRM_CS_MMIO;
+   else
+   offset = ce->engine->mmio_base;
+   }
+
dw++;
len = (len + 1) / 2;
while (len--) {
-   *cs++ = MI_STORE_REGISTER_MEM_GEN8;
-   *cs++ = hw[dw];
+   *cs++ = cmd;
+   *cs++ = (hw[dw] & mask) + offset;
*cs++ = lower_32_bits(scratch->node.start + x);
*cs++ = upper_32_bits(scratch->node.start + x);
 
@@ -993,6 +1006,7 @@ static struct i915_request *
 record_registers(struct intel_context *ce,
 struct i915_vma *before,
 struct i915_vma *after,
+bool relative,
 u32 *sema)
 {
struct i915_vma *b_before, *b_after;
@@ -1000,11 +1014,11 @@ record_registers(struct intel_context *ce,
u32 *cs;
int err;
 
-   b_before = store_context(ce, before);
+   b_before = store_context(ce, before, relative);
if (IS_ERR(b_before))
return ERR_CAST(b_before);
 
-   b_after = store_context(ce, after);
+   b_after = store_context(ce, after, relative);
if (IS_ERR(b_after)) {
rq = ERR_CAST(b_after);
goto err_before;
@@ -1074,7 +1088,8 @@ record_registers(struct intel_context *ce,
goto err_after;
 }
 
-static struct i915_vma *load_context(struct intel_context *ce, u32 poison)
+static struct i915_vma *
+load_context(struct intel_context *ce, u32 poison, bool relative)
 {
struct i915_vma *batch;
u32 dw, *cs, *hw;
@@ -1101,7 +1116,10 @@ static struct i915_vma *load_context(struct 
intel_context *ce, u32 poison)
hw = defaults;
hw += LRC_STATE_OFFSET / sizeof(*hw);
do {
+   u32 cmd = MI_INSTR(0x22, 0);
u32 len = hw[dw] & 0x7f;
+   u32 offset = 0;
+   u32 mask = ~0;
 
if (hw[dw] == 0) {
dw++;
@@ -1113,11 +1131,19 @@ static struct i915_vma *load_context(struct 
intel_context *ce, u32 poison)
continue;
}
 
+   if (hw[dw] & MI_LRI_LRM_CS_MMIO) {
+   mask = 0xfff;
+   if (relative)
+   cmd |= MI_LRI_LRM_CS_MMIO;
+   else
+   offset = ce->engine->mmio_base;
+   }
+
dw++;
+   *cs++ = cmd | len;
len = (len + 1) / 2;
-   *cs++ = MI_LOAD_REGISTER_IMM(len);
while (len--) {
-   *cs++ = hw[dw];
+   *cs++ = (hw[dw] & mask) + offset;
*cs++ = poison;
dw += 2;
}
@@ -1134,14 +1160,18 @@ static struct i915_vma *load_context(struct 
intel_context *ce, u32 poison)
return batch;
 }
 
-static int poison_registers(struct intel_context *ce, u32 poison, u32 *sema)
+static int
+poison_registers(struct intel_context *ce,
+u32 poison,
+bool relative,
+u32 *sema)
 {
struct i915_request *rq;
struct i915_vma *batch;
u32 *cs;
int err;
 
-   batch = load_context(ce, poison);
+   batch = load_context(ce, p

[Intel-gfx] [PATCH 30/57] drm/i915: Move timeslicing flag to scheduler

2021-02-01 Thread Chris Wilson
Whether a scheduler chooses to implement timeslicing is up to it, and
not an underlying property of the HW engine. The scheduler does depend
on the HW supporting preemption.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/intel_engine.h |  6 ++
 drivers/gpu/drm/i915/gt/intel_engine_types.h   | 18 --
 .../drm/i915/gt/intel_execlists_submission.c   |  9 ++---
 drivers/gpu/drm/i915/gt/selftest_execlists.c   |  2 +-
 drivers/gpu/drm/i915/i915_scheduler_types.h| 10 ++
 5 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
b/drivers/gpu/drm/i915/gt/intel_engine.h
index 4f0163457aed..ca3a9cb06328 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -279,4 +279,10 @@ intel_engine_flush_scheduler(struct intel_engine_cs 
*engine)
i915_sched_flush(intel_engine_get_scheduler(engine));
 }
 
+static inline bool
+intel_engine_has_timeslices(struct intel_engine_cs *engine)
+{
+   return i915_sched_has_timeslices(intel_engine_get_scheduler(engine));
+}
+
 #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index a3024a0de1de..96a0aec29672 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -442,11 +442,10 @@ struct intel_engine_cs {
 #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
 #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
 #define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
-#define I915_ENGINE_HAS_TIMESLICES   BIT(4)
-#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(5)
-#define I915_ENGINE_IS_VIRTUAL   BIT(6)
-#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(7)
-#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(8)
+#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(4)
+#define I915_ENGINE_IS_VIRTUAL   BIT(5)
+#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6)
+#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
unsigned int flags;
 
/*
@@ -541,15 +540,6 @@ intel_engine_has_semaphores(const struct intel_engine_cs 
*engine)
return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
 }
 
-static inline bool
-intel_engine_has_timeslices(const struct intel_engine_cs *engine)
-{
-   if (!IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION))
-   return false;
-
-   return engine->flags & I915_ENGINE_HAS_TIMESLICES;
-}
-
 static inline bool
 intel_engine_needs_breadcrumb_tasklet(const struct intel_engine_cs *engine)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 3217cb4369ad..d4b6d262265a 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -1023,7 +1023,7 @@ static bool needs_timeslice(const struct intel_engine_cs 
*engine,
 {
const struct i915_sched *se = >sched;
 
-   if (!intel_engine_has_timeslices(engine))
+   if (!i915_sched_has_timeslices(se))
return false;
 
/* If not currently active, or about to switch, wait for next event */
@@ -2918,8 +2918,6 @@ logical_ring_default_vfuncs(struct intel_engine_cs 
*engine)
engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
if (can_preempt(engine)) {
engine->flags |= I915_ENGINE_HAS_PREEMPTION;
-   if (IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION))
-   engine->flags |= I915_ENGINE_HAS_TIMESLICES;
}
}
 
@@ -2927,6 +2925,11 @@ logical_ring_default_vfuncs(struct intel_engine_cs 
*engine)
engine->emit_bb_start = gen8_emit_bb_start;
else
engine->emit_bb_start = gen8_emit_bb_start_noarb;
+
+   if (IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION) &&
+   intel_engine_has_preemption(engine))
+   __set_bit(I915_SCHED_HAS_TIMESLICES_BIT,
+ >sched.flags);
 }
 
 static void logical_ring_default_irqs(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c 
b/drivers/gpu/drm/i915/gt/selftest_execlists.c
index cfc0f4b9fbc5..147cbfd6dec0 100644
--- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
+++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
@@ -3825,7 +3825,7 @@ static unsigned int
 __select_siblings(struct intel_gt *gt,
  unsigned int class,
  struct intel_engine_cs **siblings,
- bool (*filter)(const struct intel_engine_cs *))
+ bool (*filter)(struct intel_engine_cs *))
 {
unsigned int n = 0;
unsigned int inst;
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h 
b/drivers/gpu/drm/i915/i915_scheduler_types.h
index cb1eddb7edc8..dfb29b8c2bee 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i

[Intel-gfx] [PATCH 26/57] drm/i915: Move finding the current active request to the scheduler

2021-02-01 Thread Chris Wilson
Since finding the currently active request starts by walking the
scheduler lists under the scheduler lock, move the routine to the
scheduler.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/intel_engine.h|  3 -
 drivers/gpu/drm/i915/gt/intel_engine_cs.c | 71 ++--
 .../drm/i915/gt/intel_execlists_submission.c  | 83 ++-
 drivers/gpu/drm/i915/i915_gpu_error.c | 18 ++--
 drivers/gpu/drm/i915/i915_gpu_error.h |  4 +-
 drivers/gpu/drm/i915/i915_request.c   | 71 +---
 drivers/gpu/drm/i915/i915_request.h   |  8 ++
 drivers/gpu/drm/i915/i915_scheduler.c | 50 +++
 drivers/gpu/drm/i915/i915_scheduler_types.h   |  4 +
 9 files changed, 162 insertions(+), 150 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
b/drivers/gpu/drm/i915/gt/intel_engine.h
index 52bba16c62e8..c530839627bb 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -230,9 +230,6 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine,
   ktime_t *now);
 
-struct i915_request *
-intel_engine_find_active_request(struct intel_engine_cs *engine);
-
 u32 intel_engine_context_size(struct intel_gt *gt, u8 class);
 
 void intel_engine_init_active(struct intel_engine_cs *engine,
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index b5b957283f2c..5751a529b2df 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1277,7 +1277,7 @@ bool intel_engine_can_store_dword(struct intel_engine_cs 
*engine)
}
 }
 
-static struct intel_timeline *get_timeline(struct i915_request *rq)
+static struct intel_timeline *get_timeline(const struct i915_request *rq)
 {
struct intel_timeline *tl;
 
@@ -1505,7 +1505,8 @@ static void intel_engine_print_registers(struct 
intel_engine_cs *engine,
}
 }
 
-static void print_request_ring(struct drm_printer *m, struct i915_request *rq)
+static void
+print_request_ring(struct drm_printer *m, const struct i915_request *rq)
 {
void *ring;
int size;
@@ -1590,7 +1591,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 {
struct i915_gpu_error * const error = >i915->gpu_error;
struct i915_sched *se = intel_engine_get_scheduler(engine);
-   struct i915_request *rq;
+   const struct i915_request *rq;
intel_wakeref_t wakeref;
unsigned long flags;
ktime_t dummy;
@@ -1631,8 +1632,9 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 
drm_printf(m, "\tRequests:\n");
 
+   rcu_read_lock();
spin_lock_irqsave(>lock, flags);
-   rq = intel_engine_find_active_request(engine);
+   rq = se->active_request(se);
if (rq) {
struct intel_timeline *tl = get_timeline(rq);
 
@@ -1664,6 +1666,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
}
drm_printf(m, "\tOn hold?: %lu\n", list_count(>hold));
spin_unlock_irqrestore(>lock, flags);
+   rcu_read_unlock();
 
drm_printf(m, "\tMMIO base:  0x%08x\n", engine->mmio_base);
wakeref = intel_runtime_pm_get_if_in_use(engine->uncore->rpm);
@@ -1712,66 +1715,6 @@ ktime_t intel_engine_get_busy_time(struct 
intel_engine_cs *engine, ktime_t *now)
return ktime_add(total, start);
 }
 
-static bool match_ring(struct i915_request *rq)
-{
-   u32 ring = ENGINE_READ(rq->engine, RING_START);
-
-   return ring == i915_ggtt_offset(rq->ring->vma);
-}
-
-struct i915_request *
-intel_engine_find_active_request(struct intel_engine_cs *engine)
-{
-   struct i915_sched *se = intel_engine_get_scheduler(engine);
-   struct i915_request *request, *active = NULL;
-
-   /*
-* We are called by the error capture, reset and to dump engine
-* state at random points in time. In particular, note that neither is
-* crucially ordered with an interrupt. After a hang, the GPU is dead
-* and we assume that no more writes can happen (we waited long enough
-* for all writes that were in transaction to be flushed) - adding an
-* extra delay for a recent interrupt is pointless. Hence, we do
-* not need an engine->irq_seqno_barrier() before the seqno reads.
-* At all other times, we must assume the GPU is still running, but
-* we only care about the snapshot of this moment.
-*/
-   lockdep_assert_held(>lock);
-
-   rcu_read_lock();
-   request = execlists_active(>execlists);
-   if (request) {
-   struct intel_timeline *tl = request->context->timeline;
-
-   list_for_each_entry_from_reverse(request, >requests, link) {
- 

[Intel-gfx] [PATCH 12/57] drm/i915/selftests: Exercise priority inheritance around an engine loop

2021-02-01 Thread Chris Wilson
Exercise rescheduling priority inheritance around a sequence of requests
that wrap around all the engines.

Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/selftests/i915_scheduler.c   | 225 ++
 1 file changed, 225 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/i915_scheduler.c 
b/drivers/gpu/drm/i915/selftests/i915_scheduler.c
index d095fab2ccec..acc666f755d7 100644
--- a/drivers/gpu/drm/i915/selftests/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/selftests/i915_scheduler.c
@@ -7,6 +7,7 @@
 
 #include "gt/intel_context.h"
 #include "gt/intel_gpu_commands.h"
+#include "gt/intel_ring.h"
 #include "gt/selftest_engine_heartbeat.h"
 #include "selftests/igt_spinner.h"
 #include "selftests/i915_random.h"
@@ -504,10 +505,234 @@ static int igt_priority_chains(void *arg)
return igt_schedule_chains(arg, igt_priority);
 }
 
+static struct i915_request *
+__write_timestamp(struct intel_engine_cs *engine,
+ struct drm_i915_gem_object *obj,
+ int slot,
+ struct i915_request *prev)
+{
+   struct i915_request *rq = ERR_PTR(-EINVAL);
+   bool use_64b = INTEL_GEN(engine->i915) >= 8;
+   struct intel_context *ce;
+   struct i915_vma *vma;
+   int err = 0;
+   u32 *cs;
+
+   ce = intel_context_create(engine);
+   if (IS_ERR(ce))
+   return ERR_CAST(ce);
+
+   vma = i915_vma_instance(obj, ce->vm, NULL);
+   if (IS_ERR(vma)) {
+   err = PTR_ERR(vma);
+   goto out_ce;
+   }
+
+   err = i915_vma_pin(vma, 0, 0, PIN_USER);
+   if (err)
+   goto out_ce;
+
+   rq = intel_context_create_request(ce);
+   if (IS_ERR(rq)) {
+   err = PTR_ERR(rq);
+   goto out_unpin;
+   }
+
+   i915_vma_lock(vma);
+   err = i915_vma_move_to_active(vma, rq, EXEC_OBJECT_WRITE);
+   i915_vma_unlock(vma);
+   if (err)
+   goto out_request;
+
+   if (prev) {
+   err = i915_request_await_dma_fence(rq, >fence);
+   if (err)
+   goto out_request;
+   }
+
+   if (engine->emit_init_breadcrumb) {
+   err = engine->emit_init_breadcrumb(rq);
+   if (err)
+   goto out_request;
+   }
+
+   cs = intel_ring_begin(rq, 4);
+   if (IS_ERR(cs)) {
+   err = PTR_ERR(cs);
+   goto out_request;
+   }
+
+   *cs++ = MI_STORE_REGISTER_MEM + use_64b;
+   *cs++ = i915_mmio_reg_offset(RING_TIMESTAMP(engine->mmio_base));
+   *cs++ = lower_32_bits(vma->node.start) + sizeof(u32) * slot;
+   *cs++ = upper_32_bits(vma->node.start);
+   intel_ring_advance(rq, cs);
+
+   i915_request_get(rq);
+out_request:
+   i915_request_add(rq);
+out_unpin:
+   i915_vma_unpin(vma);
+out_ce:
+   intel_context_put(ce);
+   i915_request_put(prev);
+   return err ? ERR_PTR(err) : rq;
+}
+
+static struct i915_request *create_spinner(struct drm_i915_private *i915,
+  struct igt_spinner *spin)
+{
+   struct intel_engine_cs *engine;
+
+   for_each_uabi_engine(engine, i915) {
+   struct intel_context *ce;
+   struct i915_request *rq;
+
+   if (igt_spinner_init(spin, engine->gt))
+   return ERR_PTR(-ENOMEM);
+
+   ce = intel_context_create(engine);
+   if (IS_ERR(ce))
+   return ERR_CAST(ce);
+
+   rq = igt_spinner_create_request(spin, ce, MI_NOOP);
+   intel_context_put(ce);
+   if (rq == ERR_PTR(-ENODEV))
+   continue;
+   if (IS_ERR(rq))
+   return rq;
+
+   i915_request_get(rq);
+   i915_request_add(rq);
+   return rq;
+   }
+
+   return ERR_PTR(-ENODEV);
+}
+
+static bool has_timestamp(const struct drm_i915_private *i915)
+{
+   return INTEL_GEN(i915) >= 7;
+}
+
+static int __igt_schedule_cycle(struct drm_i915_private *i915,
+   bool (*fn)(struct i915_request *rq,
+  unsigned long v, unsigned long e))
+{
+   struct intel_engine_cs *engine;
+   struct drm_i915_gem_object *obj;
+   struct igt_spinner spin;
+   struct i915_request *rq;
+   unsigned long count, n;
+   u32 *time, last;
+   int err;
+
+   /*
+* Queue a bunch of ordered requests (each waiting on the previous)
+* around the engines a couple of times. Each request will write
+* the timestamp it executes at into the scratch, with the expectation
+* that the timestamp will be in our desired execution order.
+*/
+
+   if (!i915->caps.scheduler || !has_timestamp(i915))
+   return 0;
+
+   obj = i915_g

[Intel-gfx] [PATCH 18/57] drm/i915: Extract the ability to defer and rerun a request later

2021-02-01 Thread Chris Wilson
Lift the ability to defer a request until later from execlists into the
common layer.

Signed-off-by: Chris Wilson 
---
 .../drm/i915/gt/intel_execlists_submission.c  | 57 +++--
 drivers/gpu/drm/i915/i915_scheduler.c | 63 +--
 drivers/gpu/drm/i915/i915_scheduler.h |  5 +-
 3 files changed, 67 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 853021314786..b56e321ef003 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -978,54 +978,6 @@ static void virtual_xfer_context(struct virtual_engine *ve,
}
 }
 
-static void defer_request(struct i915_request *rq, struct list_head * const pl)
-{
-   LIST_HEAD(list);
-
-   /*
-* We want to move the interrupted request to the back of
-* the round-robin list (i.e. its priority level), but
-* in doing so, we must then move all requests that were in
-* flight and were waiting for the interrupted request to
-* be run after it again.
-*/
-   do {
-   struct i915_dependency *p;
-
-   GEM_BUG_ON(i915_request_is_active(rq));
-   list_move_tail(>sched.link, pl);
-
-   for_each_waiter(p, rq) {
-   struct i915_request *w =
-   container_of(p->waiter, typeof(*w), sched);
-
-   if (p->flags & I915_DEPENDENCY_WEAK)
-   continue;
-
-   /* Leave semaphores spinning on the other engines */
-   if (w->engine != rq->engine)
-   continue;
-
-   /* No waiter should start before its signaler */
-   GEM_BUG_ON(i915_request_has_initial_breadcrumb(w) &&
-  __i915_request_has_started(w) &&
-  !__i915_request_is_complete(rq));
-
-   if (!i915_request_is_ready(w))
-   continue;
-
-   if (rq_prio(w) < rq_prio(rq))
-   continue;
-
-   GEM_BUG_ON(rq_prio(w) > rq_prio(rq));
-   GEM_BUG_ON(i915_request_is_active(w));
-   list_move_tail(>sched.link, );
-   }
-
-   rq = list_first_entry_or_null(, typeof(*rq), sched.link);
-   } while (rq);
-}
-
 static void defer_active(struct intel_engine_cs *engine)
 {
struct i915_request *rq;
@@ -1034,7 +986,14 @@ static void defer_active(struct intel_engine_cs *engine)
if (!rq)
return;
 
-   defer_request(rq, i915_sched_lookup_priolist(engine, rq_prio(rq)));
+   /*
+* We want to move the interrupted request to the back of
+* the round-robin list (i.e. its priority level), but
+* in doing so, we must then move all requests that were in
+* flight and were waiting for the interrupted request to
+* be run after it again.
+*/
+   __i915_sched_defer_request(engine, rq);
 }
 
 static bool
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 351c0c0055b5..bfd37ee801fd 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -179,8 +179,8 @@ static void assert_priolists(struct intel_engine_execlists 
* const execlists)
}
 }
 
-struct list_head *
-i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
+static struct list_head *
+lookup_priolist(struct intel_engine_cs *engine, int prio)
 {
struct intel_engine_execlists * const execlists = >execlists;
struct i915_priolist *p;
@@ -332,7 +332,7 @@ static void __i915_request_set_priority(struct i915_request 
*rq, int prio)
struct list_head *pos = >sched.signalers_list;
struct list_head *plist;
 
-   plist = i915_sched_lookup_priolist(engine, prio);
+   plist = lookup_priolist(engine, prio);
 
/*
 * Recursively bump all dependent priorities to match the new request.
@@ -463,12 +463,63 @@ void i915_request_set_priority(struct i915_request *rq, 
int prio)
spin_unlock_irqrestore(>active.lock, flags);
 }
 
+void __i915_sched_defer_request(struct intel_engine_cs *engine,
+   struct i915_request *rq)
+{
+   struct list_head *pl;
+   LIST_HEAD(list);
+
+   lockdep_assert_held(>active.lock);
+   GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags));
+
+   /*
+* When we defer a request, we must maintain its order with respect
+* to those that are waiting upon it. So we traverse its chain of
+* waiters and move any that are earlier than the request to after it.

[Intel-gfx] [PATCH 20/57] drm/i915: Wrap access to intel_engine.active

2021-02-01 Thread Chris Wilson
As we are about to shuffle the lists around to consolidate new control
objects, reduce the code movement by wrapping access to the scheduler
lists ahead of time.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c | 17 +++---
 drivers/gpu/drm/i915/gt/intel_engine_types.h  | 11 +++-
 .../drm/i915/gt/intel_execlists_submission.c  | 58 +++
 .../gpu/drm/i915/gt/intel_ring_submission.c   | 14 +++--
 drivers/gpu/drm/i915/gt/mock_engine.c |  7 ++-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |  6 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 20 ---
 drivers/gpu/drm/i915/i915_gpu_error.c |  5 +-
 drivers/gpu/drm/i915/i915_request.c   | 23 +++-
 drivers/gpu/drm/i915/i915_request.h   |  8 ++-
 drivers/gpu/drm/i915/i915_scheduler.c | 47 ---
 drivers/gpu/drm/i915/i915_scheduler_types.h   |  4 +-
 .../gpu/drm/i915/selftests/i915_scheduler.c   | 19 +++---
 13 files changed, 141 insertions(+), 98 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index e55e57b6edf6..a2916c7fcc48 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -725,6 +725,7 @@ struct measure_breadcrumb {
 static int measure_breadcrumb_dw(struct intel_context *ce)
 {
struct intel_engine_cs *engine = ce->engine;
+   struct i915_sched *se = intel_engine_get_scheduler(engine);
struct measure_breadcrumb *frame;
int dw;
 
@@ -747,11 +748,11 @@ static int measure_breadcrumb_dw(struct intel_context *ce)
frame->rq.ring = >ring;
 
mutex_lock(>timeline->mutex);
-   spin_lock_irq(>active.lock);
+   spin_lock_irq(>lock);
 
dw = engine->emit_fini_breadcrumb(>rq, frame->cs) - frame->cs;
 
-   spin_unlock_irq(>active.lock);
+   spin_unlock_irq(>lock);
mutex_unlock(>timeline->mutex);
 
GEM_BUG_ON(dw & 1); /* RING_TAIL must be qword aligned */
@@ -1627,6 +1628,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
   const char *header, ...)
 {
struct i915_gpu_error * const error = >i915->gpu_error;
+   struct i915_sched *se = intel_engine_get_scheduler(engine);
struct i915_request *rq;
intel_wakeref_t wakeref;
unsigned long flags;
@@ -1668,7 +1670,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 
drm_printf(m, "\tRequests:\n");
 
-   spin_lock_irqsave(>active.lock, flags);
+   spin_lock_irqsave(>lock, flags);
rq = intel_engine_find_active_request(engine);
if (rq) {
struct intel_timeline *tl = get_timeline(rq);
@@ -1699,8 +1701,8 @@ void intel_engine_dump(struct intel_engine_cs *engine,
hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE);
}
}
-   drm_printf(m, "\tOn hold?: %lu\n", list_count(>active.hold));
-   spin_unlock_irqrestore(>active.lock, flags);
+   drm_printf(m, "\tOn hold?: %lu\n", list_count(>hold));
+   spin_unlock_irqrestore(>lock, flags);
 
drm_printf(m, "\tMMIO base:  0x%08x\n", engine->mmio_base);
wakeref = intel_runtime_pm_get_if_in_use(engine->uncore->rpm);
@@ -1759,6 +1761,7 @@ static bool match_ring(struct i915_request *rq)
 struct i915_request *
 intel_engine_find_active_request(struct intel_engine_cs *engine)
 {
+   struct i915_sched *se = intel_engine_get_scheduler(engine);
struct i915_request *request, *active = NULL;
 
/*
@@ -1772,7 +1775,7 @@ intel_engine_find_active_request(struct intel_engine_cs 
*engine)
 * At all other times, we must assume the GPU is still running, but
 * we only care about the snapshot of this moment.
 */
-   lockdep_assert_held(>active.lock);
+   lockdep_assert_held(>lock);
 
rcu_read_lock();
request = execlists_active(>execlists);
@@ -1790,7 +1793,7 @@ intel_engine_find_active_request(struct intel_engine_cs 
*engine)
if (active)
return active;
 
-   list_for_each_entry(request, >active.requests, sched.link) {
+   list_for_each_entry(request, >requests, sched.link) {
if (__i915_request_is_complete(request))
continue;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 71ceaa5dcf40..e5637e831d28 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -329,7 +329,7 @@ struct intel_engine_cs {
 
struct intel_sseu sseu;
 
-   struct {
+   struct i915_sched {
spinlock_t lock;
struct list_head requests;
struct list_head hold; /* ready requests, but on hold 

[Intel-gfx] [PATCH 25/57] drm/i915: Move submit_request to i915_sched_engine

2021-02-01 Thread Chris Wilson
Claim the submit_request vfunc as the entry point into the scheduler
backend for ready requests.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/intel_engine_types.h |  8 
 drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 11 ++-
 drivers/gpu/drm/i915/gt/intel_reset.c|  2 +-
 drivers/gpu/drm/i915/gt/intel_ring_submission.c  |  4 ++--
 drivers/gpu/drm/i915/gt/mock_engine.c|  4 +++-
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c|  2 +-
 drivers/gpu/drm/i915/i915_request.c  |  2 +-
 drivers/gpu/drm/i915/i915_scheduler.c|  2 ++
 drivers/gpu/drm/i915/i915_scheduler_types.h  |  9 +
 drivers/gpu/drm/i915/selftests/i915_request.c|  3 +--
 10 files changed, 26 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 97fe5e395a85..6b0bde292916 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -416,14 +416,6 @@ struct intel_engine_cs {
 u32 *cs);
unsigned intemit_fini_breadcrumb_dw;
 
-   /* Pass the request to the hardware queue (e.g. directly into
-* the legacy ringbuffer or to the end of an execlist).
-*
-* This is called from an atomic context with irqs disabled; must
-* be irq safe.
-*/
-   void(*submit_request)(struct i915_request *rq);
-
/*
 * Called on signaling of a SUBMIT_FENCE, passing along the signaling
 * request down to the bonded pairs.
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index be79a352e512..33c1a833df20 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -483,7 +483,7 @@ resubmit_virtual_request(struct i915_request *rq, struct 
virtual_engine *ve)
 
clear_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags);
WRITE_ONCE(rq->engine, >base);
-   ve->base.submit_request(rq);
+   ve->base.sched.submit_request(rq);
 
spin_unlock_irq(>lock);
 }
@@ -2766,7 +2766,7 @@ static bool can_preempt(struct intel_engine_cs *engine)
 
 static void execlists_set_default_submission(struct intel_engine_cs *engine)
 {
-   engine->submit_request = i915_request_enqueue;
+   engine->sched.submit_request = i915_request_enqueue;
engine->sched.tasklet.callback = execlists_submission_tasklet;
 }
 
@@ -3227,7 +3227,7 @@ static void virtual_submit_request(struct i915_request 
*rq)
 rq->fence.context,
 rq->fence.seqno);
 
-   GEM_BUG_ON(ve->base.submit_request != virtual_submit_request);
+   GEM_BUG_ON(ve->base.sched.submit_request != virtual_submit_request);
 
spin_lock_irqsave(>lock, flags);
 
@@ -3341,12 +3341,10 @@ intel_execlists_create_virtual(struct intel_engine_cs 
**siblings,
ve->base.cops = _context_ops;
ve->base.request_alloc = execlists_request_alloc;
 
-   ve->base.submit_request = virtual_submit_request;
ve->base.bond_execute = virtual_bond_execute;
 
INIT_LIST_HEAD(virtual_queue(ve));
ve->base.execlists.queue_priority_hint = INT_MIN;
-   tasklet_setup(>base.sched.tasklet, virtual_submission_tasklet);
 
intel_context_init(>context, >base);
 
@@ -3427,6 +3425,9 @@ intel_execlists_create_virtual(struct intel_engine_cs 
**siblings,
ve->base.mask,
ENGINE_VIRTUAL);
 
+   ve->base.sched.submit_request = virtual_submit_request;
+   tasklet_setup(>base.sched.tasklet, virtual_submission_tasklet);
+
virtual_engine_initial_hint(ve);
return >context;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c 
b/drivers/gpu/drm/i915/gt/intel_reset.c
index 4a8f982a1a4f..e5cb92c7d0f8 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -820,7 +820,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt)
__intel_gt_reset(gt, ALL_ENGINES);
 
for_each_engine(engine, gt, id)
-   engine->submit_request = nop_submit_request;
+   engine->sched.submit_request = nop_submit_request;
 
/*
 * Make sure no request can slip through without getting completed by
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c 
b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 8af1bc77e15e..a7d49ea71900 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -970,12 +970,12 @@ static void gen6_bsd_submit_request(struct i915_request 
*request)
 
 static void i9xx_set_default_submission(str

[Intel-gfx] [PATCH 15/57] drm/i915: Extract request submission from execlists

2021-02-01 Thread Chris Wilson
In the process of preparing to reuse the request submission logic for
other backends, lift it out of the execlists backend. It already
operates on the common structs, so just a matter of moving and renaming.

Signed-off-by: Chris Wilson 
Reviewed-by: Tvrtko Ursulin 
---
 .../drm/i915/gt/intel_execlists_submission.c  | 55 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 30 +--
 drivers/gpu/drm/i915/i915_scheduler.c | 82 +++
 drivers/gpu/drm/i915/i915_scheduler.h |  2 +
 4 files changed, 86 insertions(+), 83 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index ea449fee8148..51044387a8a2 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -2413,59 +2413,6 @@ static void execlists_preempt(struct timer_list *timer)
execlists_kick(timer, preempt);
 }
 
-static void queue_request(struct intel_engine_cs *engine,
- struct i915_request *rq)
-{
-   GEM_BUG_ON(!list_empty(>sched.link));
-   list_add_tail(>sched.link,
- i915_sched_lookup_priolist(engine, rq_prio(rq)));
-   set_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags);
-}
-
-static bool submit_queue(struct intel_engine_cs *engine,
-const struct i915_request *rq)
-{
-   struct intel_engine_execlists *execlists = >execlists;
-
-   if (rq_prio(rq) <= execlists->queue_priority_hint)
-   return false;
-
-   execlists->queue_priority_hint = rq_prio(rq);
-   return true;
-}
-
-static bool ancestor_on_hold(const struct intel_engine_cs *engine,
-const struct i915_request *rq)
-{
-   GEM_BUG_ON(i915_request_on_hold(rq));
-   return !list_empty(>active.hold) && hold_request(rq);
-}
-
-static void execlists_submit_request(struct i915_request *request)
-{
-   struct intel_engine_cs *engine = request->engine;
-   unsigned long flags;
-
-   /* Will be called from irq-context when using foreign fences. */
-   spin_lock_irqsave(>active.lock, flags);
-
-   if (unlikely(ancestor_on_hold(engine, request))) {
-   RQ_TRACE(request, "ancestor on hold\n");
-   list_add_tail(>sched.link, >active.hold);
-   i915_request_set_hold(request);
-   } else {
-   queue_request(engine, request);
-
-   GEM_BUG_ON(RB_EMPTY_ROOT(>execlists.queue.rb_root));
-   GEM_BUG_ON(list_empty(>sched.link));
-
-   if (submit_queue(engine, request))
-   __execlists_kick(>execlists);
-   }
-
-   spin_unlock_irqrestore(>active.lock, flags);
-}
-
 static int execlists_context_pre_pin(struct intel_context *ce,
 struct i915_gem_ww_ctx *ww,
 void **vaddr)
@@ -3085,7 +3032,7 @@ static bool can_preempt(struct intel_engine_cs *engine)
 
 static void execlists_set_default_submission(struct intel_engine_cs *engine)
 {
-   engine->submit_request = execlists_submit_request;
+   engine->submit_request = i915_request_enqueue;
engine->execlists.tasklet.callback = execlists_submission_tasklet;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 06fe95250ba2..dc33e5751776 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -511,34 +511,6 @@ static int guc_request_alloc(struct i915_request *request)
return 0;
 }
 
-static inline void queue_request(struct intel_engine_cs *engine,
-struct i915_request *rq,
-int prio)
-{
-   GEM_BUG_ON(!list_empty(>sched.link));
-   list_add_tail(>sched.link,
- i915_sched_lookup_priolist(engine, prio));
-   set_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags);
-}
-
-static void guc_submit_request(struct i915_request *rq)
-{
-   struct intel_engine_cs *engine = rq->engine;
-   unsigned long flags;
-
-   /* Will be called from irq-context when using foreign fences. */
-   spin_lock_irqsave(>active.lock, flags);
-
-   queue_request(engine, rq, rq_prio(rq));
-
-   GEM_BUG_ON(RB_EMPTY_ROOT(>execlists.queue.rb_root));
-   GEM_BUG_ON(list_empty(>sched.link));
-
-   tasklet_hi_schedule(>execlists.tasklet);
-
-   spin_unlock_irqrestore(>active.lock, flags);
-}
-
 static void sanitize_hwsp(struct intel_engine_cs *engine)
 {
struct intel_timeline *tl;
@@ -607,7 +579,7 @@ static int guc_resume(struct intel_engine_cs *engine)
 
 static void guc_set_default_submission(struct intel_engine_cs *engine)
 {
-   engine->submit_request = guc_submit_request;
+ 

[Intel-gfx] [PATCH 03/57] drm/i915/selftests: Exercise cross-process context isolation

2021-02-01 Thread Chris Wilson
Verify that one context running on engine A cannot manipulate another
client's context concurrently running on engine B using unprivileged
access.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/selftest_lrc.c | 275 +
 1 file changed, 238 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c 
b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 0524232378e4..e97adf1b7729 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -911,6 +911,7 @@ create_user_vma(struct i915_address_space *vm, unsigned 
long size)
 
 static struct i915_vma *
 store_context(struct intel_context *ce,
+ struct intel_engine_cs *engine,
  struct i915_vma *scratch,
  bool relative)
 {
@@ -928,7 +929,7 @@ store_context(struct intel_context *ce,
return ERR_CAST(cs);
}
 
-   defaults = shmem_pin_map(ce->engine->default_state);
+   defaults = shmem_pin_map(engine->default_state);
if (!defaults) {
i915_gem_object_unpin_map(batch->obj);
i915_vma_put(batch);
@@ -960,7 +961,7 @@ store_context(struct intel_context *ce,
if (relative)
cmd |= MI_LRI_LRM_CS_MMIO;
else
-   offset = ce->engine->mmio_base;
+   offset = engine->mmio_base;
}
 
dw++;
@@ -979,7 +980,7 @@ store_context(struct intel_context *ce,
 
*cs++ = MI_BATCH_BUFFER_END;
 
-   shmem_unpin_map(ce->engine->default_state, defaults);
+   shmem_unpin_map(engine->default_state, defaults);
 
i915_gem_object_flush_map(batch->obj);
i915_gem_object_unpin_map(batch->obj);
@@ -1002,23 +1003,48 @@ static int move_to_active(struct i915_request *rq,
return err;
 }
 
+struct hwsp_semaphore {
+   u32 ggtt;
+   u32 *va;
+};
+
+static struct hwsp_semaphore hwsp_semaphore(struct intel_engine_cs *engine)
+{
+   struct hwsp_semaphore s;
+
+   s.va = memset32(engine->status_page.addr + 1000, 0, 1);
+   s.ggtt = (i915_ggtt_offset(engine->status_page.vma) +
+ offset_in_page(s.va));
+
+   return s;
+}
+
+static u32 *emit_noops(u32 *cs, int count)
+{
+   while (count--)
+   *cs++ = MI_NOOP;
+
+   return cs;
+}
+
 static struct i915_request *
 record_registers(struct intel_context *ce,
+struct intel_engine_cs *engine,
 struct i915_vma *before,
 struct i915_vma *after,
 bool relative,
-u32 *sema)
+const struct hwsp_semaphore *sema)
 {
struct i915_vma *b_before, *b_after;
struct i915_request *rq;
u32 *cs;
int err;
 
-   b_before = store_context(ce, before, relative);
+   b_before = store_context(ce, engine, before, relative);
if (IS_ERR(b_before))
return ERR_CAST(b_before);
 
-   b_after = store_context(ce, after, relative);
+   b_after = store_context(ce, engine, after, relative);
if (IS_ERR(b_after)) {
rq = ERR_CAST(b_after);
goto err_before;
@@ -1044,7 +1070,7 @@ record_registers(struct intel_context *ce,
if (err)
goto err_rq;
 
-   cs = intel_ring_begin(rq, 14);
+   cs = intel_ring_begin(rq, 18);
if (IS_ERR(cs)) {
err = PTR_ERR(cs);
goto err_rq;
@@ -1055,16 +1081,28 @@ record_registers(struct intel_context *ce,
*cs++ = lower_32_bits(b_before->node.start);
*cs++ = upper_32_bits(b_before->node.start);
 
-   *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
-   *cs++ = MI_SEMAPHORE_WAIT |
-   MI_SEMAPHORE_GLOBAL_GTT |
-   MI_SEMAPHORE_POLL |
-   MI_SEMAPHORE_SAD_NEQ_SDD;
-   *cs++ = 0;
-   *cs++ = i915_ggtt_offset(ce->engine->status_page.vma) +
-   offset_in_page(sema);
-   *cs++ = 0;
-   *cs++ = MI_NOOP;
+   if (sema) {
+   WRITE_ONCE(*sema->va, -1);
+
+   /* Signal the poisoner */
+   *cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
+   *cs++ = sema->ggtt;
+   *cs++ = 0;
+   *cs++ = 0;
+
+   /* Then wait for the poison to settle */
+   *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+   *cs++ = MI_SEMAPHORE_WAIT |
+   MI_SEMAPHORE_GLOBAL_GTT |
+   MI_SEMAPHORE_POLL |
+   MI_SEMAPHORE_SAD_NEQ_SDD;
+   *cs++ = 0;
+   *cs++ = sema->ggtt;
+   *cs++ = 0;
+   *cs++ = MI_NOOP;
+   } else {
+   cs = emit_noops(cs, 10);
+   }
 
*cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE;
*cs++ = MI_BATCH_BUFFER_STA

[Intel-gfx] [PATCH 27/57] drm/i915: Show execlists queues when dumping state

2021-02-01 Thread Chris Wilson
Move the scheduler pretty printer from out of the execlists register
state to and push it to the schduler.

v2: It's not common to all, so shove it out of intel_engine_cs and
split it between scheduler front/back ends

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c | 233 +-
 .../drm/i915/gt/intel_execlists_submission.c  | 174 -
 drivers/gpu/drm/i915/i915_request.c   |   6 +
 drivers/gpu/drm/i915/i915_scheduler.c | 180 ++
 drivers/gpu/drm/i915/i915_scheduler.h |   8 +
 drivers/gpu/drm/i915/i915_scheduler_types.h   |   9 +
 6 files changed, 331 insertions(+), 279 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 5751a529b2df..9ff597ef5aca 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1277,49 +1277,6 @@ bool intel_engine_can_store_dword(struct intel_engine_cs 
*engine)
}
 }
 
-static struct intel_timeline *get_timeline(const struct i915_request *rq)
-{
-   struct intel_timeline *tl;
-
-   /*
-* Even though we are holding the engine->active.lock here, there
-* is no control over the submission queue per-se and we are
-* inspecting the active state at a random point in time, with an
-* unknown queue. Play safe and make sure the timeline remains valid.
-* (Only being used for pretty printing, one extra kref shouldn't
-* cause a camel stampede!)
-*/
-   rcu_read_lock();
-   tl = rcu_dereference(rq->timeline);
-   if (!kref_get_unless_zero(>kref))
-   tl = NULL;
-   rcu_read_unlock();
-
-   return tl;
-}
-
-static int print_ring(char *buf, int sz, struct i915_request *rq)
-{
-   int len = 0;
-
-   if (!i915_request_signaled(rq)) {
-   struct intel_timeline *tl = get_timeline(rq);
-
-   len = scnprintf(buf, sz,
-   "ring:{start:%08x, hwsp:%08x, seqno:%08x, 
runtime:%llums}, ",
-   i915_ggtt_offset(rq->ring->vma),
-   tl ? tl->hwsp_offset : 0,
-   hwsp_seqno(rq),
-   
DIV_ROUND_CLOSEST_ULL(intel_context_get_total_runtime_ns(rq->context),
- 1000 * 1000));
-
-   if (tl)
-   intel_timeline_put(tl);
-   }
-
-   return len;
-}
-
 static void hexdump(struct drm_printer *m, const void *buf, size_t len)
 {
const size_t rowsize = 8 * sizeof(u32);
@@ -1349,27 +1306,15 @@ static void hexdump(struct drm_printer *m, const void 
*buf, size_t len)
}
 }
 
-static const char *repr_timer(const struct timer_list *t)
-{
-   if (!READ_ONCE(t->expires))
-   return "inactive";
-
-   if (timer_pending(t))
-   return "active";
-
-   return "expired";
-}
-
 static void intel_engine_print_registers(struct intel_engine_cs *engine,
 struct drm_printer *m)
 {
-   struct drm_i915_private *dev_priv = engine->i915;
-   struct intel_engine_execlists * const execlists = >execlists;
+   struct drm_i915_private *i915 = engine->i915;
u64 addr;
 
-   if (engine->id == RENDER_CLASS && IS_GEN_RANGE(dev_priv, 4, 7))
+   if (engine->id == RENDER_CLASS && IS_GEN_RANGE(i915, 4, 7))
drm_printf(m, "\tCCID: 0x%08x\n", ENGINE_READ(engine, CCID));
-   if (HAS_EXECLISTS(dev_priv)) {
+   if (HAS_EXECLISTS(i915)) {
drm_printf(m, "\tEL_STAT_HI: 0x%08x\n",
   ENGINE_READ(engine, RING_EXECLIST_STATUS_HI));
drm_printf(m, "\tEL_STAT_LO: 0x%08x\n",
@@ -1390,7 +1335,7 @@ static void intel_engine_print_registers(struct 
intel_engine_cs *engine,
   ENGINE_READ(engine, RING_MI_MODE) & (MODE_IDLE) ? " 
[idle]" : "");
}
 
-   if (INTEL_GEN(dev_priv) >= 6) {
+   if (INTEL_GEN(i915) >= 6) {
drm_printf(m, "\tRING_IMR:   0x%08x\n",
   ENGINE_READ(engine, RING_IMR));
drm_printf(m, "\tRING_ESR:   0x%08x\n",
@@ -1407,15 +1352,15 @@ static void intel_engine_print_registers(struct 
intel_engine_cs *engine,
addr = intel_engine_get_last_batch_head(engine);
drm_printf(m, "\tBBADDR: 0x%08x_%08x\n",
   upper_32_bits(addr), lower_32_bits(addr));
-   if (INTEL_GEN(dev_priv) >= 8)
+   if (INTEL_GEN(i915) >= 8)
addr = ENGINE_READ64(engine, RING_DMA_FADD, RING_DMA_FADD_UDW);
-   else if (INTEL_GEN(dev_priv) >= 4)
+   else if (INTEL_GEN(i915) >= 4)
ad

[Intel-gfx] [PATCH 41/57] drm/i915: Move saturated workload detection back to the context

2021-02-01 Thread Chris Wilson
When we introduced the saturated workload detection to tell us to back
off from semaphore usage [semaphores have a noticeable impact on
contended bus cycles with the CPU for some heavy workloads], we first
introduced it as a per-context tracker. This allows individual contexts
to try and optimise their own usage, but we found that with the local
tracking and the no-semaphore boosting, the first context to disable
semaphores got a massive priority boost and so would starve the rest and
all new contexts (as they started with semaphores enabled and lower
priority). Hence we moved the saturated workload detection to the
engine, and a consequence had to disable semaphores on virtual engines.

Now that we do not have semaphore priority boosting, and try to fairly
schedule irrespective of semaphore usage, we can move the tracking back
to the context and virtual engines can now utilise the faster inter-engine
synchronisation. If we see that any context fairs to use the semaphore,
because the system is oversubscribed and was busy doing something else
instead of spinning on the semaphore, we disable further usage of
semaphores with that context until it idles again. This should restrict
the semaphores to lightly utilised system where the latency between
requests is more noticeable, and curtail the bus-contention from checking
for signaled semaphores.

References: 44d89409a12e ("drm/i915: Make the semaphore saturation mask global")
Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/intel_context.c   |  3 +++
 drivers/gpu/drm/i915/gt/intel_context_types.h |  2 ++
 drivers/gpu/drm/i915/gt/intel_engine_pm.c |  2 --
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 --
 .../gpu/drm/i915/gt/intel_execlists_submission.c  | 15 ---
 drivers/gpu/drm/i915/i915_request.c   |  6 +++---
 6 files changed, 8 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index daf537d1e415..57b6bde2b736 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -344,6 +344,9 @@ static int __intel_context_active(struct i915_active 
*active)
 {
struct intel_context *ce = container_of(active, typeof(*ce), active);
 
+   CE_TRACE(ce, "active\n");
+   ce->saturated = 0;
+
intel_context_get(ce);
 
/* everything should already be activated by intel_context_pre_pin() */
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 0ea18c9e2aca..d1a35c3055a7 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -109,6 +109,8 @@ struct intel_context {
} lrc;
u32 tag; /* cookie passed to HW to track this context on submission */
 
+   intel_engine_mask_t saturated; /* submitting semaphores too late? */
+
/** stats: Context GPU engine busyness tracking. */
struct intel_context_stats {
u64 active;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index ef5064ea54e5..44948abe4bf8 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -253,8 +253,6 @@ static int __engine_park(struct intel_wakeref *wf)
struct intel_engine_cs *engine =
container_of(wf, typeof(*engine), wakeref);
 
-   engine->saturated = 0;
-
/*
 * If one and only one request is completed between pm events,
 * we know that we are inside the kernel context and it is
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index dc12cbdfda46..e94c99dee5cb 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -303,8 +303,6 @@ struct intel_engine_cs {
 
struct intel_context *kernel_context; /* pinned */
 
-   intel_engine_mask_t saturated; /* submitting semaphores too late? */
-
struct {
struct delayed_work work;
struct i915_request *systole;
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 9c929688a955..e8f192984e88 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3413,21 +3413,6 @@ intel_execlists_create_virtual(struct intel_engine_cs 
**siblings,
ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
 
-   /*
-* The decision on whether to submit a request using semaphores
-* depends on the saturated state of the engine. We only compute
-* this during HW submission of the request, and we need for this
-* state to be global

[Intel-gfx] [PATCH 23/57] drm/i915: Move tasklet from execlists to sched

2021-02-01 Thread Chris Wilson
Move the scheduling tasklists out of the execlists backend into the
per-engine scheduling bookkeeping.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/intel_engine.h| 33 +++-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c | 33 ++--
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |  2 +-
 drivers/gpu/drm/i915/gt/intel_engine_pm.c |  2 +-
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  5 --
 .../drm/i915/gt/intel_execlists_submission.c  | 82 +++
 drivers/gpu/drm/i915/gt/intel_gt_irq.c|  2 +-
 drivers/gpu/drm/i915/gt/intel_gt_requests.c   |  2 +-
 drivers/gpu/drm/i915/gt/selftest_engine_pm.c  |  2 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  | 49 +--
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |  3 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c| 13 +--
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  3 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 23 ++
 drivers/gpu/drm/i915/i915_request.c   |  2 +-
 drivers/gpu/drm/i915/i915_scheduler.c | 45 +-
 drivers/gpu/drm/i915/i915_scheduler.h | 34 
 drivers/gpu/drm/i915/i915_scheduler_types.h   |  9 ++
 drivers/gpu/drm/i915/selftests/i915_request.c | 10 +--
 .../gpu/drm/i915/selftests/i915_scheduler.c   | 24 +++---
 drivers/gpu/drm/i915/selftests/igt_spinner.c  |  2 +-
 21 files changed, 200 insertions(+), 180 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
b/drivers/gpu/drm/i915/gt/intel_engine.h
index cc2df80eb449..52bba16c62e8 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -12,6 +12,7 @@
 #include "i915_pmu.h"
 #include "i915_reg.h"
 #include "i915_request.h"
+#include "i915_scheduler.h"
 #include "i915_selftest.h"
 #include "intel_engine_types.h"
 #include "intel_gt_types.h"
@@ -123,20 +124,6 @@ execlists_active(const struct intel_engine_execlists 
*execlists)
return active;
 }
 
-static inline void
-execlists_active_lock_bh(struct intel_engine_execlists *execlists)
-{
-   local_bh_disable(); /* prevent local softirq and lock recursion */
-   tasklet_lock(>tasklet);
-}
-
-static inline void
-execlists_active_unlock_bh(struct intel_engine_execlists *execlists)
-{
-   tasklet_unlock(>tasklet);
-   local_bh_enable(); /* restore softirq, and kick ksoftirqd! */
-}
-
 static inline u32
 intel_read_status_page(const struct intel_engine_cs *engine, int reg)
 {
@@ -231,12 +218,6 @@ static inline void __intel_engine_reset(struct 
intel_engine_cs *engine,
 bool intel_engines_are_idle(struct intel_gt *gt);
 bool intel_engine_is_idle(struct intel_engine_cs *engine);
 
-void __intel_engine_flush_submission(struct intel_engine_cs *engine, bool 
sync);
-static inline void intel_engine_flush_submission(struct intel_engine_cs 
*engine)
-{
-   __intel_engine_flush_submission(engine, true);
-}
-
 void intel_engines_reset_default_submission(struct intel_gt *gt);
 
 bool intel_engine_can_store_dword(struct intel_engine_cs *engine);
@@ -283,4 +264,16 @@ intel_engine_has_heartbeat(const struct intel_engine_cs 
*engine)
return READ_ONCE(engine->props.heartbeat_interval_ms);
 }
 
+static inline void
+intel_engine_kick_scheduler(struct intel_engine_cs *engine)
+{
+   i915_sched_kick(intel_engine_get_scheduler(engine));
+}
+
+static inline void
+intel_engine_flush_scheduler(struct intel_engine_cs *engine)
+{
+   i915_sched_flush(intel_engine_get_scheduler(engine));
+}
+
 #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 4c07c6f61924..b5b957283f2c 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -911,7 +911,6 @@ int intel_engines_init(struct intel_gt *gt)
 void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 {
i915_sched_fini(intel_engine_get_scheduler(engine));
-   tasklet_kill(>execlists.tasklet); /* flush the callback */
 
intel_breadcrumbs_free(engine->breadcrumbs);
 
@@ -1194,27 +1193,6 @@ static bool ring_is_idle(struct intel_engine_cs *engine)
return idle;
 }
 
-void __intel_engine_flush_submission(struct intel_engine_cs *engine, bool sync)
-{
-   struct tasklet_struct *t = >execlists.tasklet;
-
-   if (!t->callback)
-   return;
-
-   local_bh_disable();
-   if (tasklet_trylock(t)) {
-   /* Must wait for any GPU reset in progress. */
-   if (__tasklet_is_enabled(t))
-   t->callback(t);
-   tasklet_unlock(t);
-   }
-   local_bh_enable();
-
-   /* Synchronise and wait for the tasklet on another CPU */
-   if (sync)
-   tasklet_unlock_wait(t);
-}
-
 /**
  * intel_engine_is_idle() - Report if the engine has finished process all work
  * @engine: the 

[Intel-gfx] [PATCH 01/57] drm/i915/gt: Restrict the GT clock override to just Icelake

2021-02-01 Thread Chris Wilson
It appears that Elkhart Lake uses the same clock for CTX_TIMESTAMP as
CS_TIMESTAMP, leaving Icelake as the odd one out.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3024
Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c 
b/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c
index f8c79efb1a87..09b290fe0867 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c
@@ -160,7 +160,7 @@ void intel_gt_init_clock_frequency(struct intel_gt *gt)
gt->clock_period_ns = intel_gt_clock_interval_to_ns(gt, 1);
 
/* Icelake appears to use another fixed frequency for CTX_TIMESTAMP */
-   if (IS_GEN(gt->i915, 11))
+   if (IS_ICELAKE(gt->i915))
gt->clock_period_ns = NSEC_PER_SEC / 1375;
 
GT_TRACE(gt,
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 24/57] drm/i915/gt: Only kick the scheduler on timeslice/preemption change

2021-02-01 Thread Chris Wilson
Kick the scheduler to allow it to see the timeslice duration change,
don't peek into execlists.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/sysfs_engines.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c 
b/drivers/gpu/drm/i915/gt/sysfs_engines.c
index 57ef5383dd4e..526f8402cfb7 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
@@ -9,6 +9,7 @@
 #include "i915_drv.h"
 #include "intel_engine.h"
 #include "intel_engine_heartbeat.h"
+#include "intel_engine_pm.h"
 #include "sysfs_engines.h"
 
 struct kobj_engine {
@@ -222,9 +223,8 @@ timeslice_store(struct kobject *kobj, struct kobj_attribute 
*attr,
return -EINVAL;
 
WRITE_ONCE(engine->props.timeslice_duration_ms, duration);
-
-   if (execlists_active(>execlists))
-   set_timer_ms(>execlists.timer, duration);
+   if (intel_engine_pm_is_awake(engine))
+   intel_engine_kick_scheduler(engine);
 
return count;
 }
@@ -326,9 +326,8 @@ preempt_timeout_store(struct kobject *kobj, struct 
kobj_attribute *attr,
return -EINVAL;
 
WRITE_ONCE(engine->props.preempt_timeout_ms, timeout);
-
-   if (READ_ONCE(engine->execlists.pending[0]))
-   set_timer_ms(>execlists.preempt, timeout);
+   if (intel_engine_pm_is_awake(engine))
+   intel_engine_kick_scheduler(engine);
 
return count;
 }
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 42/57] drm/i915: Bump default timeslicing quantum to 5ms

2021-02-01 Thread Chris Wilson
Primarily to smooth over differences with the guc backend that struggles
with smaller quanta, bump the default timeslicing to 5ms from 1ms.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/Kconfig.profile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.profile 
b/drivers/gpu/drm/i915/Kconfig.profile
index 35bbe2b80596..3eacea42b19f 100644
--- a/drivers/gpu/drm/i915/Kconfig.profile
+++ b/drivers/gpu/drm/i915/Kconfig.profile
@@ -90,7 +90,7 @@ config DRM_I915_STOP_TIMEOUT
 
 config DRM_I915_TIMESLICE_DURATION
int "Scheduling quantum for userspace batches (ms, jiffy granularity)"
-   default 1 # milliseconds
+   default 5 # milliseconds
help
  When two user batches of equal priority are executing, we will
  alternate execution of each batch to ensure forward progress of
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 57/57] drm/i915: Support secure dispatch on gen6/gen7

2021-02-01 Thread Chris Wilson
Re-enable secure dispatch for gen6/gen7, primarily to workaround the
command parser and overly zealous command validation on Haswell. For
example this prevents making accurate measurements using a journal for
store results from the GPU without CPU intervention.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_drv.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0e4d7998be53..54063d65d330 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1666,7 +1666,7 @@ tgl_stepping_get(struct drm_i915_private *dev_priv)
 #define HAS_LLC(dev_priv)  (INTEL_INFO(dev_priv)->has_llc)
 #define HAS_SNOOP(dev_priv)(INTEL_INFO(dev_priv)->has_snoop)
 #define HAS_EDRAM(dev_priv)((dev_priv)->edram_size_mb)
-#define HAS_SECURE_BATCHES(dev_priv) (INTEL_GEN(dev_priv) < 6)
+#define HAS_SECURE_BATCHES(dev_priv) (INTEL_GEN(dev_priv) < 8)
 #define HAS_WT(dev_priv)   HAS_EDRAM(dev_priv)
 
 #define HWS_NEEDS_PHYSICAL(dev_priv)   
(INTEL_INFO(dev_priv)->hws_needs_physical)
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 50/57] Restore "drm/i915: drop engine_pin/unpin_breadcrumbs_irq"

2021-02-01 Thread Chris Wilson
This was removed in commit 478ffad6d690 ("drm/i915: drop
engine_pin/unpin_breadcrumbs_irq") as the last user had been removed,
but now there is a promise of a new user in the next patch.

Signed-off-by: Chris Wilson 
Reviewed-by: Mika Kuoppala 
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 24 +
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.h |  3 +++
 2 files changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c 
b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 38cc42783dfb..9e67810c7767 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -310,6 +310,30 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b)
spin_unlock_irqrestore(>irq_lock, flags);
 }
 
+void intel_breadcrumbs_pin_irq(struct intel_breadcrumbs *b)
+{
+   if (GEM_DEBUG_WARN_ON(!b->irq_engine))
+   return;
+
+   spin_lock_irq(>irq_lock);
+   if (!b->irq_enabled++)
+   irq_enable(b->irq_engine);
+   GEM_BUG_ON(!b->irq_enabled); /* no overflow! */
+   spin_unlock_irq(>irq_lock);
+}
+
+void intel_breadcrumbs_unpin_irq(struct intel_breadcrumbs *b)
+{
+   if (GEM_DEBUG_WARN_ON(!b->irq_engine))
+   return;
+
+   spin_lock_irq(>irq_lock);
+   GEM_BUG_ON(!b->irq_enabled); /* no underflow! */
+   if (!--b->irq_enabled)
+   irq_disable(b->irq_engine);
+   spin_unlock_irq(>irq_lock);
+}
+
 void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
 {
if (!READ_ONCE(b->irq_armed))
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h 
b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
index 3ce5ce270b04..c2bb3a79ca9f 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
@@ -19,6 +19,9 @@ struct intel_breadcrumbs *
 intel_breadcrumbs_create(struct intel_engine_cs *irq_engine);
 void intel_breadcrumbs_free(struct intel_breadcrumbs *b);
 
+void intel_breadcrumbs_pin_irq(struct intel_breadcrumbs *b);
+void intel_breadcrumbs_unpin_irq(struct intel_breadcrumbs *b);
+
 void intel_breadcrumbs_reset(struct intel_breadcrumbs *b);
 void __intel_breadcrumbs_park(struct intel_breadcrumbs *b);
 
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 16/57] drm/i915: Extract request rewinding from execlists

2021-02-01 Thread Chris Wilson
In the process of preparing to reuse the request submission logic for
other backends, lift it out of the execlists backend.

While this operates on the common structs, we do have a bit of backend
knowledge, which is harmless for !lrc but still unsightly.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/intel_engine.h|  3 -
 .../drm/i915/gt/intel_execlists_submission.c  | 58 ++-
 drivers/gpu/drm/i915/gt/intel_lrc_reg.h   |  3 +
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |  2 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  3 +-
 drivers/gpu/drm/i915/i915_scheduler.c | 44 ++
 drivers/gpu/drm/i915/i915_scheduler.h |  3 +
 7 files changed, 56 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
b/drivers/gpu/drm/i915/gt/intel_engine.h
index 8d9184920c51..cc2df80eb449 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -137,9 +137,6 @@ execlists_active_unlock_bh(struct intel_engine_execlists 
*execlists)
local_bh_enable(); /* restore softirq, and kick ksoftirqd! */
 }
 
-struct i915_request *
-execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists);
-
 static inline u32
 intel_read_status_page(const struct intel_engine_cs *engine, int reg)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 51044387a8a2..b6dea80da533 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -359,56 +359,6 @@ assert_priority_queue(const struct i915_request *prev,
return rq_prio(prev) >= rq_prio(next);
 }
 
-static struct i915_request *
-__unwind_incomplete_requests(struct intel_engine_cs *engine)
-{
-   struct i915_request *rq, *rn, *active = NULL;
-   struct list_head *pl;
-   int prio = I915_PRIORITY_INVALID;
-
-   lockdep_assert_held(>active.lock);
-
-   list_for_each_entry_safe_reverse(rq, rn,
->active.requests,
-sched.link) {
-   if (__i915_request_is_complete(rq)) {
-   list_del_init(>sched.link);
-   continue;
-   }
-
-   __i915_request_unsubmit(rq);
-
-   GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
-   if (rq_prio(rq) != prio) {
-   prio = rq_prio(rq);
-   pl = i915_sched_lookup_priolist(engine, prio);
-   }
-   GEM_BUG_ON(RB_EMPTY_ROOT(>execlists.queue.rb_root));
-
-   list_move(>sched.link, pl);
-   set_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags);
-
-   /* Check in case we rollback so far we wrap [size/2] */
-   if (intel_ring_direction(rq->ring,
-rq->tail,
-rq->ring->tail + 8) > 0)
-   rq->context->lrc.desc |= CTX_DESC_FORCE_RESTORE;
-
-   active = rq;
-   }
-
-   return active;
-}
-
-struct i915_request *
-execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists)
-{
-   struct intel_engine_cs *engine =
-   container_of(execlists, typeof(*engine), execlists);
-
-   return __unwind_incomplete_requests(engine);
-}
-
 static void
 execlists_context_status_change(struct i915_request *rq, unsigned long status)
 {
@@ -1080,7 +1030,7 @@ static void defer_active(struct intel_engine_cs *engine)
 {
struct i915_request *rq;
 
-   rq = __unwind_incomplete_requests(engine);
+   rq = __i915_sched_rewind_requests(engine);
if (!rq)
return;
 
@@ -1292,7 +1242,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 * the preemption, some of the unwound requests may
 * complete!
 */
-   __unwind_incomplete_requests(engine);
+   __i915_sched_rewind_requests(engine);
 
last = NULL;
} else if (timeslice_expired(engine, last)) {
@@ -2287,7 +2237,7 @@ static void execlists_capture(struct intel_engine_cs 
*engine)
 * which we return it to the queue for signaling.
 *
 * By removing them from the execlists queue, we also remove the
-* requests from being processed by __unwind_incomplete_requests()
+* requests from being processed by __intel_engine_rewind_requests()
 * during the intel_engine_reset(), and so they will *not* be replayed
 * afterwards.
 *
@@ -2878,7 +2828,7 @@ static void execlists_reset_rewind(struct intel_engine_cs 
*engine, bool stalled)
/* Push back any incomplete requests for replay

[Intel-gfx] [PATCH 52/57] drm/i915/gt: Support creation of 'internal' rings

2021-02-01 Thread Chris Wilson
To support legacy ring buffer scheduling, we want a virtual ringbuffer
for each client. These rings are purely for holding the requests as they
are being constructed on the CPU and never accessed by the GPU, so they
should not be bound into the GGTT, and we can use plain old WB mapped
pages.

As they are not bound, we need to nerf a few assumptions that a rq->ring
is in the GGTT.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/intel_context.c   |  2 +-
 .../drm/i915/gt/intel_execlists_submission.c  |  2 +-
 drivers/gpu/drm/i915/gt/intel_ring.c  | 69 ---
 drivers/gpu/drm/i915/gt/intel_ring.h  | 17 -
 drivers/gpu/drm/i915/gt/intel_ring_types.h|  2 +
 drivers/gpu/drm/i915/i915_scheduler.c |  7 +-
 6 files changed, 69 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
b/drivers/gpu/drm/i915/gt/intel_context.c
index 57b6bde2b736..c7ab4ed92da4 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -258,7 +258,7 @@ int __intel_context_do_pin_ww(struct intel_context *ce,
}
 
CE_TRACE(ce, "pin ring:{start:%08x, head:%04x, tail:%04x}\n",
-i915_ggtt_offset(ce->ring->vma),
+intel_ring_address(ce->ring),
 ce->ring->head, ce->ring->tail);
 
handoff = true;
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index d9b5b6c9eb5d..ac288b180574 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3590,7 +3590,7 @@ static int print_ring(char *buf, int sz, struct 
i915_request *rq)
 
len = scnprintf(buf, sz,
"ring:{start:%08x, hwsp:%08x, seqno:%08x, 
runtime:%llums}, ",
-   i915_ggtt_offset(rq->ring->vma),
+   intel_ring_address(rq->ring),
tl ? tl->ggtt_offset : 0,
hwsp_seqno(rq),

DIV_ROUND_CLOSEST_ULL(intel_context_get_total_runtime_ns(rq->context),
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c 
b/drivers/gpu/drm/i915/gt/intel_ring.c
index aee0a77c77e0..521972c297a9 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
@@ -32,33 +32,42 @@ void __intel_ring_pin(struct intel_ring *ring)
 int intel_ring_pin(struct intel_ring *ring, struct i915_gem_ww_ctx *ww)
 {
struct i915_vma *vma = ring->vma;
-   unsigned int flags;
void *addr;
int ret;
 
if (atomic_fetch_inc(>pin_count))
return 0;
 
-   /* Ring wraparound at offset 0 sometimes hangs. No idea why. */
-   flags = PIN_OFFSET_BIAS | i915_ggtt_pin_bias(vma);
+   if (!intel_ring_is_internal(ring)) {
+   int type = i915_coherent_map_type(vma->vm->i915);
+   unsigned int pin;
 
-   if (i915_gem_object_is_stolen(vma->obj))
-   flags |= PIN_MAPPABLE;
-   else
-   flags |= PIN_HIGH;
+   /* Ring wraparound at offset 0 sometimes hangs. No idea why. */
+   pin |= PIN_OFFSET_BIAS | i915_ggtt_pin_bias(vma);
 
-   ret = i915_ggtt_pin(vma, ww, 0, flags);
-   if (unlikely(ret))
-   goto err_unpin;
+   if (i915_gem_object_is_stolen(vma->obj))
+   pin |= PIN_MAPPABLE;
+   else
+   pin |= PIN_HIGH;
 
-   if (i915_vma_is_map_and_fenceable(vma))
-   addr = (void __force *)i915_vma_pin_iomap(vma);
-   else
-   addr = i915_gem_object_pin_map(vma->obj,
-  
i915_coherent_map_type(vma->vm->i915));
-   if (IS_ERR(addr)) {
-   ret = PTR_ERR(addr);
-   goto err_ring;
+   ret = i915_ggtt_pin(vma, ww, 0, pin);
+   if (unlikely(ret))
+   goto err_unpin;
+
+   if (i915_vma_is_map_and_fenceable(vma))
+   addr = (void __force *)i915_vma_pin_iomap(vma);
+   else
+   addr = i915_gem_object_pin_map(vma->obj, type);
+   if (IS_ERR(addr)) {
+   ret = PTR_ERR(addr);
+   goto err_ring;
+   }
+   } else {
+   addr = i915_gem_object_pin_map(vma->obj, I915_MAP_WB);
+   if (IS_ERR(addr)) {
+   ret = PTR_ERR(addr);
+   goto err_ring;
+   }
}
 
i915_vma_make_unshrinkable(vma);
@@ -99,19 +108,24 @@ void intel_ring_unpin(struct intel_ring *ring)
i915_gem_ob

[Intel-gfx] [PATCH 49/57] drm/i915/gt: Use ppHWSP for unshared non-semaphore related timelines

2021-02-01 Thread Chris Wilson
When we are not using semaphores with a context/engine, we can simply
reuse the same seqno location across wraps, but we still require each
timeline to have its own address. For LRC submission, each context is
prefixed by a per-process HWSP, which provides us with a unique location
for each context-local timeline. A shared timeline that is common to
multiple contexts will continue to use a separate page.

This enables us to create position invariant contexts should we feel the
need to relocate them.

Initially they are automatically used by Broadwell/Braswell as they do
not require independent timelines.

Signed-off-by: Chris Wilson 
Cc: Joonas Lahtinen 
Reviewed-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 8508b8d701c1..f9acd9e63066 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -833,6 +833,14 @@ pinned_timeline(struct intel_context *ce, struct 
intel_engine_cs *engine)
return intel_timeline_create_from_engine(engine, page_unmask_bits(tl));
 }
 
+static struct intel_timeline *
+pphwsp_timeline(struct intel_context *ce, struct i915_vma *state)
+{
+   return __intel_timeline_create(ce->engine->gt, state,
+  I915_GEM_HWS_SEQNO_ADDR |
+  INTEL_TIMELINE_RELATIVE_CONTEXT);
+}
+
 int lrc_alloc(struct intel_context *ce, struct intel_engine_cs *engine)
 {
struct intel_ring *ring;
@@ -860,8 +868,10 @@ int lrc_alloc(struct intel_context *ce, struct 
intel_engine_cs *engine)
 */
if (unlikely(ce->timeline))
tl = pinned_timeline(ce, engine);
-   else
+   else if (intel_engine_has_semaphores(engine))
tl = intel_timeline_create(engine->gt);
+   else
+   tl = pphwsp_timeline(ce, vma);
if (IS_ERR(tl)) {
err = PTR_ERR(tl);
goto err_ring;
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 14/57] drm/i915: Improve DFS for priority inheritance

2021-02-01 Thread Chris Wilson
The core of the scheduling algorithm is that we compute the topological
order of the fence DAG. Knowing that we have a DAG, we should be able to
use a DFS to compute the topological sort in linear time. However,
during the conversion of the recursive algorithm into an iterative one,
the memoization of how far we had progressed down a branch was
forgotten. The result was that instead of running in linear time, it was
running in geometric time and could easily run for a few hundred
milliseconds given a wide enough graph, not the microseconds as required.

Signed-off-by: Chris Wilson 
Reviewed-by: Andi Shyti 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_scheduler.c   | 58 -
 drivers/gpu/drm/i915/i915_scheduler_types.h |  6 ++-
 2 files changed, 39 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index a56a812cbf29..cea5129560a5 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -242,6 +242,26 @@ void __i915_priolist_free(struct i915_priolist *p)
kmem_cache_free(global.slab_priorities, p);
 }
 
+static struct i915_request *
+stack_push(struct i915_request *rq,
+  struct i915_request *prev,
+  struct list_head *pos)
+{
+   prev->sched.dfs.pos = pos;
+   rq->sched.dfs.prev = prev;
+   return rq;
+}
+
+static struct i915_request *
+stack_pop(struct i915_request *rq,
+ struct list_head **pos)
+{
+   rq = rq->sched.dfs.prev;
+   if (rq)
+   *pos = rq->sched.dfs.pos;
+   return rq;
+}
+
 static inline bool need_preempt(int prio, int active)
 {
/*
@@ -306,11 +326,10 @@ static void ipi_priority(struct i915_request *rq, int 
prio)
 static void __i915_request_set_priority(struct i915_request *rq, int prio)
 {
struct intel_engine_cs *engine = rq->engine;
-   struct i915_request *rn;
+   struct list_head *pos = >sched.signalers_list;
struct list_head *plist;
-   LIST_HEAD(dfs);
 
-   list_add(>sched.dfs, );
+   plist = i915_sched_lookup_priolist(engine, prio);
 
/*
 * Recursively bump all dependent priorities to match the new request.
@@ -330,40 +349,31 @@ static void __i915_request_set_priority(struct 
i915_request *rq, int prio)
 * end result is a topological list of requests in reverse order, the
 * last element in the list is the request we must execute first.
 */
-   list_for_each_entry(rq, , sched.dfs) {
-   struct i915_dependency *p;
-
-   /* Also release any children on this engine that are ready */
-   GEM_BUG_ON(rq->engine != engine);
-
-   for_each_signaler(p, rq) {
+   rq->sched.dfs.prev = NULL;
+   do {
+   list_for_each_continue(pos, >sched.signalers_list) {
+   struct i915_dependency *p =
+   list_entry(pos, typeof(*p), signal_link);
struct i915_request *s =
container_of(p->signaler, typeof(*s), sched);
 
-   GEM_BUG_ON(s == rq);
-
if (rq_prio(s) >= prio)
continue;
 
if (__i915_request_is_complete(s))
continue;
 
-   if (s->engine != rq->engine) {
+   if (s->engine != engine) {
ipi_priority(s, prio);
continue;
}
 
-   list_move_tail(>sched.dfs, );
+   /* Remember our position along this branch */
+   rq = stack_push(s, rq, pos);
+   pos = >sched.signalers_list;
}
-   }
 
-   plist = i915_sched_lookup_priolist(engine, prio);
-
-   /* Fifo and depth-first replacement ensure our deps execute first */
-   list_for_each_entry_safe_reverse(rq, rn, , sched.dfs) {
-   GEM_BUG_ON(rq->engine != engine);
-
-   INIT_LIST_HEAD(>sched.dfs);
+   RQ_TRACE(rq, "set-priority:%d\n", prio);
WRITE_ONCE(rq->sched.attr.priority, prio);
 
/*
@@ -377,12 +387,13 @@ static void __i915_request_set_priority(struct 
i915_request *rq, int prio)
if (!i915_request_is_ready(rq))
continue;
 
+   GEM_BUG_ON(rq->engine != engine);
if (i915_request_in_priority_queue(rq))
list_move_tail(>sched.link, plist);
 
/* Defer (tasklet) submission until after all updates. */
kick_submission(engine, rq, prio);
-   }
+   } while ((rq = stack_pop(rq, )));
 }
 
 #define all_signalers_checked(p, rq) \
@@ -456,7 +467,6 @@ void i915_sched_no

[Intel-gfx] [PATCH 39/57] drm/i915: Extend the priority boosting for the display with a deadline

2021-02-01 Thread Chris Wilson
For a modeset/pageflip, there is a very precise deadline by which the
frame must be completed in order to hit the vblank and be shown. While
we don't pass along that exact information, we can at least inform the
scheduler that this request-chain needs to be completed asap.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/display/intel_display.c |  7 +--
 drivers/gpu/drm/i915/gem/i915_gem_object.h   |  5 +++--
 drivers/gpu/drm/i915/gem/i915_gem_wait.c | 21 
 3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
b/drivers/gpu/drm/i915/display/intel_display.c
index aca964f7ba72..ce59119ac14a 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -13702,7 +13702,8 @@ intel_prepare_plane_fb(struct drm_plane *_plane,
 
if (new_plane_state->uapi.fence) { /* explicit fencing */
i915_gem_fence_wait_priority(new_plane_state->uapi.fence,
-I915_PRIORITY_DISPLAY);
+I915_PRIORITY_DISPLAY,
+ktime_get() /* next vblank? */);
ret = i915_sw_fence_await_dma_fence(>commit_ready,
new_plane_state->uapi.fence,

i915_fence_timeout(dev_priv),
@@ -13724,7 +13725,9 @@ intel_prepare_plane_fb(struct drm_plane *_plane,
if (ret)
return ret;
 
-   i915_gem_object_wait_priority(obj, 0, I915_PRIORITY_DISPLAY);
+   i915_gem_object_wait_priority(obj, 0,
+ I915_PRIORITY_DISPLAY,
+ ktime_get() /* next vblank? */);
i915_gem_object_flush_frontbuffer(obj, ORIGIN_DIRTYFB);
 
if (!new_plane_state->uapi.fence) { /* implicit fencing */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 325766abca21..9935a2e59df0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -549,14 +549,15 @@ static inline void __start_cpu_write(struct 
drm_i915_gem_object *obj)
obj->cache_dirty = true;
 }
 
-void i915_gem_fence_wait_priority(struct dma_fence *fence, int prio);
+void i915_gem_fence_wait_priority(struct dma_fence *fence,
+ int prio, ktime_t deadline);
 
 int i915_gem_object_wait(struct drm_i915_gem_object *obj,
 unsigned int flags,
 long timeout);
 int i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
  unsigned int flags,
- int prio);
+ int prio, ktime_t deadline);
 
 void __i915_gem_object_flush_frontbuffer(struct drm_i915_gem_object *obj,
 enum fb_op_origin origin);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c 
b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
index 4d1897c347b9..162f9737965f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
@@ -92,11 +92,14 @@ i915_gem_object_wait_reservation(struct dma_resv *resv,
return timeout;
 }
 
-static void fence_set_priority(struct dma_fence *fence, int prio)
+static void
+fence_set_priority(struct dma_fence *fence, int prio, ktime_t deadline)
 {
if (dma_fence_is_signaled(fence) || !dma_fence_is_i915(fence))
return;
 
+   i915_request_set_deadline(to_request(fence),
+ i915_sched_to_ticks(deadline));
i915_request_set_priority(to_request(fence), prio);
 }
 
@@ -105,7 +108,8 @@ static inline bool __dma_fence_is_chain(const struct 
dma_fence *fence)
return fence->ops == _fence_chain_ops;
 }
 
-void i915_gem_fence_wait_priority(struct dma_fence *fence, int prio)
+void i915_gem_fence_wait_priority(struct dma_fence *fence,
+ int prio, ktime_t deadline)
 {
if (dma_fence_is_signaled(fence))
return;
@@ -118,19 +122,19 @@ void i915_gem_fence_wait_priority(struct dma_fence 
*fence, int prio)
int i;
 
for (i = 0; i < array->num_fences; i++)
-   fence_set_priority(array->fences[i], prio);
+   fence_set_priority(array->fences[i], prio, deadline);
} else if (__dma_fence_is_chain(fence)) {
struct dma_fence *iter;
 
/* The chain is ordered; if we boost the last, we boost all */
dma_fence_chain_for_each(iter, fence) {
fence_set_priority(to_dma_fence_chain(iter)->fence,
-  prio);
+  prio, dead

[Intel-gfx] [PATCH 53/57] drm/i915/gt: Use client timeline address for seqno writes

2021-02-01 Thread Chris Wilson
If we allow for per-client timelines, even with legacy ring submission,
we open the door to a world full of possiblities [scheduling and
semaphores].

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/gen2_engine_cs.c  | 72 ++-
 drivers/gpu/drm/i915/gt/gen2_engine_cs.h  |  5 +-
 drivers/gpu/drm/i915/gt/gen6_engine_cs.c  | 89 +--
 drivers/gpu/drm/i915/gt/gen8_engine_cs.c  | 23 ++---
 .../gpu/drm/i915/gt/intel_ring_submission.c   | 30 +++
 drivers/gpu/drm/i915/i915_request.h   | 13 +++
 6 files changed, 169 insertions(+), 63 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen2_engine_cs.c 
b/drivers/gpu/drm/i915/gt/gen2_engine_cs.c
index b491a64919c8..b3fff7a955f2 100644
--- a/drivers/gpu/drm/i915/gt/gen2_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/gen2_engine_cs.c
@@ -172,9 +172,77 @@ u32 *gen3_emit_breadcrumb(struct i915_request *rq, u32 *cs)
return __gen2_emit_breadcrumb(rq, cs, 16, 8);
 }
 
-u32 *gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs)
+static u32 *__gen4_emit_breadcrumb(struct i915_request *rq, u32 *cs,
+  int flush, int post)
 {
-   return __gen2_emit_breadcrumb(rq, cs, 8, 8);
+   struct intel_timeline *tl = rcu_dereference_protected(rq->timeline, 1);
+   u32 offset = __i915_request_hwsp_offset(rq);
+
+   GEM_BUG_ON(tl->mode == INTEL_TIMELINE_RELATIVE_CONTEXT);
+
+   *cs++ = MI_FLUSH;
+
+   while (flush--) {
+   *cs++ = MI_STORE_DWORD_INDEX;
+   *cs++ = I915_GEM_HWS_SCRATCH * sizeof(u32);
+   *cs++ = rq->fence.seqno;
+   }
+
+   if (intel_timeline_is_relative(tl)) {
+   offset = offset_in_page(offset);
+   while (post--) {
+   *cs++ = MI_STORE_DWORD_INDEX;
+   *cs++ = offset;
+   *cs++ = rq->fence.seqno;
+   *cs++ = MI_NOOP;
+   }
+   } else {
+   while (post--) {
+   *cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
+   *cs++ = 0;
+   *cs++ = offset;
+   *cs++ = rq->fence.seqno;
+   }
+   }
+
+   *cs++ = MI_USER_INTERRUPT;
+
+   rq->tail = intel_ring_offset(rq, cs);
+   assert_ring_tail_valid(rq->ring, rq->tail);
+
+   return cs;
+}
+
+u32 *gen4_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs)
+{
+   return __gen4_emit_breadcrumb(rq, cs, 8, 8);
+}
+
+int gen4_emit_init_breadcrumb_xcs(struct i915_request *rq)
+{
+   struct intel_timeline *tl = i915_request_timeline(rq);
+   u32 *cs;
+
+   GEM_BUG_ON(i915_request_has_initial_breadcrumb(rq));
+   if (!intel_timeline_has_initial_breadcrumb(tl))
+   return 0;
+
+   cs = intel_ring_begin(rq, 4);
+   if (IS_ERR(cs))
+   return PTR_ERR(cs);
+
+   *cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
+   *cs++ = 0;
+   *cs++ = __i915_request_hwsp_offset(rq);
+   *cs++ = rq->fence.seqno - 1;
+
+   intel_ring_advance(rq, cs);
+
+   /* Record the updated position of the request's payload */
+   rq->infix = intel_ring_offset(rq, cs);
+
+   __set_bit(I915_FENCE_FLAG_INITIAL_BREADCRUMB, >fence.flags);
+   return 0;
 }
 
 /* Just userspace ABI convention to limit the wa batch bo to a resonable size 
*/
diff --git a/drivers/gpu/drm/i915/gt/gen2_engine_cs.h 
b/drivers/gpu/drm/i915/gt/gen2_engine_cs.h
index a5cd64a65c9e..ba7567b15229 100644
--- a/drivers/gpu/drm/i915/gt/gen2_engine_cs.h
+++ b/drivers/gpu/drm/i915/gt/gen2_engine_cs.h
@@ -16,7 +16,10 @@ int gen4_emit_flush_rcs(struct i915_request *rq, u32 mode);
 int gen4_emit_flush_vcs(struct i915_request *rq, u32 mode);
 
 u32 *gen3_emit_breadcrumb(struct i915_request *rq, u32 *cs);
-u32 *gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs);
+u32 *gen4_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs);
+
+u32 *gen4_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs);
+int gen4_emit_init_breadcrumb_xcs(struct i915_request *rq);
 
 int i830_emit_bb_start(struct i915_request *rq,
   u64 offset, u32 len,
diff --git a/drivers/gpu/drm/i915/gt/gen6_engine_cs.c 
b/drivers/gpu/drm/i915/gt/gen6_engine_cs.c
index 2f59dd3bdc18..14cab4c726ce 100644
--- a/drivers/gpu/drm/i915/gt/gen6_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/gen6_engine_cs.c
@@ -141,6 +141,12 @@ int gen6_emit_flush_rcs(struct i915_request *rq, u32 mode)
 
 u32 *gen6_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs)
 {
+   struct intel_timeline *tl = rcu_dereference_protected(rq->timeline, 1);
+   u32 offset = __i915_request_hwsp_offset(rq);
+   unsigned int flags;
+
+   GEM_BUG_ON(tl->mode == INTEL_TIMELINE_RELATIVE_CONTEXT);
+
/* First we do the gen6_emit_post_sync_nonzero_flush w/a */
*cs++ = GFX_OP_PIPE_CONTROL(4);
*cs++ = PIPE_CONTROL_CS_S

[Intel-gfx] [PATCH 33/57] drm/i915: Move busywaiting control to the scheduler

2021-02-01 Thread Chris Wilson
Busy-waiting is used for preempt-to-busy by schedulers, if they so
choose. Since it is not a property of the engine, but that of the
submission backend, move the flag from out of the engine to
i915_sched_engine.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/gen8_engine_cs.c  |  4 ++--
 .../drm/i915/gt/intel_execlists_submission.c  |  6 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c| 19 +--
 drivers/gpu/drm/i915/i915_request.h   |  5 +
 drivers/gpu/drm/i915/i915_scheduler_types.h   |  6 ++
 5 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c 
b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
index cac80af7ad1c..8791e03ebe61 100644
--- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
@@ -507,7 +507,7 @@ gen8_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 
*cs)
*cs++ = MI_USER_INTERRUPT;
 
*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
-   if (intel_engine_has_semaphores(rq->engine))
+   if (i915_request_use_busywait(rq))
cs = emit_preempt_busywait(rq, cs);
 
rq->tail = intel_ring_offset(rq, cs);
@@ -599,7 +599,7 @@ gen12_emit_fini_breadcrumb_tail(struct i915_request *rq, 
u32 *cs)
*cs++ = MI_USER_INTERRUPT;
 
*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
-   if (intel_engine_has_semaphores(rq->engine))
+   if (i915_request_use_busywait(rq))
cs = gen12_emit_preempt_busywait(rq, cs);
 
rq->tail = intel_ring_offset(rq, cs);
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index d4b6d262265a..9245499d2082 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -304,7 +304,7 @@ static bool need_preempt(const struct intel_engine_cs 
*engine,
const struct i915_sched *se = >sched;
int last_prio;
 
-   if (!intel_engine_has_semaphores(engine))
+   if (!i915_sched_use_busywait(se))
return false;
 
/*
@@ -2930,6 +2930,10 @@ logical_ring_default_vfuncs(struct intel_engine_cs 
*engine)
intel_engine_has_preemption(engine))
__set_bit(I915_SCHED_HAS_TIMESLICES_BIT,
  >sched.flags);
+
+   if (intel_engine_has_preemption(engine))
+   __set_bit(I915_SCHED_USE_BUSYWAIT_BIT,
+ >sched.flags);
 }
 
 static void logical_ring_default_irqs(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c 
b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 279091e41b41..6d73add47109 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -679,9 +679,11 @@ static int live_lrc_gpr(void *arg)
if (err)
goto err;
 
-   err = __live_lrc_gpr(engine, scratch, true);
-   if (err)
-   goto err;
+   if (intel_engine_has_preemption(engine)) {
+   err = __live_lrc_gpr(engine, scratch, true);
+   if (err)
+   goto err;
+   }
 
 err:
st_engine_heartbeat_enable(engine);
@@ -859,9 +861,11 @@ static int live_lrc_timestamp(void *arg)
if (err)
break;
 
-   err = __lrc_timestamp(, true);
-   if (err)
-   break;
+   if (intel_engine_has_preemption(data.engine)) {
+   err = __lrc_timestamp(, true);
+   if (err)
+   break;
+   }
}
 
 err:
@@ -1508,6 +1512,9 @@ static int live_lrc_isolation(void *arg)
skip_isolation(engine))
continue;
 
+   if (!intel_engine_has_preemption(engine))
+   continue;
+
intel_engine_pm_get(engine);
for (i = 0; i < ARRAY_SIZE(poison); i++) {
int result;
diff --git a/drivers/gpu/drm/i915/i915_request.h 
b/drivers/gpu/drm/i915/i915_request.h
index 8eea25cb043e..7c29d33e7d51 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -642,4 +642,9 @@ static inline bool i915_request_use_semaphores(const struct 
i915_request *rq)
return intel_engine_has_semaphores(rq->engine);
 }
 
+static inline bool i915_request_use_busywait(const struct i915_request *rq)
+{
+   return i915_sched_use_busywait(i915_request_get_scheduler(rq));
+}
+
 #endif /* I915_REQUEST_H */
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h 
b/drivers/gpu/drm/i915/i915_scheduler_types.h
index b4a0e4e26bfd..37475024c0de 100644

[Intel-gfx] [PATCH 55/57] drm/i915/gt: Implement ring scheduler for gen4-7

2021-02-01 Thread Chris Wilson
A key prolem with legacy ring buffer submission is that it is an inheret
FIFO queue across all clients; if one blocks, they all block. A
scheduler allows us to avoid that limitation, and ensures that all
clients can submit in parallel, removing the resource contention of the
global ringbuffer.

Having built the ring scheduler infrastructure over top of the global
ringbuffer submission, we now need to provide the HW knowledge required
to build command packets and implement context switching.

Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gt/intel_ring_scheduler.c| 459 +-
 drivers/gpu/drm/i915/i915_reg.h   |  10 +
 2 files changed, 466 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c 
b/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
index b6fcb18ef0a6..46372116011b 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
@@ -7,7 +7,12 @@
 
 #include 
 
+#include "gen2_engine_cs.h"
+#include "gen6_engine_cs.h"
+#include "gen6_ppgtt.h"
+#include "gen7_renderclear.h"
 #include "i915_drv.h"
+#include "i915_mitigations.h"
 #include "intel_breadcrumbs.h"
 #include "intel_context.h"
 #include "intel_engine_pm.h"
@@ -182,8 +187,270 @@ static void ring_copy(struct intel_ring *dst,
memcpy(out, src->vaddr + start, end - start);
 }
 
+static void mi_set_context(struct intel_ring *ring,
+  struct intel_engine_cs *engine,
+  struct intel_context *ce,
+  u32 flags)
+{
+   struct drm_i915_private *i915 = engine->i915;
+   enum intel_engine_id id;
+   const int num_engines =
+   IS_HASWELL(i915) ? engine->gt->info.num_engines - 1 : 0;
+   int len;
+   u32 *cs;
+
+   len = 4;
+   if (IS_GEN(i915, 7))
+   len += 2 + (num_engines ? 4 * num_engines + 6 : 0);
+   else if (IS_GEN(i915, 5))
+   len += 2;
+
+   cs = ring_map_dw(ring, len);
+
+   /* WaProgramMiArbOnOffAroundMiSetContext:ivb,vlv,hsw,bdw,chv */
+   if (IS_GEN(i915, 7)) {
+   *cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE;
+   if (num_engines) {
+   struct intel_engine_cs *signaller;
+
+   *cs++ = MI_LOAD_REGISTER_IMM(num_engines);
+   for_each_engine(signaller, engine->gt, id) {
+   if (signaller == engine)
+   continue;
+
+   *cs++ = i915_mmio_reg_offset(
+  RING_PSMI_CTL(signaller->mmio_base));
+   *cs++ = _MASKED_BIT_ENABLE(
+   GEN6_PSMI_SLEEP_MSG_DISABLE);
+   }
+   }
+   } else if (IS_GEN(i915, 5)) {
+   /*
+* This w/a is only listed for pre-production ilk a/b steppings,
+* but is also mentioned for programming the powerctx. To be
+* safe, just apply the workaround; we do not use SyncFlush so
+* this should never take effect and so be a no-op!
+*/
+   *cs++ = MI_SUSPEND_FLUSH | MI_SUSPEND_FLUSH_EN;
+   }
+
+   *cs++ = MI_NOOP;
+   *cs++ = MI_SET_CONTEXT;
+   *cs++ = i915_ggtt_offset(ce->state) | flags;
+   /*
+* w/a: MI_SET_CONTEXT must always be followed by MI_NOOP
+* WaMiSetContext_Hang:snb,ivb,vlv
+*/
+   *cs++ = MI_NOOP;
+
+   if (IS_GEN(i915, 7)) {
+   if (num_engines) {
+   struct intel_engine_cs *signaller;
+   i915_reg_t last_reg = {}; /* keep gcc quiet */
+
+   *cs++ = MI_LOAD_REGISTER_IMM(num_engines);
+   for_each_engine(signaller, engine->gt, id) {
+   if (signaller == engine)
+   continue;
+
+   last_reg = RING_PSMI_CTL(signaller->mmio_base);
+   *cs++ = i915_mmio_reg_offset(last_reg);
+   *cs++ = _MASKED_BIT_DISABLE(
+   GEN6_PSMI_SLEEP_MSG_DISABLE);
+   }
+
+   /* Insert a delay before the next switch! */
+   *cs++ = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT;
+   *cs++ = i915_mmio_reg_offset(last_reg);
+   *cs++ = intel_gt_scratch_offset(engine->gt,
+   
INTEL_GT_SCRATCH_FIELD_DEFAULT);
+   *cs++ = MI_NOOP;
+   }
+   *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+   } else if (IS_GEN(i915, 5)) {
+   

[Intel-gfx] [PATCH 10/57] drm/i915: Restructure priority inheritance

2021-02-01 Thread Chris Wilson
In anticipation of wanting to be able to call pi from underneath an
engine's active.lock, rework the priority inheritance to primarily work
along an engine's priority queue, delegating any other engine that the
chain may traverse to a worker. This reduces the global spinlock from
governing the entire multi-engine priority inheritance depth-first search,
to a smaller lock on each engine around a single list on that engine.

Signed-off-by: Chris Wilson 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c |   2 +
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   3 +-
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   3 +
 drivers/gpu/drm/i915/i915_scheduler.c | 358 +++---
 drivers/gpu/drm/i915/i915_scheduler.h |   3 +
 drivers/gpu/drm/i915/i915_scheduler_types.h   |  23 +-
 6 files changed, 250 insertions(+), 142 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 44b6e8e646ed..e55e57b6edf6 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -575,6 +575,8 @@ void intel_engine_init_execlists(struct intel_engine_cs 
*engine)
 
execlists->queue_priority_hint = INT_MIN;
execlists->queue = RB_ROOT_CACHED;
+
+   i915_sched_init_ipi(>ipi);
 }
 
 static void cleanup_status_page(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c 
b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index 0b026cde9f09..48a91c0dbad6 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -114,8 +114,7 @@ static void heartbeat(struct work_struct *wrk)
 * but all other contexts, including the kernel
 * context are stuck waiting for the signal.
 */
-   } else if (intel_engine_has_scheduler(engine) &&
-  rq->sched.attr.priority < I915_PRIORITY_BARRIER) {
+   } else if (rq->sched.attr.priority < I915_PRIORITY_BARRIER) {
/*
 * Gradually raise the priority of the heartbeat to
 * give high priority work [which presumably desires
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 175695c73d5f..71ceaa5dcf40 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -20,6 +20,7 @@
 #include "i915_gem.h"
 #include "i915_pmu.h"
 #include "i915_priolist_types.h"
+#include "i915_scheduler_types.h"
 #include "i915_selftest.h"
 #include "intel_breadcrumbs_types.h"
 #include "intel_sseu.h"
@@ -257,6 +258,8 @@ struct intel_engine_execlists {
struct rb_root_cached queue;
struct rb_root_cached virtual;
 
+   struct i915_sched_ipi ipi;
+
/**
 * @csb_write: control register for Context Switch buffer
 *
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 84a55df88687..ec9da9109dc3 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -17,8 +17,6 @@ static struct i915_global_scheduler {
struct kmem_cache *slab_priorities;
 } global;
 
-static DEFINE_SPINLOCK(schedule_lock);
-
 static struct i915_sched_node *node_get(struct i915_sched_node *node)
 {
i915_request_get(container_of(node, struct i915_request, sched));
@@ -30,17 +28,124 @@ static void node_put(struct i915_sched_node *node)
i915_request_put(container_of(node, struct i915_request, sched));
 }
 
+static inline int rq_prio(const struct i915_request *rq)
+{
+   return READ_ONCE(rq->sched.attr.priority);
+}
+
+static int ipi_get_prio(struct i915_request *rq)
+{
+   if (READ_ONCE(rq->sched.ipi_priority) == I915_PRIORITY_INVALID)
+   return I915_PRIORITY_INVALID;
+
+   return xchg(>sched.ipi_priority, I915_PRIORITY_INVALID);
+}
+
+static void ipi_schedule(struct work_struct *wrk)
+{
+   struct i915_sched_ipi *ipi = container_of(wrk, typeof(*ipi), work);
+   struct i915_request *rq = xchg(>list, NULL);
+
+   do {
+   struct i915_request *rn = xchg(>sched.ipi_link, NULL);
+   int prio;
+
+   prio = ipi_get_prio(rq);
+
+   /*
+* For cross-engine scheduling to work we rely on one of two
+* things:
+*
+* a) The requests are using dma-fence fences and so will not
+* be scheduled until the previous engine is completed, and
+* so we cannot cross back onto the original engine and end up
+* queuing an earlier request after the first (due to the
+   

[Intel-gfx] [PATCH 07/57] drm/i915/gt: Move engine setup out of set_default_submission

2021-02-01 Thread Chris Wilson
Now that we no longer switch back and forth between guc and execlists,
we no longer need to restore the backend's vfunc and can leave them set
after initialisation. The only catch is that we lose the submission on
wedging and still need to reset the submit_request vfunc on unwedging.

Signed-off-by: Chris Wilson 
---
 .../drm/i915/gt/intel_execlists_submission.c  | 46 -
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  4 --
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 50 ---
 3 files changed, 44 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 45a8ac152b88..5d824e1cfcba 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3089,29 +3089,6 @@ static void execlists_set_default_submission(struct 
intel_engine_cs *engine)
engine->submit_request = execlists_submit_request;
engine->schedule = i915_schedule;
engine->execlists.tasklet.callback = execlists_submission_tasklet;
-
-   engine->reset.prepare = execlists_reset_prepare;
-   engine->reset.rewind = execlists_reset_rewind;
-   engine->reset.cancel = execlists_reset_cancel;
-   engine->reset.finish = execlists_reset_finish;
-
-   engine->park = execlists_park;
-   engine->unpark = NULL;
-
-   engine->flags |= I915_ENGINE_SUPPORTS_STATS;
-   if (!intel_vgpu_active(engine->i915)) {
-   engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
-   if (can_preempt(engine)) {
-   engine->flags |= I915_ENGINE_HAS_PREEMPTION;
-   if (IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION))
-   engine->flags |= I915_ENGINE_HAS_TIMESLICES;
-   }
-   }
-
-   if (intel_engine_has_preemption(engine))
-   engine->emit_bb_start = gen8_emit_bb_start;
-   else
-   engine->emit_bb_start = gen8_emit_bb_start_noarb;
 }
 
 static void execlists_shutdown(struct intel_engine_cs *engine)
@@ -3142,6 +3119,14 @@ logical_ring_default_vfuncs(struct intel_engine_cs 
*engine)
engine->cops = _context_ops;
engine->request_alloc = execlists_request_alloc;
 
+   engine->reset.prepare = execlists_reset_prepare;
+   engine->reset.rewind = execlists_reset_rewind;
+   engine->reset.cancel = execlists_reset_cancel;
+   engine->reset.finish = execlists_reset_finish;
+
+   engine->park = execlists_park;
+   engine->unpark = NULL;
+
engine->emit_flush = gen8_emit_flush_xcs;
engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_xcs;
@@ -3162,6 +3147,21 @@ logical_ring_default_vfuncs(struct intel_engine_cs 
*engine)
 * until a more refined solution exists.
 */
}
+
+   engine->flags |= I915_ENGINE_SUPPORTS_STATS;
+   if (!intel_vgpu_active(engine->i915)) {
+   engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
+   if (can_preempt(engine)) {
+   engine->flags |= I915_ENGINE_HAS_PREEMPTION;
+   if (IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION))
+   engine->flags |= I915_ENGINE_HAS_TIMESLICES;
+   }
+   }
+
+   if (intel_engine_has_preemption(engine))
+   engine->emit_bb_start = gen8_emit_bb_start;
+   else
+   engine->emit_bb_start = gen8_emit_bb_start_noarb;
 }
 
 static void logical_ring_default_irqs(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c 
b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 9c2c605d7a92..3cb2ce503544 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -969,14 +969,10 @@ static void gen6_bsd_submit_request(struct i915_request 
*request)
 static void i9xx_set_default_submission(struct intel_engine_cs *engine)
 {
engine->submit_request = i9xx_submit_request;
-
-   engine->park = NULL;
-   engine->unpark = NULL;
 }
 
 static void gen6_bsd_set_default_submission(struct intel_engine_cs *engine)
 {
-   i9xx_set_default_submission(engine);
engine->submit_request = gen6_bsd_submit_request;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 92688a9b6717..f72faa0b8339 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -608,35 +608,6 @@ static int guc_resume(struct intel_engine_cs *engine)
 static void guc_set_default_submission(struct intel_engine_cs *engine)
 {
engine->subm

[Intel-gfx] [PATCH 32/57] drm/i915: Move needs-breadcrumb flags to scheduler

2021-02-01 Thread Chris Wilson
Whether the scheduler depends on interrupt delivery for forward progress
is a property of the scheduler backend not of the underlying engine, so
move the flag from inside the engine to i915_sched_engine.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/intel_engine.h|  6 ++
 drivers/gpu/drm/i915/gt/intel_engine_types.h  | 13 +++--
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |  2 +-
 drivers/gpu/drm/i915/i915_scheduler_types.h   |  7 +++
 4 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
b/drivers/gpu/drm/i915/gt/intel_engine.h
index ca3a9cb06328..db5419ba1dc8 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -285,4 +285,10 @@ intel_engine_has_timeslices(struct intel_engine_cs *engine)
return i915_sched_has_timeslices(intel_engine_get_scheduler(engine));
 }
 
+static inline bool
+intel_engine_needs_breadcrumb_tasklet(struct intel_engine_cs *engine)
+{
+   return 
i915_sched_needs_breadcrumb_tasklet(intel_engine_get_scheduler(engine));
+}
+
 #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 96a0aec29672..f856bd9b7dae 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -442,10 +442,9 @@ struct intel_engine_cs {
 #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
 #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
 #define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
-#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(4)
-#define I915_ENGINE_IS_VIRTUAL   BIT(5)
-#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6)
-#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
+#define I915_ENGINE_IS_VIRTUAL   BIT(4)
+#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(5)
+#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(6)
unsigned int flags;
 
/*
@@ -540,12 +539,6 @@ intel_engine_has_semaphores(const struct intel_engine_cs 
*engine)
return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
 }
 
-static inline bool
-intel_engine_needs_breadcrumb_tasklet(const struct intel_engine_cs *engine)
-{
-   return engine->flags & I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
-}
-
 static inline bool
 intel_engine_is_virtual(const struct intel_engine_cs *engine)
 {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 887f38fb671f..e8c66d868c59 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -606,7 +606,6 @@ static void guc_default_vfuncs(struct intel_engine_cs 
*engine)
}
engine->set_default_submission = guc_set_default_submission;
 
-   engine->flags |= I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
engine->flags |= I915_ENGINE_HAS_PREEMPTION;
 
/*
@@ -656,6 +655,7 @@ int intel_guc_submission_setup(struct intel_engine_cs 
*engine)
 
tasklet_setup(>sched.tasklet, guc_submission_tasklet);
__set_bit(I915_SCHED_ACTIVE_BIT, >sched.flags);
+   __set_bit(I915_SCHED_NEEDS_BREADCRUMB_BIT, >sched.flags);
 
guc_default_vfuncs(engine);
guc_default_irqs(engine);
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h 
b/drivers/gpu/drm/i915/i915_scheduler_types.h
index dfb29b8c2bee..b4a0e4e26bfd 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -20,6 +20,7 @@ struct i915_request;
 enum {
I915_SCHED_ACTIVE_BIT = 0,
I915_SCHED_HAS_TIMESLICES_BIT,
+   I915_SCHED_NEEDS_BREADCRUMB_BIT,
 };
 
 /**
@@ -194,4 +195,10 @@ static inline bool i915_sched_has_timeslices(const struct 
i915_sched *se)
return test_bit(I915_SCHED_HAS_TIMESLICES_BIT, >flags);
 }
 
+static inline bool
+i915_sched_needs_breadcrumb_tasklet(const struct i915_sched *se)
+{
+   return test_bit(I915_SCHED_NEEDS_BREADCRUMB_BIT, >flags);
+}
+
 #endif /* _I915_SCHEDULER_TYPES_H_ */
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 38/57] drm/i915/gt: Specify a deadline for the heartbeat

2021-02-01 Thread Chris Wilson
As we know when we expect the heartbeat to be checked for completion,
pass this information along as its deadline. We still do not complain if
the deadline is missed, at least until we have tried a few times, but it
will allow for quicker hang detection on systems where deadlines are
adhered to.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c 
b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index f1811e79401e..0f0bf9e4d34f 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -65,6 +65,16 @@ static void heartbeat_commit(struct i915_request *rq,
__i915_request_queue(rq, attr);
 }
 
+static void set_heartbeat_deadline(struct intel_engine_cs *engine,
+  struct i915_request *rq)
+{
+   unsigned long interval;
+
+   interval = READ_ONCE(engine->props.heartbeat_interval_ms);
+   if (interval)
+   i915_request_set_deadline(rq, ktime_get() + (interval << 20));
+}
+
 static void show_heartbeat(const struct i915_request *rq,
   struct intel_engine_cs *engine)
 {
@@ -128,6 +138,8 @@ static void heartbeat(struct work_struct *wrk)
attr.priority = I915_PRIORITY_BARRIER;
 
local_bh_disable();
+   if (attr.priority == I915_PRIORITY_BARRIER)
+   i915_request_set_deadline(rq, 0);
i915_request_set_priority(rq, attr.priority);
local_bh_enable();
} else {
@@ -160,6 +172,7 @@ static void heartbeat(struct work_struct *wrk)
if (IS_ERR(rq))
goto unlock;
 
+   set_heartbeat_deadline(engine, rq);
heartbeat_commit(rq, );
 
 unlock:
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 40/57] drm/i915/gt: Support virtual engine queues

2021-02-01 Thread Chris Wilson
Allow multiple requests to be queued unto a virtual engine, whereas
before we only allowed a single request to be queued at a time. The
advantage of keeping just one request in the queue was to ensure that we
always decided late which engine to use. However, with the introduction
of the virtual deadline we throttle submission and still only drip one
request into the sibling at a time (unless it is truly empty, but then a
second request will have an earlier deadline than the queued virtual
engine and force itself in front). This also takes advantage that a
virtual engine will remain bound while it is active, i.e. we can not
switch to a second engine until the context is completed -- such that we
cannot be as lazy as lazy can be.

By allowing a full queue, we avoid having to synchronize via the
breadcrumb interrupt everytime, letting the virtual engine reach the
full throughput of the siblings.

Signed-off-by: Chris Wilson 
---
 .../drm/i915/gt/intel_execlists_submission.c  | 435 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  | 146 --
 drivers/gpu/drm/i915/i915_request.c   |  12 +-
 drivers/gpu/drm/i915/i915_scheduler.c |  70 ++-
 drivers/gpu/drm/i915/i915_scheduler.h |   4 +-
 5 files changed, 281 insertions(+), 386 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 31d36057c729..9c929688a955 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -160,17 +160,6 @@ struct virtual_engine {
struct intel_context context;
struct rcu_work rcu;
 
-   /*
-* We allow only a single request through the virtual engine at a time
-* (each request in the timeline waits for the completion fence of
-* the previous before being submitted). By restricting ourselves to
-* only submitting a single request, each request is placed on to a
-* physical to maximise load spreading (by virtue of the late greedy
-* scheduling -- each real engine takes the next available request
-* upon idling).
-*/
-   struct i915_request *request;
-
/*
 * We keep a rbtree of available virtual engines inside each physical
 * engine, sorted by priority. Here we preallocate the nodes we need
@@ -274,17 +263,24 @@ static struct i915_request *first_request(const struct 
i915_sched *se)
sched.link);
 }
 
-static struct i915_request *first_virtual(const struct intel_engine_cs *engine)
+static struct virtual_engine *
+first_virtual_engine(const struct intel_engine_cs *engine)
 {
-   struct rb_node *rb;
+   return rb_entry_safe(rb_first_cached(>execlists.virtual),
+struct virtual_engine,
+nodes[engine->id].rb);
+}
 
-   rb = rb_first_cached(>execlists.virtual);
-   if (!rb)
+static const struct i915_request *
+first_virtual(const struct intel_engine_cs *engine)
+{
+   struct virtual_engine *ve;
+
+   ve = first_virtual_engine(engine);
+   if (!ve)
return NULL;
 
-   return READ_ONCE(rb_entry(rb,
- struct virtual_engine,
- nodes[engine->id].rb)->request);
+   return first_request(>base.sched);
 }
 
 static const struct i915_request *
@@ -500,7 +496,7 @@ static void execlists_schedule_in(struct i915_request *rq, 
int idx)
trace_i915_request_in(rq, idx);
 
old = ce->inflight;
-   if (!old)
+   if (!__intel_context_inflight_count(old))
old = __execlists_schedule_in(rq);
WRITE_ONCE(ce->inflight, ptr_inc(old));
 
@@ -510,31 +506,43 @@ static void execlists_schedule_in(struct i915_request 
*rq, int idx)
 static void
 resubmit_virtual_request(struct i915_request *rq, struct virtual_engine *ve)
 {
-   struct i915_sched *se = i915_request_get_scheduler(rq);
+   struct i915_sched *se = intel_engine_get_scheduler(>base);
+   struct i915_sched *pv = i915_request_get_scheduler(rq);
+   struct i915_request *pos = rq;
+   struct intel_timeline *tl;
 
-   spin_lock_irq(>lock);
+   spin_lock_irq(>lock);
 
-   clear_bit(I915_FENCE_FLAG_PQUEUE, >fence.flags);
-   WRITE_ONCE(rq->engine, >base);
-   ve->base.sched.submit_request(rq);
+   if (__i915_request_is_complete(rq))
+   goto unlock;
 
-   spin_unlock_irq(>lock);
+   tl = i915_request_active_timeline(rq);
+
+   /* Rewind back to the start of this virtual engine queue */
+   list_for_each_entry_continue_reverse(rq, >requests, link) {
+   if (!i915_request_in_priority_queue(rq))
+   break;
+
+   pos = rq;
+   }
+
+   /* Resubmit the queue in execution order */
+   spin_lock(>

[Intel-gfx] [PATCH 43/57] drm/i915/gt: Delay taking irqoff for execlists submission

2021-02-01 Thread Chris Wilson
Before we take the irqsafe spinlock to dequeue requests and submit them
to HW, first do the check whether we need to take any action (i.e.
whether the HW is ready for some work, or if we need to preempt the
currently executing context) without taking the lock. We will then
likely skip taking the spinlock, and so reduce contention.

Signed-off-by: Chris Wilson 
---
 .../drm/i915/gt/intel_execlists_submission.c  | 88 ---
 1 file changed, 39 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index e8f192984e88..d4ae65af7dc1 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -1014,24 +1014,6 @@ static void virtual_xfer_context(struct virtual_engine 
*ve,
}
 }
 
-static void defer_active(struct intel_engine_cs *engine)
-{
-   struct i915_request *rq;
-
-   rq = __i915_sched_rewind_requests(engine);
-   if (!rq)
-   return;
-
-   /*
-* We want to move the interrupted request to the back of
-* the round-robin list (i.e. its priority level), but
-* in doing so, we must then move all requests that were in
-* flight and were waiting for the interrupted request to
-* be run after it again.
-*/
-   __i915_sched_defer_request(engine, rq);
-}
-
 static bool
 timeslice_yield(const struct intel_engine_execlists *el,
const struct i915_request *rq)
@@ -1312,8 +1294,6 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 * and context switches) submission.
 */
 
-   spin_lock(>lock);
-
/*
 * If the queue is higher priority than the last
 * request in the currently active context, submit afresh.
@@ -1336,24 +1316,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 rq_deadline(last),
 rq_prio(last));
record_preemption(execlists);
-
-   /*
-* Don't let the RING_HEAD advance past the breadcrumb
-* as we unwind (and until we resubmit) so that we do
-* not accidentally tell it to go backwards.
-*/
-   ring_set_paused(engine, 1);
-
-   /*
-* Note that we have not stopped the GPU at this point,
-* so we are unwinding the incomplete requests as they
-* remain inflight and so by the time we do complete
-* the preemption, some of the unwound requests may
-* complete!
-*/
-   __i915_sched_rewind_requests(engine);
-
-   last = NULL;
+   last = (void *)1;
} else if (timeslice_expired(engine, last)) {
ENGINE_TRACE(engine,
 "expired:%s last=%llx:%llu, deadline=%llu, 
now=%llu, yield?=%s\n",
@@ -1380,8 +1343,6 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 * same context again, grant it a full timeslice.
 */
cancel_timer(>timer);
-   ring_set_paused(engine, 1);
-   defer_active(engine);
 
/*
 * Unlike for preemption, if we rewind and continue
@@ -1396,7 +1357,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 * normal save/restore will preserve state and allow
 * us to later continue executing the same request.
 */
-   last = NULL;
+   last = (void *)3;
} else {
/*
 * Otherwise if we already have a request pending
@@ -1412,12 +1373,46 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 * Even if ELSP[1] is occupied and not worthy
 * of timeslices, our queue might be.
 */
-   spin_unlock(>lock);
return;
}
}
}
 
+   local_irq_disable(); /* irq remains off until after ELSP write */
+   spin_lock(>lock);
+
+   if ((unsigned long)last & 1) {
+   bool defer = (unsigned long)last & 2;
+
+   /*
+* Don't let the RING_HEAD advance past the breadcrumb
+* as we unwind (and until we resubmit) so that we do
+* not accidentally tell it to go backwards.
+*/
+

[Intel-gfx] [PATCH 34/57] drm/i915: Move preempt-reset flag to the scheduler

2021-02-01 Thread Chris Wilson
While the HW may support preemption, whether or not the scheduler
enforces preemption by forcibly resetting the current context is
ultimately up to the scheduler.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gt/intel_engine.h   | 7 ++-
 drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 5 -
 drivers/gpu/drm/i915/i915_scheduler_types.h  | 9 +
 3 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
b/drivers/gpu/drm/i915/gt/intel_engine.h
index db5419ba1dc8..33a29623571d 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -244,12 +244,9 @@ static inline bool intel_engine_uses_guc(const struct 
intel_engine_cs *engine)
 }
 
 static inline bool
-intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
+intel_engine_has_preempt_reset(struct intel_engine_cs *engine)
 {
-   if (!IS_ACTIVE(CONFIG_DRM_I915_PREEMPT_TIMEOUT))
-   return false;
-
-   return intel_engine_has_preemption(engine);
+   return i915_sched_has_preempt_reset(intel_engine_get_scheduler(engine));
 }
 
 static inline bool
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 9245499d2082..7ec33bd73d95 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -2931,9 +2931,12 @@ logical_ring_default_vfuncs(struct intel_engine_cs 
*engine)
__set_bit(I915_SCHED_HAS_TIMESLICES_BIT,
  >sched.flags);
 
-   if (intel_engine_has_preemption(engine))
+   if (intel_engine_has_preemption(engine)) {
__set_bit(I915_SCHED_USE_BUSYWAIT_BIT,
  >sched.flags);
+   __set_bit(I915_SCHED_HAS_PREEMPT_RESET_BIT,
+ >sched.flags);
+   }
 }
 
 static void logical_ring_default_irqs(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h 
b/drivers/gpu/drm/i915/i915_scheduler_types.h
index 37475024c0de..7271a0259a56 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -20,6 +20,7 @@ struct i915_request;
 enum {
I915_SCHED_ACTIVE_BIT = 0,
I915_SCHED_HAS_TIMESLICES_BIT,
+   I915_SCHED_HAS_PREEMPT_RESET_BIT,
I915_SCHED_NEEDS_BREADCRUMB_BIT,
I915_SCHED_USE_BUSYWAIT_BIT,
 };
@@ -207,4 +208,12 @@ static inline bool i915_sched_use_busywait(const struct 
i915_sched *se)
return test_bit(I915_SCHED_USE_BUSYWAIT_BIT, >flags);
 }
 
+static inline bool i915_sched_has_preempt_reset(const struct i915_sched *se)
+{
+   if (!IS_ACTIVE(CONFIG_DRM_I915_PREEMPT_TIMEOUT))
+   return false;
+
+   return test_bit(I915_SCHED_HAS_PREEMPT_RESET_BIT, >flags);
+}
+
 #endif /* _I915_SCHEDULER_TYPES_H_ */
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 54/57] drm/i915/gt: Infrastructure for ring scheduling

2021-02-01 Thread Chris Wilson
Build a bare bones scheduler to sit on top the global legacy ringbuffer
submission. This virtual execlists scheme should be applicable to all
older platforms.

A key problem we have with the legacy ring buffer submission is that it
only allows for FIFO queuing. All clients share the global request queue
and must contend for its lock when submitting. As any client may need to
wait for external events, all clients must then wait. However, if we
stage each client into their own virtual ringbuffer with their own
timelines, we can copy the client requests into the global ringbuffer
only when they are ready, reordering the submission around stalls.
Furthermore, the ability to reorder gives us rudimentarily priority
sorting -- although without preemption support, once something is on the
GPU it stays on the GPU, and so it is still possible for a hog to delay
a high priority request (such as updating the display). However, it does
means that in keeping a short submission queue, the high priority
request will be next. This design resembles the old guc submission
scheduler, for reordering requests onto a global workqueue.

The implementation uses the MI_USER_INTERRUPT at the end of every
request to track completion, so is more interrupt happy than execlists
[which has an interrupt for each context event, albeit two]. Our
interrupts on these system are relatively heavy, and in the past we have
been able to completely starve Sandybrige by the interrupt traffic. Our
interrupt handlers are being much better (in part offloading the work to
bottom halves leaving the interrupt itself only dealing with acking the
registers) but we can still see the impact of starvation in the uneven
submission latency on a saturated system.

Overall though, the short sumission queues and extra interrupts do not
appear to be affecting throughput (+-10%, some tasks even improve to the
reduced request overheads) and improve latency. [Which is a massive
improvement since the introduction of Sandybridge!]

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/Makefile |   1 +
 drivers/gpu/drm/i915/gt/intel_engine.h|   1 +
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   1 +
 .../gpu/drm/i915/gt/intel_ring_scheduler.c| 783 ++
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  17 +-
 .../gpu/drm/i915/gt/intel_ring_submission.h   |  17 +
 6 files changed, 812 insertions(+), 8 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
 create mode 100644 drivers/gpu/drm/i915/gt/intel_ring_submission.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index ce01634d4ea7..1f9c98eae605 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -115,6 +115,7 @@ gt-y += \
gt/intel_renderstate.o \
gt/intel_reset.o \
gt/intel_ring.o \
+   gt/intel_ring_scheduler.o \
gt/intel_ring_submission.o \
gt/intel_rps.o \
gt/intel_sseu.o \
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
b/drivers/gpu/drm/i915/gt/intel_engine.h
index 33a29623571d..bc07c96ab48c 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -193,6 +193,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs 
*engine);
 int intel_engine_resume(struct intel_engine_cs *engine);
 
 int intel_ring_submission_setup(struct intel_engine_cs *engine);
+int intel_ring_scheduler_setup(struct intel_engine_cs *engine);
 
 int intel_engine_stop_cs(struct intel_engine_cs *engine);
 void intel_engine_cancel_stop_cs(struct intel_engine_cs *engine);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index e94c99dee5cb..9f14cc631287 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -318,6 +318,7 @@ struct intel_engine_cs {
struct {
struct intel_ring *ring;
struct intel_timeline *timeline;
+   struct intel_context *context;
} legacy;
 
/*
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c 
b/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
new file mode 100644
index ..b6fcb18ef0a6
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
@@ -0,0 +1,783 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#include 
+
+#include 
+
+#include "i915_drv.h"
+#include "intel_breadcrumbs.h"
+#include "intel_context.h"
+#include "intel_engine_pm.h"
+#include "intel_engine_stats.h"
+#include "intel_gt.h"
+#include "intel_gt_pm.h"
+#include "intel_gt_requests.h"
+#include "intel_reset.h"
+#include "intel_ring.h"
+#include "intel_ring_submission.h"
+#include "shmem_utils.h"
+
+/*
+ * Rough estimate of the typical request size, performing a flush

[Intel-gfx] [PATCH 45/57] drm/i915/gt: Track timeline GGTT offset separately from subpage offset

2021-02-01 Thread Chris Wilson
Currently we know that the timeline status page is at most a page in
size, and so we can preserve the lower 12bits of the offset when
relocating the status page in the GGTT. If we want to use a larger
object, such as the context state, we may not necessarily use a position
within the first page and so need more than 12b.

Signed-off-by: Chris Wilson 
Reviewed-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gt/gen6_engine_cs.c|  4 ++--
 drivers/gpu/drm/i915/gt/gen8_engine_cs.c|  2 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c   |  1 -
 .../drm/i915/gt/intel_execlists_submission.c|  2 +-
 drivers/gpu/drm/i915/gt/intel_timeline.c| 17 +++--
 drivers/gpu/drm/i915/gt/intel_timeline_types.h  |  1 +
 drivers/gpu/drm/i915/gt/selftest_engine_cs.c|  2 +-
 drivers/gpu/drm/i915/gt/selftest_rc6.c  |  2 +-
 drivers/gpu/drm/i915/gt/selftest_timeline.c | 16 
 drivers/gpu/drm/i915/i915_scheduler.c   |  2 +-
 10 files changed, 23 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen6_engine_cs.c 
b/drivers/gpu/drm/i915/gt/gen6_engine_cs.c
index ce38d1bcaba3..2f59dd3bdc18 100644
--- a/drivers/gpu/drm/i915/gt/gen6_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/gen6_engine_cs.c
@@ -161,7 +161,7 @@ u32 *gen6_emit_breadcrumb_rcs(struct i915_request *rq, u32 
*cs)
 PIPE_CONTROL_DC_FLUSH_ENABLE |
 PIPE_CONTROL_QW_WRITE |
 PIPE_CONTROL_CS_STALL);
-   *cs++ = i915_request_active_timeline(rq)->hwsp_offset |
+   *cs++ = i915_request_active_timeline(rq)->ggtt_offset |
PIPE_CONTROL_GLOBAL_GTT;
*cs++ = rq->fence.seqno;
 
@@ -359,7 +359,7 @@ u32 *gen7_emit_breadcrumb_rcs(struct i915_request *rq, u32 
*cs)
 PIPE_CONTROL_QW_WRITE |
 PIPE_CONTROL_GLOBAL_GTT_IVB |
 PIPE_CONTROL_CS_STALL);
-   *cs++ = i915_request_active_timeline(rq)->hwsp_offset;
+   *cs++ = i915_request_active_timeline(rq)->ggtt_offset;
*cs++ = rq->fence.seqno;
 
*cs++ = MI_USER_INTERRUPT;
diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c 
b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
index d8763146e054..187f1dad1054 100644
--- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
@@ -346,7 +346,7 @@ static u32 hwsp_offset(const struct i915_request *rq)
if (cl)
return cl->ggtt_offset;
 
-   return rcu_dereference_protected(rq->timeline, 1)->hwsp_offset;
+   return rcu_dereference_protected(rq->timeline, 1)->ggtt_offset;
 }
 
 int gen8_emit_init_breadcrumb(struct i915_request *rq)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index f39f8049641c..f91c38124871 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1452,7 +1452,6 @@ void intel_engine_dump(struct intel_engine_cs *engine,
i915_sched_show(m, intel_engine_get_scheduler(engine),
i915_request_show, 8);
 
-
drm_printf(m, "\tMMIO base:  0x%08x\n", engine->mmio_base);
wakeref = intel_runtime_pm_get_if_in_use(engine->uncore->rpm);
if (wakeref) {
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index d4ae65af7dc1..d9b5b6c9eb5d 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3591,7 +3591,7 @@ static int print_ring(char *buf, int sz, struct 
i915_request *rq)
len = scnprintf(buf, sz,
"ring:{start:%08x, hwsp:%08x, seqno:%08x, 
runtime:%llums}, ",
i915_ggtt_offset(rq->ring->vma),
-   tl ? tl->hwsp_offset : 0,
+   tl ? tl->ggtt_offset : 0,
hwsp_seqno(rq),

DIV_ROUND_CLOSEST_ULL(intel_context_get_total_runtime_ns(rq->context),
  1000 * 1000));
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c 
b/drivers/gpu/drm/i915/gt/intel_timeline.c
index 1505dffbaba9..b684322c879c 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -354,13 +354,11 @@ int intel_timeline_pin(struct intel_timeline *tl, struct 
i915_gem_ww_ctx *ww)
if (err)
return err;
 
-   tl->hwsp_offset =
-   i915_ggtt_offset(tl->hwsp_ggtt) +
-   offset_in_page(tl->hwsp_offset);
+   tl->ggtt_offset = i915_ggtt_offset(tl->hwsp_ggtt) + tl->hwsp_offset;
GT_TRACE(tl->gt, "timeline:%llx using HWSP offset:%x\n",
-tl->fence_context, tl->hwsp_offset);
+  

[Intel-gfx] [PATCH 56/57] drm/i915/gt: Enable ring scheduling for gen5-7

2021-02-01 Thread Chris Wilson
Switch over from FIFO global submission to the priority-sorted
topographical scheduler. At the cost of more busy work on the CPU to
keep the GPU supplied with the next packet of requests, this allows us
to reorder requests around submission stalls and so allow low latency
under load while maintaining fairness between clients.

The downside is that we enable interrupts on all requests (unlike with
execlists where we have an interrupt for context switches). This means
that instead of receiving an interrupt for when we are waitng for
completion, we are processing them all the time, with noticeable
overhead of cpu time absorbed by the interrupt handler. The effect is
most pronounced on CPU-throughput limited renderers like uxa, where
performance can be degraded by 20% in the worst case. Nevertheless, this
is a pathological example of an obsolete userspace driver. (There are
also cases where uxa performs better by 20%, which is an interesting
quirk...) The glxgears-not-a-benchmark (cpu throughtput bound) is one
such example of a performance hit, only affecting uxa.

The expectation is that allowing request reordering will allow much
smoother UX that greatly compensates for reduced throughput under high
submission load (but low GPU load).

This also enables the timer based RPS for better powersaving, with the
exception of Valleyview whose PCU doesn't take kindly to our
interference.

References: 0f46832fab77 ("drm/i915: Mask USER interrupts on gen6 (until 
required)")
Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 2 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c | 5 -
 drivers/gpu/drm/i915/gt/intel_gt_types.h  | 1 +
 drivers/gpu/drm/i915/gt/intel_rps.c   | 6 ++
 4 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
index d3f87dc4eda3..2246b5c308dc 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -94,7 +94,7 @@ static int live_nop_switch(void *arg)
rq = i915_request_get(this);
i915_request_add(this);
}
-   if (i915_request_wait(rq, 0, HZ / 5) < 0) {
+   if (i915_request_wait(rq, 0, HZ) < 0) {
pr_err("Failed to populated %d contexts\n", nctx);
intel_gt_set_wedged(>gt);
i915_request_put(rq);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index f91c38124871..c8136ded5bbe 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -875,8 +875,11 @@ int intel_engines_init(struct intel_gt *gt)
} else if (HAS_EXECLISTS(gt->i915)) {
gt->submission_method = INTEL_SUBMISSION_ELSP;
setup = intel_execlists_submission_setup;
-   } else {
+   } else if (INTEL_GEN(gt->i915) >= 5) {
gt->submission_method = INTEL_SUBMISSION_RING;
+   setup = intel_ring_scheduler_setup;
+   } else {
+   gt->submission_method = INTEL_SUBMISSION_LEGACY;
setup = intel_ring_submission_setup;
}
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h 
b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index 626af37c7790..125b40f62644 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -30,6 +30,7 @@ struct intel_engine_cs;
 struct intel_uncore;
 
 enum intel_submission_method {
+   INTEL_SUBMISSION_LEGACY,
INTEL_SUBMISSION_RING,
INTEL_SUBMISSION_ELSP,
INTEL_SUBMISSION_GUC,
diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c 
b/drivers/gpu/drm/i915/gt/intel_rps.c
index 900c20a6d073..2c78d61e7ea9 100644
--- a/drivers/gpu/drm/i915/gt/intel_rps.c
+++ b/drivers/gpu/drm/i915/gt/intel_rps.c
@@ -1081,9 +1081,7 @@ static bool gen6_rps_enable(struct intel_rps *rps)
intel_uncore_write_fw(uncore, GEN6_RP_DOWN_TIMEOUT, 5);
intel_uncore_write_fw(uncore, GEN6_RP_IDLE_HYSTERSIS, 10);
 
-   rps->pm_events = (GEN6_PM_RP_UP_THRESHOLD |
- GEN6_PM_RP_DOWN_THRESHOLD |
- GEN6_PM_RP_DOWN_TIMEOUT);
+   rps->pm_events = GEN6_PM_RP_UP_THRESHOLD | GEN6_PM_RP_DOWN_THRESHOLD;
 
return rps_reset(rps);
 }
@@ -1391,7 +1389,7 @@ void intel_rps_enable(struct intel_rps *rps)
GEM_BUG_ON(rps->efficient_freq < rps->min_freq);
GEM_BUG_ON(rps->efficient_freq > rps->max_freq);
 
-   if (has_busy_stats(rps))
+   if (has_busy_stats(rps) && !IS_VALLEYVIEW(i915))
intel_rps_set_timer(rps);
else if (INTEL_GEN(i915) >= 6)
intel_rps_set_interrupts(rps);

[Intel-gfx] [PATCH 46/57] drm/i915/gt: Add timeline "mode"

2021-02-01 Thread Chris Wilson
Explicitly differentiate between the absolute and relative timelines,
and the global HWSP and ppHWSP relative offsets. When using a timeline
that is relative to a known status page, we can replace the absolute
addressing in the commands with indexed variants.

Signed-off-by: Chris Wilson 
Reviewed-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gt/intel_timeline.c  | 21 ---
 drivers/gpu/drm/i915/gt/intel_timeline.h  |  2 +-
 .../gpu/drm/i915/gt/intel_timeline_types.h| 10 +++--
 3 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c 
b/drivers/gpu/drm/i915/gt/intel_timeline.c
index b684322c879c..69052495c64a 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -226,7 +226,6 @@ static int intel_timeline_init(struct intel_timeline 
*timeline,
 
timeline->gt = gt;
 
-   timeline->has_initial_breadcrumb = !hwsp;
timeline->hwsp_cacheline = NULL;
 
if (!hwsp) {
@@ -243,13 +242,29 @@ static int intel_timeline_init(struct intel_timeline 
*timeline,
return PTR_ERR(cl);
}
 
+   timeline->mode = INTEL_TIMELINE_ABSOLUTE;
timeline->hwsp_cacheline = cl;
timeline->hwsp_offset = cacheline * CACHELINE_BYTES;
 
vaddr = page_mask_bits(cl->vaddr);
} else {
-   timeline->hwsp_offset = offset;
-   vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
+   int preferred;
+
+   if (offset & INTEL_TIMELINE_RELATIVE_CONTEXT) {
+   timeline->mode = INTEL_TIMELINE_RELATIVE_CONTEXT;
+   timeline->hwsp_offset =
+   offset & ~INTEL_TIMELINE_RELATIVE_CONTEXT;
+   preferred = i915_coherent_map_type(gt->i915);
+   } else {
+   timeline->mode = INTEL_TIMELINE_RELATIVE_ENGINE;
+   timeline->hwsp_offset = offset;
+   preferred = I915_MAP_WB;
+   }
+
+   vaddr = i915_gem_object_pin_map(hwsp->obj,
+   preferred | I915_MAP_OVERRIDE);
+   if (IS_ERR(vaddr))
+   vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WC);
if (IS_ERR(vaddr))
return PTR_ERR(vaddr);
}
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.h 
b/drivers/gpu/drm/i915/gt/intel_timeline.h
index 7d6218b55df6..e1d522329757 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.h
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.h
@@ -45,7 +45,7 @@ static inline void intel_timeline_put(struct intel_timeline 
*timeline)
 static inline bool
 intel_timeline_has_initial_breadcrumb(const struct intel_timeline *tl)
 {
-   return tl->has_initial_breadcrumb;
+   return tl->mode == INTEL_TIMELINE_ABSOLUTE;
 }
 
 static inline int __intel_timeline_sync_set(struct intel_timeline *tl,
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline_types.h 
b/drivers/gpu/drm/i915/gt/intel_timeline_types.h
index c5995cc290a0..61938d103a13 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_timeline_types.h
@@ -19,6 +19,12 @@ struct i915_syncmap;
 struct intel_gt;
 struct intel_timeline_hwsp;
 
+enum intel_timeline_mode {
+   INTEL_TIMELINE_ABSOLUTE = 0,
+   INTEL_TIMELINE_RELATIVE_CONTEXT = BIT(0),
+   INTEL_TIMELINE_RELATIVE_ENGINE  = BIT(1),
+};
+
 struct intel_timeline {
u64 fence_context;
u32 seqno;
@@ -44,6 +50,8 @@ struct intel_timeline {
atomic_t pin_count;
atomic_t active_count;
 
+   enum intel_timeline_mode mode;
+
const u32 *hwsp_seqno;
struct i915_vma *hwsp_ggtt;
u32 hwsp_offset;
@@ -51,8 +59,6 @@ struct intel_timeline {
 
struct intel_timeline_cacheline *hwsp_cacheline;
 
-   bool has_initial_breadcrumb;
-
/**
 * List of breadcrumbs associated with GPU requests currently
 * outstanding.
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 35/57] drm/i915: Replace priolist rbtree with a skiplist

2021-02-01 Thread Chris Wilson
Replace the priolist rbtree with a skiplist. The crucial difference is
that walking and removing the first element of a skiplist is O(1), but
O(lgN) for an rbtree, as we need to rebalance on remove. This is a
hindrance for submission latency as it occurs between picking a request
for the priolist and submitting it to hardware, as well effectively
tripling the number of O(lgN) operations required under the irqoff lock.
This is critical to reducing the latency jitter with multiple clients.

The downsides to skiplists are that lookup/insertion is only
probabilistically O(lgN) and there is a significant memory penalty to
as each skip node is larger than the rbtree equivalent. Furthermore, we
don't use dynamic arrays for the skiplist, so the allocation is fixed,
and imposes an upper bound on the scalability wrt to the number of
inflight requests.

In the following patches, we introduce a new sort key to the scheduler,
a virtual deadline. This imposes a different structure to the tree.
Using a priority sort, we have very few priority levels active at any
time, most likely just the default priority and so the rbtree degenerates
to a single elements containing the list of all ready requests. The
deadlines in contrast are very sparse, and typically each request has a
unique deadline. Instead of being able to simply walk the list during
dequeue, with the deadline scheduler we have to iterate through the bst
on the critical submission path. Skiplists are vastly superior in this
instance due to the O(1) iteration during dequeue, with very similar
characteristics [on average] to the rbtree for insertion.

This means that by using skiplists we can introduce a sparse sort key
without degrading latency on the critical submission path.

As an example, one simple case where we try to do lots of
semi-independent work without any priority management (gem_exec_parallel),
the lock hold times were:
[worst][total][avg]
 973.05 6301584.84 0.35 # plain rbtree
 559.82 5424915.25 0.33 # best rbtree with pruning
 208.21 3898784.09 0.24 # skiplist
  34.05 5784106.01 0.32 # rbtree without deadlines
  23.35 4152999.80 0.24 # skiplist without deadlines

Based on the skiplist implementation by Dr Con Kolivas for MuQSS.

References: https://en.wikipedia.org/wiki/Skip_list
Signed-off-by: Chris Wilson 
---
 .../drm/i915/gt/intel_execlists_submission.c  |  52 ++--
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  31 +-
 drivers/gpu/drm/i915/i915_priolist_types.h|  64 +++-
 drivers/gpu/drm/i915/i915_scheduler.c | 288 ++
 drivers/gpu/drm/i915/i915_scheduler.h |  11 +-
 drivers/gpu/drm/i915/i915_scheduler_types.h   |   2 +-
 .../drm/i915/selftests/i915_mock_selftests.h  |   1 +
 .../gpu/drm/i915/selftests/i915_scheduler.c   |  53 +++-
 8 files changed, 383 insertions(+), 119 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 7ec33bd73d95..1a33c33c96c4 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -252,11 +252,6 @@ static void ring_set_paused(const struct intel_engine_cs 
*engine, int state)
wmb();
 }
 
-static struct i915_priolist *to_priolist(struct rb_node *rb)
-{
-   return rb_entry(rb, struct i915_priolist, node);
-}
-
 static int rq_prio(const struct i915_request *rq)
 {
return READ_ONCE(rq->sched.attr.priority);
@@ -280,15 +275,27 @@ static int effective_prio(const struct i915_request *rq)
return prio;
 }
 
+static struct i915_request *first_request(const struct i915_sched *se)
+{
+   struct i915_priolist *pl = se->queue.sentinel.next[0];
+
+   if (pl == >queue.sentinel)
+   return NULL;
+
+   return list_first_entry_or_null(>requests,
+   struct i915_request,
+   sched.link);
+}
+
 static int queue_prio(const struct i915_sched *se)
 {
-   struct rb_node *rb;
+   struct i915_request *rq;
 
-   rb = rb_first_cached(>queue);
-   if (!rb)
+   rq = first_request(se);
+   if (!rq)
return INT_MIN;
 
-   return to_priolist(rb)->priority;
+   return rq_prio(rq);
 }
 
 static int virtual_prio(const struct intel_engine_execlists *el)
@@ -298,7 +305,7 @@ static int virtual_prio(const struct intel_engine_execlists 
*el)
return rb ? rb_entry(rb, struct ve_node, rb)->prio : INT_MIN;
 }
 
-static bool need_preempt(const struct intel_engine_cs *engine,
+static bool need_preempt(struct intel_engine_cs *engine,
 const struct i915_request *rq)
 {
const struct i915_sched *se = >sched;
@@ -1143,6 +1150,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
struct i915_request ** const last_port = port + execlists->port_mask;
struct 

[Intel-gfx] [PATCH 44/57] drm/i915/gt: Wrap intel_timeline.has_initial_breadcrumb

2021-02-01 Thread Chris Wilson
In preparation for removing the has_initial_breadcrumb field, add a
helper function for the existing callers.

Signed-off-by: Chris Wilson 
Reviewed-by: Mika Kuoppala 
---
 drivers/gpu/drm/i915/gt/gen8_engine_cs.c| 2 +-
 drivers/gpu/drm/i915/gt/intel_ring_submission.c | 4 ++--
 drivers/gpu/drm/i915/gt/intel_timeline.c| 6 +++---
 drivers/gpu/drm/i915/gt/intel_timeline.h| 6 ++
 drivers/gpu/drm/i915/gt/selftest_timeline.c | 5 +++--
 5 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c 
b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
index 8791e03ebe61..d8763146e054 100644
--- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
@@ -354,7 +354,7 @@ int gen8_emit_init_breadcrumb(struct i915_request *rq)
u32 *cs;
 
GEM_BUG_ON(i915_request_has_initial_breadcrumb(rq));
-   if (!i915_request_timeline(rq)->has_initial_breadcrumb)
+   if (!intel_timeline_has_initial_breadcrumb(i915_request_timeline(rq)))
return 0;
 
cs = intel_ring_begin(rq, 6);
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c 
b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index a7d49ea71900..9d193acd260b 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -908,7 +908,7 @@ static int ring_request_alloc(struct i915_request *request)
int ret;
 
GEM_BUG_ON(!intel_context_is_pinned(request->context));
-   GEM_BUG_ON(i915_request_timeline(request)->has_initial_breadcrumb);
+   
GEM_BUG_ON(intel_timeline_has_initial_breadcrumb(i915_request_timeline(request)));
 
/*
 * Flush enough space to reduce the likelihood of waiting after
@@ -1229,7 +1229,7 @@ int intel_ring_submission_setup(struct intel_engine_cs 
*engine)
err = PTR_ERR(timeline);
goto err;
}
-   GEM_BUG_ON(timeline->has_initial_breadcrumb);
+   GEM_BUG_ON(intel_timeline_has_initial_breadcrumb(timeline));
 
err = intel_timeline_pin(timeline, NULL);
if (err)
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c 
b/drivers/gpu/drm/i915/gt/intel_timeline.c
index 491b8df174c2..1505dffbaba9 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -444,14 +444,14 @@ void intel_timeline_exit(struct intel_timeline *tl)
 static u32 timeline_advance(struct intel_timeline *tl)
 {
GEM_BUG_ON(!atomic_read(>pin_count));
-   GEM_BUG_ON(tl->seqno & tl->has_initial_breadcrumb);
+   GEM_BUG_ON(tl->seqno & intel_timeline_has_initial_breadcrumb(tl));
 
-   return tl->seqno += 1 + tl->has_initial_breadcrumb;
+   return tl->seqno += 1 + intel_timeline_has_initial_breadcrumb(tl);
 }
 
 static void timeline_rollback(struct intel_timeline *tl)
 {
-   tl->seqno -= 1 + tl->has_initial_breadcrumb;
+   tl->seqno -= 1 + intel_timeline_has_initial_breadcrumb(tl);
 }
 
 static noinline int
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.h 
b/drivers/gpu/drm/i915/gt/intel_timeline.h
index b1f81d947f8d..7d6218b55df6 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.h
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.h
@@ -42,6 +42,12 @@ static inline void intel_timeline_put(struct intel_timeline 
*timeline)
kref_put(>kref, __intel_timeline_free);
 }
 
+static inline bool
+intel_timeline_has_initial_breadcrumb(const struct intel_timeline *tl)
+{
+   return tl->has_initial_breadcrumb;
+}
+
 static inline int __intel_timeline_sync_set(struct intel_timeline *tl,
u64 context, u32 seqno)
 {
diff --git a/drivers/gpu/drm/i915/gt/selftest_timeline.c 
b/drivers/gpu/drm/i915/gt/selftest_timeline.c
index d283dce5b4ac..562a450d2832 100644
--- a/drivers/gpu/drm/i915/gt/selftest_timeline.c
+++ b/drivers/gpu/drm/i915/gt/selftest_timeline.c
@@ -665,7 +665,7 @@ static int live_hwsp_wrap(void *arg)
if (IS_ERR(tl))
return PTR_ERR(tl);
 
-   if (!tl->has_initial_breadcrumb || !tl->hwsp_cacheline)
+   if (!intel_timeline_has_initial_breadcrumb(tl) || !tl->hwsp_cacheline)
goto out_free;
 
err = intel_timeline_pin(tl, NULL);
@@ -1234,7 +1234,8 @@ static int live_hwsp_rollover_user(void *arg)
goto out;
 
tl = ce->timeline;
-   if (!tl->has_initial_breadcrumb || !tl->hwsp_cacheline)
+   if (!intel_timeline_has_initial_breadcrumb(tl) ||
+   !tl->hwsp_cacheline)
goto out;
 
timeline_rollback(tl);
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 51/57] drm/i915/gt: Couple tasklet scheduling for all CS interrupts

2021-02-01 Thread Chris Wilson
If any engine asks for the tasklet to be kicked from the CS interrupt,
do so. Currently, this is used by the execlists scheduler backends to
feed in the next request to the HW, and similarly could be used by a
ring scheduler, as will be seen in the next patch.

Signed-off-by: Chris Wilson 
Reviewed-by: Mika Kuoppala 
---
 drivers/gpu/drm/i915/gt/intel_gt_irq.c | 17 -
 drivers/gpu/drm/i915/gt/intel_gt_irq.h |  3 +++
 drivers/gpu/drm/i915/gt/intel_rps.c|  2 +-
 drivers/gpu/drm/i915/i915_irq.c|  8 
 4 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c 
b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
index 6ce5bd28a23d..270dbebc4c18 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
@@ -62,6 +62,13 @@ cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
intel_engine_kick_scheduler(engine);
 }
 
+void gen2_engine_cs_irq(struct intel_engine_cs *engine)
+{
+   intel_engine_signal_breadcrumbs(engine);
+   if (intel_engine_needs_breadcrumb_tasklet(engine))
+   intel_engine_kick_scheduler(engine);
+}
+
 static u32
 gen11_gt_engine_identity(struct intel_gt *gt,
 const unsigned int bank, const unsigned int bit)
@@ -275,9 +282,9 @@ void gen11_gt_irq_postinstall(struct intel_gt *gt)
 void gen5_gt_irq_handler(struct intel_gt *gt, u32 gt_iir)
 {
if (gt_iir & GT_RENDER_USER_INTERRUPT)
-   
intel_engine_signal_breadcrumbs(gt->engine_class[RENDER_CLASS][0]);
+   gen2_engine_cs_irq(gt->engine_class[RENDER_CLASS][0]);
if (gt_iir & ILK_BSD_USER_INTERRUPT)
-   
intel_engine_signal_breadcrumbs(gt->engine_class[VIDEO_DECODE_CLASS][0]);
+   gen2_engine_cs_irq(gt->engine_class[VIDEO_DECODE_CLASS][0]);
 }
 
 static void gen7_parity_error_irq_handler(struct intel_gt *gt, u32 iir)
@@ -301,11 +308,11 @@ static void gen7_parity_error_irq_handler(struct intel_gt 
*gt, u32 iir)
 void gen6_gt_irq_handler(struct intel_gt *gt, u32 gt_iir)
 {
if (gt_iir & GT_RENDER_USER_INTERRUPT)
-   
intel_engine_signal_breadcrumbs(gt->engine_class[RENDER_CLASS][0]);
+   gen2_engine_cs_irq(gt->engine_class[RENDER_CLASS][0]);
if (gt_iir & GT_BSD_USER_INTERRUPT)
-   
intel_engine_signal_breadcrumbs(gt->engine_class[VIDEO_DECODE_CLASS][0]);
+   gen2_engine_cs_irq(gt->engine_class[VIDEO_DECODE_CLASS][0]);
if (gt_iir & GT_BLT_USER_INTERRUPT)
-   
intel_engine_signal_breadcrumbs(gt->engine_class[COPY_ENGINE_CLASS][0]);
+   gen2_engine_cs_irq(gt->engine_class[COPY_ENGINE_CLASS][0]);
 
if (gt_iir & (GT_BLT_CS_ERROR_INTERRUPT |
  GT_BSD_CS_ERROR_INTERRUPT |
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.h 
b/drivers/gpu/drm/i915/gt/intel_gt_irq.h
index f667e976fb2b..26c2a5ea3b23 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_irq.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.h
@@ -8,6 +8,7 @@
 
 #include 
 
+struct intel_engine_cs;
 struct intel_gt;
 
 #define GEN8_GT_IRQS (GEN8_GT_RCS_IRQ | \
@@ -18,6 +19,8 @@ struct intel_gt;
  GEN8_GT_PM_IRQ | \
  GEN8_GT_GUC_IRQ)
 
+void gen2_engine_cs_irq(struct intel_engine_cs *engine);
+
 void gen11_gt_irq_reset(struct intel_gt *gt);
 void gen11_gt_irq_postinstall(struct intel_gt *gt);
 void gen11_gt_irq_handler(struct intel_gt *gt, const u32 master_ctl);
diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c 
b/drivers/gpu/drm/i915/gt/intel_rps.c
index 405d814e9040..900c20a6d073 100644
--- a/drivers/gpu/drm/i915/gt/intel_rps.c
+++ b/drivers/gpu/drm/i915/gt/intel_rps.c
@@ -1774,7 +1774,7 @@ void gen6_rps_irq_handler(struct intel_rps *rps, u32 
pm_iir)
return;
 
if (pm_iir & PM_VEBOX_USER_INTERRUPT)
-   intel_engine_signal_breadcrumbs(gt->engine[VECS0]);
+   gen2_engine_cs_irq(gt->engine[VECS0]);
 
if (pm_iir & PM_VEBOX_CS_ERROR_INTERRUPT)
DRM_DEBUG("Command parser error, pm_iir 0x%08x\n", pm_iir);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 9665cd9742a6..c244ba2c8cee 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3954,7 +3954,7 @@ static irqreturn_t i8xx_irq_handler(int irq, void *arg)
intel_uncore_write16(_priv->uncore, GEN2_IIR, iir);
 
if (iir & I915_USER_INTERRUPT)
-   
intel_engine_signal_breadcrumbs(dev_priv->gt.engine[RCS0]);
+   gen2_engine_cs_irq(dev_priv->gt.engine[RCS0]);
 
if (iir & I915_MASTER_ERROR_INTERRUPT)
i8xx_error_irq_handler(dev_priv, eir, eir_stuck);
@@ -4062,7 +4062,7 @@ static irqreturn_t i915_irq_handler(int irq, void *arg)
in

[Intel-gfx] [PATCH 47/57] drm/i915/gt: Use indices for writing into relative timelines

2021-02-01 Thread Chris Wilson
Relative timelines are relative to either the global or per-process
HWSP, and so we can replace the absolute addressing with store-index
variants for position invariance.

Signed-off-by: Chris Wilson 
Reviewed-by: Matthew Brost 
---
 drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 98 +---
 drivers/gpu/drm/i915/gt/intel_timeline.h | 12 +++
 2 files changed, 82 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c 
b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
index 187f1dad1054..7fd843369b41 100644
--- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
@@ -518,7 +518,19 @@ gen8_emit_fini_breadcrumb_tail(struct i915_request *rq, 
u32 *cs)
 
 static u32 *emit_xcs_breadcrumb(struct i915_request *rq, u32 *cs)
 {
-   return gen8_emit_ggtt_write(cs, rq->fence.seqno, hwsp_offset(rq), 0);
+   struct intel_timeline *tl = rcu_dereference_protected(rq->timeline, 1);
+   unsigned int flags = MI_FLUSH_DW_OP_STOREDW;
+   u32 offset = hwsp_offset(rq);
+
+   if (intel_timeline_is_relative(tl)) {
+   offset = offset_in_page(offset);
+   flags |= MI_FLUSH_DW_STORE_INDEX;
+   }
+   GEM_BUG_ON(offset & 7);
+   if (!intel_timeline_in_context(tl))
+   offset |= MI_FLUSH_DW_USE_GTT;
+
+   return __gen8_emit_flush_dw(cs, rq->fence.seqno, offset, flags);
 }
 
 u32 *gen8_emit_fini_breadcrumb_xcs(struct i915_request *rq, u32 *cs)
@@ -528,6 +540,18 @@ u32 *gen8_emit_fini_breadcrumb_xcs(struct i915_request 
*rq, u32 *cs)
 
 u32 *gen8_emit_fini_breadcrumb_rcs(struct i915_request *rq, u32 *cs)
 {
+   struct intel_timeline *tl = rcu_dereference_protected(rq->timeline, 1);
+   unsigned int flags = PIPE_CONTROL_FLUSH_ENABLE | PIPE_CONTROL_CS_STALL;
+   u32 offset = hwsp_offset(rq);
+
+   if (intel_timeline_is_relative(tl)) {
+   offset = offset_in_page(offset);
+   flags |= PIPE_CONTROL_STORE_DATA_INDEX;
+   }
+   GEM_BUG_ON(offset & 7);
+   if (!intel_timeline_in_context(tl))
+   flags |= PIPE_CONTROL_GLOBAL_GTT_IVB;
+
cs = gen8_emit_pipe_control(cs,
PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
PIPE_CONTROL_DEPTH_CACHE_FLUSH |
@@ -535,26 +559,33 @@ u32 *gen8_emit_fini_breadcrumb_rcs(struct i915_request 
*rq, u32 *cs)
0);
 
/* XXX flush+write+CS_STALL all in one upsets gem_concurrent_blt:kbl */
-   cs = gen8_emit_ggtt_write_rcs(cs,
- rq->fence.seqno,
- hwsp_offset(rq),
- PIPE_CONTROL_FLUSH_ENABLE |
- PIPE_CONTROL_CS_STALL);
+   cs = __gen8_emit_write_rcs(cs, rq->fence.seqno, offset, 0, flags);
 
return gen8_emit_fini_breadcrumb_tail(rq, cs);
 }
 
 u32 *gen11_emit_fini_breadcrumb_rcs(struct i915_request *rq, u32 *cs)
 {
-   cs = gen8_emit_ggtt_write_rcs(cs,
- rq->fence.seqno,
- hwsp_offset(rq),
- PIPE_CONTROL_CS_STALL |
- PIPE_CONTROL_TILE_CACHE_FLUSH |
- PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
- PIPE_CONTROL_DEPTH_CACHE_FLUSH |
- PIPE_CONTROL_DC_FLUSH_ENABLE |
- PIPE_CONTROL_FLUSH_ENABLE);
+   struct intel_timeline *tl = rcu_dereference_protected(rq->timeline, 1);
+   u32 offset = hwsp_offset(rq);
+   unsigned int flags;
+
+   flags = (PIPE_CONTROL_CS_STALL |
+PIPE_CONTROL_TILE_CACHE_FLUSH |
+PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
+PIPE_CONTROL_DEPTH_CACHE_FLUSH |
+PIPE_CONTROL_DC_FLUSH_ENABLE |
+PIPE_CONTROL_FLUSH_ENABLE);
+
+   if (intel_timeline_is_relative(tl)) {
+   offset = offset_in_page(offset);
+   flags |= PIPE_CONTROL_STORE_DATA_INDEX;
+   }
+   GEM_BUG_ON(offset & 7);
+   if (!intel_timeline_in_context(tl))
+   flags |= PIPE_CONTROL_GLOBAL_GTT_IVB;
+
+   cs = __gen8_emit_write_rcs(cs, rq->fence.seqno, offset, 0, flags);
 
return gen8_emit_fini_breadcrumb_tail(rq, cs);
 }
@@ -617,19 +648,30 @@ u32 *gen12_emit_fini_breadcrumb_xcs(struct i915_request 
*rq, u32 *cs)
 
 u32 *gen12_emit_fini_breadcrumb_rcs(struct i915_request *rq, u32 *cs)
 {
-   cs = gen12_emit_ggtt_write_rcs(cs,
-  rq->fence.seqno,
-  hwsp_offset(rq),
-  PIPE_CONTROL0_HDC_PIPELINE_FLUSH,
-  PIPE_CONTROL_CS_STALL |
- 

[Intel-gfx] [PATCH 09/57] drm/i915: Replace engine->schedule() with a known request operation

2021-02-01 Thread Chris Wilson
Looking to the future, we want to set the scheduling attributes
explicitly and so replace the generic engine->schedule() with the more
direct i915_request_set_priority()

What it loses in removing the 'schedule' name from the function, it
gains in having an explicit entry point with a stated goal.

Signed-off-by: Chris Wilson 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/display/intel_display.c  |  5 ++-
 drivers/gpu/drm/i915/gem/i915_gem_object.h|  5 ++-
 drivers/gpu/drm/i915/gem/i915_gem_wait.c  | 29 +---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c |  3 --
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |  4 +--
 drivers/gpu/drm/i915/gt/intel_engine_types.h  | 29 
 drivers/gpu/drm/i915/gt/intel_engine_user.c   |  2 +-
 .../drm/i915/gt/intel_execlists_submission.c  |  3 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  | 33 +--
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  | 11 +++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  3 +-
 drivers/gpu/drm/i915/i915_request.c   | 10 +++---
 drivers/gpu/drm/i915/i915_request.h   |  5 +++
 drivers/gpu/drm/i915/i915_scheduler.c | 15 +
 drivers/gpu/drm/i915/i915_scheduler.h |  3 +-
 15 files changed, 65 insertions(+), 95 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
b/drivers/gpu/drm/i915/display/intel_display.c
index d8f10589e09e..aca964f7ba72 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -13662,7 +13662,6 @@ int
 intel_prepare_plane_fb(struct drm_plane *_plane,
   struct drm_plane_state *_new_plane_state)
 {
-   struct i915_sched_attr attr = { .priority = I915_PRIORITY_DISPLAY };
struct intel_plane *plane = to_intel_plane(_plane);
struct intel_plane_state *new_plane_state =
to_intel_plane_state(_new_plane_state);
@@ -13703,7 +13702,7 @@ intel_prepare_plane_fb(struct drm_plane *_plane,
 
if (new_plane_state->uapi.fence) { /* explicit fencing */
i915_gem_fence_wait_priority(new_plane_state->uapi.fence,
-);
+I915_PRIORITY_DISPLAY);
ret = i915_sw_fence_await_dma_fence(>commit_ready,
new_plane_state->uapi.fence,

i915_fence_timeout(dev_priv),
@@ -13725,7 +13724,7 @@ intel_prepare_plane_fb(struct drm_plane *_plane,
if (ret)
return ret;
 
-   i915_gem_object_wait_priority(obj, 0, );
+   i915_gem_object_wait_priority(obj, 0, I915_PRIORITY_DISPLAY);
i915_gem_object_flush_frontbuffer(obj, ORIGIN_DIRTYFB);
 
if (!new_plane_state->uapi.fence) { /* implicit fencing */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 3411ad197fa6..325766abca21 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -549,15 +549,14 @@ static inline void __start_cpu_write(struct 
drm_i915_gem_object *obj)
obj->cache_dirty = true;
 }
 
-void i915_gem_fence_wait_priority(struct dma_fence *fence,
- const struct i915_sched_attr *attr);
+void i915_gem_fence_wait_priority(struct dma_fence *fence, int prio);
 
 int i915_gem_object_wait(struct drm_i915_gem_object *obj,
 unsigned int flags,
 long timeout);
 int i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
  unsigned int flags,
- const struct i915_sched_attr *attr);
+ int prio);
 
 void __i915_gem_object_flush_frontbuffer(struct drm_i915_gem_object *obj,
 enum fb_op_origin origin);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c 
b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
index 4b9856d5ba14..d79bf16083bd 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
@@ -91,22 +91,12 @@ i915_gem_object_wait_reservation(struct dma_resv *resv,
return timeout;
 }
 
-static void fence_set_priority(struct dma_fence *fence,
-  const struct i915_sched_attr *attr)
+static void fence_set_priority(struct dma_fence *fence, int prio)
 {
-   struct i915_request *rq;
-   struct intel_engine_cs *engine;
-
if (dma_fence_is_signaled(fence) || !dma_fence_is_i915(fence))
return;
 
-   rq = to_request(fence);
-   engine = rq->engine;
-
-   rcu_read_lock(); /* RCU serialisation for set-wedged protection */
-   if (engine->schedule)
-   engine->schedule(rq, attr);
-   rcu_read_unlock();
+   i915_request_set_pr

[Intel-gfx] [PATCH 36/57] drm/i915: Wrap cmpxchg64 with try_cmpxchg64() helper

2021-02-01 Thread Chris Wilson
Wrap cmpxchg64 with a try_cmpxchg()-esque helper. Hiding the old-value
dance in the helper allows for cleaner code.

Signed-off-by: Chris Wilson 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_utils.h | 32 +++
 1 file changed, 32 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_utils.h 
b/drivers/gpu/drm/i915/i915_utils.h
index abd4dcd9f79c..95ead6bb1ba6 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -461,4 +461,36 @@ static inline bool timer_expired(const struct timer_list 
*t)
  */
 #define IS_ACTIVE(config) ((config) != 0)
 
+#ifndef try_cmpxchg64
+#if IS_ENABLED(CONFIG_64BIT)
+#define try_cmpxchg64(_ptr, _pold, _new) try_cmpxchg(_ptr, _pold, _new)
+#else
+#define try_cmpxchg64(_ptr, _pold, _new)   \
+({ \
+   __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold);  \
+   __typeof__(*(_ptr)) __old = *_old;  \
+   __typeof__(*(_ptr)) __cur = cmpxchg64(_ptr, __old, _new);   \
+   bool success = __cur == __old;  \
+   if (unlikely(!success)) \
+   *_old = __cur;  \
+   likely(success);\
+})
+#endif
+#endif
+
+#ifndef xchg64
+#if IS_ENABLED(CONFIG_64BIT)
+#define xchg64(_ptr, _new) xchg(_ptr, _new)
+#else
+#define xchg64(_ptr, _new) \
+({ \
+   __typeof__(_ptr) __ptr = (_ptr);\
+   __typeof__(*(_ptr)) __old = *__ptr; \
+   while (!try_cmpxchg64(__ptr, &__old, (_new)))   \
+   ;   \
+   __old;  \
+})
+#endif
+#endif
+
 #endif /* !__I915_UTILS_H */
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 31/57] drm/i915/gt: Declare when we enabled timeslicing

2021-02-01 Thread Chris Wilson
Let userspace know if they can trust timeslicing by including it as part
of the I915_PARAM_HAS_SCHEDULER::I915_SCHEDULER_CAP_TIMESLICING

v2: Only declare timeslicing if we can safely preempt userspace.

Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/gt/intel_engine_user.c | 26 +++--
 include/uapi/drm/i915_drm.h |  1 +
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 64eccdf32a22..50911fbe6368 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -90,13 +90,17 @@ static void sort_engines(struct drm_i915_private *i915,
 static void set_scheduler_caps(struct drm_i915_private *i915)
 {
static const struct {
-   u8 engine;
-   u8 sched;
-   } map[] = {
+   u8 flag;
+   u8 cap;
+   } engine_map[] = {
 #define MAP(x, y) { ilog2(I915_ENGINE_##x), ilog2(I915_SCHEDULER_CAP_##y) }
MAP(HAS_PREEMPTION, PREEMPTION),
MAP(HAS_SEMAPHORES, SEMAPHORES),
MAP(SUPPORTS_STATS, ENGINE_BUSY_STATS),
+#undef MAP
+   }, sched_map[] = {
+#define MAP(x, y) { ilog2(I915_SCHED_##x), ilog2(I915_SCHEDULER_CAP_##y) }
+   MAP(HAS_TIMESLICES_BIT, TIMESLICING),
 #undef MAP
};
struct intel_engine_cs *engine;
@@ -105,6 +109,7 @@ static void set_scheduler_caps(struct drm_i915_private 
*i915)
enabled = 0;
disabled = 0;
for_each_uabi_engine(engine, i915) { /* all engines must agree! */
+   struct i915_sched *se = intel_engine_get_scheduler(engine);
int i;
 
if (intel_engine_has_scheduler(engine))
@@ -114,11 +119,18 @@ static void set_scheduler_caps(struct drm_i915_private 
*i915)
disabled |= (I915_SCHEDULER_CAP_ENABLED |
 I915_SCHEDULER_CAP_PRIORITY);
 
-   for (i = 0; i < ARRAY_SIZE(map); i++) {
-   if (engine->flags & BIT(map[i].engine))
-   enabled |= BIT(map[i].sched);
+   for (i = 0; i < ARRAY_SIZE(engine_map); i++) {
+   if (engine->flags & BIT(engine_map[i].flag))
+   enabled |= BIT(engine_map[i].cap);
else
-   disabled |= BIT(map[i].sched);
+   disabled |= BIT(engine_map[i].cap);
+   }
+
+   for (i = 0; i < ARRAY_SIZE(sched_map); i++) {
+   if (se->flags & BIT(sched_map[i].flag))
+   enabled |= BIT(sched_map[i].cap);
+   else
+   disabled |= BIT(sched_map[i].cap);
}
}
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 1987e2ea79a3..cda0f391d965 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -524,6 +524,7 @@ typedef struct drm_i915_irq_wait {
 #define   I915_SCHEDULER_CAP_PREEMPTION(1ul << 2)
 #define   I915_SCHEDULER_CAP_SEMAPHORES(1ul << 3)
 #define   I915_SCHEDULER_CAP_ENGINE_BUSY_STATS (1ul << 4)
+#define   I915_SCHEDULER_CAP_TIMESLICING   (1ul << 5)
 
 #define I915_PARAM_HUC_STATUS   42
 
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 37/57] drm/i915: Fair low-latency scheduling

2021-02-01 Thread Chris Wilson
Stddev
  108-4.6326643 47.797855 -0.00069639128 2.116185   7.6764049

Each point is the relative percentage change in gem_wim's work-per-second
score [using the median result of 120 25s runs, the relative change
computed as (B/A - 1) * 100]; 0 being no change.

Reviewing the same workloads on Tigerlake,

+delta%--+
|   a|
|   a|
|   a|
|   aa a |
|    |
|    |
|aaa |
|aaa |
|aaa  a   a   aa  a a a  |
| aa a aa a a a  aa   a aaa   a a|
||___MA_||
++
N   Min   MaxMedian  Avg   Stddev
  108 -4.258712  46.830810.368531594.1415662 9.461689

The expectation is that by deliberately increasing the number of context
switches to improve fairness between clients, throughput will be
diminished. What we do see is are small fluctuations around no change,
with the median result being improved throughput. The dramatic
improvement is from reintroducing the improved no-semaphore boosting,
which avoids accidentally preventing scheduling of ready workloads due
to busy spinners.

We expect to see no change in single client workloads such as games,
though running multiple applications on a desktop should have reduced
jitter i.e. smoother input-output latency.

This scheduler is based on MuQSS by Dr Con Kolivas.

v2: More commentary, especially around where we reset the deadlines.

Testcase: igt/gem_exec_fair
Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c |   2 -
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   1 +
 drivers/gpu/drm/i915/gt/intel_engine_pm.c |   4 +-
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  14 -
 .../drm/i915/gt/intel_execlists_submission.c  | 232 +-
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |  30 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   5 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c|   1 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   4 -
 drivers/gpu/drm/i915/i915_priolist_types.h|   7 +-
 drivers/gpu/drm/i915/i915_request.c   |  19 +-
 drivers/gpu/drm/i915/i915_scheduler.c | 426 +-
 drivers/gpu/drm/i915/i915_scheduler.h |  16 +-
 drivers/gpu/drm/i915/i915_scheduler_types.h   |  23 +
 drivers/gpu/drm/i915/selftests/i915_request.c |   1 +
 .../gpu/drm/i915/selftests/i915_scheduler.c   | 136 ++
 16 files changed, 657 insertions(+), 264 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 9ff597ef5aca..f39f8049641c 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -572,8 +572,6 @@ void intel_engine_init_execlists(struct intel_engine_cs 
*engine)
memset(execlists->pending, 0, sizeof(execlists->pending));
execlists->active =
memset(execlists->inflight, 0, sizeof(execlists->inflight));
-
-   execlists->queue_priority_hint = INT_MIN;
 }
 
 static void cleanup_status_page(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c 
b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index fce86bd4b47f..f1811e79401e 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -203,6 +203,7 @@ static int __intel_engine_pulse(struct intel_engine_cs 
*engine)
if (IS_ERR(rq))
return PTR_ERR(rq);
 
+   rq->sched.deadline = 0;
__set_bit(I915_FENCE_FLAG_SENTINEL, >fence.flags);
 
heartbeat_commit(rq, );
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index 27d9d17b35cb..ef5064ea54e5 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -211,6 +211,7 @@ static bool switch_to_kernel_context(struct intel_engine_cs 
*engine)
i915_request_add_active_barriers(rq);
 
/* Install ourselves as a preemption barrier */
+   rq->sched.deadline = 0;
rq->sched.attr.priority = I915_PRIORITY_BARRIER;
if (likely(!__i915_request_commit(rq))) { /* engine shou

[Intel-gfx] [PATCH i-g-t] intel_gpu_top: Hide unused clients

2021-02-01 Thread Chris Wilson
Hide inactive clients by pressing 'i' (toggle in interactive mode).

Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
---
 tools/intel_gpu_top.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tools/intel_gpu_top.c b/tools/intel_gpu_top.c
index 60ff62d28..edf0dedac 100644
--- a/tools/intel_gpu_top.c
+++ b/tools/intel_gpu_top.c
@@ -1595,6 +1595,7 @@ print_imc(struct engines *engines, double t, int lines, 
int con_w, int con_h)
 }
 
 static bool class_view;
+static bool filter_idle;
 
 static int
 print_engines_header(struct engines *engines, double t,
@@ -1689,6 +1690,9 @@ print_engines_footer(struct engines *engines, double t,
pops->close_struct();
 
if (output_mode == INTERACTIVE) {
+   if (filter_idle && !c->total_runtime)
+   return;
+
if (lines++ < con_h)
printf("\n");
}
@@ -2115,6 +2119,9 @@ static void process_stdin(unsigned int timeout_us)
case 'q':
stop_top = true;
break;
+   case 'i':
+   filter_idle ^= true;
+   break;
case '1':
class_view ^= true;
break;
-- 
2.30.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] intel_gpu_top: Hide unused clients

2021-02-01 Thread Chris Wilson
Quoting Tvrtko Ursulin (2021-02-01 08:36:04)
> 
> On 01/02/2021 08:21, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2021-02-01 08:14:45)
> >>
> >> On 31/01/2021 03:11, Chris Wilson wrote:
> >>> Keep new clients hidden until they utilise the GPU.
> >>>
> >>> Signed-off-by: Chris Wilson 
> >>> Cc: Tvrtko Ursulin 
> >>> ---
> >>>tools/intel_gpu_top.c | 5 -
> >>>1 file changed, 4 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/tools/intel_gpu_top.c b/tools/intel_gpu_top.c
> >>> index 60ff62d28..66a8effa6 100644
> >>> --- a/tools/intel_gpu_top.c
> >>> +++ b/tools/intel_gpu_top.c
> >>> @@ -828,8 +828,11 @@ static void update_client(struct client *c, unsigned 
> >>> int pid, char *name)
> >>>c->last[i] = val[i];
> >>>}
> >>>
> >>> - c->samples++;
> >>> + if (!c->total_runtime)
> >>> + return;
> >>> +
> >>>c->status = ALIVE;
> >>> + c->samples++;
> >>>}
> >>>
> >>>static void
> >>>
> >>
> >> Not sure we need to do it at this level and not instead at presentation
> >> time.
> > 
> > My goal was just presentation :(
> 
> Something like this would be presentation time:
> 
> diff --git a/tools/intel_gpu_top.c b/tools/intel_gpu_top.c
> index 60ff62d28e5d..f7c5cc3bf19f 100644
> --- a/tools/intel_gpu_top.c
> +++ b/tools/intel_gpu_top.c
> @@ -1942,6 +1942,9 @@ print_client(struct client *c, struct engines 
> *engines, double t, int lines,
>  unsigned int i;
> 
>  if (output_mode == INTERACTIVE) {
> +   if (!c->total_runtime) /* make a key toggle? */
> +   return;
> +
>  lines++;
> 
>  printf("%6u %17s ", c->pid, c->print_name);
> 
> But it worries me a bit to do it by default.

I was about to argue then remembered top behaves the same (although a
process cannot exist without consuming at least some CPU resources,
whereas a client can never touch the GPU or allocate GPU memory).

But if we are borrow ideas for filtering the view from top... :)

For top, it's 'i' and there's o%CPU>x

> >> Plus, in default sort mode they would be at the end of the list,
> >> so behind the more active clients. Or you go into sort by id and they
> >> annoy you there?
> > 
> > No. I had a bunch of "Xorg" when launching steam which never became
> > anything. So I guess just a bunch of dlopen("libgl") spawing a bunch of
> > clients that we never used for anything more than gl[X]GetString, but
> > leaked the fd.
> 
> But they were at the end, so potentially even cut of if enough 
> interesting clients? Or you actually sorted by id? Or something else is 
> broken?

But they were there! Unwanted DRI3 fd, abandoned, left homeless before
being used. Even if they are used, there's still a window where we would
see "Xorg" become "realname" (granted there's still a window as we
sample pidname before totalruntime), it irks me.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] v5.11-rc5 BUG kmalloc-1k (Not tainted): Redzone overwritten

2021-02-01 Thread Chris Wilson
Quoting Jani Nikula (2021-01-28 13:23:48)
> 
> A number of our CI systems are hitting redzone overwritten errors after
> s2idle, with the errors introduced between v5.11-rc4 and v5.11-rc5. See
> snippet below, full logs for one affected machine at [1].
> 
> Known issue?

Fwiw, I think this should be fixed by

commit 08d60e5999540110576e7c1346d486220751b7f9
Author: John Ogness 
Date:   Sun Jan 24 21:33:28 2021 +0106

printk: fix string termination for record_print_text()

Commit f0e386ee0c0b ("printk: fix buffer overflow potential for
print_text()") added string termination in record_print_text().
However it used the wrong base pointer for adding the terminator.
This led to a 0-byte being written somewhere beyond the buffer.

Use the correct base pointer when adding the terminator.

Fixes: f0e386ee0c0b ("printk: fix buffer overflow potential for 
print_text()")
Reported-by: Sven Schnelle 
Signed-off-by: John Ogness 
Signed-off-by: Petr Mladek 
Link: 
https://lore.kernel.org/r/20210124202728.4718-1-john.ogn...@linutronix.de

din should be rolled forward, but there's yet another regression in rc6
breaking suspend on all machines.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


<    1   2   3   4   5   6   7   8   9   10   >