Currently Ironlake operates under the assumption that rpm awake (and its
error checking is disabled). As such, we have missed a few places where we
access registers without taking the rpm wakeref and thus trigger
warnings. intel_ips being one culprit.
As this involved adding a potentially
As we now employ a very heavy pm_qos around the punit access, we want to
minimise the number of synchronous requests by performing one for the
whole punit sequence rather than around individual accesses. The
sideband lock is used for this, so push the pm_qos into the sideband
lock acquisition and
Avoid looking at the magical engines[RCS] to decide if the HW and driver
supports logical contexts, and instead record that knowledge during
initialisation.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_drv.h | 1 +
To no surprise (since we've flip-flopped over the use of PIN_HIGH a few
times), doing a search by address over a pathologically fragmented
address space is exceeding slow. To protect ourselves from nearly
unbounded latency (think searching a million holes while under
struct_mutex), limit the
Lift the sideband acquisition for vlv_punit_read and vlv_punit_write
into their callers, so that we can lock the sideband once for a sequence
of operations, rather than perform the heavyweight acquisition on each
request.
Signed-off-by: Chris Wilson
---
We now have two locks for sideband access. The general one covering
sideband access across all generation, sb_lock, and a specific one
covering sideband access via the punit on vlv/chv. After lifting the
sb_lock around the punit into the callers, the pcu_lock is now redudant
and can be separated
Ironlake does support being able to saving and reloading context specific
registers between contexts, providing isolation of the basic GPU state
(as programmable by userspace). This allows userspace to assume that the
GPU retains their state from one batch to the next, minimising the
amount of
We want to be able to reset the GPU from inside a timer callback
(hardirq context). One step requires us to copy the default context
state over to the guilty context, which means we need to plan in advance
to have that object accessible from within an atomic context. The atomic
context prevents us
In the next patch, we want to store the intel_context pointer inside
i915_request, as it is frequently access via a convoluted dance when
submitting the request to hw. Having two context pointers inside
i915_request leads to confusion so first rename the existing
i915_gem_context pointer to
Since all the RPS handling code is in intel_gt_pm, move the irq handlers
there as well so that it all contained within one file.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_drv.h | 4 -
drivers/gpu/drm/i915/i915_irq.c | 314
As we want to be able to call i915_reset_engine and co from a softirq or
timer context, we need to be irqsafe at all timers. So we have to forgo
the simple spin_lock_irq for the full spin_lock_irqsave.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_gem.c | 6
Currently the purgeable objects, I915_MADV_DONTNEED, as mixed in the
normal bound/unbound lists. Every shrinker pass starts with an attempt
to purge from this set of unneeded objects, which entails us doing a
walk over both lists looking for any candidates. If there are none, and
since we are
These routines are identical except in the nature of the value parameter.
For writes it is a pure in-param, but for a read, we need an out-param.
Since they differ in a single line, merge the two routines into one.
Signed-off-by: Chris Wilson
Reviewed-by: Imre Deak
Currently, we try to report to the shrinker the precise number of
objects (pages) that are available to be reaped at this moment. This
requires searching all objects with allocated pages to see if they
fulfill the search criteria, and this count is performed quite
frequently. (The shrinker tries
The choice of preemption timeout is determined by the context from which
we trigger the preemption, as such allow the caller to specify the
desired timeout.
Effectively the other choice would be to use the shortest timeout along
the dependency chain. However, given that we would have already
Since intel_sideband_read and intel_sideband_write differ by only a
couple of lines (depending on whether we feed the value in or out),
merge the two into a single common accessor.
v2: Restore vlv_flisdsi_read() lost during rebasing.
Signed-off-by: Chris Wilson
---
With the vlv sideband fixed to avoid sleeping while we talk to the
punit, the system should be much more stable and be able to utilise the
punit without risk.
This reverts commit 6067a27d1f01 ("drm/i915: Avoid tweaking evaluation
thresholds on Baytrail v3")
References: 6067a27d1f01 ("drm/i915:
When circumstances allow, trying resetting the engine directly from the
preemption timeout handler. As this is softirq context, we have to be
careful both not to sleep and not to spin on anything we may be
interrupting (e.g. the submission tasklet).
Signed-off-by: Chris Wilson
As all backends implement the same pin_count mechanism and do a
dec-and-test as their first step, pull that into the common
intel_context_unpin(). This also pulls into the caller, eliminating the
indirect call in the usual steady state case. The intel_context_pin()
side is a little more
As we keep an rbtree of available holes sorted by their size, we can
very easily determine if there is any hole large enough that might
satisfy the allocation request. This helps when dealing with a highly
fragmented address space and a request for a search by address.
To cache the largest size,
Having abandoned the split approach of acking then handling the GT irqs
(sacrificed to use the interrupt handler to guaranteed exclusive access
to the irq data), pull the two routines into one to let the compiler
eliminate the redundant storage.
Signed-off-by: Chris Wilson
In order to support engine reset from irq (timer) context, we need to be
able to re-initialise the breadcrumbs. So we need to promote the plain
spin_lock_irq to a safe spin_lock_irqsave.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/intel_breadcrumbs.c | 5 +++--
To be useful later, enable intel_engine_dump() to be called from irq
context (i.e. using saving and restoring irq start rather than assuming
we enter with irqs enabled).
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/intel_engine_cs.c | 11 +++
1 file
Following the removal of the last workarounds, the only CSB mmio access
is for the old vGPU interface. The mmio registers presented by vGPU do
not require forcewake and can be treated as ordinary volatile memory,
i.e. they behave just like the HWSP access just at a different location.
We can
Back in commit 27af5eea54d1 ("drm/i915: Move execlists irq handler to a
bottom half"), we came to the conclusion that running our CSB processing
and ELSP submission from inside the irq handler was a bad idea. A really
bad idea as we could impose nearly 1s latency on other users of the
system, on
Install a timer when trying to preempt on behalf of an important
context such that if the active context does not honour the preemption
request within the desired timeout, then we reset the GPU to allow the
important context to run.
v2: Install the timer on scheduling the preempt request; long
Store whether or not we need to kick the guc's execlists emulation on
the engine itself to avoid chasing the device info.
gen8_cs_irq_handler 512 428 -84
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_irq.c | 4
Use a liberal timeout of 20ms to ensure that the rendering for an
interactive pageflip is started in a timely fashion, and that
user interaction is not blocked by GPU, or CPU, hogs. This is at the cost
of resetting whoever was blocking the preemption, likely leading to that
context/process being
In the next patch, we will process the CSB events directly from the CS
interrupt handler, being called for each interrupt. Hence, we will no
longer have the need for a loop until the has-interrupt bit is clear,
and in the meantime can remove that small optimisation.
Signed-off-by: Chris Wilson
The HWACK bit more generically solves the problem of resubmitting ESLP
while the hardware is still processing the current ELSP write. We no
longer need to check port[0].count itself.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/intel_lrc.c | 2 --
1 file
In the next patch, we will begin processing the CSB from inside the
interrupt handler. This means that updating the execlists->port[] will
no longer be locked by the tasklet but by the engine->timeline.lock
instead. Pull dequeue and submit under the same lock for protection.
(An alternative,
As we are splitting processing the CSB events from submitting the ELSP,
we also need to duplicate the check that we hold a device wakeref for our
hardware access to the disjoint locations.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/intel_lrc.c | 26
One usecase would be to couple in via EGL_NV_context_priority_realtime
in userspace to provide some QoS guarantees in conjunction with setting
the highest priority.
Signed-off-by: Chris Wilson
---
drivers/gpu/drm/i915/i915_gem_context.c| 22 ++
Move the knowledge about resetting the current context tracking on the
engine from inside i915_gem_context.c into intel_engine_cs.c
Signed-off-by: Chris Wilson
Reviewed-by: Tvrtko Ursulin
---
drivers/gpu/drm/i915/i915_gem_context.c | 12
As we now only use the cached HWSP access to read the CSB buffer and no
longer use any forcewaked mmio, processing the CSB is fast and possible
to do so directly from inside the CS interrupt handler.
We have to rearrange the irq handler slightly as we wish to preserve the
single threaded access
We can avoid the mmio read of the CSB pointers after reset based on the
knowledge that the HW always start writing at entry 0 in the CSB buffer.
We need to reset our CSB head tracking after GPU reset (and on
sanitization after resume) so that we are expecting to read from entry
0.
Signed-off-by:
To ease the frequent and ugly pointer dance of
>gem_context->engine[request->engine->id] during request
submission, store that pointer as request->hw_context. One major
advantage that we will exploit later is that this decouples the logical
context state from the engine itself.
v2: Set
As we reset the GPU on suspend/resume, we also do need to reset the
engine state tracking so call into the engine backends. This is
especially important so that we can also sanitize the state tracking
across resume.
Signed-off-by: Chris Wilson
---
Searching for an available hole by address is slow, as there no
guarantee that a hole will be available and so we must walk over all
nodes in the rbtree before we determine the search was futile. In many
cases, the caller doesn't strictly care for the highest available hole
and was just
> -Original Message-
> From: Usyskin, Alexander
> Sent: Thursday, May 17, 2018 11:29 AM
> To: C, Ramalingam ; Shankar, Uma
> ; intel-gfx@lists.freedesktop.org; dri-
> de...@lists.freedesktop.org; seanp...@chromium.org; dan...@ffwll.ch;
>
201 - 240 of 240 matches
Mail list logo