Re: [Intel-gfx] [PATCH] drm/i915: Do NOT skip the first 4k of stolen memory for pre-allocated buffers

2018-04-07 Thread Hans de Goede

Hi,

On 06-04-18 18:06, Ville Syrjälä wrote:

On Thu, Apr 05, 2018 at 01:37:31PM +0200, Hans de Goede wrote:

Hi,

On 04-04-18 22:49, Ville Syrjälä wrote:

On Wed, Apr 04, 2018 at 10:06:29PM +0200, Hans de Goede wrote:

Hi,

On 04-04-18 17:50, Ville Syrjälä wrote:

On Fri, Mar 30, 2018 at 04:26:53PM +0200, Hans de Goede wrote:

Hi,

On 30-03-18 15:25, Hans de Goede wrote:

Hi,

On 30-03-18 14:44, Chris Wilson wrote:

Quoting Hans de Goede (2018-03-30 13:37:40)

Hi,

On 30-03-18 14:30, Chris Wilson wrote:

Quoting Hans de Goede (2018-03-30 13:27:15)

Before this commit the WaSkipStolenMemoryFirstPage workaround code was
skipping the first 4k by passing 4096 as start of the address range passed
to drm_mm_init(). This means that calling drm_mm_reserve_node() to try and
reserve the firmware framebuffer so that we can inherit it would always
fail, as the firmware framebuffer starts at address 0.

Commit d43537610470 ("drm/i915: skip the first 4k of stolen memory on
everything >= gen8") says in its commit message: "This is confirmed to fix
Skylake screen flickering issues (probably caused by the fact that we
initialized a ring in the first page of stolen, but I didn't 100% confirm
this theory)."

Which suggests that it is safe to use the first page for a linear
framebuffer as the firmware is doing.

This commit always passes 0 as start to drm_mm_init() and works around
WaSkipStolenMemoryFirstPage in i915_gem_stolen_insert_node_in_range()
by insuring the start address passed by to drm_mm_insert_node_in_range()
is always 4k or more. All entry points to i915_gem_stolen.c go through
i915_gem_stolen_insert_node_in_range(), so that any newly allocated
objects such as ring-buffers will not be allocated in the first 4k.

The one exception is i915_gem_object_create_stolen_for_preallocated()
which directly calls drm_mm_reserve_node() which now will be able to
use the first 4k.

This fixes the i915 driver no longer being able to inherit the firmware
framebuffer on gen8+, which fixes the video output changing from the
vendor logo to a black screen as soon as the i915 driver is loaded
(on systems without fbcon).


We've been told by the HW guys not to use the first page. (That's my
understanding from every time this gets questioned.)


Yet the GOP is happily using the first page. I think we may need to make
a difference here between the GPU not using the first page and the
display engine/pipeline not using the first page. Note that my patch
only influences the inheriting of the initial framebuffer as allocated
by the GOP. It does not influence any other allocations from the
reserved range, those will still all avoid the first page.

Without this patch fastboot / flickerfree support is essentially broken
on any gen8+ hardware given that one of the goals of atomic is to be
able to do flickerfree transitions I think that this warrants a closer
look then just simply saying never use the first page.


The concern is what else (i.e. nothing that we allocated ourselves) that
may be in the first page...


Given that the GOP has put its framebuffer there at least at boot there
is nothing there, otherwise it would show up on the display.

We have a whole bunch of code to inherit the BIOS fb in intel_display.c
and AFAIK that code is there because this inheriting the BIOS fb is
deemed an important feature. So I'm not happy at all with the handwavy
best to not touch it answer I'm getting to this patch.

Unless there are some clear answer from the hardware folks which specifically
say we cannot put a framebuffer there (and then why is the GOP doing it?)
then I believe that the best way forward here is to get various people to
test with this patch and the best way to do that is probably to put it
in next. Note I deliberately did not add a Cc stable.


To elaborate on this, the excluding of the first 4k of the stolen memory
region causes intel_alloc_initial_plane_obj() from intel_display.c to fail,
which in turn causes intel_find_initial_plane_obj() to call
intel_plane_disable_noatomic(intel_crtc, intel_plane); which temporarily
completely turns off the display which leads to a very ugly flickering
of the display at boot (as well as replacing the vendor logo with a
black screen).

I think we can all agree that this behavior is undesirable and even a
regression in behavior caused by the fix to exclude the first 4k.

Chris, if my patch is not an acceptable way to fix this, then how do you
suggest that we fix this?

Digging a bit deeper I found this:

https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-kbl-vol16-workarounds.pdf

Which says:

"WaSkipStolenMemoryFirstPage:

WA to skip the first page of stolen
memory due to sporadic HW write on *CS Idle"

And also about FBC:

"First line of FBC getting corrupted when FBC
compressed frame buffer offset is programmed to
zero. Command streamers are doing flush writes to
base of stolen.
WA: New restriction to program FBC compressed
frame buffer offset to at least 4KB."

So using the 

Re: [Intel-gfx] ✗ Fi.CI.IGT: failure for series starting with [1/2] drm/i915: Treat i915_reset_engine() as guilty until proven innocent

2018-04-07 Thread Chris Wilson
Quoting Patchwork (2018-04-07 04:57:12)
> == Series Details ==
> 
> Series: series starting with [1/2] drm/i915: Treat i915_reset_engine() as 
> guilty until proven innocent
> URL   : https://patchwork.freedesktop.org/series/41308/
> State : failure
> 
> == Summary ==
> 
>  Possible new issues:
> 
> Test kms_atomic_transition:
> Subgroup plane-all-transition-fencing:
> fail   -> PASS   (shard-snb)
> Test kms_cursor_crc:
> Subgroup cursor-256x256-dpms:
> fail   -> PASS   (shard-snb)
> Subgroup cursor-256x256-offscreen:
> fail   -> PASS   (shard-snb)
> Test kms_cursor_legacy:
> Subgroup long-nonblocking-modeset-vs-cursor-atomic:
> pass   -> FAIL   (shard-snb)
> Test kms_draw_crc:
> Subgroup draw-method-xrgb2101010-mmap-wc-untiled:
> pass   -> SKIP   (shard-snb)
> Test kms_frontbuffer_tracking:
> Subgroup fbc-1p-offscren-pri-shrfb-draw-pwrite:
> fail   -> PASS   (shard-snb)
> Subgroup fbc-1p-primscrn-shrfb-pgflip-blt:
> fail   -> PASS   (shard-snb)
> Subgroup fbc-2p-scndscrn-spr-indfb-onoff:
> skip   -> FAIL   (shard-snb)
> Subgroup fbc-modesetfrombusy:
> fail   -> PASS   (shard-snb)
> Subgroup fbcpsr-1p-offscren-pri-indfb-draw-blt:
> fail   -> SKIP   (shard-snb)
> Subgroup fbcpsr-1p-primscrn-indfb-pgflip-blt:
> skip   -> FAIL   (shard-snb)
> Subgroup fbcpsr-1p-primscrn-pri-indfb-draw-mmap-gtt:
> fail   -> SKIP   (shard-snb)
> Subgroup fbcpsr-1p-primscrn-pri-shrfb-draw-render:
> fail   -> SKIP   (shard-snb)
> Subgroup fbcpsr-2p-scndscrn-spr-indfb-fullscreen:
> fail   -> SKIP   (shard-snb)
> Subgroup psr-1p-primscrn-spr-indfb-onoff:
> fail   -> SKIP   (shard-snb)
> Subgroup psr-2p-scndscrn-cur-indfb-draw-mmap-wc:
> fail   -> SKIP   (shard-snb)
> Test kms_mmap_write_crc:
> fail   -> PASS   (shard-snb)
> Test kms_vblank:
> Subgroup pipe-b-query-forked-busy-hang:
> fail   -> PASS   (shard-snb)

And drv_hangman passed, along with gem_eio, so it seems like it didn't
futz the hangchecker or i915_wedged too badly.

Thanks for the review, pushed.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/psr: Chase psr.enabled only under the psr.lock

2018-04-07 Thread Chris Wilson
Quoting Rodrigo Vivi (2018-04-06 23:18:16)
> On Fri, Apr 06, 2018 at 11:12:27AM -0700, Souza, Jose wrote:
> > On Thu, 2018-04-05 at 12:49 +0100, Chris Wilson wrote:
> > > +   struct drm_crtc *crtc =
> > > +   dp_to_dig_port(intel_dp)->base.base.crtc;
> 
> I'm afraid that the issue is this pointer here. So this will only mask
> the issue.
> 
> Should we maybe stash the pipe? :/

It's not that bad. pipe cannot change until after psr_disable is called,
right? And psr_disable ensures that this worker is flushed. The current
problem is just the coordination of cancelling the worker, where we may
set psr.enabled to NULL right before the worker grabs it and
dereferences it.

So if we lock until we have the pipe, we know that dereference chain is
valid, and we know that psr_disable() cannot complete until we complete
the wait. So the pipe remains valid until we return (so long as the pipe
exists when we start).
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx