[Intel-gfx] linux-firmware-i915 pull request

2015-06-19 Thread Vivi, Rodrigo
Hi,

Please consider pulling i915 to linux-firmware.git and let me know if
there is anything else I should do differently on the pull request.

Besides what I had requested already there is a new minor version of dmc
available.

Thanks in advance,
Rodrigo.

The following changes since commit
3161bfa479d5e9ed4f46b57df9bcecbbc4f8eb3c:

  linux-firmware: Update firmware patch for Intel Bluetooth 7260 (B3/B4)
(2015-05-13 11:12:48 -0400)

are available in the git repository at:

  git://people.freedesktop.org/~vivijim/linux-firmware-i915 master

for you to fetch changes up to a750f4ee10f962640c479aca184fe65f83956295:

  linux-firmware: New minor DMC release for Skylake - ver1_18
(2015-06-19 17:29:18 -0700)


Rodrigo Vivi (5):
  linux-firmware: Add i915 DMC firmware
  linux-firmware: Add i915 GuC firmware
  linux-firmware: Add i915 DMC firmware for Broxton
  linux-firmware: New minor DMC release for Skylake.
  linux-firmware: New minor DMC release for Skylake - ver1_18

 LICENSE.i915   |  39
+++
 WHENCE |  23 +++
 i915/bxt_dmc_ver1.bin  |   1 +
 i915/bxt_dmc_ver1_04.bin   | Bin 0 -> 5872 bytes
 i915/skl_dmc_ver1.bin  |   1 +
 i915/skl_dmc_ver1_04.bin   | Bin 0 -> 8048 bytes
 i915/skl_dmc_ver1_16.bin   | Bin 0 -> 7892 bytes
 i915/skl_dmc_ver1_18.bin   | Bin 0 -> 7892 bytes
 i915/skl_guc_ver1.bin  |   1 +
 i915/skl_guc_ver1_1059.bin | Bin 0 -> 109636 bytes
 10 files changed, 65 insertions(+)
 create mode 100644 LICENSE.i915
 create mode 12 i915/bxt_dmc_ver1.bin
 create mode 100644 i915/bxt_dmc_ver1_04.bin
 create mode 12 i915/skl_dmc_ver1.bin
 create mode 100644 i915/skl_dmc_ver1_04.bin
 create mode 100644 i915/skl_dmc_ver1_16.bin
 create mode 100644 i915/skl_dmc_ver1_18.bin
 create mode 12 i915/skl_guc_ver1.bin
 create mode 100644 i915/skl_guc_ver1_1059.bin

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/2] Set alignment value in drm_intel_add_validate_buffer()

2015-06-19 Thread Anuj Phogat
+Ben

On Fri, Apr 10, 2015 at 5:20 PM, Anuj Phogat  wrote:
> Signed-off-by: Anuj Phogat 
> ---
>  intel/intel_bufmgr_gem.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
> index 51d87ae..92701a5 100644
> --- a/intel/intel_bufmgr_gem.c
> +++ b/intel/intel_bufmgr_gem.c
> @@ -459,7 +459,7 @@ drm_intel_add_validate_buffer(drm_intel_bo *bo)
> bufmgr_gem->exec_objects[index].handle = bo_gem->gem_handle;
> bufmgr_gem->exec_objects[index].relocation_count = 
> bo_gem->reloc_count;
> bufmgr_gem->exec_objects[index].relocs_ptr = (uintptr_t) 
> bo_gem->relocs;
> -   bufmgr_gem->exec_objects[index].alignment = 0;
> +   bufmgr_gem->exec_objects[index].alignment = bo->align;
> bufmgr_gem->exec_objects[index].offset = 0;
> bufmgr_gem->exec_bos[index] = bo;
> bufmgr_gem->exec_count++;
> @@ -501,7 +501,7 @@ drm_intel_add_validate_buffer2(drm_intel_bo *bo, int 
> need_fence)
> bufmgr_gem->exec2_objects[index].handle = bo_gem->gem_handle;
> bufmgr_gem->exec2_objects[index].relocation_count = 
> bo_gem->reloc_count;
> bufmgr_gem->exec2_objects[index].relocs_ptr = 
> (uintptr_t)bo_gem->relocs;
> -   bufmgr_gem->exec2_objects[index].alignment = 0;
> +   bufmgr_gem->exec2_objects[index].alignment = bo->align;
> bufmgr_gem->exec2_objects[index].offset = 0;
> bufmgr_gem->exec_bos[index] = bo;
> bufmgr_gem->exec2_objects[index].flags = 0;
> --
> 2.3.4
>
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/2] i965/gen9: Pass alignment as function parameter in drm_intel_gem_bo_alloc_internal()

2015-06-19 Thread Anuj Phogat
+Ben.

On Fri, Apr 10, 2015 at 5:20 PM, Anuj Phogat  wrote:
> and use it to initialize the align variable in drm_intel_bo.
>
> In case of YF/YS tiled buffers libdrm need not know about the tiling
> format because these buffers don't have hardware support to be tiled
> or detiled through a fenced region. But, libdrm still need to know
> about buffer alignment restrictions because kernel uses it when
> resolving the relocation.
>
> Mesa uses drm_intel_gem_bo_alloc_for_render() to allocate Yf/Ys buffers.
> So, use the passed alignment value in this function. Note that we continue
> ignoring the alignment value passed to drm_intel_gem_bo_alloc() to follow
> the previous behavior.
>
> Cc: Kristian Høgsberg 
> Cc: Damien Lespiau 
> Cc: Daniel Vetter 
> Signed-off-by: Anuj Phogat 
> ---
>  intel/intel_bufmgr_gem.c | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
> index 5a67f53..51d87ae 100644
> --- a/intel/intel_bufmgr_gem.c
> +++ b/intel/intel_bufmgr_gem.c
> @@ -655,7 +655,8 @@ drm_intel_gem_bo_alloc_internal(drm_intel_bufmgr *bufmgr,
> unsigned long size,
> unsigned long flags,
> uint32_t tiling_mode,
> -   unsigned long stride)
> +   unsigned long stride,
> +   unsigned int alignment)
>  {
> drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bufmgr;
> drm_intel_bo_gem *bo_gem;
> @@ -754,6 +755,7 @@ retry:
> return NULL;
> }
> bo_gem->bo.bufmgr = bufmgr;
> +   bo_gem->bo.align = alignment;
>
> bo_gem->tiling_mode = I915_TILING_NONE;
> bo_gem->swizzle_mode = I915_BIT_6_SWIZZLE_NONE;
> @@ -797,7 +799,8 @@ drm_intel_gem_bo_alloc_for_render(drm_intel_bufmgr 
> *bufmgr,
>  {
> return drm_intel_gem_bo_alloc_internal(bufmgr, name, size,
>BO_ALLOC_FOR_RENDER,
> -  I915_TILING_NONE, 0);
> +  I915_TILING_NONE, 0,
> +  alignment);
>  }
>
>  static drm_intel_bo *
> @@ -807,7 +810,7 @@ drm_intel_gem_bo_alloc(drm_intel_bufmgr *bufmgr,
>unsigned int alignment)
>  {
> return drm_intel_gem_bo_alloc_internal(bufmgr, name, size, 0,
> -  I915_TILING_NONE, 0);
> +  I915_TILING_NONE, 0, 0);
>  }
>
>  static drm_intel_bo *
> @@ -859,7 +862,7 @@ drm_intel_gem_bo_alloc_tiled(drm_intel_bufmgr *bufmgr, 
> const char *name,
> stride = 0;
>
> return drm_intel_gem_bo_alloc_internal(bufmgr, name, size, flags,
> -  tiling, stride);
> +  tiling, stride, 0);
>  }
>
>  static drm_intel_bo *
> --
> 2.3.4
>
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 06/10] drm: Avoid atomic commit path for CRTC property (Gamma)

2015-06-19 Thread Matt Roper
On Mon, Jun 08, 2015 at 05:58:23PM -0700, Matt Roper wrote:
> On Sat, Jun 06, 2015 at 05:34:45PM +0530, Sharma, Shashank wrote:
> > Regards
> > Shashank
> > 
> > On 6/6/2015 6:31 AM, Matt Roper wrote:
> > >On Thu, Jun 04, 2015 at 07:12:37PM +0530, Kausal Malladi wrote:
> > >>From: Kausal Malladi 
> > >>
> > >>The atomic CRTC set infrastructure is not available yet. So, the atomic
> > >>check path throws error for any non-plane property.
> > >>
> > >>This patch adds a diversion to avoid commit path for DRM atomic set Gamma
> > >>property, until atomic CRTC infrastructure is ready.
> > >
> > >This doesn't look right, but I don't quite understand what you're trying
> > >to do from the commit message.
> > >
> > >This function is what will implement legacy set_property ioctl's of a
> > >CRTC on an atomic-based driver.  The idea is that when the ioctl is
> > >issued, we should get (i.e., make a duplicate of) the current CRTC
> > >state, change some of the values in that state (which is what the
> > >.atomic_set_property function takes care of), and then submit that
> > >modified state to the driver for checking and hw-programming.
> > >
> > >Okay, I just took a quick peek ahead in your patch set and I think I
> > >understand the confusion now...it looks like you're trying to actually
> > >perform the full hardware update in the .atomic_set_property call chain,
> > >which isn't what we want to be doing in an atomic driver.
> > >.atomic_set_property() is just a matter of updating the state structures
> > >to reflect the changes you *want* to make (but not actually performing
> > >those changes right away).  It isn't until drm_atomic_commit() gets
> > >called that we validate the new state structure and then write it to the
> > >hardware if it looks okay.
> > >
> > >Keep in mind that the overall goal behind atomic is that we want to be
> > >able to supply several items to be updated (e.g., mode, CSC, gamma,
> > >etc.) via the atomic ioctl and then have them all accepted (and
> > >programmed) by the driver, or all rejected (so none get programmed) if
> > >any of them are invalid.  Also, if the collection of properties that
> > >you're updating fall within the "nuclear pageflip" subset of
> > >functionality, you'll also get a guarantee that those items all get
> > >updated within the same vblank; updates that straddle a vblank could
> > >cause unwanted flickering or other artifacts.  The helper function
> > >you're updating here only happens to update a single state value at a
> > >time, but the same .atomic_set_property() is also used by the atomic
> > >ioctl, so we need to make sure it's following the expected design.
> > >
> > >Finally, keep in mind that the function you're updating here is a DRM
> > >core function.  Even though i915 isn't quite ready for full atomic yet
> > >and might have a bit of brain damage in areas, other vendors' drivers do
> > >have complete atomic modesetting support (I think?), so adding
> > >i915-specific workarounds like this to the core function could actually
> > >hamper them.
> > >
> > >
> > >Matt
> > >
> > I understood this point. But in this case, we will be blocked until
> > set CRTC implementation comes in. Can you suggest a better way to
> > handle this without getting blocked for set_crtc() ?
> 
> Is the functionality you need still lacking with Maarten Lankhorst's
> "convert to atomic, part 2" series that recently landed upstream?  If
> that series isn't sufficient, them I'm pretty sure that his "part 3"
> series on the mailing list will fix it; that will hopefully be ready to
> merge as soon as I get a chance to do a review of it.
> 
> I'll add Maarten to the Cc here in case he wants to comment...
> 
> 
> Matt

Kausal/Shashank, have you guys had a chance to check out Maarten's
latest i915 atomic work?  His "part 2" series is already merged and
"part 3" should be merged soon and those really help clean up a lot of
the Intel-specific pain points caused by the atomic transition.  I think
we need to make sure you can handle properties properly in an atomic
manner, so if you find you're still blocked on the current code, we
should look into how to address that so that we don't have to bring any
temporary Intel hackiness into the shared DRM core code.


Matt

> 
> > >>
> > >>Signed-off-by: Shashank Sharma 
> > >>Signed-off-by: Kausal Malladi 
> > >>---
> > >>  drivers/gpu/drm/drm_atomic_helper.c | 17 ++---
> > >>  1 file changed, 14 insertions(+), 3 deletions(-)
> > >>
> > >>diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
> > >>b/drivers/gpu/drm/drm_atomic_helper.c
> > >>index a900858..37dba55 100644
> > >>--- a/drivers/gpu/drm/drm_atomic_helper.c
> > >>+++ b/drivers/gpu/drm/drm_atomic_helper.c
> > >>@@ -1696,6 +1696,8 @@ drm_atomic_helper_crtc_set_property(struct drm_crtc 
> > >>*crtc,
> > >>  {
> > >>  struct drm_atomic_state *state;
> > >>  struct drm_crtc_state *crtc_state;
> > >>+ struct drm_device *dev = crtc->dev;
> > >>+ struct drm_mode_config

Re: [Intel-gfx] [PATCH v3 00/19] Convert to atomic, part 3.

2015-06-19 Thread Matt Roper
On Mon, Jun 15, 2015 at 12:33:37PM +0200, Maarten Lankhorst wrote:
> Requisites:
> - "[PATCH] drm/atomic: pass old crtc state to atomic_begin/flush."
> 
> This patch series converts plane updates and cdclk updates to atomic,
> but still doesn't touch the hw readout code, which was regressing a lot.
> 
> The fixes in this series are needed to support proper hw readout, and can
> be applied on top of topic/atomic-conversion.
> 
> In particular, this fixes the following bugs:
> - https://bugs.freedesktop.org/show_bug.cgi?id=90874
>   Needs atomic CDCLK as part of state, before any plane checks,
>   or scalers will not work correctly.
> - https://bugs.freedesktop.org/show_bug.cgi?id=90868
>   It shows a problem with plane visibility on resume.
>   This is fixed by calcing plane states correctly across modeset.
>   There's also a problem with DPLL 0 failing to lock, I hope that's
>   fixed by cdclk changes, but it might have been a bug in the reverted
>   atomic hw readout patch too.

I've spent the last week looking over these and I don't see any major
functional problems so I think these are probably ready for merging.  A
couple patches got into areas of the code I'm not super familiar with,
but I've also done quite a bit of testing of this series (on IVB) and
haven't run into issues, so that increases my confidence.

I think there was one incorrectly identified function parameter I called
out on one of the early scaler patches, but with a sensible parameter
rename there, you can consider these

Reviewed-by: Matt Roper 
Tested-by(IVB): Matt Roper 


Matt

> 
> Maarten Lankhorst (19):
>   drm/i915: Use crtc state in intel_modeset_pipe_config
>   drm/i915: Clean up intel_atomic_setup_scalers slightly.
>   drm/i915: Add a simple atomic crtc check function, v2.
>   drm/i915: Move scaler setup to check crtc function, v2.
>   drm/i915: Assign a new pll from the crtc check function, v2.
>   drm/i915: Split skl_update_scaler, v3.
>   drm/i915: Split plane updates of crtc->atomic into a helper, v2.
>   drm/i915: clean up plane commit functions
>   drm/i915: clean up atomic plane check functions, v2.
>   drm/i915: remove force argument from disable_plane
>   drm/i915: move detaching scalers to begin_crtc_commit, v2.
>   drm/i915: Move crtc commit updates to separate functions.
>   drm/i915: Do not run most checks when there's no modeset.
>   drm/i915: Handle disabling planes better, v2.
>   drm/i915: atomic plane updates in a nutshell
>   drm/i915: Update less state during modeset.
>   drm/i915: Make setting color key atomic.
>   drm/i915: Remove transitional references from
> intel_plane_atomic_check.
>   drm/i915: Make cdclk part of the atomic state.
> 
>  drivers/gpu/drm/i915/i915_drv.h   |3 +-
>  drivers/gpu/drm/i915/intel_atomic.c   |   47 +-
>  drivers/gpu/drm/i915/intel_atomic_plane.c |   41 +-
>  drivers/gpu/drm/i915/intel_display.c  | 1480 
> +++--
>  drivers/gpu/drm/i915/intel_dp.c   |2 +-
>  drivers/gpu/drm/i915/intel_drv.h  |   27 +-
>  drivers/gpu/drm/i915/intel_sprite.c   |  170 ++--
>  7 files changed, 874 insertions(+), 896 deletions(-)
> 
> -- 
> 2.1.0
> 
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Matt Roper
Graphics Software Engineer
IoTG Platform Enabling & Development
Intel Corporation
(916) 356-2795
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/2] i965/gen9: Pass alignment as function parameter in drm_intel_gem_bo_alloc_internal()

2015-06-19 Thread Anuj Phogat
On Wed, Jun 10, 2015 at 1:47 AM, Damien Lespiau
 wrote:
> On Tue, Jun 09, 2015 at 02:59:33PM -0700, Anuj Phogat wrote:
>> This patch is on the list for 8 weeks now. Please take a look so I can push
>> it upstream.
>
> Could I suggest you nominate a mesa team member working on SKL as well?
> that would be the ideal match I believe.
I will request someone in Mesa time to take a look. But, I still expect
"looks fine/incorrect", "acked/nacked", "I don't know this code" comments by
people who reviewed the v1.

+Ben: Since he's reviewing my mesa patch:
"i965/gen9: Allocate YF/YS tiled buffer objects"
>
> --
> Damien
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/5] drm/i915: PSR: Remove Low Power HW tracking mask.

2015-06-19 Thread Rodrigo Vivi
On Fri, Jun 19, 2015 at 1:32 PM, Daniel Vetter  wrote:
> On Thu, Jun 18, 2015 at 8:43 PM, Rodrigo Vivi  wrote:
>> By Spec we should just mask memup and hotplug detection
>> for hardware tracking cases. However we always masked
>> LPSP that is for low power tracking support because
>> without it PSR was constantly exiting and never really
>> getting activated.
>>
>> Now with runtime PM being enabled by default Matthew
>> reported that he was facing missed screen updates. So
>> let's remove this undesirable mask and let HW tracking
>> take care of cases like this were power saving features
>> are also running.
>>
>> WARNING: With this patch PSR depends on Audio and GPU
>> runtime PM to be properly enabled, working on "auto".
>> If either audio runtime PM or gpu runtime pm are not
>> properly set PSR will constant Exit and Performance
>> Counter will be 0.
>>
>> But the best thing of this patch is that with one more
>> HW tracking working the risks of missed blank screen
>> are minimized at most.
>>
>> This affects just core platforms where PSR exit are also
>> helped by HW tracking: Haswell, Broadwell and Skylake
>> for now.
>>
>> Cc: Daniel Vetter 
>> Cc: Matthew Garrett 
>> Signed-off-by: Rodrigo Vivi 
>
> I guess I don't really understand your description, but it does sound
> strange ... runtime pm enabling from my patch is only about D3, power
> well changes are still done. And as long as we have anything enabled
> (even with PSR) we'll prevent D3.
>
> So the only thing I can think of is that somehow D3 wreaks something
> in the PSR setup and that's causing issues. Unfortunately I have no
> idea about our hw details around PSR and D3, so no idea. Maybe Art has
> some?

I don't know this relation as well. When I found this LPSP maks that
made PSR working it was totally by forcing all masks and start
removing one by one up to the point that this Low Power something did
the trick. At that time Artur had told about power well handling
enabled, but now after Mathew reported that issue I noticed this Low
power flag was also related to runtime PM...

>
>> ---
>>  drivers/gpu/drm/i915/intel_psr.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_psr.c 
>> b/drivers/gpu/drm/i915/intel_psr.c
>> index 5ee0fa5..6549d58 100644
>> --- a/drivers/gpu/drm/i915/intel_psr.c
>> +++ b/drivers/gpu/drm/i915/intel_psr.c
>> @@ -400,7 +400,7 @@ void intel_psr_enable(struct intel_dp *intel_dp)
>>
>> /* Avoid continuous PSR exit by masking memup and hpd */
>
> Need to adjust the comment.

not actually... memup and hpd are still there... comment matched spec
but never matched the code... now it matches.

>
>> I915_WRITE(EDP_PSR_DEBUG_CTL(dev), EDP_PSR_DEBUG_MASK_MEMUP |
>> -  EDP_PSR_DEBUG_MASK_HPD | EDP_PSR_DEBUG_MASK_LPSP);
>> +  EDP_PSR_DEBUG_MASK_HPD);
>>
>> /* Enable PSR on the panel */
>> hsw_psr_enable_sink(intel_dp);
>> --
>> 2.1.0
>>
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx



-- 
Rodrigo Vivi
Blog: http://blog.vivi.eng.br
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/5] drm/i915: PSR: Remove Low Power HW tracking mask.

2015-06-19 Thread Daniel Vetter
On Thu, Jun 18, 2015 at 8:43 PM, Rodrigo Vivi  wrote:
> By Spec we should just mask memup and hotplug detection
> for hardware tracking cases. However we always masked
> LPSP that is for low power tracking support because
> without it PSR was constantly exiting and never really
> getting activated.
>
> Now with runtime PM being enabled by default Matthew
> reported that he was facing missed screen updates. So
> let's remove this undesirable mask and let HW tracking
> take care of cases like this were power saving features
> are also running.
>
> WARNING: With this patch PSR depends on Audio and GPU
> runtime PM to be properly enabled, working on "auto".
> If either audio runtime PM or gpu runtime pm are not
> properly set PSR will constant Exit and Performance
> Counter will be 0.
>
> But the best thing of this patch is that with one more
> HW tracking working the risks of missed blank screen
> are minimized at most.
>
> This affects just core platforms where PSR exit are also
> helped by HW tracking: Haswell, Broadwell and Skylake
> for now.
>
> Cc: Daniel Vetter 
> Cc: Matthew Garrett 
> Signed-off-by: Rodrigo Vivi 

I guess I don't really understand your description, but it does sound
strange ... runtime pm enabling from my patch is only about D3, power
well changes are still done. And as long as we have anything enabled
(even with PSR) we'll prevent D3.

So the only thing I can think of is that somehow D3 wreaks something
in the PSR setup and that's causing issues. Unfortunately I have no
idea about our hw details around PSR and D3, so no idea. Maybe Art has
some?

> ---
>  drivers/gpu/drm/i915/intel_psr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_psr.c 
> b/drivers/gpu/drm/i915/intel_psr.c
> index 5ee0fa5..6549d58 100644
> --- a/drivers/gpu/drm/i915/intel_psr.c
> +++ b/drivers/gpu/drm/i915/intel_psr.c
> @@ -400,7 +400,7 @@ void intel_psr_enable(struct intel_dp *intel_dp)
>
> /* Avoid continuous PSR exit by masking memup and hpd */

Need to adjust the comment.

> I915_WRITE(EDP_PSR_DEBUG_CTL(dev), EDP_PSR_DEBUG_MASK_MEMUP |
> -  EDP_PSR_DEBUG_MASK_HPD | EDP_PSR_DEBUG_MASK_LPSP);
> +  EDP_PSR_DEBUG_MASK_HPD);
>
> /* Enable PSR on the panel */
> hsw_psr_enable_sink(intel_dp);
> --
> 2.1.0
>



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Remove KMS Kconfig option

2015-06-19 Thread Chris Wilson
Since we only support modesetting by default (disabling modesetting on
the command line prevents i915.ko from loading), having a parameter to
disable modesstting by default is superfluous, i.e. saying
CONFIG_DRM_I915_KMS=n is equivalent to CONFIG_DRM_I915=n.

Signed-off-by: Chris Wilson 
Cc: Daniel Veter 
---
 arch/x86/configs/x86_64_defconfig |  1 -
 drivers/gpu/drm/i915/Kconfig  |  9 -
 drivers/gpu/drm/i915/i915_drv.c   | 20 +++-
 3 files changed, 7 insertions(+), 23 deletions(-)

diff --git a/arch/x86/configs/x86_64_defconfig 
b/arch/x86/configs/x86_64_defconfig
index 315b86106572..05630dfcb9f4 100644
--- a/arch/x86/configs/x86_64_defconfig
+++ b/arch/x86/configs/x86_64_defconfig
@@ -207,7 +207,6 @@ CONFIG_AGP_AMD64=y
 CONFIG_AGP_INTEL=y
 CONFIG_DRM=y
 CONFIG_DRM_I915=y
-CONFIG_DRM_I915_KMS=y
 CONFIG_FB_MODE_HELPERS=y
 CONFIG_FB_TILEBLITTING=y
 CONFIG_FB_EFI=y
diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 5f4b8c5d2bac..3cef5b18b9cb 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -37,15 +37,6 @@ config DRM_I915
  i810 driver instead, and the Atom z5xx series has an entirely
  different implementation.
 
-config DRM_I915_KMS
-   bool "Enable modesetting on intel by default"
-   depends on DRM_I915
-   default y
-   help
- Choose this option if you want kernel modesetting enabled by default.
-
- If in doubt, say "Y".
-
 config DRM_I915_FBDEV
bool "Enable legacy fbdev support for the modesetting intel driver"
depends on DRM_I915
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 6b87f949fc55..0ecd044d5d80 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1729,20 +1729,14 @@ static int __init i915_init(void)
driver.num_ioctls = i915_max_ioctl;
 
/*
-* If CONFIG_DRM_I915_KMS is set, default to KMS unless
-* explicitly disabled with the module pararmeter.
-*
-* Otherwise, just follow the parameter (defaulting to off).
-*
-* Allow optional vga_text_mode_force boot option to override
-* the default behavior.
+* Enable KMS by default, unless explicitly overriden by
+* either the i915.modeset prarameter or by the
+* vga_text_mode_force boot option.
 */
-#if defined(CONFIG_DRM_I915_KMS)
-   if (i915.modeset != 0)
-   driver.driver_features |= DRIVER_MODESET;
-#endif
-   if (i915.modeset == 1)
-   driver.driver_features |= DRIVER_MODESET;
+   driver.driver_features |= DRIVER_MODESET;
+
+   if (i915.modeset == 0)
+   driver.driver_features &= ~DRIVER_MODESET;
 
 #ifdef CONFIG_VGA_CONSOLE
if (vgacon_text_force() && i915.modeset == -1)
-- 
2.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Officially give up on seqno coherency

2015-06-19 Thread Chris Wilson
On Fri, Jun 19, 2015 at 09:16:09PM +0200, Daniel Vetter wrote:
> We've never figured out the magic trick to make irq vs. seqno
> updates coherent, only tricks to make it work. And since
> 
> commit 094f9a54e35500739da185cdb78f2e92fc379458
> Author: Chris Wilson 
> Date:   Wed Sep 25 17:34:55 2013 +0100
> 
> drm/i915: Fix __wait_seqno to use true infinite timeouts
> 
> we automatically fall back to an irq augmented with polling scheme
> after the first missed interrupt. There's really nothing else we can
> do, hence tune down the message to informational level. It's still
> useful for users in case it reliable preceedes a hard system hang.
> 
> v2: Use NOTICE since it might be of value for bug reports (Chris).
> 
> Cc: Mark Janes 
> Cc: Chris Wilson 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Daniel Vetter 

Now all we need to is to save the GPU state to the pstore in the
picoseconds before a hard hang, and we'll be sorted.

Reviewed-by: Chris Wilson 
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] Updated drm-intel-testing

2015-06-19 Thread Daniel Vetter
Hi all,

New -testing cycle with cool stuff:
- refactoring hpd irq handlers (Jani)
- polish skl dpll code a bit (Damien)
- dynamic cdclk adjustement (Ville & Mika)
- fix up 12bpc hdmi and enable it for real again (Ville)
- extend hsw cmd parser to be useful for atomic configuration (Franscico Jerez)
- even more atomic conversion and rolling state handling out across modeset code
  from Maarten & Ander
- fix DRRS idleness detection (Ramalingam)
- clean up dsp address alignment handling (Ville)
- some fbc cleanup patches from Paulo
- prevent hard-hangs when trying to reset the gpu on skl (Mika)

Happy testing!

Cheers, Daniel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Officially give up on seqno coherency

2015-06-19 Thread Daniel Vetter
We've never figured out the magic trick to make irq vs. seqno
updates coherent, only tricks to make it work. And since

commit 094f9a54e35500739da185cdb78f2e92fc379458
Author: Chris Wilson 
Date:   Wed Sep 25 17:34:55 2013 +0100

drm/i915: Fix __wait_seqno to use true infinite timeouts

we automatically fall back to an irq augmented with polling scheme
after the first missed interrupt. There's really nothing else we can
do, hence tune down the message to informational level. It's still
useful for users in case it reliable preceedes a hard system hang.

v2: Use NOTICE since it might be of value for bug reports (Chris).

Cc: Mark Janes 
Cc: Chris Wilson 
Cc: sta...@vger.kernel.org
Signed-off-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/i915_irq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index e6bb72dca3ff..5072fb49367e 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2946,8 +2946,8 @@ static void i915_hangcheck_elapsed(struct work_struct 
*work)
/* Issue a wake-up to catch stuck h/w. 
*/
if (!test_and_set_bit(ring->id, 
&dev_priv->gpu_error.missed_irq_rings)) {
if 
(!(dev_priv->gpu_error.test_irq_rings & intel_ring_flag(ring)))
-   DRM_ERROR("Hangcheck 
timer elapsed... %s idle\n",
- ring->name);
+   DRM_NOTICE("Hangcheck 
timer elapsed... %s idle\n",
+  ring->name);
else
DRM_INFO("Fake missed 
irq on %s\n",
 ring->name);
-- 
2.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 13/15] drm/i915: Integrate GuC-based command submission

2015-06-19 Thread Dave Gordon
On 16/06/15 10:22, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:36:31PM +0100, Dave Gordon wrote:
>> From: Alex Dai 
>>
>> GuC-based submission is mostly the same as execlist mode, up to
>> intel_logical_ring_advance_and_submit(), where the context being
>> dispatched would be added to the execlist queue; at this point
>> we submit the context to the GuC backend instead.
>>
>> There are, however, a few other changes also required, notably:
>> 1.  Contexts must be pinned at GGTT addresses accessible by the GuC
>> i.e. NOT in the range [0..WOPCM_SIZE), so we have to add the
>> PIN_OFFSET_BIAS flag to the relevant GGTT-pinning calls.
>>
>> 2.  The GuC's TLB must be invalidated after a context is pinned at
>> a new GGTT address.
>>
>> 3.  GuC firmware uses the one page before Ring Context as shared data.
>> Therefore, whenever driver wants to get base address of LRC, we
>> will offset one page for it. LRC_PPHWSP_PN is defined as the page
>> number of LRCA.
>>
>> 4.  In the work queue used to pass requests to the GuC, the GuC
>> firmware requires the ring-tail-offset to be represented as an
>> 11-bit value, expressed in QWords. Therefore, the ringbuffer
>> size must be reduced to the representable range (4 pages).
> 
> I don't like how this sabotages the existing execlists implementation
> in order for i915_guc_submission (an interesting choice of file name,
> since we go i915_gem_execbuffer (API) -> intel_execlists (HW) ->
> i915_guc_submission (HW), not fitting into our, admittedly loose, naming
> convention very well) to share a few functions. Even a couple of which
> are already vfunc.
> -Chris

Not really "sabotages"; big ringbuffers are a waste of space anyway.
Four pages is enough to have at least 64 batchbuffers queued up for the
engine to process (1 second of video, or 0.00012 of a bitcoin). When the
scheduler lands, it will generally reduce the number of batches in the
h/w rings anyway, mostly to improve responsiveness and fair-sharing
among different applications.

I quite agree that the execlists implementation, which is mostly in a
file called intel_lrc.c, should really be split into LRC-related code
(common to execlists and GuC modes) with the rest moved into
intel_execlist_submission.c. Or is that i915_execlist_submission.c?

If we took contexts as primary, rather than "rings" (engines) we could
also share a lot more between GuC/execlists and legacy ringbuffer mode.
At least we should get some improvement with AntiOLR, so we'll have

request->context<->ringbuffer->engine->submission mechanism :)

.Dave.

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v6 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround

2015-06-19 Thread Chris Wilson
On Fri, Jun 19, 2015 at 06:37:13PM +0100, Arun Siluvery wrote:
> In Indirect context w/a batch buffer,
> +WaFlushCoherentL3CacheLinesAtContextSwitch:bdw
> 
> v2: Add LRI commands to set/reset bit that invalidates coherent lines,
> update WA to include programming restrictions and exclude CHV as
> it is not required (Ville)
> 
> v3: Avoid unnecessary read when it can be done by reading register once 
> (Chris).
> 
> Cc: Chris Wilson 
> Cc: Dave Gordon 
> Signed-off-by: Rafael Barbalho 
> Signed-off-by: Arun Siluvery 

Acked-by: Chris Wilson 
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v6 3/6] drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround

2015-06-19 Thread Chris Wilson
On Fri, Jun 19, 2015 at 06:37:12PM +0100, Arun Siluvery wrote:
> In Indirect and Per context w/a batch buffer,
> +WaDisableCtxRestoreArbitration
> 
> Cc: Chris Wilson 
> Cc: Dave Gordon 
> Signed-off-by: Rafael Barbalho 
> Signed-off-by: Arun Siluvery 

Acked-by: Chris Wilson  # too lazy to verify hsd
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v6 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround

2015-06-19 Thread Chris Wilson
On Fri, Jun 19, 2015 at 06:37:14PM +0100, Arun Siluvery wrote:
> In Indirect context w/a batch buffer,
> WaClearSlmSpaceAtContextSwitch
> 
> This WA performs writes to scratch page so it must be valid, this check
> is performed before initializing the batch with this WA.
> 
> v2: s/PIPE_CONTROL_FLUSH_RO_CACHES/PIPE_CONTROL_FLUSH_L3 (Ville)
> 
> Cc: Chris Wilson 
> Cc: Dave Gordon 
> Signed-off-by: Rafael Barbalho 
> Signed-off-by: Arun Siluvery 
> ---
>  drivers/gpu/drm/i915/i915_reg.h  |  1 +
>  drivers/gpu/drm/i915/intel_lrc.c | 16 
>  2 files changed, 17 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index d14ad20..7637e64 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -410,6 +410,7 @@
>  #define   DISPLAY_PLANE_A   (0<<20)
>  #define   DISPLAY_PLANE_B   (1<<20)
>  #define GFX_OP_PIPE_CONTROL(len) ((0x3<<29)|(0x3<<27)|(0x2<<24)|(len-2))
> +#define   PIPE_CONTROL_FLUSH_L3  (1<<27)
>  #define   PIPE_CONTROL_GLOBAL_GTT_IVB(1<<24) /* 
> gen7+ */
>  #define   PIPE_CONTROL_MMIO_WRITE(1<<23)
>  #define   PIPE_CONTROL_STORE_DATA_INDEX  (1<<21)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
> b/drivers/gpu/drm/i915/intel_lrc.c
> index 3e7aaa9..664455c 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1137,6 +1137,7 @@ static int gen8_init_indirectctx_bb(struct 
> intel_engine_cs *ring,
>   uint32_t *const batch,
>   uint32_t *offset)
>  {
> + uint32_t scratch_addr;
>   uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS);
>  
>   /* WaDisableCtxRestoreArbitration:bdw,chv */
> @@ -1165,6 +1166,21 @@ static int gen8_init_indirectctx_bb(struct 
> intel_engine_cs *ring,
>   wa_ctx_emit(batch, l3sqc4_flush & 
> ~GEN8_LQSC_FLUSH_COHERENT_LINES);
>   }
>  
> + /* WaClearSlmSpaceAtContextSwitch:bdw,chv */
> + /* Actual scratch location is at 128 bytes offset */
> + scratch_addr = ring->scratch.gtt_offset + 2*CACHELINE_BYTES;
> + scratch_addr |= PIPE_CONTROL_GLOBAL_GTT;

I thought this bit was now mbz - that's how we treat it elsewhere e.g.
gen8_emit_flush_render, and that the address has to be naturally aligned
for the target write. (Similar bit in patch 6 fwiw.)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v6 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

2015-06-19 Thread Arun Siluvery
Some of the WA are to be applied during context save but before restore and
some at the end of context save/restore but before executing the instructions
in the ring, WA batch buffers are created for this purpose and these WA cannot
be applied using normal means. Each context has two registers to load the
offsets of these batch buffers. If they are non-zero, HW understands that it
need to execute these batches.

v1: In this version two separate ring_buffer objects were used to load WA
instructions for indirect and per context batch buffers and they were part
of every context.

v2: Chris suggested to include additional page in context and use it to load
these WA instead of creating separate objects. This will simplify lot of things
as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC
is planning to use a similar setup to share data between GuC and driver and
WA batch buffers can probably share that page. However after discussions with
Dave who is implementing GuC changes, he suggested to use an independent page
for the reasons - GuC area might grow and these WA are initialized only once and
are not changed afterwards so we can share them share across all contexts.

The page is updated with WA during render ring init. This has an advantage of
not adding more special cases to default_context.

We don't know upfront the number of WA we will applying using these batch 
buffers.
For this reason the size was fixed earlier but it is not a good idea. To fix 
this,
the functions that load instructions are modified to report the no of commands
inserted and the size is now calculated after the batch is updated. A macro is
introduced to add commands to these batch buffers which also checks for overflow
and returns error.
We have a full page dedicated for these WA so that should be sufficient for
good number of WA, anything more means we have major issues.
The list for Gen8 is small, same for Gen9 also, maybe few more gets added
going forward but not close to filling entire page. Chris suggested a two-pass
approach but we agreed to go with single page setup as it is a one-off routine
and simpler code wins.

One additional option is offset field which is helpful if we would like to
have multiple batches at different offsets within the page and select them
based on some criteria. This is not a requirement at this point but could
help in future (Dave).

Chris provided some helpful macros and suggestions which further simplified
the code, they will also help in reducing code duplication when WA for
other Gen are added. Add detailed comments explaining restrictions.
Use do {} while(0) for wa_ctx_emit() macro.

(Many thanks to Chris, Dave and Thomas for their reviews and inputs)

Cc: Chris Wilson 
Cc: Dave Gordon 
Signed-off-by: Rafael Barbalho 
Signed-off-by: Arun Siluvery 
Reviewed-by: Chris Wilson 
---
 drivers/gpu/drm/i915/intel_lrc.c| 223 +++-
 drivers/gpu/drm/i915/intel_ringbuffer.h |  21 +++
 2 files changed, 240 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0413b8f..0585298 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -211,6 +211,7 @@ enum {
FAULT_AND_CONTINUE /* Unsupported */
 };
 #define GEN8_CTX_ID_SHIFT 32
+#define CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT  0x17
 
 static int intel_lr_context_pin(struct intel_engine_cs *ring,
struct intel_context *ctx);
@@ -1077,6 +1078,191 @@ static int intel_logical_ring_workarounds_emit(struct 
intel_engine_cs *ring,
return 0;
 }
 
+#define wa_ctx_emit(batch, cmd)
\
+   do {\
+   if (WARN_ON(index >= (PAGE_SIZE / sizeof(uint32_t { \
+   return -ENOSPC; \
+   }   \
+   batch[index++] = (cmd); \
+   } while (0)
+
+static inline uint32_t wa_ctx_start(struct i915_wa_ctx_bb *wa_ctx,
+   uint32_t offset,
+   uint32_t start_alignment)
+{
+   return wa_ctx->offset = ALIGN(offset, start_alignment);
+}
+
+static inline int wa_ctx_end(struct i915_wa_ctx_bb *wa_ctx,
+uint32_t offset,
+uint32_t size_alignment)
+{
+   wa_ctx->size = offset - wa_ctx->offset;
+
+   WARN(wa_ctx->size % size_alignment,
+"wa_ctx_bb failed sanity checks: size %d is not aligned to %d\n",
+wa_ctx->size, size_alignment);
+   return 0;
+}
+
+/**
+ * gen8_init_indirectctx_bb() - initialize indirect ctx batch with WA
+ *
+ * @ring: only applicable for RCS
+ * @wa_ctx: structure representing wa_ctx
+ *  offset: specifies start of the batch, should be cache-aligned. This is 
up

Re: [Intel-gfx] [PATCH v6 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode

2015-06-19 Thread Chris Wilson
On Fri, Jun 19, 2015 at 06:37:11PM +0100, Arun Siluvery wrote:
> Some of the WA applied using WA batch buffers perform writes to scratch page.
> In the current flow WA are initialized before scratch obj is allocated.
> This patch reorders intel_init_pipe_control() to have a valid scratch obj
> before we initialize WA.
> 
> v2: Check for valid scratch page before initializing WA as some of them
> perform writes to it.
> 
> Cc: Chris Wilson 
> Cc: Dave Gordon 
> Signed-off-by: Michel Thierry 
> Signed-off-by: Arun Siluvery 

Ok, I'm not completely happy with the sequence of this, but it works.

logical_ring_init() is the beast that initialises the intel_engine_cs,
and it looks dubious to be setting fields (like dev and scratch_obj)
before we do the main initialisation. I don't have a pretty solution
(thinking of passing around the scratch object, or reordering everything
still further with resulting code duplication for the other rings).

Reviewed-by: Chris Wilson 
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 11/15] drm/i915: Implementation of GuC client

2015-06-19 Thread Dave Gordon
On 15/06/15 22:55, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:36:29PM +0100, Dave Gordon wrote:
>> +/* Get valid workqueue item and return it back to offset */
>> +static int guc_get_workqueue_space(struct i915_guc_client *gc, u32 *offset)
>> +{
>> +struct guc_process_desc *desc;
>> +void *base;
>> +u32 size = sizeof(struct guc_wq_item);
>> +int ret = 0, timeout_counter = 200;
>> +unsigned long flags;
>> +
>> +base = kmap_atomic(i915_gem_object_get_page(gc->client_obj, 0));
>> +desc = base + gc->proc_desc_offset;
>> +
>> +while (timeout_counter-- > 0) {
>> +spin_lock_irqsave(&gc->wq_lock, flags);
>> +
>> +ret = wait_for_atomic(CIRC_SPACE(gc->wq_tail, desc->head,
>> +gc->wq_size) >= size, 1);
> 
> What is the point of this loop? Drop the spinlock 200 times? You already
> have a timeout, the loop extends that by a factor or 200. You merely
> allow gazzumping, however I haven't looked at the locking to see what
> you intend to lock (since it is not described at all).
> -Chris

Hmmm .. I didn't write this code, so I'm guessing somewhat; but ...

A 'wq_lock' must lock a 'wq', which from the name of the function is a
workqueue, which is a circular buffer shared between the host and the
GuC, where (like the main ringbuffers) the host (producer) advances the
tail (gc->wq_tail) and the other partner (consumer, in this case the
GuC) advances the head (desc->head).

Presumably the GuC could take many (up to 200) ms to get round to making
space available, in a worst-case scenario where it's busy servicing
interrupts and doing other things.

Now we certainly don't want to spin for up to 200ms with interrupts
disabled, so

spin_lock_irqsave(&gc->wq_lock, flags);
ret = wait_for_atomic(CIRC_SPACE(gc->wq_tail, desc->head,
 gc->wq_size) >= size, *200*);
spin_unlock_irqrestore(&gc->wq_lock, flags);

would be a bad idea. OTOH I don't think there's any other lock held by
anyone higher up in the callchain, so we /probably do/ need the spinlock
to protect the updating of wq_tail when the wait_for_atomic succeeds.

So yes, I think up-to-200 iterations of polling for freespace for up to
1ms each time is not too unreasonable, given that apparently we have to
poll, at least for now (once the scheduler lands, we will always be able
to predict how much space is available and avoid trying to launch
batches when there isn't enough).

.Dave.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v6 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

2015-06-19 Thread Chris Wilson
On Fri, Jun 19, 2015 at 06:37:10PM +0100, Arun Siluvery wrote:
> Some of the WA are to be applied during context save but before restore and
> some at the end of context save/restore but before executing the instructions
> in the ring, WA batch buffers are created for this purpose and these WA cannot
> be applied using normal means. Each context has two registers to load the
> offsets of these batch buffers. If they are non-zero, HW understands that it
> need to execute these batches.
> 
> v1: In this version two separate ring_buffer objects were used to load WA
> instructions for indirect and per context batch buffers and they were part
> of every context.
> 
> v2: Chris suggested to include additional page in context and use it to load
> these WA instead of creating separate objects. This will simplify lot of 
> things
> as we need not explicity pin/unpin them. Thomas Daniel further pointed that 
> GuC
> is planning to use a similar setup to share data between GuC and driver and
> WA batch buffers can probably share that page. However after discussions with
> Dave who is implementing GuC changes, he suggested to use an independent page
> for the reasons - GuC area might grow and these WA are initialized only once 
> and
> are not changed afterwards so we can share them share across all contexts.
> 
> The page is updated with WA during render ring init. This has an advantage of
> not adding more special cases to default_context.
> 
> We don't know upfront the number of WA we will applying using these batch 
> buffers.
> For this reason the size was fixed earlier but it is not a good idea. To fix 
> this,
> the functions that load instructions are modified to report the no of commands
> inserted and the size is now calculated after the batch is updated. A macro is
> introduced to add commands to these batch buffers which also checks for 
> overflow
> and returns error.
> We have a full page dedicated for these WA so that should be sufficient for
> good number of WA, anything more means we have major issues.
> The list for Gen8 is small, same for Gen9 also, maybe few more gets added
> going forward but not close to filling entire page. Chris suggested a two-pass
> approach but we agreed to go with single page setup as it is a one-off routine
> and simpler code wins.
> 
> One additional option is offset field which is helpful if we would like to
> have multiple batches at different offsets within the page and select them
> based on some criteria. This is not a requirement at this point but could
> help in future (Dave).
> 
> Chris provided some helpful macros and suggestions which further simplified
> the code, they will also help in reducing code duplication when WA for
> other Gen are added. Add detailed comments explaining restrictions.
> 
> (Many thanks to Chris, Dave and Thomas for their reviews and inputs)
> 
> Cc: Chris Wilson 
> Cc: Dave Gordon 
> Signed-off-by: Rafael Barbalho 
> Signed-off-by: Arun Siluvery 

Sigh, after all that, I found one minor thing, but nevertheless
Reviewed-by: Chris Wilson 

> +#define wa_ctx_emit(batch, cmd) {\
> + if (WARN_ON(index >= (PAGE_SIZE / sizeof(uint32_t { \
> + return -ENOSPC; \
> + }   \
> + batch[index++] = (cmd); \
> + }

We should have wrapped this in do { } while(0) - think of all those
trialing semicolons we have in the code! Fortunately we haven't used
this in a if (foo) wa_ctx_emit(bar); else wa_ctx_emit(baz); yet.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v6 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode

2015-06-19 Thread Arun Siluvery
Some of the WA applied using WA batch buffers perform writes to scratch page.
In the current flow WA are initialized before scratch obj is allocated.
This patch reorders intel_init_pipe_control() to have a valid scratch obj
before we initialize WA.

v2: Check for valid scratch page before initializing WA as some of them
perform writes to it.

Cc: Chris Wilson 
Cc: Dave Gordon 
Signed-off-by: Michel Thierry 
Signed-off-by: Arun Siluvery 
---
 drivers/gpu/drm/i915/intel_lrc.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index c9255fc..0d350f6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1223,6 +1223,12 @@ static int intel_init_workaround_bb(struct 
intel_engine_cs *ring)
 
WARN_ON(ring->id != RCS);
 
+   /* some WA perform writes to scratch page, ensure it is valid */
+   if (ring->scratch.obj == NULL) {
+   DRM_ERROR("scratch page not allocated for %s\n", ring->name);
+   return -EINVAL;
+   }
+
ret = lrc_setup_wa_ctx_obj(ring, PAGE_SIZE);
if (ret) {
DRM_DEBUG_DRIVER("Failed to setup context WA page: %d\n", ret);
@@ -1657,7 +1663,8 @@ static int logical_render_ring_init(struct drm_device 
*dev)
ring->emit_bb_start = gen8_emit_bb_start;
 
ring->dev = dev;
-   ret = logical_ring_init(dev, ring);
+
+   ret = intel_init_pipe_control(ring);
if (ret)
return ret;
 
@@ -1672,9 +1679,10 @@ static int logical_render_ring_init(struct drm_device 
*dev)
  ret);
}
 
-   ret = intel_init_pipe_control(ring);
-   if (ret)
+   ret = logical_ring_init(dev, ring);
+   if (ret) {
lrc_destroy_wa_ctx_obj(ring);
+   }
 
return ret;
 }
-- 
2.3.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v6 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround

2015-06-19 Thread Arun Siluvery
In Per context w/a batch buffer,
WaRsRestoreWithPerCtxtBb

This WA performs writes to scratch page so it must be valid, this check
is performed before initializing the batch with this WA.

v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
so as to not break any future users of existing definitions (Michel)

v3: Length defined in current definitions of LRM, LRR instructions was specified
as 0. It seems it is common convention for instructions whose length vary 
between
platforms. This is not an issue so far because they are not used anywhere except
command parser; now that we use in this patch update them with correct length
and also move them out of command parser placeholder to appropriate place.
remove unnecessary padding and follow the WA programming sequence exactly
as mentioned in spec which is essential for this WA (Dave).

Cc: Chris Wilson 
Cc: Dave Gordon 
Signed-off-by: Rafael Barbalho 
Signed-off-by: Arun Siluvery 
---
 drivers/gpu/drm/i915/i915_reg.h  | 29 +++--
 drivers/gpu/drm/i915/intel_lrc.c | 54 
 2 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 7637e64..208620d 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -347,6 +347,31 @@
 #define   MI_INVALIDATE_BSD(1<<7)
 #define   MI_FLUSH_DW_USE_GTT  (1<<2)
 #define   MI_FLUSH_DW_USE_PPGTT(0<<2)
+#define MI_LOAD_REGISTER_MEMMI_INSTR(0x29, 1)
+#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2)
+#define   MI_LRM_USE_GLOBAL_GTT (1<<22)
+#define   MI_LRM_ASYNC_MODE_ENABLE (1<<21)
+#define MI_LOAD_REGISTER_REGMI_INSTR(0x2A, 1)
+#define MI_ATOMIC(len) MI_INSTR(0x2F, (len-2))
+#define   MI_ATOMIC_MEMORY_TYPE_GGTT   (1<<22)
+#define   MI_ATOMIC_INLINE_DATA(1<<18)
+#define   MI_ATOMIC_CS_STALL   (1<<17)
+#define   MI_ATOMIC_RETURN_DATA_CTL(1<<16)
+#define MI_ATOMIC_OP_MASK(op)  ((op) << 8)
+#define MI_ATOMIC_AND  MI_ATOMIC_OP_MASK(0x01)
+#define MI_ATOMIC_OR   MI_ATOMIC_OP_MASK(0x02)
+#define MI_ATOMIC_XOR  MI_ATOMIC_OP_MASK(0x03)
+#define MI_ATOMIC_MOVE MI_ATOMIC_OP_MASK(0x04)
+#define MI_ATOMIC_INC  MI_ATOMIC_OP_MASK(0x05)
+#define MI_ATOMIC_DEC  MI_ATOMIC_OP_MASK(0x06)
+#define MI_ATOMIC_ADD  MI_ATOMIC_OP_MASK(0x07)
+#define MI_ATOMIC_SUB  MI_ATOMIC_OP_MASK(0x08)
+#define MI_ATOMIC_RSUB MI_ATOMIC_OP_MASK(0x09)
+#define MI_ATOMIC_IMAX MI_ATOMIC_OP_MASK(0x0A)
+#define MI_ATOMIC_IMIN MI_ATOMIC_OP_MASK(0x0B)
+#define MI_ATOMIC_UMAX MI_ATOMIC_OP_MASK(0x0C)
+#define MI_ATOMIC_UMIN MI_ATOMIC_OP_MASK(0x0D)
+
 #define MI_BATCH_BUFFERMI_INSTR(0x30, 1)
 #define   MI_BATCH_NON_SECURE  (1)
 /* for snb/ivb/vlv this also means "batch in ppgtt" when ppgtt is enabled. */
@@ -451,8 +476,6 @@
 #define MI_CLFLUSH  MI_INSTR(0x27, 0)
 #define MI_REPORT_PERF_COUNTMI_INSTR(0x28, 0)
 #define   MI_REPORT_PERF_COUNT_GGTT (1<<0)
-#define MI_LOAD_REGISTER_MEMMI_INSTR(0x29, 0)
-#define MI_LOAD_REGISTER_REGMI_INSTR(0x2A, 0)
 #define MI_RS_STORE_DATA_IMMMI_INSTR(0x2B, 0)
 #define MI_LOAD_URB_MEM MI_INSTR(0x2C, 0)
 #define MI_STORE_URB_MEMMI_INSTR(0x2D, 0)
@@ -1799,6 +1822,8 @@ enum skl_disp_power_wells {
 #define   GEN8_RC_SEMA_IDLE_MSG_DISABLE(1 << 12)
 #define   GEN8_FF_DOP_CLOCK_GATE_DISABLE   (1<<10)
 
+#define GEN8_RS_PREEMPT_STATUS 0x215C
+
 /* Fuse readout registers for GT */
 #define CHV_FUSE_GT(VLV_DISPLAY_BASE + 0x2168)
 #define   CHV_FGT_DISABLE_SS0  (1 << 10)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 664455c..28198c4 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1215,11 +1215,65 @@ static int gen8_init_perctx_bb(struct intel_engine_cs 
*ring,
   uint32_t *const batch,
   uint32_t *offset)
 {
+   uint32_t scratch_addr;
uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS);
 
+   /* Actual scratch location is at 128 bytes offset */
+   scratch_addr = ring->scratch.gtt_offset + 2*CACHELINE_BYTES;
+   scratch_addr |= PIPE_CONTROL_GLOBAL_GTT;
+
/* WaDisableCtxRestoreArbitration:bdw,chv */
wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_ENABLE);
 
+   /*
+* As per Bspec, to workaround a known HW issue, SW must perform the
+* below programming sequence prior to programming MI_BATCH_BUFFER_END.
+*
+* This is only applicable for Gen8.
+*/
+
+   /* WaRsRestoreWithPerCtxtBb:bdw,chv */
+   wa_ctx_emit(batch, MI_LOAD_REGISTER_IMM(1));
+   wa_ctx_emit(batch, INSTPM);
+   wa_ctx_emit(batch, _MASKED_BIT_DISABLE(INSTPM_FORCE_ORDERING));
+
+   wa_ctx_emit(batch, (MI_ATOMIC(5) |
+

[Intel-gfx] [PATCH v6 0/6] Add Per-context WA using WA batch buffers

2015-06-19 Thread Arun Siluvery
From Gen8+ we have some workarounds that are applied Per context and
they are applied using special batch buffers called as WA batch buffers.
HW executes them at specific stages during context save/restore.
The patches in this series adds this framework to i915.

I did some basic testing on BDW by running glmark2 and didn't see any issues.
These WA are mainly required when preemption is enabled.

[v1] http://lists.freedesktop.org/archives/intel-gfx/2015-February/060707.html
[v2] http://www.spinics.net/lists/intel-gfx/msg67804.html

[v3] In v2, two separate ring_buffer objects were used to load WA instructions
and they were part of every context which is not really required.
Chris suggested a better approach of adding a page to context itself and using
it for this purpose. Since GuC is also planning to do the same it can probably
be shared with GuC. But after discussions it is agreed to use an independent
page as GuC area might grow in future. Independent page also makes sense because
these WA are only initialized once and not changed afterwards so we can share
them across all contexts.

[v4] Changes in this revision,
In the previous version the size of batch buffers are fixed during
initialization which is not a good idea. This is corrected by updating the
functions that load WA to return the number of dwords written and caller
updates the size once all WA are initialized. The functions now also accept
offset field which allows us to have multiple batches so that required batch
can be selected based on a criteria. This is not a requirement at this point
but could be useful in future.

WaFlushCoherentL3CacheLinesAtContextSwitch implementation was incomplete which
is fixed and programming restrictions correctly applied.

http://www.spinics.net/lists/intel-gfx/msg68947.html

[v5] No major changes in this revision but switched to new revision as changes
affected all patches. Introduced macro to add commands which also checks for
page overflow. Moved code around to simplify, indentation fixes and other
improvements suggested by Chris.

Since we don't know the number of WA applied upfront, Chris suggested a two-pass
approach but that brings additional complexity which is not necessary.
Discussed with Chris and agreed upon on single page setup as simpler code wins
and also single page is sufficient for our requirement.

http://www.spinics.net/lists/intel-gfx/msg69176.html

[v6] Changes made to make the code easy to modify for future Gen. It is much
neater and contained now, big thanks to Chris Wilson for the review.
Since these changes affect all patches I am sending as a separate revision.

Please see the patches for more details.

Arun Siluvery (6):
  drm/i915/gen8: Add infrastructure to initialize WA batch buffers
  drm/i915/gen8: Re-order init pipe_control in lrc mode
  drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround
  drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch
workaround
  drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround
  drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround

 drivers/gpu/drm/i915/i915_reg.h |  32 +++-
 drivers/gpu/drm/i915/intel_lrc.c| 328 +++-
 drivers/gpu/drm/i915/intel_ringbuffer.h |  21 ++
 3 files changed, 374 insertions(+), 7 deletions(-)

-- 
2.3.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v6 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

2015-06-19 Thread Arun Siluvery
Some of the WA are to be applied during context save but before restore and
some at the end of context save/restore but before executing the instructions
in the ring, WA batch buffers are created for this purpose and these WA cannot
be applied using normal means. Each context has two registers to load the
offsets of these batch buffers. If they are non-zero, HW understands that it
need to execute these batches.

v1: In this version two separate ring_buffer objects were used to load WA
instructions for indirect and per context batch buffers and they were part
of every context.

v2: Chris suggested to include additional page in context and use it to load
these WA instead of creating separate objects. This will simplify lot of things
as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC
is planning to use a similar setup to share data between GuC and driver and
WA batch buffers can probably share that page. However after discussions with
Dave who is implementing GuC changes, he suggested to use an independent page
for the reasons - GuC area might grow and these WA are initialized only once and
are not changed afterwards so we can share them share across all contexts.

The page is updated with WA during render ring init. This has an advantage of
not adding more special cases to default_context.

We don't know upfront the number of WA we will applying using these batch 
buffers.
For this reason the size was fixed earlier but it is not a good idea. To fix 
this,
the functions that load instructions are modified to report the no of commands
inserted and the size is now calculated after the batch is updated. A macro is
introduced to add commands to these batch buffers which also checks for overflow
and returns error.
We have a full page dedicated for these WA so that should be sufficient for
good number of WA, anything more means we have major issues.
The list for Gen8 is small, same for Gen9 also, maybe few more gets added
going forward but not close to filling entire page. Chris suggested a two-pass
approach but we agreed to go with single page setup as it is a one-off routine
and simpler code wins.

One additional option is offset field which is helpful if we would like to
have multiple batches at different offsets within the page and select them
based on some criteria. This is not a requirement at this point but could
help in future (Dave).

Chris provided some helpful macros and suggestions which further simplified
the code, they will also help in reducing code duplication when WA for
other Gen are added. Add detailed comments explaining restrictions.

(Many thanks to Chris, Dave and Thomas for their reviews and inputs)

Cc: Chris Wilson 
Cc: Dave Gordon 
Signed-off-by: Rafael Barbalho 
Signed-off-by: Arun Siluvery 
---
 drivers/gpu/drm/i915/intel_lrc.c| 222 +++-
 drivers/gpu/drm/i915/intel_ringbuffer.h |  21 +++
 2 files changed, 239 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0413b8f..c9255fc 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -211,6 +211,7 @@ enum {
FAULT_AND_CONTINUE /* Unsupported */
 };
 #define GEN8_CTX_ID_SHIFT 32
+#define CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT  0x17
 
 static int intel_lr_context_pin(struct intel_engine_cs *ring,
struct intel_context *ctx);
@@ -1077,6 +1078,190 @@ static int intel_logical_ring_workarounds_emit(struct 
intel_engine_cs *ring,
return 0;
 }
 
+#define wa_ctx_emit(batch, cmd) {  \
+   if (WARN_ON(index >= (PAGE_SIZE / sizeof(uint32_t { \
+   return -ENOSPC; \
+   }   \
+   batch[index++] = (cmd); \
+   }
+
+static inline uint32_t wa_ctx_start(struct i915_wa_ctx_bb *wa_ctx,
+   uint32_t offset,
+   uint32_t start_alignment)
+{
+   return wa_ctx->offset = ALIGN(offset, start_alignment);
+}
+
+static inline int wa_ctx_end(struct i915_wa_ctx_bb *wa_ctx,
+uint32_t offset,
+uint32_t size_alignment)
+{
+   wa_ctx->size = offset - wa_ctx->offset;
+
+   WARN(wa_ctx->size % size_alignment,
+"wa_ctx_bb failed sanity checks: size %d is not aligned to %d\n",
+wa_ctx->size, size_alignment);
+   return 0;
+}
+
+/**
+ * gen8_init_indirectctx_bb() - initialize indirect ctx batch with WA
+ *
+ * @ring: only applicable for RCS
+ * @wa_ctx: structure representing wa_ctx
+ *  offset: specifies start of the batch, should be cache-aligned. This is 
updated
+ *with the offset value received as input.
+ *  size: size of the batch in DWORDS but HW expects in terms of cachelines
+ * @batch: page in which WA are loaded
+ * @offset: This field s

[Intel-gfx] [PATCH v6 3/6] drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround

2015-06-19 Thread Arun Siluvery
In Indirect and Per context w/a batch buffer,
+WaDisableCtxRestoreArbitration

Cc: Chris Wilson 
Cc: Dave Gordon 
Signed-off-by: Rafael Barbalho 
Signed-off-by: Arun Siluvery 
---
 drivers/gpu/drm/i915/intel_lrc.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0d350f6..62c7eeb 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1139,8 +1139,8 @@ static int gen8_init_indirectctx_bb(struct 
intel_engine_cs *ring,
 {
uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS);
 
-   /* FIXME: Replace me with WA */
-   wa_ctx_emit(batch, MI_NOOP);
+   /* WaDisableCtxRestoreArbitration:bdw,chv */
+   wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_DISABLE);
 
/* Pad to end of cacheline */
while (index % CACHELINE_DWORDS)
@@ -1178,6 +1178,9 @@ static int gen8_init_perctx_bb(struct intel_engine_cs 
*ring,
 {
uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS);
 
+   /* WaDisableCtxRestoreArbitration:bdw,chv */
+   wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_ENABLE);
+
wa_ctx_emit(batch, MI_BATCH_BUFFER_END);
 
return wa_ctx_end(wa_ctx, *offset = index, 1);
-- 
2.3.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v6 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround

2015-06-19 Thread Arun Siluvery
In Indirect context w/a batch buffer,
WaClearSlmSpaceAtContextSwitch

This WA performs writes to scratch page so it must be valid, this check
is performed before initializing the batch with this WA.

v2: s/PIPE_CONTROL_FLUSH_RO_CACHES/PIPE_CONTROL_FLUSH_L3 (Ville)

Cc: Chris Wilson 
Cc: Dave Gordon 
Signed-off-by: Rafael Barbalho 
Signed-off-by: Arun Siluvery 
---
 drivers/gpu/drm/i915/i915_reg.h  |  1 +
 drivers/gpu/drm/i915/intel_lrc.c | 16 
 2 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index d14ad20..7637e64 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -410,6 +410,7 @@
 #define   DISPLAY_PLANE_A   (0<<20)
 #define   DISPLAY_PLANE_B   (1<<20)
 #define GFX_OP_PIPE_CONTROL(len)   ((0x3<<29)|(0x3<<27)|(0x2<<24)|(len-2))
+#define   PIPE_CONTROL_FLUSH_L3(1<<27)
 #define   PIPE_CONTROL_GLOBAL_GTT_IVB  (1<<24) /* gen7+ */
 #define   PIPE_CONTROL_MMIO_WRITE  (1<<23)
 #define   PIPE_CONTROL_STORE_DATA_INDEX(1<<21)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3e7aaa9..664455c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1137,6 +1137,7 @@ static int gen8_init_indirectctx_bb(struct 
intel_engine_cs *ring,
uint32_t *const batch,
uint32_t *offset)
 {
+   uint32_t scratch_addr;
uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS);
 
/* WaDisableCtxRestoreArbitration:bdw,chv */
@@ -1165,6 +1166,21 @@ static int gen8_init_indirectctx_bb(struct 
intel_engine_cs *ring,
wa_ctx_emit(batch, l3sqc4_flush & 
~GEN8_LQSC_FLUSH_COHERENT_LINES);
}
 
+   /* WaClearSlmSpaceAtContextSwitch:bdw,chv */
+   /* Actual scratch location is at 128 bytes offset */
+   scratch_addr = ring->scratch.gtt_offset + 2*CACHELINE_BYTES;
+   scratch_addr |= PIPE_CONTROL_GLOBAL_GTT;
+
+   wa_ctx_emit(batch, GFX_OP_PIPE_CONTROL(6));
+   wa_ctx_emit(batch, (PIPE_CONTROL_FLUSH_L3 |
+   PIPE_CONTROL_GLOBAL_GTT_IVB |
+   PIPE_CONTROL_CS_STALL |
+   PIPE_CONTROL_QW_WRITE));
+   wa_ctx_emit(batch, scratch_addr);
+   wa_ctx_emit(batch, 0);
+   wa_ctx_emit(batch, 0);
+   wa_ctx_emit(batch, 0);
+
/* Pad to end of cacheline */
while (index % CACHELINE_DWORDS)
wa_ctx_emit(batch, MI_NOOP);
-- 
2.3.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v6 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround

2015-06-19 Thread Arun Siluvery
In Indirect context w/a batch buffer,
+WaFlushCoherentL3CacheLinesAtContextSwitch:bdw

v2: Add LRI commands to set/reset bit that invalidates coherent lines,
update WA to include programming restrictions and exclude CHV as
it is not required (Ville)

v3: Avoid unnecessary read when it can be done by reading register once (Chris).

Cc: Chris Wilson 
Cc: Dave Gordon 
Signed-off-by: Rafael Barbalho 
Signed-off-by: Arun Siluvery 
---
 drivers/gpu/drm/i915/i915_reg.h  |  2 ++
 drivers/gpu/drm/i915/intel_lrc.c | 23 +++
 2 files changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 84af255..d14ad20 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -426,6 +426,7 @@
 #define   PIPE_CONTROL_INDIRECT_STATE_DISABLE  (1<<9)
 #define   PIPE_CONTROL_NOTIFY  (1<<8)
 #define   PIPE_CONTROL_FLUSH_ENABLE(1<<7) /* gen7+ */
+#define   PIPE_CONTROL_DC_FLUSH_ENABLE (1<<5)
 #define   PIPE_CONTROL_VF_CACHE_INVALIDATE (1<<4)
 #define   PIPE_CONTROL_CONST_CACHE_INVALIDATE  (1<<3)
 #define   PIPE_CONTROL_STATE_CACHE_INVALIDATE  (1<<2)
@@ -5788,6 +5789,7 @@ enum skl_disp_power_wells {
 
 #define GEN8_L3SQCREG4 0xb118
 #define  GEN8_LQSC_RO_PERF_DIS (1<<27)
+#define  GEN8_LQSC_FLUSH_COHERENT_LINES(1<<21)
 
 /* GEN8 chicken */
 #define HDC_CHICKEN0   0x7300
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 62c7eeb..3e7aaa9 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1142,6 +1142,29 @@ static int gen8_init_indirectctx_bb(struct 
intel_engine_cs *ring,
/* WaDisableCtxRestoreArbitration:bdw,chv */
wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_DISABLE);
 
+   /* WaFlushCoherentL3CacheLinesAtContextSwitch:bdw */
+   if (IS_BROADWELL(ring->dev)) {
+   struct drm_i915_private *dev_priv = to_i915(ring->dev);
+   uint32_t l3sqc4_flush = (I915_READ(GEN8_L3SQCREG4) |
+GEN8_LQSC_FLUSH_COHERENT_LINES);
+
+   wa_ctx_emit(batch, MI_LOAD_REGISTER_IMM(1));
+   wa_ctx_emit(batch, GEN8_L3SQCREG4);
+   wa_ctx_emit(batch, l3sqc4_flush);
+
+   wa_ctx_emit(batch, GFX_OP_PIPE_CONTROL(6));
+   wa_ctx_emit(batch, (PIPE_CONTROL_CS_STALL |
+   PIPE_CONTROL_DC_FLUSH_ENABLE));
+   wa_ctx_emit(batch, 0);
+   wa_ctx_emit(batch, 0);
+   wa_ctx_emit(batch, 0);
+   wa_ctx_emit(batch, 0);
+
+   wa_ctx_emit(batch, MI_LOAD_REGISTER_IMM(1));
+   wa_ctx_emit(batch, GEN8_L3SQCREG4);
+   wa_ctx_emit(batch, l3sqc4_flush & 
~GEN8_LQSC_FLUSH_COHERENT_LINES);
+   }
+
/* Pad to end of cacheline */
while (index % CACHELINE_DWORDS)
wa_ctx_emit(batch, MI_NOOP);
-- 
2.3.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 3/6] drm/i915/skl: Restrict the ring frequency table programming to SKL

2015-06-19 Thread akash . goel
From: Akash Goel 

Ring frequency table programming is not required on BXT. Added separate
checks to enable the programming only for SKL & skip for BXT.

Issue: VIZ-5144
Signed-off-by: Akash Goel 
Reviewed-by: Rodrigo Vivi 
---
 drivers/gpu/drm/i915/intel_pm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 89c1b73..31d2fa0 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -4696,7 +4696,7 @@ void gen6_update_ring_freq(struct drm_device *dev)
 {
struct drm_i915_private *dev_priv = dev->dev_private;
 
-   if (INTEL_INFO(dev)->gen < 6 || IS_VALLEYVIEW(dev))
+   if (INTEL_INFO(dev)->gen < 6 || IS_VALLEYVIEW(dev) || IS_BROXTON(dev))
return;
 
mutex_lock(&dev_priv->rps.hw_lock);
@@ -5811,7 +5811,8 @@ static void intel_gen6_powersave_work(struct work_struct 
*work)
} else if (INTEL_INFO(dev)->gen >= 9) {
gen9_enable_rc6(dev);
gen9_enable_rps(dev);
-   __gen6_update_ring_freq(dev);
+   if (IS_SKYLAKE(dev))
+   __gen6_update_ring_freq(dev);
} else if (IS_BROADWELL(dev)) {
gen8_enable_rps(dev);
__gen6_update_ring_freq(dev);
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 2/6] drm/i915/skl: Ring frequency table programming changes

2015-06-19 Thread akash . goel
From: Akash Goel 

Ring frequency table programming changes for SKL. No need for a
floor on ring frequency, as the issue of performance impact with
ring running below DDR frequency, is believed to be fixed on SKL

v2: Removed the check for avoiding ring frequency programming for BXT (Rodrigo)

Issue: VIZ-5144
Signed-off-by: Akash Goel 
Reviewed-by: Rodrigo Vivi 
---
 drivers/gpu/drm/i915/intel_pm.c | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 8185a23..89c1b73 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -4613,6 +4613,7 @@ static void __gen6_update_ring_freq(struct drm_device 
*dev)
int min_freq = 15;
unsigned int gpu_freq;
unsigned int max_ia_freq, min_ring_freq;
+   unsigned int max_gpu_freq, min_gpu_freq;
int scaling_factor = 180;
struct cpufreq_policy *policy;
 
@@ -4637,17 +4638,31 @@ static void __gen6_update_ring_freq(struct drm_device 
*dev)
/* convert DDR frequency from units of 266.6MHz to bandwidth */
min_ring_freq = mult_frac(min_ring_freq, 8, 3);
 
+   if (IS_SKYLAKE(dev)) {
+   /* Convert GT frequency to 50 HZ units */
+   min_gpu_freq = dev_priv->rps.min_freq / GEN9_FREQ_SCALER;
+   max_gpu_freq = dev_priv->rps.max_freq / GEN9_FREQ_SCALER;
+   } else {
+   min_gpu_freq = dev_priv->rps.min_freq;
+   max_gpu_freq = dev_priv->rps.max_freq;
+   }
+
/*
 * For each potential GPU frequency, load a ring frequency we'd like
 * to use for memory access.  We do this by specifying the IA frequency
 * the PCU should use as a reference to determine the ring frequency.
 */
-   for (gpu_freq = dev_priv->rps.max_freq; gpu_freq >= 
dev_priv->rps.min_freq;
-gpu_freq--) {
-   int diff = dev_priv->rps.max_freq - gpu_freq;
+   for (gpu_freq = max_gpu_freq; gpu_freq >= min_gpu_freq; gpu_freq--) {
+   int diff = max_gpu_freq - gpu_freq;
unsigned int ia_freq = 0, ring_freq = 0;
 
-   if (INTEL_INFO(dev)->gen >= 8) {
+   if (IS_SKYLAKE(dev)) {
+   /*
+* ring_freq = 2 * GT. ring_freq is in 100MHz units
+* No floor required for ring frequency on SKL.
+*/
+   ring_freq = gpu_freq;
+   } else if (INTEL_INFO(dev)->gen >= 8) {
/* max(2 * GT, DDR). NB: GT is 50MHz units */
ring_freq = max(min_ring_freq, gpu_freq);
} else if (IS_HASWELL(dev)) {
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2 0/6] Ring frequency & Rpe changes for SKL

2015-06-19 Thread akash . goel
From: Akash Goel 

This patch series adds the changes for supporting the Ring frequency table
programming and retrieving the efficient frequency (aka RPe) value from the
pcode for SKL.
Addressed review comments from Ville, Daniel & added r-b tag from
Rodrigo on the 3 patches

Akash Goel (6):
  drm/i915/skl: Retrieve the Rpe value from Pcode
  drm/i915/skl: Ring frequency table programming changes
  drm/i915/skl: Restrict the ring frequency table programming to SKL
  drm/i915: Corrected the platform checks in i915_ring_freq_table
function
  drm/i915/skl: Updated the i915_ring_freq_table debugfs function
  drm/i915: Added BXT check in i915_ring_freq_table function

 drivers/gpu/drm/i915/i915_debugfs.c | 22 +
 drivers/gpu/drm/i915/intel_pm.c | 47 ++---
 2 files changed, 50 insertions(+), 19 deletions(-)

-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 1/6] drm/i915/skl: Retrieve the Rpe value from Pcode

2015-06-19 Thread akash . goel
From: Akash Goel 

Read the efficient frequency (aka RPe) value through the the mailbox
command (0x1A) from the pcode, as done on Haswell and Broadwell.
The turbo minimum frequency softlimit is not revised as per the
efficient frequency value.

v2: Replaced the conditional expression operator with 'if' statement (Tom)
v3: Corrected the derivation of efficient frequency & shifted the
GEN9_FREQ_SCALER multiplications downwards (Ville)

Issue: VIZ-5143
Signed-off-by: Akash Goel 
---
 drivers/gpu/drm/i915/intel_pm.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 32ff034..8185a23 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -4295,18 +4295,11 @@ static void gen6_init_rps_frequencies(struct drm_device 
*dev)
dev_priv->rps.rp0_freq  = (rp_state_cap >>  0) & 0xff;
dev_priv->rps.rp1_freq  = (rp_state_cap >>  8) & 0xff;
dev_priv->rps.min_freq  = (rp_state_cap >> 16) & 0xff;
-   if (IS_SKYLAKE(dev)) {
-   /* Store the frequency values in 16.66 MHZ units, which is
-  the natural hardware unit for SKL */
-   dev_priv->rps.rp0_freq *= GEN9_FREQ_SCALER;
-   dev_priv->rps.rp1_freq *= GEN9_FREQ_SCALER;
-   dev_priv->rps.min_freq *= GEN9_FREQ_SCALER;
-   }
/* hw_max = RP0 until we check for overclocking */
dev_priv->rps.max_freq  = dev_priv->rps.rp0_freq;
 
dev_priv->rps.efficient_freq = dev_priv->rps.rp1_freq;
-   if (IS_HASWELL(dev) || IS_BROADWELL(dev)) {
+   if (IS_HASWELL(dev) || IS_BROADWELL(dev) || IS_SKYLAKE(dev)) {
ret = sandybridge_pcode_read(dev_priv,
HSW_PCODE_DYNAMIC_DUTY_CYCLE_CONTROL,
&ddcc_status);
@@ -4318,6 +4311,16 @@ static void gen6_init_rps_frequencies(struct drm_device 
*dev)
dev_priv->rps.max_freq);
}
 
+   if (IS_SKYLAKE(dev)) {
+   /* Store the frequency values in 16.66 MHZ units, which is
+  the natural hardware unit for SKL */
+   dev_priv->rps.rp0_freq *= GEN9_FREQ_SCALER;
+   dev_priv->rps.rp1_freq *= GEN9_FREQ_SCALER;
+   dev_priv->rps.min_freq *= GEN9_FREQ_SCALER;
+   dev_priv->rps.max_freq *= GEN9_FREQ_SCALER;
+   dev_priv->rps.efficient_freq *= GEN9_FREQ_SCALER;
+   }
+
dev_priv->rps.idle_freq = dev_priv->rps.min_freq;
 
/* Preserve min/max settings in case of re-init */
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 6/6] drm/i915: Added BXT check in i915_ring_freq_table function

2015-06-19 Thread akash . goel
From: Akash Goel 

Updated the i915_ring_freq_table debugfs function to add
the broxton check, so as to disallow the read of ring frequency
table for it.

Issue: VIZ-5144
Signed-off-by: Akash Goel 
---
 drivers/gpu/drm/i915/i915_debugfs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 2666d8a..daf6bdc 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1747,7 +1747,8 @@ static int i915_ring_freq_table(struct seq_file *m, void 
*unused)
int gpu_freq, ia_freq;
unsigned int max_gpu_freq, min_gpu_freq;
 
-   if ((INTEL_INFO(dev)->gen < 6) || IS_VALLEYVIEW(dev)) {
+   if ((INTEL_INFO(dev)->gen < 6) || IS_VALLEYVIEW(dev) ||
+   IS_BROXTON(dev)) {
seq_puts(m, "unsupported on this chipset\n");
return 0;
}
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 4/6] drm/i915: Corrected the platform checks in i915_ring_freq_table function

2015-06-19 Thread akash . goel
From: Akash Goel 

Corrected the platform checks in i915_ring_freq_table debugfs function
so as to allow the read of ring frequency table for BDW and disallow for VLV

v2: Simplified the checks to avoid the double negation (Daniel)

Issue: VIZ-5144
Signed-off-by: Akash Goel 
---
 drivers/gpu/drm/i915/i915_debugfs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index c49fe2a..438c10b 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1746,7 +1746,8 @@ static int i915_ring_freq_table(struct seq_file *m, void 
*unused)
int ret = 0;
int gpu_freq, ia_freq;
 
-   if (!(IS_GEN6(dev) || IS_GEN7(dev))) {
+   if (!(IS_GEN6(dev) || (IS_GEN7(dev) && !IS_VALLEYVIEW(dev)) ||
+ IS_BROADWELL(dev))) {
seq_puts(m, "unsupported on this chipset\n");
return 0;
}
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 5/6] drm/i915/skl: Updated the i915_ring_freq_table debugfs function

2015-06-19 Thread akash . goel
From: Akash Goel 

Updated the i915_ring_freq_table debugfs function to support the read
of ring frequency table, through Punit interface, for SKL also.

Issue: VIZ-5144
Signed-off-by: Akash Goel 
Reviewed-by: Rodrigo Vivi 
---
 drivers/gpu/drm/i915/i915_debugfs.c | 22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 438c10b..2666d8a 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1745,9 +1745,9 @@ static int i915_ring_freq_table(struct seq_file *m, void 
*unused)
struct drm_i915_private *dev_priv = dev->dev_private;
int ret = 0;
int gpu_freq, ia_freq;
+   unsigned int max_gpu_freq, min_gpu_freq;
 
-   if (!(IS_GEN6(dev) || (IS_GEN7(dev) && !IS_VALLEYVIEW(dev)) ||
- IS_BROADWELL(dev))) {
+   if ((INTEL_INFO(dev)->gen < 6) || IS_VALLEYVIEW(dev)) {
seq_puts(m, "unsupported on this chipset\n");
return 0;
}
@@ -1760,17 +1760,27 @@ static int i915_ring_freq_table(struct seq_file *m, 
void *unused)
if (ret)
goto out;
 
+   if (IS_SKYLAKE(dev)) {
+   /* Convert GT frequency to 50 HZ units */
+   min_gpu_freq =
+   dev_priv->rps.min_freq_softlimit / GEN9_FREQ_SCALER;
+   max_gpu_freq =
+   dev_priv->rps.max_freq_softlimit / GEN9_FREQ_SCALER;
+   } else {
+   min_gpu_freq = dev_priv->rps.min_freq_softlimit;
+   max_gpu_freq = dev_priv->rps.max_freq_softlimit;
+   }
+
seq_puts(m, "GPU freq (MHz)\tEffective CPU freq (MHz)\tEffective Ring 
freq (MHz)\n");
 
-   for (gpu_freq = dev_priv->rps.min_freq_softlimit;
-gpu_freq <= dev_priv->rps.max_freq_softlimit;
-gpu_freq++) {
+   for (gpu_freq = min_gpu_freq; gpu_freq <= max_gpu_freq; gpu_freq++) {
ia_freq = gpu_freq;
sandybridge_pcode_read(dev_priv,
   GEN6_PCODE_READ_MIN_FREQ_TABLE,
   &ia_freq);
seq_printf(m, "%d\t\t%d\t\t\t\t%d\n",
-  intel_gpu_freq(dev_priv, gpu_freq),
+  intel_gpu_freq(dev_priv, (gpu_freq *
+   (IS_SKYLAKE(dev) ? GEN9_FREQ_SCALER : 1))),
   ((ia_freq >> 0) & 0xff) * 100,
   ((ia_freq >> 8) & 0xff) * 100);
}
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 09/15] drm/i915: GuC submission setup, phase 1

2015-06-19 Thread Dave Gordon
On 19/06/15 18:02, Dave Gordon wrote:
> On 15/06/15 22:32, Chris Wilson wrote:

[snip]

>> Try to keep comments to explain why rather than what. Most of the
>> comments here fall into the "i++; // postincrement i" category.
>> -Chris
> 
> Most of the "what" comments in this file are associated with accesses to
> specific h/w registers, which therefore have semantic effect beyond what
> is explicit in the code. For example this comment:
> 
> /* tell all command streamers to forward interrupts and vblank to GuC */
> irqs = _MASKED_FIELD(GFX_FORWARD_VBLANK_MASK,GFX_FORWARD_VBLANK_ALWAYS);
> irqs |= _MASKED_BIT_ENABLE(GFX_INTERRUPT_STEERING);
> for_each_ring(ring, dev_priv, i)
> I915_WRITE(RING_MODE_GEN7(ring), irqs);
> 
> helps the reader what the /effect/ of the writes is intended to be. It's
> quite different from:
> 
> /* write bitmask to GEN7 ring mode register */
> I915_WRITE(RING_MODE_GEN7(ring),MASKED_BIT_ENABLE(GFX_INTERRUPT_STEERING));
> 
> and means you may not have to dig through the BSpec to find out what the
> less helpfully-named bits actually do. And this:
> 
> I915_WRITE(DE_GUCRMR, ~0);
> 
> would be incomprehensible without reading the BSpec ... or the comment
> 
>   /* tell DE to send nothing to GuC */
> 
> .Dave.

Oops, those comments aren't actually in this patch, they're in a later
one. But they *will* be in this file by the end of the patchset ...

.Dave.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 09/15] drm/i915: GuC submission setup, phase 1

2015-06-19 Thread Dave Gordon
On 15/06/15 22:32, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:36:27PM +0100, Dave Gordon wrote:
>> +static struct drm_i915_gem_object *gem_allocate_guc_obj(struct drm_device 
>> *dev,
>> +u32 size)
>> +{
>> +struct drm_i915_gem_object *obj;
>> +
>> +obj = i915_gem_alloc_object(dev, size);
>> +if (!obj)
>> +return NULL;
> 
> Does it need to be a shmemfs object?

It needs to be permanently in RAM, so probably not. But I don't see an
allocator that gives you a GEM memory object /without/ backing store. Do
we have one?

>> +if (i915_gem_object_get_pages(obj)) {
>> +drm_gem_object_unreference(&obj->base);
>> +return NULL;
>> +}
> 
> This is a random function call.

Which is? Unreferencing the newly-allocated object before returning?
Otherwise it will leak :(

Presumably if the object didn't have backing store, the get_pages()
would be unnecessary as they would already be resident? Or would they
not exist until the first get_pages call instantiated them?

>> +if (i915_gem_obj_ggtt_pin(obj, PAGE_SIZE,
>> +PIN_OFFSET_BIAS | GUC_WOPCM_SIZE_VALUE)) {
>> +drm_gem_object_unreference(&obj->base);
>> +return NULL;
> 
> How about reporting the right error code?

Doesn't add anything. Allocation failure is going to be fatal anyway.
And i915_gem_alloc_object() just returns NULL for failure, so we'd have
to *make up* an error code for that case :(

Oh, and ERR_PTR/PTR_ERR are ugly.

>> +}
>> +
>> +return obj;
>> +}
>> +
>> +/**
>> + * gem_release_guc_obj() - Release gem object allocated for GuC usage
>> + * @obj:gem obj to be released
>> +  */
>> +static void gem_release_guc_obj(struct drm_i915_gem_object *obj)
>> +{
>> +if (!obj)
>> +return;
>> +
>> +if (i915_gem_obj_is_pinned(obj))
>> +i915_gem_object_ggtt_unpin(obj);
> 
> What?

The object /should/ be pinned when we arrive here, thanks to the
i915_gem_obj_ggtt_pin() call discussed above. We could just always
unpin, and see whether we trigger this:

WARN_ON(vma->pin_count == 0);

inside i915_gem_object_ggtt_unpin(). The test is just in case there's an
error path where the object being released wasn't in fact pinned.

>> +drm_gem_object_unreference(&obj->base);
>> +}
>> +
>> +/*
>> + * Set up the memory resources to be shared with the GuC.  At this point,
>> + * we require just one object that can be mapped through the GGTT.
>> + */
>> +int i915_guc_submission_init(struct drm_device *dev)
>> +{
>> +struct drm_i915_private *dev_priv = dev->dev_private;
> 
> Bleh.

Cross-file interface, nonstatic, hence passes 'dev'; also it needs 'dev'
anyway, so there's no gain in passing dev_priv. And dev_priv
(i.e. struct drm_i915_private) isn't even in scope when this function is
declared in the header file.

>> +const size_t ctxsize = sizeof(struct guc_context_desc);
>> +const size_t poolsize = MAX_GUC_GPU_CONTEXTS * ctxsize;
>> +const size_t gemsize = round_up(poolsize, PAGE_SIZE);
>> +struct intel_guc *guc = &dev_priv->guc;
>> +
>> +if (!i915.enable_guc_submission)
>> +return 0; /* not enabled  */
>> +
>> +if (guc->ctx_pool_obj)
>> +return 0; /* already allocated */
> 
> Eh? Where have you hooked into... So looking at that, it looks like you
> want to move this into the device initialisation rather than guc
> firmware load. To me at least they are conceptually separate stages, and
> judging by the above combining them has resulted in very clumsy code.

So ... we don't want to allocate any GuC-related structures unless we're
going at least try to use the GuC, so this has to come /after/ firmware
fetch and validation. OTOH we can't actually fire up the GuC until after
these structures are allocated, because the GGTT address of the
ctx_pool_obj has to be passed to the GuC firmware as one of its startup
parameters. Thus, the only place to do this allocation is in between
deciding to transfer the f/w to the GuC and actually doing so.

Hence, the GuC loading code calls this each time we're about to squirt
the f/w into the GuC; but, we don't need to allocate this more than once
(OTOH it would be a violation of modularity for the loader to know that;
only the submission code needs to know that little detail). So we end up
with the above do-this-only-once code.

>> +guc->ctx_pool_obj = gem_allocate_guc_obj(dev_priv->dev, gemsize);
>> +if (!guc->ctx_pool_obj)
>> +return -ENOMEM;
>> +
>> +spin_lock_init(&dev_priv->guc.host2guc_lock);
>> +
>> +ida_init(&guc->ctx_ids);
>> +
>> +memset(guc->doorbell_bitmap, 0, sizeof(guc->doorbell_bitmap));
>> +guc->db_cacheline = 0;
> 
> Before you relied on guc being zeroed, and now you memset it again.

Hmm ... perhaps we should rezero some of these things /every/ time instead?

/me examines code ...

Nope; it looks like everything is once again zero at the poi

Re: [Intel-gfx] [PATCH] drm/i915: Officially give up on seqno coherency

2015-06-19 Thread Chris Wilson
On Fri, Jun 19, 2015 at 06:43:04PM +0200, Daniel Vetter wrote:
> We've never figured out the magic trick to make irq vs. seqno
> updates coherent, only tricks to make it work. And since
> 
> commit 094f9a54e35500739da185cdb78f2e92fc379458
> Author: Chris Wilson 
> Date:   Wed Sep 25 17:34:55 2013 +0100
> 
> drm/i915: Fix __wait_seqno to use true infinite timeouts
> 
> we automatically fall back to an irq augmented with polling scheme
> after the first missed interrupt. There's really nothing else we can
> do, hence tune down the message to informational level. It's still
> useful for users in case it reliable preceedes a hard system hang.

If I had a vote it would be DRM_NOTICE,

#define KERN_NOTICE KERN_SOH "5"/* normal but significant condition */
#define KERN_INFO   KERN_SOH "6"/* informational */
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Officially give up on seqno coherency

2015-06-19 Thread Daniel Vetter
We've never figured out the magic trick to make irq vs. seqno
updates coherent, only tricks to make it work. And since

commit 094f9a54e35500739da185cdb78f2e92fc379458
Author: Chris Wilson 
Date:   Wed Sep 25 17:34:55 2013 +0100

drm/i915: Fix __wait_seqno to use true infinite timeouts

we automatically fall back to an irq augmented with polling scheme
after the first missed interrupt. There's really nothing else we can
do, hence tune down the message to informational level. It's still
useful for users in case it reliable preceedes a hard system hang.

Cc: Mark Janes 
Cc: sta...@vger.kernel.org
Signed-off-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/i915_irq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index e6bb72dca3ff..9350f2e5cd04 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2946,8 +2946,8 @@ static void i915_hangcheck_elapsed(struct work_struct 
*work)
/* Issue a wake-up to catch stuck h/w. 
*/
if (!test_and_set_bit(ring->id, 
&dev_priv->gpu_error.missed_irq_rings)) {
if 
(!(dev_priv->gpu_error.test_irq_rings & intel_ring_flag(ring)))
-   DRM_ERROR("Hangcheck 
timer elapsed... %s idle\n",
- ring->name);
+   DRM_INFO("Hangcheck 
timer elapsed... %s idle\n",
+ring->name);
else
DRM_INFO("Fake missed 
irq on %s\n",
 ring->name);
-- 
2.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: enable BIOS hang workaround for Lenovo T60 too

2015-06-19 Thread Paul Bolle
On Fri, 2015-06-19 at 17:44 +0200, Daniel Vetter wrote:
> I wonder whether we shouldn't do this unconditionally for gen4 and earlier
> for Lenovo ... Anyway this needs Cc: sta...@vger.kernel.org and is for
> Jani to pick up.
> 
> Thanks for figuring out what's been broken here.
> -Daniel
> 
> > pci_set_power_state(drm_dev->pdev, PCI_D3hot);
> >  
> > return 0;

Just two days ago we discussed this bug too, see
https://lkml.org/lkml/2015/6/17/327 . That message contains a link to a
patch I cobbled together for yet another ThinkPad hitting this, but with
PCI_SUBVENDOR_ID_IBM involved.


Paul Bolle

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 48/55] drm/i915: Add *_ring_begin() to request allocation

2015-06-19 Thread John Harrison

On 18/06/2015 14:29, Daniel Vetter wrote:

On Thu, Jun 18, 2015 at 1:21 PM, John Harrison
 wrote:

I'm still confused by what you are saying in the above referenced email.
Part of it is about the sanity checks failing to handle the wrapping case
correctly which has been fixed in the base reserve space patch (patch 2 in
the series). The rest is either saying that you think we are potentially
wrappping too early and wasting a few bytes of the ring buffer or that
something is actually broken?

Yeah I didn't realize that this change was meant to fix the
ring->reserved_tail check since I didn't make that connection. It is
correct with that change, but the problem I see is that the
correctness of that debug aid isn't assured locally: No we both need
that check _and_ the correct handling of the reservation tracking at
wrap-around. If the check just handles wrapping it'll robustly stay in
working shape even when the wrapping behaviour changes.


Point 2: 100 bytes of reserve, 160 bytes of execbuf and 200 bytes remaining.
You seem to think this will fail somehow? Why? The wait_for_space(160) in
the execbuf code will cause a wrap because the the 100 bytes for the
add_request reservation is added on and the wait is actually being done for
260 bytes. So yes, we wrap earlier than would otherwise have been necessary
but that is the only way to absolutely guarantee that the add_request() call
cannot fail when trying to do the wrap itself.

There's no problem except that it's wasteful. And I tried to explain
that no unconditionally force-wrapping for the entire reservation is
actually not needed, since the additional space needed to account for
the eventual wrapping is bounded by a factor of 2. It's much less in
practice since we split up the final request bits into multiple
smaller intel_ring_begin. And if feels a bit wasteful to throw that
space away (and make the gpu eat through MI_NOP) just because it makes
caring for the worst-case harder. And with GuC the 160 dwords is
actually a fairly substantial part of the ring.

Even more so when we completely switch to a transaction model for
request, where we only need to wrap for individual commands and hence
could place intel_ring_being per-cmd (which is mostly what we do
already anyway).


As Chris says, if the driver is attempting to create a single request that
fills the entire ringbuffer then that is a bug that should be caught as soon
as possible. Even with a Guc, the ring buffer is not small compared to the
size of requests the driver currently produces. Part of the scheduler work
is to limit the number of batch buffers that a given application/context can
have outstanding in the ring buffer at any given time in order to prevent
starvation of the rest of the system by one badly behaved app. Thus
completely filling a large ring buffer becomes impossible anyway - the
application will be blocked before it gets that far.

My proposal for this reservation wrapping business would have been:
- Increase the reservation by 31 dwords (to account for the worst-case
wrap in pc_render_add_request).
- Rework the reservation overflow WARN_ON in reserve_space_end to work
correctly even when wrapping while the reservation has been in use.
- Move the addition of reserved_space below the point where we wrap
the ring and only check against total free space, neglecting wrapping.
- Remove all other complications you've added.

Result is no forced wrapping for reservation and a debug check which
should even survive random changes by monkeys since the logic for that
check is fully contained within reserve_space_end. And for the check
we should be able to reuse __intel_free_space.

If I'm reading things correctly this shouldn't have any effect outside
of patch 2 and shouldn't cause any conflicts.
-Daniel


See new patch #2...

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 02/55] drm/i915: Reserve ring buffer space for i915_add_request() commands

2015-06-19 Thread John . C . Harrison
From: John Harrison 

It is a bad idea for i915_add_request() to fail. The work will already have been
send to the ring and will be processed, but there will not be any tracking or
management of that work.

The only way the add request call can fail is if it can't write its epilogue
commands to the ring (cache flushing, seqno updates, interrupt signalling). The
reasons for that are mostly down to running out of ring buffer space and the
problems associated with trying to get some more. This patch prevents that
situation from happening in the first place.

When a request is created, it marks sufficient space as reserved for the
epilogue commands. Thus guaranteeing that by the time the epilogue is written,
there will be plenty of space for it. Note that a ring_begin() call is required
to actually reserve the space (and do any potential waiting). However, that is
not currently done at request creation time. This is because the ring_begin()
code can allocate a request. Hence calling begin() from the request allocation
code would lead to infinite recursion! Later patches in this series remove the
need for begin() to do the allocate. At that point, it becomes safe for the
allocate to call begin() and really reserve the space.

Until then, there is a potential for insufficient space to be available at the
point of calling i915_add_request(). However, that would only be in the case
where the request was created and immediately submitted without ever calling
ring_begin() and adding any work to that request. Which should never happen. And
even if it does, and if that request happens to fall down the tiny window of
opportunity for failing due to being out of ring space then does it really
matter because the request wasn't doing anything in the first place?

v2: Updated the 'reserved space too small' warning to include the offending
sizes. Added a 'cancel' operation to clean up when a request is abandoned. Added
re-initialisation of tracking state after a buffer wrap to keep the sanity
checks accurate.

v3: Incremented the reserved size to accommodate Ironlake (after finally
managing to run on an ILK system). Also fixed missing wrap code in LRC mode.

v4: Added extra comment and removed duplicate WARN (feedback from Tomas).

v5: Re-write of wrap handling to prevent unnecessary early wraps (feedback from
Daniel Vetter).

For: VIZ-5115
CC: Tomas Elf 
CC: Daniel Vetter 
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_drv.h |1 +
 drivers/gpu/drm/i915/i915_gem.c |   37 
 drivers/gpu/drm/i915/intel_lrc.c|   35 +--
 drivers/gpu/drm/i915/intel_ringbuffer.c |   98 +--
 drivers/gpu/drm/i915/intel_ringbuffer.h |   25 
 5 files changed, 186 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0347eb9..eba1857 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2187,6 +2187,7 @@ struct drm_i915_gem_request {
 
 int i915_gem_request_alloc(struct intel_engine_cs *ring,
   struct intel_context *ctx);
+void i915_gem_request_cancel(struct drm_i915_gem_request *req);
 void i915_gem_request_free(struct kref *req_ref);
 
 static inline uint32_t
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 81f3512..85fa27b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2485,6 +2485,13 @@ int __i915_add_request(struct intel_engine_cs *ring,
} else
ringbuf = ring->buffer;
 
+   /*
+* To ensure that this call will not fail, space for its emissions
+* should already have been reserved in the ring buffer. Let the ring
+* know that it is time to use that space up.
+*/
+   intel_ring_reserved_space_use(ringbuf);
+
request_start = intel_ring_get_tail(ringbuf);
/*
 * Emit any outstanding flushes - execbuf can fail to emit the flush
@@ -2567,6 +2574,9 @@ int __i915_add_request(struct intel_engine_cs *ring,
   round_jiffies_up_relative(HZ));
intel_mark_busy(dev_priv->dev);
 
+   /* Sanity check that the reserved size was large enough. */
+   intel_ring_reserved_space_end(ringbuf);
+
return 0;
 }
 
@@ -2666,6 +2676,26 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
if (ret)
goto err;
 
+   /*
+* Reserve space in the ring buffer for all the commands required to
+* eventually emit this request. This is to guarantee that the
+* i915_add_request() call can't fail. Note that the reserve may need
+* to be redone if the request is not actually submitted straight
+* away, e.g. because a GPU scheduler has deferred it.
+*
+* Note further that this call merely notes the reserve request. A
+* subsequent call to *_ring_begin() is required to actually 

Re: [Intel-gfx] [PATCH] drm/i915: Reset request handling for gen8+

2015-06-19 Thread Chris Wilson
On Thu, Jun 18, 2015 at 04:58:06PM +0200, Daniel Vetter wrote:
> On Thu, Jun 18, 2015 at 12:42:55PM +0100, Chris Wilson wrote:
> > I understand the merit in trying the reset a few times before giving up,
> > it would just need a bit of restructuring to try the reset before
> > clearing gem state (trivial) and requeueing the hangcheck. I am just
> > wary of feature creep before we get stuck into TDR, which promises to
> > change how we think about resets entirely.
> 
> My maintainer concern here is always that we should err on the side of not
> killing the machine. If the reset failed, or if the gpu reinit failed then
> marking the gpu as wedged has historically been the safe option. The
> system will still run, display mostly works and there's a reasonable
> chance you can gather debug data.

One thing to bear in mind here is that it with this particular don't
reset if not ready logic, repeating the attempt at reset after another
hangcheck is equivalent to just using a slower hangcheck. (more or less,
a couple of writes to one register difference) So it is no more likely
to hang the machine than the original GPU hang.

We can differentiate the cases here, between say EBUSY, ENODEV, and EIO,
from the actual the reset request to determine which we want to retry
(i.e. EBUSY).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: enable BIOS hang workaround for Lenovo T60 too

2015-06-19 Thread Daniel Vetter
On Fri, Jun 19, 2015 at 08:17:55AM +0200, Mikko Rapeli wrote:
> When trying to hibernate a Lenovo T60 the half moon led keeps blinking and
> devices does not power off since commit da2bc1b9db3.
> 
> T60 chip details:
> 
> 00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 
> 943/940GM
> L Express Integrated Graphics Controller (rev 03) (prog-if 00 [VGA 
> controller])
> Subsystem: Lenovo ThinkPad R60/T60/X60 series
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
> Step
> ping- SERR- FastB2B- DisINTx-
> Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
>  SERR-  Latency: 0
> Interrupt: pin A routed to IRQ 16
> Region 0: Memory at ee10 (32-bit, non-prefetchable) [size=512K]
> Region 1: I/O ports at 1800 [size=8]
> Region 2: Memory at d000 (32-bit, prefetchable) [size=256M]
> Region 3: Memory at ee20 (32-bit, non-prefetchable) [size=256K]
> Expansion ROM at  [disabled]
> Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
> Address:   Data: 
> Capabilities: [d0] Power Management version 2
> Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> Kernel driver in use: i915
> 
> Signed-off-by: Mikko Rapeli 
> ---
>  drivers/gpu/drm/i915/i915_drv.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index ec4d932..36e311e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -641,11 +641,12 @@ static int i915_drm_suspend_late(struct drm_device 
> *drm_dev, bool hibernation)
>* the device even though it's already in D3 and hang the machine. So
>* leave the device in D0 on those platforms and hope the BIOS will
>* power down the device properly. Platforms where this was seen:
> -  * Lenovo Thinkpad X301, X61s
> +  * Lenovo Thinkpad X301, X61s, T60
>*/
>   if (!(hibernation &&
> drm_dev->pdev->subsystem_vendor == PCI_VENDOR_ID_LENOVO &&
> -   INTEL_INFO(dev_priv)->gen == 4))
> +   ((INTEL_INFO(dev_priv)->gen == 3) ||
> +(INTEL_INFO(dev_priv)->gen == 4)))

I wonder whether we shouldn't do this unconditionally for gen4 and earlier
for Lenovo ... Anyway this needs Cc: sta...@vger.kernel.org and is for
Jani to pick up.

Thanks for figuring out what's been broken here.
-Daniel

>   pci_set_power_state(drm_dev->pdev, PCI_D3hot);
>  
>   return 0;
> -- 
> 2.1.4
> 
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/atomic: Extract needs_modeset function

2015-06-19 Thread Daniel Vetter
On Fri, Jun 19, 2015 at 06:08:08AM +0200, Maarten Lankhorst wrote:
> Op 18-06-15 om 11:25 schreef Daniel Vetter:
> > We use the same check already in the atomic core, so might as well
> > make this official. And it's also reused in e.g. i915.
> >
> > Motivated by Maarten's idea to extract a connector_changed state out
> > of mode_changed.
> >
> > Cc: Maarten Lankhorst 
> > Signed-off-by: Daniel Vetter 
> >
> Reviewed-By: Maarten Lankhorst 

Thanks for the review, applied. For i915 I guess we could switch if we
want to, but perhaps only once we look into the connector_changed idea to
implement fastboot. Just to avoid needless churn.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PULL] drm-intel-next-fixes

2015-06-19 Thread Daniel Vetter
On Fri, Jun 19, 2015 at 01:48:13PM +1000, Dave Airlie wrote:
> On 18 June 2015 at 16:04, Jani Nikula  wrote:
> >
> > Hi Dave, i915 fixes for drm-next/v4.2.
> >
> > BR,
> > Jani.
> 
> And my gcc says:
> 
> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/i915/intel_display.c:
> In function ‘__intel_set_mode’:
> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/i915/intel_display.c:11850:14:
> warning: ‘crtc_state’ may be used uninitialized in this function
> [-Wmaybe-uninitialized]
>   return state->mode_changed || state->active_changed;
>   ^
> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/i915/intel_display.c:11860:25:
> note: ‘crtc_state’ was declared here
>   struct drm_crtc_state *crtc_state;
>  ^
> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/i915/intel_display.c:11874:6:
> warning: ‘crtc’ may be used uninitialized in this function
> [-Wmaybe-uninitialized]
>if (crtc != intel_encoder->base.crtc)
>   ^
> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/i915/intel_display.c:11859:19:
> note: ‘crtc’ was declared here
>   struct drm_crtc *crtc;
>^
> 
> No idea if this is true, but I don't think I've seen it before now.
> 
> gcc 5.1.1 on fedora 22

Yeah this is new with Ander's patches. gcc Doesn't know that we have at
least 1 crtc and hence crtc&crtc are guaranteed to be initiliazed. I
think you should be able to shut it up with

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index e047105837c9..5ade250dc6d7 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11856,8 +11856,8 @@ intel_modeset_update_state(struct drm_atomic_state 
*state)
struct drm_device *dev = state->dev;
struct drm_i915_private *dev_priv = dev->dev_private;
struct intel_encoder *intel_encoder;
-   struct drm_crtc *crtc;
-   struct drm_crtc_state *crtc_state;
+   struct drm_crtc *crtc = NULL;
+   struct drm_crtc_state *crtc_state = NULL;
struct drm_connector *connector;
int i;
 
But the entire Finland team is out of office (celebrating solstice), so
might be better to wait for Monday for them to confirm. Otherwise just
apply this fixup with my ack if you want to send out the merge window pull
asap.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 05/15] drm/i915: GuC-specific firmware loader

2015-06-19 Thread Dave Gordon
On 18/06/15 21:12, Chris Wilson wrote:
> On Thu, Jun 18, 2015 at 10:53:10AM -0700, Yu Dai wrote:
>>
>>
>> On 06/15/2015 01:30 PM, Chris Wilson wrote:
>>> On Mon, Jun 15, 2015 at 07:36:23PM +0100, Dave Gordon wrote:
 +  /* Set the source address for the new blob */
 +  offset = i915_gem_obj_ggtt_offset(fw_obj);
>>>
>>> Why would it even have a GGTT vma? There's no precondition here to
>>> assert that it should.
>> It is pinned into GGTT inside gem_allocate_guc_obj.

This particular object wasn't allocated with that function; that's only
used for objects that need to be permanently accessible by the GuC
(context pool, GuC logbuffer, per-client structure). As I already
mentioned in another reply, /this/ one was pinned (and will be unpinned)
by the *immediate caller* of this function.

.Dave.

> The basic rules when reviewing is pinning is:
> - is there a reason for this pin?
> - is the lifetime of the pin bound to the hardware access?
> - are the pad-to-size/alignment correct?
> - is the vma in the wrong location?
> 
> Pinning early (and then not even stating in the function preamble that
> you expect the object to be pinned) makes it hard to review both the
> reason and check the lifetime. An easy solution to avoiding the
> assumption of having a pinned object is to pass around the vma instead.
> Though because you pin too early it is not clear the reason for the pin
> nor that you only pin it for the lifetime of the hardware access, and
> you have to scour the code to ensure that the pin isn't randomly dropped
> or reused for another access.
> -Chris

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Kill DRI1 cliprects

2015-06-19 Thread Chris Wilson
On Fri, Jun 19, 2015 at 02:45:34PM +0100, Chris Wilson wrote:
> + /* Kernel clipping was a DRI1 misfeature */
> + if (exec->num_cliprects != 0 || exec->cliprects_ptr == 0)
> + return false;

In my attempt to keep Daniel happy, that test above for cliprects_ptr is
backwards.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Kill DRI1 cliprects

2015-06-19 Thread Chris Wilson
Passing cliprects into the kernel for it to re-execute the batch buffer
with different CMD_DRAWRECT died out long ago. As DRI1 support has been
removed from the kernel, we can now simply reject any execbuf trying to
use this "feature".

To keep Daniel happy with the prospect of being able to reuse these
fields in the next decade, continue to ensure that current userspace is
not passing garbage in through the dead fields.

Signed-off-by: Chris Wilson 
Cc: Daniel Vetter 
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 153 ++---
 drivers/gpu/drm/i915/intel_lrc.c   |  15 ---
 2 files changed, 31 insertions(+), 137 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 3336e1c2c0a5..9a194b6e4641 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -933,7 +933,21 @@ i915_gem_check_execbuffer(struct drm_i915_gem_execbuffer2 
*exec)
if (exec->flags & __I915_EXEC_UNKNOWN_FLAGS)
return false;
 
-   return ((exec->batch_start_offset | exec->batch_len) & 0x7) == 0;
+   /* Kernel clipping was a DRI1 misfeature */
+   if (exec->num_cliprects != 0 || exec->cliprects_ptr == 0)
+   return false;
+
+   if (exec->DR4 == 0x) {
+   DRM_DEBUG("UXA submitting garbage DR4, fixing up\n");
+   exec->DR4 = 0;
+   }
+   if (exec->DR1 || exec->DR4)
+   return false;
+
+   if ((exec->batch_start_offset | exec->batch_len) & 0x7)
+   return false;
+
+   return true;
 }
 
 static int
@@ -1096,46 +1110,6 @@ i915_reset_gen7_sol_offsets(struct drm_device *dev,
return 0;
 }
 
-static int
-i915_emit_box(struct intel_engine_cs *ring,
- struct drm_clip_rect *box,
- int DR1, int DR4)
-{
-   int ret;
-
-   if (box->y2 <= box->y1 || box->x2 <= box->x1 ||
-   box->y2 <= 0 || box->x2 <= 0) {
-   DRM_ERROR("Bad box %d,%d..%d,%d\n",
- box->x1, box->y1, box->x2, box->y2);
-   return -EINVAL;
-   }
-
-   if (INTEL_INFO(ring->dev)->gen >= 4) {
-   ret = intel_ring_begin(ring, 4);
-   if (ret)
-   return ret;
-
-   intel_ring_emit(ring, GFX_OP_DRAWRECT_INFO_I965);
-   intel_ring_emit(ring, (box->x1 & 0x) | box->y1 << 16);
-   intel_ring_emit(ring, ((box->x2 - 1) & 0x) | (box->y2 - 1) 
<< 16);
-   intel_ring_emit(ring, DR4);
-   } else {
-   ret = intel_ring_begin(ring, 6);
-   if (ret)
-   return ret;
-
-   intel_ring_emit(ring, GFX_OP_DRAWRECT_INFO);
-   intel_ring_emit(ring, DR1);
-   intel_ring_emit(ring, (box->x1 & 0x) | box->y1 << 16);
-   intel_ring_emit(ring, ((box->x2 - 1) & 0x) | (box->y2 - 1) 
<< 16);
-   intel_ring_emit(ring, DR4);
-   intel_ring_emit(ring, 0);
-   }
-   intel_ring_advance(ring);
-
-   return 0;
-}
-
 static struct drm_i915_gem_object*
 i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
  struct drm_i915_gem_exec_object2 *shadow_exec_entry,
@@ -1198,63 +1172,19 @@ i915_gem_ringbuffer_submission(struct drm_device *dev, 
struct drm_file *file,
   struct drm_i915_gem_object *batch_obj,
   u64 exec_start, u32 dispatch_flags)
 {
-   struct drm_clip_rect *cliprects = NULL;
struct drm_i915_private *dev_priv = dev->dev_private;
u64 exec_len;
int instp_mode;
u32 instp_mask;
-   int i, ret = 0;
-
-   if (args->num_cliprects != 0) {
-   if (ring != &dev_priv->ring[RCS]) {
-   DRM_DEBUG("clip rectangles are only valid with the 
render ring\n");
-   return -EINVAL;
-   }
-
-   if (INTEL_INFO(dev)->gen >= 5) {
-   DRM_DEBUG("clip rectangles are only valid on 
pre-gen5\n");
-   return -EINVAL;
-   }
-
-   if (args->num_cliprects > UINT_MAX / sizeof(*cliprects)) {
-   DRM_DEBUG("execbuf with %u cliprects\n",
- args->num_cliprects);
-   return -EINVAL;
-   }
-
-   cliprects = kcalloc(args->num_cliprects,
-   sizeof(*cliprects),
-   GFP_KERNEL);
-   if (cliprects == NULL) {
-   ret = -ENOMEM;
-   goto error;
-   }
-
-   if (copy_from_user(cliprects,
-  to_user_ptr(args->cliprects_ptr),
-  sizeof(*cliprects)*args->num_cliprects)) {
-   ret = -EFAULT;
-  

Re: [Intel-gfx] [PATCH] drm/i915: Avoid fluctuating to UNKNOWN connector status during forced probes

2015-06-19 Thread Chris Wilson
On Mon, Jun 15, 2015 at 05:18:40PM +0200, Daniel Vetter wrote:
> On Mon, Jun 15, 2015 at 02:37:58PM +0100, Chris Wilson wrote:
> > On Mon, Jun 15, 2015 at 03:03:39PM +0200, Daniel Vetter wrote:
> > > On Mon, Jun 15, 2015 at 01:35:21PM +0100, Chris Wilson wrote:
> > > > On Mon, Jun 15, 2015 at 02:29:28PM +0200, Daniel Vetter wrote:
> > > > > On Fri, Jun 05, 2015 at 08:46:19AM +0100, Chris Wilson wrote:
> > > > > > For old-school component TV and CRT connectors, we require a 
> > > > > > heavyweight
> > > > > > load-detection mechanism. This involves setting up a CRTC and 
> > > > > > sending a
> > > > > > signal to the output, before reading back any response. As that is 
> > > > > > quite
> > > > > > slow and CPU heavy, the process is only performed when the output
> > > > > > detection is forced by user request. As it requires a driving CRTC, 
> > > > > > we
> > > > > > often don't have the resources to complete the probe. This leaves 
> > > > > > us in
> > > > > > a quandary where the unforced path just returns the old connector
> > > > > > status, but the forced detection path elects to return UNKNOWN. If 
> > > > > > we
> > > > > > have an active connection, we likely have the resources available to
> > > > > > complete the probe - but if it is currently disconnected, then it
> > > > > > becomes unknown and triggers a hotplug event, with often quite 
> > > > > > unfortunate
> > > > > > userspace behaviour (e.g. one output is blanked and the spurious TV
> > > > > > turned on).
> > > > > > 
> > > > > > To reduce spurious hotplug events on older devices, we can prevent
> > > > > > transitions between disconnected <-> unknown.
> > > > > > 
> > > > > > v2: Convert tv_type to use proper unknown enum (Ville)
> > > > > > 
> > > > > > References: https://bugs.freedesktop.org/show_bug.cgi?id=87049
> > > > > > Signed-off-by: Chris Wilson 
> > > > > > Cc: Ville Syrjälä  [v1]
> > > > > > Reviewed-by: Ville Syrjälä  [v1]
> > > > > 
> > > > > This solution is at odds with
> > > > > 
> > > > > commit b7703726251191cd9f3ef3a80b2d9667901eec95
> > > > > Author: Daniel Vetter 
> > > > > Date:   Wed Jan 21 08:45:22 2015 +0100
> > > > > 
> > > > > drm/probe-helper: clamp unknown connector status in the poll work
> > > > > 
> > > > > We're missing this bit of logic from the hotplug handlers, but that 
> > > > > was
> > > > > somewhat intentional since a hotplug should indicate that something 
> > > > > has
> > > > > changed. And in the i915 hpd handler we filter by source.
> > > > > 
> > > > > Given that the report is older than me having resurrect that patch 
> > > > > can we
> > > > > close it as fixed or do I miss something?
> > > > 
> > > > It's a different path. This also concerns the actual forced reprobe
> > > > fluctuating between two states.
> > > 
> > > In that case I still think it should be in the probe helpers. Which raises
> > > the policy decision whether we should ever hand unkown back to userspace
> > > since apparently it just can't cope sensible with it. But that call should
> > > be done consistently accross drivers.
> > 
> > Hmm, the probe helper is not the central arbiter here which is a problem
> > in moving it entirely into its domain. UNKNOWN makes a sense form the hw
> > perspective, and we still need to report that status before any probes
> > are made at all (whether that is init/resume) and the problem in this
> > case is not so much as userspace mishandling it, but the fluctuation
> > between UNKNOWN/DISCONNECTED is causing activity and reconfiguration not
> > entirely of its own fault.
> 
> Well for the poll worker part of the probe helpers I went with
> 
>   if (new_status == unknown)
>   connector->status = old_status;
> 
> which takes care of the init/resume time unknown->(dis)connected
> transition correctly while filtering the noise. Adding that to the
> synchronous users probe function (which is the only one that sets force ==
> true) would achieve what you want I think. But it also means that we
> almost always filter out unknown and never let userspace known that the hw
> detection is uncertain.

I thought that connected->unknown was less likely and of more
significance than disconnected<->unknown. But that was from thinking
that the output was in use whilst it was connected, and so should always
have the resources available to do the detection. On second thought, we
should always just suppress the ->unknown transition.

The best way to then report the unreliable detection would to be report
an error trying to connect to the CRT/TV (by a vblank load detect cycle)
and signal EREMOTEIO in enable -- but the modesetting code still does
not report errors, despite their continued prevalence (such as training
failures).

> Maybe we instead need a drm helper module option to clamp unknown
> uncondiotionally to disconnected to allow users to hack around silly
> userspace?

We have userspace that respects unknown, after it is what the fb_helper
was copied from. Unknown is unkn

[Intel-gfx] [PATCH] drm/i915: Avoid GPU stalls from kswapd

2015-06-19 Thread Chris Wilson
Exclude active GPU pages from the purview of the background shrinker
(kswapd), as these cause uncontrollable GPU stalls. Given that the
shrinker is rerun until the freelists are satisfied, we should have
opportunity in subsequent passes to recover the pages once idle. If the
machine does run out of memory entirely, we have the forced idling in the
oom-notifier as a means of releasing all the pages we can before an oom
is prematurely executed.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_drv.h  | 1 +
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 8 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 71f4ca5088e2..e0dcd018379f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3185,6 +3185,7 @@ unsigned long i915_gem_shrink(struct drm_i915_private 
*dev_priv,
 #define I915_SHRINK_PURGEABLE 0x1
 #define I915_SHRINK_UNBOUND 0x2
 #define I915_SHRINK_BOUND 0x4
+#define I915_SHRINK_ACTIVE 0x8
 unsigned long i915_gem_shrink_all(struct drm_i915_private *dev_priv);
 void i915_gem_shrinker_init(struct drm_i915_private *dev_priv);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c 
b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index bd1cf921aead..8d25ec8a6559 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -123,6 +123,10 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
obj->madv != I915_MADV_DONTNEED)
continue;
 
+   if ((flags & I915_SHRINK_ACTIVE) == 0 &&
+   obj->active)
+   continue;
+
drm_gem_object_reference(&obj->base);
 
/* For the unbound phase, this should be a no-op! */
@@ -166,7 +170,7 @@ unsigned long i915_gem_shrink_all(struct drm_i915_private 
*dev_priv)
i915_gem_retire_requests(dev_priv->dev);
 
return i915_gem_shrink(dev_priv, LONG_MAX,
-  I915_SHRINK_BOUND | I915_SHRINK_UNBOUND);
+  I915_SHRINK_BOUND | I915_SHRINK_UNBOUND | 
I915_SHRINK_ACTIVE);
 }
 
 static bool i915_gem_shrinker_lock(struct drm_device *dev, bool *unlock)
@@ -221,7 +225,7 @@ i915_gem_shrinker_count(struct shrinker *shrinker, struct 
shrink_control *sc)
count += obj->base.size >> PAGE_SHIFT;
 
list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
-   if (obj->pages_pin_count == num_vma_bound(obj))
+   if (!obj->active && obj->pages_pin_count == num_vma_bound(obj))
count += obj->base.size >> PAGE_SHIFT;
}
 
-- 
2.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Enforce execobject.alignment to be a power-of-two

2015-06-19 Thread Chris Wilson
Internal requirement for the alignment is that it must be a
power-of-two, so enforce rejection at the user interface to execbuffer
(which allows the caller to specify a stricter-than-expected alignment
criterion).

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index f4fb2bd33753..b14733908d8d 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1189,6 +1189,9 @@ validate_exec_list(const struct drm_i915_gem_execbuffer2 
*args,
if (exec[i].flags & invalid_flags)
return -EINVAL;
 
+   if (exec[i].alignment && !is_power_of_2(exec[i].alignment))
+   return -EINVAL;
+
/* pad_to_size was once a reserved field, so sanitize it */
if (exec[i].flags & EXEC_OBJECT_PAD_TO_SIZE) {
if (offset_in_page(exec[i].pad_to_size))
-- 
2.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Ignore LVDS presence in VBT flag if the LVDS is enabled by BIOS

2015-06-19 Thread Chris Wilson
On older gen, pre-Ironlake, parts there is no hardwired pin to report
the presence of an LVDS panel. Instead, we have to rely on the VBT to
declare whether the machine has a panel or not. Though notoriously
unreliable, so far we have erred on the side of false-positives and have
required a list of machines which end up falsely reporting a panel as
present. However, we now have reports of false-negatives, machines with
an LVDS that are being ignored due to the VBT not declaring the panel.
This patch ignores the VBT setting if the BIOS has already enabled the
LVDS panel (and on Ironlake+ we also have the hardware presence pin).

It fixes the Samsung NP680Z5E-X01FR in the bug report, but is likely to
result in more false-positives, and since we rely on the BIOS to enable
the panel, there are likely different circumstances where the BIOS will
not enable that panel (and so we may see the same machine with and
without a panel all on the whim of the BIOS).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90979
Reported-and-tested-by: lys...@gmail.com
Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/intel_lvds.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lvds.c 
b/drivers/gpu/drm/i915/intel_lvds.c
index 161ab26f81fb..bf1702a6e33d 100644
--- a/drivers/gpu/drm/i915/intel_lvds.c
+++ b/drivers/gpu/drm/i915/intel_lvds.c
@@ -942,12 +942,6 @@ void intel_lvds_init(struct drm_device *dev)
if (dmi_check_system(intel_no_lvds))
return;
 
-   pin = GMBUS_PIN_PANEL;
-   if (!lvds_is_present_in_vbt(dev, &pin)) {
-   DRM_DEBUG_KMS("LVDS is not present in VBT\n");
-   return;
-   }
-
if (HAS_PCH_SPLIT(dev)) {
if ((I915_READ(PCH_LVDS) & LVDS_DETECTED) == 0)
return;
@@ -957,6 +951,16 @@ void intel_lvds_init(struct drm_device *dev)
}
}
 
+   pin = GMBUS_PIN_PANEL;
+   if (!lvds_is_present_in_vbt(dev, &pin)) {
+   u32 reg = HAS_PCH_SPLIT(dev) ? PCH_LVDS : LVDS;
+   if ((I915_READ(reg) & LVDS_PORT_EN) == 0) {
+   DRM_DEBUG_KMS("LVDS is not present in VBT\n");
+   return;
+   }
+   DRM_DEBUG_KMS("LVDS is not present in VBT, but enabled 
anyway\n");
+   }
+
lvds_encoder = kzalloc(sizeof(*lvds_encoder), GFP_KERNEL);
if (!lvds_encoder)
return;
-- 
2.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 3/4] snd: add support for displayport multi-stream to hda codec.

2015-06-19 Thread Takashi Iwai
At Fri, 19 Jun 2015 20:33:39 +1000,
Dave Airlie wrote:
> 
> On 19 June 2015 at 19:54, Lin, Mengdong  wrote:
> > Hi Takashi/Dave,
> >
> > Shall we move or cc this discussion on audio driver side to ALSA ML?
> 
> Oops I thought I had cc'ed these patches to alsa-devel as well when I sent 
> them.
> 
> > I think we also need to decide how to manage PCM devices for DP MST.
> > Now the HD-A driver create a PCM device for each pin, and the substream
> > number is 1 for each PCM. Now with DP MST enabled, each pin can support
> > multiple streams (e.g. 3 on Intel HSW/BDW/SKL).
> >
> > There may be 2 options:
> > -#1: Let an HDMI codec specify number of substreams, same as the number
> > of device entries on a pin. We can specify 3 for HSW/BDW/SKL. Other
> > vendors can also specify a value according to actual HW capabilities.
> >
> > So for HSW, we have 3x3 subtreams totally. But we only have 3 convertors
> > (for 3 display pipelines), so we can open up to 3 substreams at the same
> > time. When the audio driver finds all 3 convertors are used when opening
> > a new substream, it will fail.
> 
> One thing I noticed is the number of devices on a PIN is only updated when
> the MST device is plugged in so normally pins 5,6,7 have 0 devices, and when
> I plug in MST device, I get the 3 devices on port 6. So it seems dynamic
> enough at this point, though I guess it'll always be 0 or 3.
> >
> > - #2: Create PCM device dynamically. Only create a PCM devices for a device
> > entry which connects to monitor with audio support. When the monitor
> > is removed, the PCM device will be disconnected, closed and removed,
> > similar to the USB case.
> >
> > This will change ALSA core. But there will be less PCM devices and
> > substreams, since the number of connected monitors Is decided by the
> > actual GPU display pipeline.
> 
> I like this option more, since I think it should be more like USB, but I've
> no idea how much work it would be from the alsa side, this patch was
> probably as deep into alsa as I've gone.

Two things have to be considered for compatibility:
- ELD, channel map and jack detection: these are created per PCM
  device, and extending to substream would confuse user space a lot.
  In theory, it can be extended using subdevice number, but in anyway
  this won't work with PulseAudio as is.

- The per-pin assignment provides a more or less persistent route to a
  certain device.  Changing the assignment method may break the
  previous setup.

Also, the dynamic PCM creation / removal is an issue that has been
discussed many times.  Unfortunately it won't work as is, at lest for
PA.  Currently PA does probing of devices only at the card probe time.
The hotplug of USB-audio works because it's always tied with the
card.  But in this case, the card remains while only the PCM devices
will be created / removed, thus the probe in PA won't be triggered.


Takashi

> 
> Dave.
> 
> >> Signed-off-by: Dave Airlie 
> >> ---
> >>  include/sound/hda_verbs.h  |   2 +
> >>  sound/pci/hda/hda_codec.c  |   1 +
> >>  sound/pci/hda/hda_proc.c   |   5 +-
> >>  sound/pci/hda/patch_hdmi.c | 181
> >> +++--
> >>  4 files changed, 131 insertions(+), 58 deletions(-)
> >>
> >> diff --git a/include/sound/hda_verbs.h b/include/sound/hda_verbs.h index
> >> d0509db..3b62ac5 100644
> >> --- a/include/sound/hda_verbs.h
> >> +++ b/include/sound/hda_verbs.h
> >> @@ -75,6 +75,7 @@ enum {
> >>  #define AC_VERB_GET_HDMI_CHAN_SLOT   0x0f34
> >>  #define AC_VERB_GET_DEVICE_SEL   0xf35
> >>  #define AC_VERB_GET_DEVICE_LIST  0xf36
> >> +#define AC_VERB_GET_DP_STREAM_ID 0xf3c
> >>
> >>  /*
> >>   * SET verbs
> >> @@ -115,6 +116,7 @@ enum {
> >>  #define AC_VERB_SET_HDMI_CP_CTRL 0x733
> >>  #define AC_VERB_SET_HDMI_CHAN_SLOT   0x734
> >>  #define AC_VERB_SET_DEVICE_SEL   0x735
> >> +#define AC_VERB_SET_DP_STREAM_ID 0x73C
> >>
> >>  /*
> >>   * Parameter IDs
> >> diff --git a/sound/pci/hda/hda_codec.c b/sound/pci/hda/hda_codec.c index
> >> 5645481..3981c63 100644
> >> --- a/sound/pci/hda/hda_codec.c
> >> +++ b/sound/pci/hda/hda_codec.c
> >> @@ -482,6 +482,7 @@ int snd_hda_get_devices(struct hda_codec *codec,
> >> hda_nid_t nid,
> >>   }
> >>   return devices;
> >>  }
> >> +EXPORT_SYMBOL_GPL(snd_hda_get_devices);
> >>
> >>  /*
> >>   * destructor
> >> diff --git a/sound/pci/hda/hda_proc.c b/sound/pci/hda/hda_proc.c index
> >> baaf7ed0..39fac53 100644
> >> --- a/sound/pci/hda/hda_proc.c
> >> +++ b/sound/pci/hda/hda_proc.c
> >> @@ -644,10 +644,13 @@ static void print_device_list(struct snd_info_buffer
> >> *buffer,
> >>   int i, curr = -1;
> >>   u8 dev_list[AC_MAX_DEV_LIST_LEN];
> >>   int devlist_len;
> >> + int dp_s_id;
> >>
> >> + dp_s_id = snd_hda_codec_read(codec, nid, 0,
> >> + AC_VERB_GET_DP_STREAM_ID, 0);
> >>   devlist_len = snd_hda_get_devi

Re: [Intel-gfx] [RFC] drm/i915: Introduce documentation for register subfields

2015-06-19 Thread Damien Lespiau
On Fri, Jun 19, 2015 at 10:52:15AM +0100, Chris Wilson wrote:
> An interesting point was raised in some recent patches to document the
> various widths of the register subfields. I disagreed with that patch in
> that it transformed illegal values into some random potentially harmful
> valid value (and not transforming an invalid value would end up writing
> bits in other subfields, so equally bad). Most of our register usage is
> with static values so unlikely to be of concern, but for those we can
> document the register subfields and provide compile time checking.

I've thought about things like this in the past as well, we can even
automatically generate all of that from the specs!

I had other per-register metadata in mind, like:
  - is the register is a masked one or not (and then warn if we're
trying to write it wihtout the _MASKED_*() macros)
  - is the register part of the render context (and then warn if we try
to write it with a MMIO write).

#define FOO 0x1234
#define FOO_IS_MASKED   1
#define FOO_IS_IN_RENDER_CTX1

should be enough for the compile-time warns like you do here.

-- 
Damien
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [alsa-devel] DP MST audio support

2015-06-19 Thread Dave Airlie
Just to fill in the info on this one.

> We don't have Haswell Lenovo t440s atm, so could you share more info?
> - Dell U2410 should support both HDMI and DP input. But I guess it cannot
>  support DP MST, right?
> - Are you connecting this monitor a DP cable?
>  Which DDI port is used? DDI B, C or D?
> - Does audio fail after i915 enables DP MST?
> - Is patch "snd/hdmi: hack out haswell codec workaround" the only change
>  on audio driver side?

Yes its a DP SST input on the U2410, and I'm using that for the audio.

It's connected to the Lenovo dock with DP cable. The dock is an MST device.

The dock is connected to DDI C I think, and if the dock is operated in SST
mode audio works, but in MST mode audio fails. (operating the dock in SST
mode isn't useful though since only one of the multiple outputs works then).

>> > The graphics side patches are fairly trivial, also it would be good to
>> > get a good explaination of how the hw works,
>> >
>> > from what I can see devices get connections not pins on this hw, and I
>> > notice that I don't always get 3 devices, so I'm not sure if devices
>> > are a dynamic thing we should be reprobing on some signal.
>
> Do you mean 3 PCM devices here, like pcmC0D3p, pcmC0D7p, pcmC0D8p?
> Now the devices are not dynamic, a PCM device is created on each pin.
> It seems we need to revise this for DP MST, since a pin can be used to send
> up to 3 independent streams on Intel GPU which has 3 display pipelines.
>

No I mean the "Devices" from snd_hda_get_devices.

Dave.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 3/4] snd: add support for displayport multi-stream to hda codec.

2015-06-19 Thread Dave Airlie
On 19 June 2015 at 19:54, Lin, Mengdong  wrote:
> Hi Takashi/Dave,
>
> Shall we move or cc this discussion on audio driver side to ALSA ML?

Oops I thought I had cc'ed these patches to alsa-devel as well when I sent them.

> I think we also need to decide how to manage PCM devices for DP MST.
> Now the HD-A driver create a PCM device for each pin, and the substream
> number is 1 for each PCM. Now with DP MST enabled, each pin can support
> multiple streams (e.g. 3 on Intel HSW/BDW/SKL).
>
> There may be 2 options:
> -#1: Let an HDMI codec specify number of substreams, same as the number
> of device entries on a pin. We can specify 3 for HSW/BDW/SKL. Other
> vendors can also specify a value according to actual HW capabilities.
>
> So for HSW, we have 3x3 subtreams totally. But we only have 3 convertors
> (for 3 display pipelines), so we can open up to 3 substreams at the same
> time. When the audio driver finds all 3 convertors are used when opening
> a new substream, it will fail.

One thing I noticed is the number of devices on a PIN is only updated when
the MST device is plugged in so normally pins 5,6,7 have 0 devices, and when
I plug in MST device, I get the 3 devices on port 6. So it seems dynamic
enough at this point, though I guess it'll always be 0 or 3.
>
> - #2: Create PCM device dynamically. Only create a PCM devices for a device
> entry which connects to monitor with audio support. When the monitor
> is removed, the PCM device will be disconnected, closed and removed,
> similar to the USB case.
>
> This will change ALSA core. But there will be less PCM devices and
> substreams, since the number of connected monitors Is decided by the
> actual GPU display pipeline.

I like this option more, since I think it should be more like USB, but I've
no idea how much work it would be from the alsa side, this patch was
probably as deep into alsa as I've gone.

Dave.

>> Signed-off-by: Dave Airlie 
>> ---
>>  include/sound/hda_verbs.h  |   2 +
>>  sound/pci/hda/hda_codec.c  |   1 +
>>  sound/pci/hda/hda_proc.c   |   5 +-
>>  sound/pci/hda/patch_hdmi.c | 181
>> +++--
>>  4 files changed, 131 insertions(+), 58 deletions(-)
>>
>> diff --git a/include/sound/hda_verbs.h b/include/sound/hda_verbs.h index
>> d0509db..3b62ac5 100644
>> --- a/include/sound/hda_verbs.h
>> +++ b/include/sound/hda_verbs.h
>> @@ -75,6 +75,7 @@ enum {
>>  #define AC_VERB_GET_HDMI_CHAN_SLOT   0x0f34
>>  #define AC_VERB_GET_DEVICE_SEL   0xf35
>>  #define AC_VERB_GET_DEVICE_LIST  0xf36
>> +#define AC_VERB_GET_DP_STREAM_ID 0xf3c
>>
>>  /*
>>   * SET verbs
>> @@ -115,6 +116,7 @@ enum {
>>  #define AC_VERB_SET_HDMI_CP_CTRL 0x733
>>  #define AC_VERB_SET_HDMI_CHAN_SLOT   0x734
>>  #define AC_VERB_SET_DEVICE_SEL   0x735
>> +#define AC_VERB_SET_DP_STREAM_ID 0x73C
>>
>>  /*
>>   * Parameter IDs
>> diff --git a/sound/pci/hda/hda_codec.c b/sound/pci/hda/hda_codec.c index
>> 5645481..3981c63 100644
>> --- a/sound/pci/hda/hda_codec.c
>> +++ b/sound/pci/hda/hda_codec.c
>> @@ -482,6 +482,7 @@ int snd_hda_get_devices(struct hda_codec *codec,
>> hda_nid_t nid,
>>   }
>>   return devices;
>>  }
>> +EXPORT_SYMBOL_GPL(snd_hda_get_devices);
>>
>>  /*
>>   * destructor
>> diff --git a/sound/pci/hda/hda_proc.c b/sound/pci/hda/hda_proc.c index
>> baaf7ed0..39fac53 100644
>> --- a/sound/pci/hda/hda_proc.c
>> +++ b/sound/pci/hda/hda_proc.c
>> @@ -644,10 +644,13 @@ static void print_device_list(struct snd_info_buffer
>> *buffer,
>>   int i, curr = -1;
>>   u8 dev_list[AC_MAX_DEV_LIST_LEN];
>>   int devlist_len;
>> + int dp_s_id;
>>
>> + dp_s_id = snd_hda_codec_read(codec, nid, 0,
>> + AC_VERB_GET_DP_STREAM_ID, 0);
>>   devlist_len = snd_hda_get_devices(codec, nid, dev_list,
>>   AC_MAX_DEV_LIST_LEN);
>> - snd_iprintf(buffer, "  Devices: %d\n", devlist_len);
>> + snd_iprintf(buffer, "  Devices: %d: 0x%x\n", devlist_len, dp_s_id);
>>   if (devlist_len <= 0)
>>   return;
>>
>> diff --git a/sound/pci/hda/patch_hdmi.c b/sound/pci/hda/patch_hdmi.c index
>> 5f44f60..8272656 100644
>> --- a/sound/pci/hda/patch_hdmi.c
>> +++ b/sound/pci/hda/patch_hdmi.c
>> @@ -68,6 +68,17 @@ struct hdmi_spec_per_cvt {
>>  /* max. connections to a widget */
>>  #define HDA_MAX_CONNECTIONS  32
>>
>> +struct hdmi_spec_per_pin;
>> +#define HDA_MAX_DEVICES 3
>> +struct hdmi_spec_per_device {
>> + struct hdmi_spec_per_pin *pin;
>> + int device_idx;
>> + struct hdmi_eld sink_eld;
>> +#ifdef CONFIG_PROC_FS
>> + struct snd_info_entry *proc_entry;
>> +#endif
>> +};
>> +
>>  struct hdmi_spec_per_pin {
>>   hda_nid_t pin_nid;
>>   int num_mux_nids;
>> @@ -76,7 +87,11 @@ struct hdmi_spec_per_pin {
>>   hda_nid_t cvt_nid;
>>
>>   struct hda_codec *codec;
>> - struct hdmi_eld sink_eld;
>> +
>> + int 

Re: [Intel-gfx] [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

2015-06-19 Thread Chris Wilson
On Fri, Jun 19, 2015 at 10:48:24AM +0100, Siluvery, Arun wrote:
> variable names were getting too long and caused difficulties in
> indentation so tried to shorten them, will change this part.

It's a trade off. I was thinking we wouldn't need to use the full form
that often so the extra characters wouldn't be too much of an issue, but
having the names be consistent is most valuable. You can always play
around with temporary variables, using short structs, to make it
readable in places if it gets too ugly.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 3/4] snd: add support for displayport multi-stream to hda codec.

2015-06-19 Thread Lin, Mengdong
Hi Takashi/Dave,

Shall we move or cc this discussion on audio driver side to ALSA ML?

I think we also need to decide how to manage PCM devices for DP MST.
Now the HD-A driver create a PCM device for each pin, and the substream
number is 1 for each PCM. Now with DP MST enabled, each pin can support
multiple streams (e.g. 3 on Intel HSW/BDW/SKL).

There may be 2 options:
-#1: Let an HDMI codec specify number of substreams, same as the number
of device entries on a pin. We can specify 3 for HSW/BDW/SKL. Other
vendors can also specify a value according to actual HW capabilities.

So for HSW, we have 3x3 subtreams totally. But we only have 3 convertors
(for 3 display pipelines), so we can open up to 3 substreams at the same
time. When the audio driver finds all 3 convertors are used when opening
a new substream, it will fail.

- #2: Create PCM device dynamically. Only create a PCM devices for a device
entry which connects to monitor with audio support. When the monitor
is removed, the PCM device will be disconnected, closed and removed,
similar to the USB case. 

This will change ALSA core. But there will be less PCM devices and
substreams, since the number of connected monitors Is decided by the
actual GPU display pipeline.

Could you share your preference on these 2 options or other suggestions?

Thanks
Mengdong 

> -Original Message-
> From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On Behalf Of
> Dave Airlie
> Sent: Wednesday, June 17, 2015 12:02 PM
> To: intel-gfx@lists.freedesktop.org
> Subject: [Intel-gfx] [PATCH 3/4] snd: add support for displayport multi-stream
> to hda codec.
> 
> From: Dave Airlie 
> 
> Add new verbs GET_DP_STREAM_ID and SET_DP_STREAM_ID from Intel docs.
> get stream id and print in proc
> split ELD to be per device not per pin
> handle pd/eldv per device not per pin
> setup codec->dp_mst earlier.
> 
> Signed-off-by: Dave Airlie 
> ---
>  include/sound/hda_verbs.h  |   2 +
>  sound/pci/hda/hda_codec.c  |   1 +
>  sound/pci/hda/hda_proc.c   |   5 +-
>  sound/pci/hda/patch_hdmi.c | 181
> +++--
>  4 files changed, 131 insertions(+), 58 deletions(-)
> 
> diff --git a/include/sound/hda_verbs.h b/include/sound/hda_verbs.h index
> d0509db..3b62ac5 100644
> --- a/include/sound/hda_verbs.h
> +++ b/include/sound/hda_verbs.h
> @@ -75,6 +75,7 @@ enum {
>  #define AC_VERB_GET_HDMI_CHAN_SLOT   0x0f34
>  #define AC_VERB_GET_DEVICE_SEL   0xf35
>  #define AC_VERB_GET_DEVICE_LIST  0xf36
> +#define AC_VERB_GET_DP_STREAM_ID 0xf3c
> 
>  /*
>   * SET verbs
> @@ -115,6 +116,7 @@ enum {
>  #define AC_VERB_SET_HDMI_CP_CTRL 0x733
>  #define AC_VERB_SET_HDMI_CHAN_SLOT   0x734
>  #define AC_VERB_SET_DEVICE_SEL   0x735
> +#define AC_VERB_SET_DP_STREAM_ID 0x73C
> 
>  /*
>   * Parameter IDs
> diff --git a/sound/pci/hda/hda_codec.c b/sound/pci/hda/hda_codec.c index
> 5645481..3981c63 100644
> --- a/sound/pci/hda/hda_codec.c
> +++ b/sound/pci/hda/hda_codec.c
> @@ -482,6 +482,7 @@ int snd_hda_get_devices(struct hda_codec *codec,
> hda_nid_t nid,
>   }
>   return devices;
>  }
> +EXPORT_SYMBOL_GPL(snd_hda_get_devices);
> 
>  /*
>   * destructor
> diff --git a/sound/pci/hda/hda_proc.c b/sound/pci/hda/hda_proc.c index
> baaf7ed0..39fac53 100644
> --- a/sound/pci/hda/hda_proc.c
> +++ b/sound/pci/hda/hda_proc.c
> @@ -644,10 +644,13 @@ static void print_device_list(struct snd_info_buffer
> *buffer,
>   int i, curr = -1;
>   u8 dev_list[AC_MAX_DEV_LIST_LEN];
>   int devlist_len;
> + int dp_s_id;
> 
> + dp_s_id = snd_hda_codec_read(codec, nid, 0,
> + AC_VERB_GET_DP_STREAM_ID, 0);
>   devlist_len = snd_hda_get_devices(codec, nid, dev_list,
>   AC_MAX_DEV_LIST_LEN);
> - snd_iprintf(buffer, "  Devices: %d\n", devlist_len);
> + snd_iprintf(buffer, "  Devices: %d: 0x%x\n", devlist_len, dp_s_id);
>   if (devlist_len <= 0)
>   return;
> 
> diff --git a/sound/pci/hda/patch_hdmi.c b/sound/pci/hda/patch_hdmi.c index
> 5f44f60..8272656 100644
> --- a/sound/pci/hda/patch_hdmi.c
> +++ b/sound/pci/hda/patch_hdmi.c
> @@ -68,6 +68,17 @@ struct hdmi_spec_per_cvt {
>  /* max. connections to a widget */
>  #define HDA_MAX_CONNECTIONS  32
> 
> +struct hdmi_spec_per_pin;
> +#define HDA_MAX_DEVICES 3
> +struct hdmi_spec_per_device {
> + struct hdmi_spec_per_pin *pin;
> + int device_idx;
> + struct hdmi_eld sink_eld;
> +#ifdef CONFIG_PROC_FS
> + struct snd_info_entry *proc_entry;
> +#endif
> +};
> +
>  struct hdmi_spec_per_pin {
>   hda_nid_t pin_nid;
>   int num_mux_nids;
> @@ -76,7 +87,11 @@ struct hdmi_spec_per_pin {
>   hda_nid_t cvt_nid;
> 
>   struct hda_codec *codec;
> - struct hdmi_eld sink_eld;
> +
> + int num_devices;
> + u8 dev_list[AC_MAX_DEV_LIST_LEN];
> + struct hdmi_spec_per_device devices[HD

[Intel-gfx] [RFC] drm/i915: Introduce documentation for register subfields

2015-06-19 Thread Chris Wilson
An interesting point was raised in some recent patches to document the
various widths of the register subfields. I disagreed with that patch in
that it transformed illegal values into some random potentially harmful
valid value (and not transforming an invalid value would end up writing
bits in other subfields, so equally bad). Most of our register usage is
with static values so unlikely to be of concern, but for those we can
document the register subfields and provide compile time checking.

The resulting error message is fairly impententrable though:

n file included from include/uapi/linux/stddef.h:1:0,
 from include/linux/stddef.h:4,
 from ./include/uapi/linux/posix_types.h:4,
 from include/uapi/linux/types.h:13,
 from include/linux/types.h:5,
 from include/linux/mod_devicetable.h:11,
 from include/linux/i2c.h:29,
 from drivers/gpu/drm/i915/intel_dp.c:28:
In function ‘intel_dp_autotest_edid’,
inlined from ‘intel_dp_handle_test_request’ at 
drivers/gpu/drm/i915/intel_dp.c:4230:14,
inlined from ‘intel_dp_detect’ at drivers/gpu/drm/i915/intel_dp.c:4635:4:
include/linux/compiler.h:429:38: error: call to ‘__compiletime_assert_4173’ 
declared with attribute error: BUILD_BUG_ON failed: (5) & -(1<<2)
  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
  ^
include/linux/compiler.h:412:4: note: in definition of macro 
‘__compiletime_assert’
prefix ## suffix();\
^
include/linux/compiler.h:429:2: note: in expansion of macro 
‘_compiletime_assert’
  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
  ^
include/linux/bug.h:50:37: note: in expansion of macro ‘compiletime_assert’
 #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
 ^
include/linux/bug.h:74:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
  BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
  ^
drivers/gpu/drm/i915/intel_dp.c:45:68: note: in expansion of macro 
‘BUILD_BUG_ON’
 #define __FIELD__(x, shift, width) ({ if (__builtin_constant_p(x)) 
BUILD_BUG_ON((x) & -(1

Re: [Intel-gfx] [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

2015-06-19 Thread Siluvery, Arun

On 19/06/2015 10:27, Chris Wilson wrote:

On Thu, Jun 18, 2015 at 06:33:24PM +0100, Arun Siluvery wrote:
Totally minor worries now.


+/**
+ * gen8_init_indirectctx_bb() - initialize indirect ctx batch with WA
+ *
+ * @ring: only applicable for RCS
+ * @wa_ctx_batch: page in which WA are loaded
+ * @offset: This is for future use in case if we would like to have multiple
+ *  batches at different offsets and select them based on a criteria.
+ * @num_dwords: The number of WA applied are known at the beginning, it returns
+ * the no of DWORDS written. This batch does not contain MI_BATCH_BUFFER_END
+ * so it adds padding to make it cacheline aligned. MI_BATCH_BUFFER_END will be
+ * added to perctx batch and both of them together makes a complete batch 
buffer.
+ *
+ * Return: non-zero if we exceed the PAGE_SIZE limit.
+ */
+
+static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring,
+   uint32_t **wa_ctx_batch,
+   uint32_t offset,
+   uint32_t *num_dwords)
+{
+   uint32_t index;
+   uint32_t *batch = *wa_ctx_batch;
+
+   index = offset;


I worry that offset need not be cacheline aligned on entry (for example
if indirectctx and perctx were switched, or someone else stuffed more
controls into the per-ring object). Like perctx, there is no mention of
any alignment worries for the starting location, but here you tell use
that the INDIRECT_CTX length is specified in cacheline, so I also presume
the start needs to be aligned.


offset need to be cachealigned, I will update the comments.


+static int lrc_setup_wa_ctx_obj(struct intel_engine_cs *ring, u32 size)
+{
+   int ret;
+
+   WARN_ON(ring->id != RCS);
+
+   ring->wa_ctx.obj = i915_gem_alloc_object(ring->dev, PAGE_ALIGN(size));
+   if (!ring->wa_ctx.obj) {
+   DRM_DEBUG_DRIVER("alloc LRC WA ctx backing obj failed.\n");
+   return -ENOMEM;
+   }
+
+   ret = i915_gem_obj_ggtt_pin(ring->wa_ctx.obj, PAGE_SIZE, 0);


One day _pin() will return the vma being pinned and I will rejoice as it
makes reviewing pinning much easier! Not a problem for you right now.


+static int intel_init_workaround_bb(struct intel_engine_cs *ring)
+{
+   int ret = 0;
+   uint32_t *batch;
+   uint32_t num_dwords;
+   struct page *page;
+   struct i915_ctx_workarounds *wa_ctx = &ring->wa_ctx;
+
+   WARN_ON(ring->id != RCS);
+
+   if (ring->scratch.obj == NULL) {
+   DRM_ERROR("scratch page not allocated for %s\n", ring->name);
+   return -EINVAL;
+   }


I haven't found the dependence upon scratch.obj, could you explain it?
Does it appear later?


yes it does in patch 2 which rearranges init_pipe_control().
I will move this check to that patch as per your comment.


@@ -1754,15 +1934,26 @@ populate_lr_context(struct intel_context *ctx, struct 
drm_i915_gem_object *ctx_o
reg_state[CTX_SECOND_BB_STATE] = ring->mmio_base + 0x118;
reg_state[CTX_SECOND_BB_STATE+1] = 0;
if (ring->id == RCS) {
-   /* TODO: according to BSpec, the register state context
-* for CHV does not have these. OTOH, these registers do
-* exist in CHV. I'm waiting for a clarification */
reg_state[CTX_BB_PER_CTX_PTR] = ring->mmio_base + 0x1c0;
reg_state[CTX_BB_PER_CTX_PTR+1] = 0;
reg_state[CTX_RCS_INDIRECT_CTX] = ring->mmio_base + 0x1c4;
reg_state[CTX_RCS_INDIRECT_CTX+1] = 0;
reg_state[CTX_RCS_INDIRECT_CTX_OFFSET] = ring->mmio_base + 
0x1c8;
reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] = 0;
+   if (ring->wa_ctx.obj) {
+   reg_state[CTX_RCS_INDIRECT_CTX+1] =
+   (i915_gem_obj_ggtt_offset(ring->wa_ctx.obj) +
+ring->wa_ctx.indctx_batch_offset * 
sizeof(uint32_t)) |
+   (ring->wa_ctx.indctx_batch_size / 
CACHELINE_DWORDS);


Ok, this really does impose alignment conditions on indctx_batch_offset

Oh, if I do get a chance to complain, spell out indirect_ctx, make it a
struct or even just precalculate the reg value, just indctx's only value
is that is the same length as perctx, but otherwise quite obtuse.

variable names were getting too long and caused difficulties in 
indentation so tried to shorten them, will change this part.


regards
Arun


Other than that, I couldn't punch any holes in its robustness, and the
series is starting to look quite good and very neat.
-Chris



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 07/15] drm/i915: Defer default hardware context initialisation until first open

2015-06-19 Thread Dave Gordon
On 16/06/15 10:35, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:36:25PM +0100, Dave Gordon wrote:
>> +static int i915_gem_context_first_open(struct drm_device *dev)
>> +{
>> +struct drm_i915_private *dev_priv = dev->dev_private;
>> +int ret;
>> +
>> +/*
>> + * We can't enable contexts until all firmware is loaded. This
>> + * call shouldn't return -EAGAIN because we pass wait=true, but
>> + * it can still fail with code -EIO if the GuC doesn't respond,
>> + * or -ENOEXEC if the GuC firmware image is invalid.
>> + */
>> +ret = intel_guc_ucode_load(dev, true);
>> +WARN_ON(ret == -EAGAIN);
>> +
>> +/*
>> + * If an error occurred and GuC submission has been requested, we can
>> + * attempt recovery by disabling GuC submission and reinitialising
>> + * the GPU and driver. We then fail this open() anyway, but the next
>> + * attempt will find that GuC submission is already disabled, and so
>> + * proceed to complete context initialisation in non-GuC mode instead.
>> + */
>> +if (ret && i915.enable_guc_submission) {
>> +i915_handle_guc_error(dev, ret);
>> +return ret;
>> +}
> 
> This is still backwards. What we wanted was for the submission process
> to start up normally and then once the GuC loading succeeds, we then
> start submitting the backlog to the GuC. If the loading fails, we can
> then submit the backlog via execlists. It may be interesting to even
> start userspace before GuC finishes loading.

Absolutely. The latter is what this allows :)

(And its a requirement for Android, as on those platforms the f/w won't
become available until userspace is running).

But we're not going to keep stuff queued up in the rings. It would add
more complexity to manage the backlog and remember that we can accept
calls to add_request but not actually submit them. Also there would be
issue with any code that sent commands to the engines to program
registers via LRIs and then EITHER assumed that they had taken effect OR
waited for completion (because that would block indefinitely).

Instead, we've split hw_init() into early (MMIO) and late (context and
batch) phases, and deferred all of the latter iff we need GuC f/w to be
loaded before batch submission. When the late phase runs, it can submit
batches, and wait for completion if required, without blocking the
entire system as it would if called from driver_load().

It might even make sense as an overall strategy to defer MORE of the
driver initialisation process, so that the critical single-threaded
driver load during system startup does as little as possible.

> So this makes more sense as to why you have the tight integration with
> execlists then. I still don't think that justifies changing gen8 without
> reason.
> -Chris

It's tightly integrated because GuC submission *IS* execlist submission,
only with the GuC doing the actual poking of the ELSP and fielding the
resulting context-switch interrupts.

Deferred initialisation doesn't apply to Gen8, or anything else that
doesn't have and use GuC submission. The delayed-context-initialisation
codepath is only reachable if there is GuC firmware to be loaded.

.Dave.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC PATCH 2/7] drm/i915: Read hw state into an atomic state struct, try 2.

2015-06-19 Thread Daniel Stone
Hi,

On 19 June 2015 at 09:02, Maarten Lankhorst
 wrote:
> +   if (crtc->state->enable) {
> +   intel_mode_from_pipe_config(&crtc->mode,
> +   to_intel_crtc_state(crtc->state));
> +   
> intel_mode_from_pipe_config(&crtc->state->adjusted_mode,
> +   to_intel_crtc_state(crtc->state));
> +
> +   if (drm_atomic_set_mode_for_crtc(crtc->state, 
> &crtc->mode) < 0)
> +   drm_mode_copy(&crtc->state->mode, 
> &crtc->mode);

I've never really understood why this is here. There are only two ways
in which set_mode_for_crtc can fail: either the mode fails validation
(fixed in the patch I sent you the other day, which it seems like
you've now pulled in), or -ENOMEM. The first one we shouldn't really
be papering over - and I haven't seen it happen since that patch - and
the second one isn't really recoverable anyway. Plus I would argue
that breaking the state like that is pretty bad.

Cheers,
Daniel
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v5 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround

2015-06-19 Thread Chris Wilson
On Thu, Jun 18, 2015 at 06:33:29PM +0100, Arun Siluvery wrote:
> In Per context w/a batch buffer,
> WaRsRestoreWithPerCtxtBb
> 
> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
> so as to not break any future users of existing definitions (Michel)
> 
> v3: Length defined in current definitions of LRM, LRR instructions was 
> specified
> as 0. It seems it is common convention for instructions whose length vary 
> between
> platforms. This is not an issue so far because they are not used anywhere 
> except
> command parser; now that we use in this patch update them with correct length
> and also move them out of command parser placeholder to appropriate place.
> remove unnecessary padding and follow the WA programming sequence exactly
> as mentioned in spec which is essential for this WA (Dave).

Similarly with the indirect ctx from patch 5, having the documentation
that we need scratch_obj in the preamble for these workarounds I think
is worth its weight in the extra typing.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v5 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround

2015-06-19 Thread Chris Wilson
On Thu, Jun 18, 2015 at 06:33:28PM +0100, Arun Siluvery wrote:
> In Indirect context w/a batch buffer,
> WaClearSlmSpaceAtContextSwitch
> 
> v2: s/PIPE_CONTROL_FLUSH_RO_CACHES/PIPE_CONTROL_FLUSH_L3 (Ville)

Ok, so this is the first use of scratch object. I would like to move the
test for existence from patch 1 to here, and into indirectctx_bb so that
we know why we need the obj. It's just trying to keep the patches and
code as self-contained and as descriptive as possible (including their
requirements).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

2015-06-19 Thread Chris Wilson
On Thu, Jun 18, 2015 at 06:33:24PM +0100, Arun Siluvery wrote:
Totally minor worries now.

> +/**
> + * gen8_init_indirectctx_bb() - initialize indirect ctx batch with WA
> + *
> + * @ring: only applicable for RCS
> + * @wa_ctx_batch: page in which WA are loaded
> + * @offset: This is for future use in case if we would like to have multiple
> + *  batches at different offsets and select them based on a criteria.
> + * @num_dwords: The number of WA applied are known at the beginning, it 
> returns
> + * the no of DWORDS written. This batch does not contain MI_BATCH_BUFFER_END
> + * so it adds padding to make it cacheline aligned. MI_BATCH_BUFFER_END will 
> be
> + * added to perctx batch and both of them together makes a complete batch 
> buffer.
> + *
> + * Return: non-zero if we exceed the PAGE_SIZE limit.
> + */
> +
> +static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring,
> + uint32_t **wa_ctx_batch,
> + uint32_t offset,
> + uint32_t *num_dwords)
> +{
> + uint32_t index;
> + uint32_t *batch = *wa_ctx_batch;
> +
> + index = offset;

I worry that offset need not be cacheline aligned on entry (for example
if indirectctx and perctx were switched, or someone else stuffed more
controls into the per-ring object). Like perctx, there is no mention of
any alignment worries for the starting location, but here you tell use
that the INDIRECT_CTX length is specified in cacheline, so I also presume
the start needs to be aligned.

> +static int lrc_setup_wa_ctx_obj(struct intel_engine_cs *ring, u32 size)
> +{
> + int ret;
> +
> + WARN_ON(ring->id != RCS);
> +
> + ring->wa_ctx.obj = i915_gem_alloc_object(ring->dev, PAGE_ALIGN(size));
> + if (!ring->wa_ctx.obj) {
> + DRM_DEBUG_DRIVER("alloc LRC WA ctx backing obj failed.\n");
> + return -ENOMEM;
> + }
> +
> + ret = i915_gem_obj_ggtt_pin(ring->wa_ctx.obj, PAGE_SIZE, 0);

One day _pin() will return the vma being pinned and I will rejoice as it
makes reviewing pinning much easier! Not a problem for you right now.

> +static int intel_init_workaround_bb(struct intel_engine_cs *ring)
> +{
> + int ret = 0;
> + uint32_t *batch;
> + uint32_t num_dwords;
> + struct page *page;
> + struct i915_ctx_workarounds *wa_ctx = &ring->wa_ctx;
> +
> + WARN_ON(ring->id != RCS);
> +
> + if (ring->scratch.obj == NULL) {
> + DRM_ERROR("scratch page not allocated for %s\n", ring->name);
> + return -EINVAL;
> + }

I haven't found the dependence upon scratch.obj, could you explain it?
Does it appear later?

> @@ -1754,15 +1934,26 @@ populate_lr_context(struct intel_context *ctx, struct 
> drm_i915_gem_object *ctx_o
>   reg_state[CTX_SECOND_BB_STATE] = ring->mmio_base + 0x118;
>   reg_state[CTX_SECOND_BB_STATE+1] = 0;
>   if (ring->id == RCS) {
> - /* TODO: according to BSpec, the register state context
> -  * for CHV does not have these. OTOH, these registers do
> -  * exist in CHV. I'm waiting for a clarification */
>   reg_state[CTX_BB_PER_CTX_PTR] = ring->mmio_base + 0x1c0;
>   reg_state[CTX_BB_PER_CTX_PTR+1] = 0;
>   reg_state[CTX_RCS_INDIRECT_CTX] = ring->mmio_base + 0x1c4;
>   reg_state[CTX_RCS_INDIRECT_CTX+1] = 0;
>   reg_state[CTX_RCS_INDIRECT_CTX_OFFSET] = ring->mmio_base + 
> 0x1c8;
>   reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] = 0;
> + if (ring->wa_ctx.obj) {
> + reg_state[CTX_RCS_INDIRECT_CTX+1] =
> + (i915_gem_obj_ggtt_offset(ring->wa_ctx.obj) +
> +  ring->wa_ctx.indctx_batch_offset * 
> sizeof(uint32_t)) |
> + (ring->wa_ctx.indctx_batch_size / 
> CACHELINE_DWORDS);

Ok, this really does impose alignment conditions on indctx_batch_offset

Oh, if I do get a chance to complain, spell out indirect_ctx, make it a
struct or even just precalculate the reg value, just indctx's only value
is that is the same length as perctx, but otherwise quite obtuse.

Other than that, I couldn't punch any holes in its robustness, and the
series is starting to look quite good and very neat.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 07/15] drm/i915: Defer default hardware context initialisation until first open

2015-06-19 Thread Dave Gordon
On 17/06/15 13:18, Daniel Vetter wrote:
> On Mon, Jun 15, 2015 at 07:36:25PM +0100, Dave Gordon wrote:
>> In order to fully initialise the default contexts, we have to execute
>> batchbuffer commands on the GPU engines. But in the case of GuC-based
>> batch submission, we can't do that until any required firmware has
>> been loaded, which may not be possible during driver load, because the
>> filesystem(s) containing the firmware may not be mounted until later.
>>
>> Therefore, we now allow the first call to the firmware-loading code to
>> return -EAGAIN to indicate that it's not yet ready, and that it should
>> be retried when the device is first opened from user code, by which
>> time we expect that all required filesystems will have been mounted.
>> The late-retry code will then re-attempt to load the firmware if the
>> early attempt failed.
>>
>> If the late retry fails, the current open-in-progress will fail, but
>> the recovery code will disable GuC submission and reset the GPU and
>> driver. The next open will therefore be in non-GuC mode, and will be
>> allowed to complete even if the GuC cannot be loaded or used.
>>
>> Issue: VIZ-4884
>> Signed-off-by: Dave Gordon 
>> Signed-off-by: Alex Dai 
> 
> I'm not really sold on this super-flexible fallback scheme implemented
> here. Because such fallback schemes means more code to test (which no on
> will do likely) or just even bigger fireworks when we actually hit them in
> reality when something goes wrong. Imo if anything goes wrong in the setup
> we just throw in the towel and fail the driver loading.

Firstly, GuC submission is an OPTION. That means we already have code to
work with or without a GuC. The fallback just allows us to keep going
after finding that although GuC submission has been requested, and we do
have a GuC, nonetheless the request cannot be satisfied. That's no
different from automatically disabling PPGTT or execlist mode if they're
requested on platforms where we don't support them.

> There's only one exception: If something fails with GT init we declare the
> gpu wedged but proceed with all the modeset setup. This makes sense
> because we need all the code to handle a wedge gpu anyway, dead-on-boot
> gpus happen occasionally and it's really not nice to greet the user with a
> black screen. But more fallbacks are imo just headache.
> 
> Hence when the guc fails we imo really shouldn't bother with fallbacks,
> but instead just declare the thing wedged and carry on.

So the strategy here is exactly the same as for GT init; declare the GPU
wedged, but after disabling GuC mode. The recovery will then get us into
the same state as if there were no GuC, or GuC mode had not been
selected in the first place. We can't switch between GuC and execlists
arbitrarily; the only switchover is from GuC to non-GuC, and it can only
happen ONCE.

To test this is easy; just rename your firmware blob so the driver can't
find it and reboot. It should automatically run in execlist mode, with a
log message telling you what went wrong (f/w file not found). Much nicer
than your screen staying blank because you upgraded the driver and not
the firmware, or vice versa.

> That should also allow us to simplify the firmware loading: We can do that
> in an async worker and if the blob isn't there in time then we just move
> on.
> -Daniel

Under no circumstances can you ever load the firmware from an async
worker thread, because Bad Things Will Happen if there is hardware
activity already in progress when the GuC f/w starts up.

.Dave.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c

2015-06-19 Thread Chris Wilson
On Thu, Jun 18, 2015 at 07:07:46PM +0100, Dave Gordon wrote:
> On 18/06/15 13:10, Chris Wilson wrote:
> > On Thu, Jun 18, 2015 at 12:49:55PM +0100, Dave Gordon wrote:
> >> On 17/06/15 13:02, Daniel Vetter wrote:
> >>> Domain handling is required for all gem objects, and the resulting bugs if
> >>> you don't for one-off objects are absolutely no fun to track down.
> >>
> >> Is it not the case that the new object returned by
> >> i915_gem_alloc_object() is
> >> (a) of a type that can be mapped into the GTT, and
> >> (b) initially in the CPU domain for both reading and writing?
> >>
> >> So AFAICS the allocate-and-fill function I'm describing (to appear in
> >> next patch series respin) doesn't need any further domain handling.
> > 
> > A i915_gem_object_create_from_data() is a reasonable addition, and I
> > suspect it will make the code a bit more succinct.
> 
> I shall adopt this name for it :)
> 
> > Whilst your statement is true today, calling set_domain is then a no-op,
> > and helps document how you use the object and so reduces the likelihood
> > of us introducing bugs in the future.
> > -Chris
> 
> So here's the new function ... where should the set-to-cpu-domain go?
> After the pin_pages and before the sg_copy_from_buffer?

Either, since the domain will not change whilst you have the lock,
but if you do it before get_pages() you will have a slightly easier
error path.

Part of the reason why I want a function like this is so that I can
replace it with a stolen object and so need to write the data through a
temporary GGTT mapping. Speak now if you need more flags to the function
to prevent certain classes of objects being created.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support

2015-06-19 Thread Dave Gordon
On 18/06/15 15:49, Daniel Vetter wrote:
> On Thu, Jun 18, 2015 at 01:11:34PM +0100, Dave Gordon wrote:
>> On 17/06/15 13:05, Daniel Vetter wrote:
>>> On Mon, Jun 15, 2015 at 07:36:20PM +0100, Dave Gordon wrote:
 Current devices may contain one or more programmable microcontrollers
 that need to have a firmware image (aka "binary blob") loaded from an
 external medium and transferred to the device's memory.

 This file provides generic support functions for doing this; they can
 then be used by each uC-specific loader, thus reducing code duplication
 and testing effort.

 Signed-off-by: Dave Gordon 
 Signed-off-by: Alex Dai 
>>>
>>> Given that I'm just shredding the synchronization used by the dmc loader
>>> I'm not convinced this is a good idea. Abstraction has cost, and a bit of
>>> copy-paste for similar sounding but slightly different things doesn't
>>> sound awful to me. And the critical bit in all the firmware loading I've
>>> seen thus far is in synchronizing the loading with other operations,
>>> hiding that isn't a good idea. Worse if we enforce stuff like requiring
>>> dev->struct_mutex.
>>> -Daniel
>>
>> It's precisely because it's in some sense "trivial-but-tricky" that we
>> should write it once, get it right, and use it everywhere. Copypaste
>> /does/ sound awful; I've seen how the code this was derived from had
>> already been cloned into three flavours, all different and all wrong.
>>
>> It's a very simple abstraction: one early call to kick things off as
>> early as possible, no locking required. One late call with the
>> struct_mutex held to complete the synchronisation and actually do the
>> work, thus guaranteeing that the transfer to the target uC is done in a
>> controlled fashion, at a time of the caller's choice, and by the
>> driver's mainline thread, NOT by an asynchronous thread racing with
>> other activity (which was one of the things wrong with the original
>> version).
> 
> Yeah I've seen the origins of this in the display code, and that code gets
> the syncing wrong. The only thing that one has do to is grab a runtime pm
> reference for the appropriate power well to prevent dc5 entry, and release
> it when the firmware is loaded and initialized.

Agreed.

> Which means any kind of firmware loader which requires/uses
> dev->struct_mutex get stuff wrong and is not appropriate everywhere.

BUT, the loading of the firmware into any uC MUST be done in a
controlled manner i.e. at a time when no other thread is touching the
h/w. Otherwise the f/w load and whatever else is concurrently accessing
the h/w could in some cases interfere disastrously. Examples of
interference might be:

* interleaved accesses to the ELSP (in the case of the GuC)
* incorrect handover of power management (DMC, GuC)
* erroneous management of forcewake state

In general the f/w that is just starting on the uC may have certain
expectations about the initial state of the h/w, which may not be met if
other threads are accessing various bits of h/w while the uC is booting up.

So we absolutely need to guarantee that the f/w load is done by a thread
which has exclusive ownership of any bit of the h/w that the f/w is
going to make assumptions about. With the current locking structure of
the driver, that means holding the struct_mutex (it shouldn't really,
there should be a separate mutex for h/w register access vs.
driver-private data structures, but there isn't).

>> We should convert the DMC loader to use this too, so there need be only
>> one bit of code in the whole driver that needs to understand how to use
>> completions to get correct handover from a free-running no-locks-held
>> thread to the properly disciplined environment of driver mainline for
>> purposes of programming the h/w.
> 
> Nack on using this for dmc, since I want them to convert it to the above
> synchronization, since that's how all the other async power initialization
> is done.
> 
> Guc is different since we really must have it ready for execbuf, and for
> that usecase a completion at drm_open time sounds like the right thing.
> 
> As a rule of thumb for refactoring and share infastructure we use the
> following recipe in drm:
> - first driver implements things as straightforward as possible
> - 2nd user copypastes
> - 3rd one has the duty to figure out whether some refactoring is in order
>   or not.
> Imo that approach leads a really good balance between avoiding
> overengineering and having maintainable code.
> -Daniel

We've already been through these phases; the code has already been
cloned twice (and then changed, but not enough to fix the problems with
the original) and then when I found the issues with the GuC loader and
noticed the hilarious ownership dance it was doing during handover I
realised it was time to fix it in one place rather than several, and
posted a patchset to the internal mailing list on 2015-02-24 with this
commentary:

> The GuC loader uses an asynchronous thread to fetch 

[Intel-gfx] [RFC PATCH 2/7] drm/i915: Read hw state into an atomic state struct, try 2.

2015-06-19 Thread Maarten Lankhorst
To make this work we load the new hardware state into the
atomic_state, then swap it with the sw state.

This lets us change the force restore path in setup_hw_state()
to use a single call to intel_mode_set() to restore all the
previous state.

As a nice bonus this kills off encoder->new_encoder,
connector->new_enabled and crtc->new_enabled. They were used only
to restore the state after a modeset.

Change since try1:
- Fix the regressions caused by not updating drm core sw state.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/intel_atomic.c  |   2 +-
 drivers/gpu/drm/i915/intel_display.c | 381 ++-
 drivers/gpu/drm/i915/intel_drv.h |  14 +-
 3 files changed, 243 insertions(+), 154 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_atomic.c 
b/drivers/gpu/drm/i915/intel_atomic.c
index 429677a111d5..4d87574a2032 100644
--- a/drivers/gpu/drm/i915/intel_atomic.c
+++ b/drivers/gpu/drm/i915/intel_atomic.c
@@ -398,7 +398,7 @@ int intel_atomic_setup_scalers(struct drm_device *dev,
return 0;
 }
 
-static void
+void
 intel_atomic_duplicate_dpll_state(struct drm_i915_private *dev_priv,
  struct intel_shared_dpll_config *shared_dpll)
 {
diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index b3d8d1a30a04..41193dab58c5 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -10231,7 +10231,7 @@ bool intel_get_load_detect_pipe(struct drm_connector 
*connector,
 retry:
ret = drm_modeset_lock(&config->connection_mutex, ctx);
if (ret)
-   goto fail_unlock;
+   goto fail;
 
/*
 * Algorithm gets a little messy:
@@ -10249,10 +10249,10 @@ retry:
 
ret = drm_modeset_lock(&crtc->mutex, ctx);
if (ret)
-   goto fail_unlock;
+   goto fail;
ret = drm_modeset_lock(&crtc->primary->mutex, ctx);
if (ret)
-   goto fail_unlock;
+   goto fail;
 
old->dpms_mode = connector->dpms;
old->load_detect_temp = false;
@@ -10271,9 +10271,6 @@ retry:
continue;
if (possible_crtc->state->enable)
continue;
-   /* This can occur when applying the pipe A quirk on resume. */
-   if (to_intel_crtc(possible_crtc)->new_enabled)
-   continue;
 
crtc = possible_crtc;
break;
@@ -10284,20 +10281,17 @@ retry:
 */
if (!crtc) {
DRM_DEBUG_KMS("no pipe available for load-detect\n");
-   goto fail_unlock;
+   goto fail;
}
 
ret = drm_modeset_lock(&crtc->mutex, ctx);
if (ret)
-   goto fail_unlock;
+   goto fail;
ret = drm_modeset_lock(&crtc->primary->mutex, ctx);
if (ret)
-   goto fail_unlock;
-   intel_encoder->new_crtc = to_intel_crtc(crtc);
-   to_intel_connector(connector)->new_encoder = intel_encoder;
+   goto fail;
 
intel_crtc = to_intel_crtc(crtc);
-   intel_crtc->new_enabled = true;
old->dpms_mode = connector->dpms;
old->load_detect_temp = true;
old->release_fb = NULL;
@@ -10365,9 +10359,7 @@ retry:
intel_wait_for_vblank(dev, intel_crtc->pipe);
return true;
 
- fail:
-   intel_crtc->new_enabled = crtc->state->enable;
-fail_unlock:
+fail:
drm_atomic_state_free(state);
state = NULL;
 
@@ -10413,10 +10405,6 @@ void intel_release_load_detect_pipe(struct 
drm_connector *connector,
if (IS_ERR(crtc_state))
goto fail;
 
-   to_intel_connector(connector)->new_encoder = NULL;
-   intel_encoder->new_crtc = NULL;
-   intel_crtc->new_enabled = false;
-
connector_state->best_encoder = NULL;
connector_state->crtc = NULL;
 
@@ -11869,33 +11857,6 @@ static const struct drm_crtc_helper_funcs 
intel_helper_funcs = {
.atomic_check = intel_crtc_atomic_check,
 };
 
-/**
- * intel_modeset_update_staged_output_state
- *
- * Updates the staged output configuration state, e.g. after we've read out the
- * current hw state.
- */
-static void intel_modeset_update_staged_output_state(struct drm_device *dev)
-{
-   struct intel_crtc *crtc;
-   struct intel_encoder *encoder;
-   struct intel_connector *connector;
-
-   for_each_intel_connector(dev, connector) {
-   connector->new_encoder =
-   to_intel_encoder(connector->base.encoder);
-   }
-
-   for_each_intel_encoder(dev, encoder) {
-   encoder->new_crtc =
-   to_intel_crtc(encoder->base.crtc);
-   }
-
-   for_each_intel_crtc(dev, crtc) {
-   crtc->new_enabled = cr

[Intel-gfx] [RFC PATCH 5/7] drm/i915: Update power domains only on affected crtc's.

2015-06-19 Thread Maarten Lankhorst
Use for_each_crtc_state to only touch affected crtc's.
In order to make sure that the initial power is still set
correctly we make sure modeset_update_crtc_power_domains is called
during the initial modeset.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/intel_display.c | 44 
 1 file changed, 29 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index de975ef09e23..942d25e7490a 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -5197,30 +5197,35 @@ static unsigned long get_crtc_power_domains(struct 
drm_crtc *crtc)
return mask;
 }
 
-static void modeset_update_crtc_power_domains(struct drm_atomic_state *state)
+static void modeset_update_crtc_power_domains(struct drm_atomic_state *state,
+ bool power_only)
 {
struct drm_device *dev = state->dev;
struct drm_i915_private *dev_priv = dev->dev_private;
-   unsigned long pipe_domains[I915_MAX_PIPES] = { 0, };
-   struct intel_crtc *crtc;
+   unsigned long pipe_domains[I915_MAX_PIPES] = { 0, }, domains;
+   struct drm_crtc_state *crtc_state;
+   struct drm_crtc *crtc;
+   int i;
 
/*
 * First get all needed power domains, then put all unneeded, to avoid
 * any unnecessary toggling of the power wells.
 */
-   for_each_intel_crtc(dev, crtc) {
+   for_each_crtc_in_state(state, crtc, crtc_state, i) {
enum intel_display_power_domain domain;
+   enum pipe pipe = to_intel_crtc(crtc)->pipe;
 
-   if (!crtc->base.state->enable)
+   if (!crtc->state->active)
continue;
 
-   pipe_domains[crtc->pipe] = get_crtc_power_domains(&crtc->base);
+   domains = pipe_domains[pipe] = get_crtc_power_domains(crtc);
+   domains &= ~to_intel_crtc(crtc)->enabled_power_domains;
 
-   for_each_power_domain(domain, pipe_domains[crtc->pipe])
+   for_each_power_domain(domain, domains)
intel_display_power_get(dev_priv, domain);
}
 
-   if (dev_priv->display.modeset_commit_cdclk) {
+   if (!power_only && dev_priv->display.modeset_commit_cdclk) {
unsigned int cdclk = to_intel_atomic_state(state)->cdclk;
 
if (cdclk != dev_priv->cdclk_freq &&
@@ -5228,16 +5233,19 @@ static void modeset_update_crtc_power_domains(struct 
drm_atomic_state *state)
dev_priv->display.modeset_commit_cdclk(state);
}
 
-   for_each_intel_crtc(dev, crtc) {
+   for_each_crtc_in_state(state, crtc, crtc_state, i) {
+   struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+   enum pipe pipe = intel_crtc->pipe;
enum intel_display_power_domain domain;
 
-   for_each_power_domain(domain, crtc->enabled_power_domains)
-   intel_display_power_put(dev_priv, domain);
+   domains = intel_crtc->enabled_power_domains;
+   domains &= ~pipe_domains[pipe];
 
-   crtc->enabled_power_domains = pipe_domains[crtc->pipe];
-   }
+   intel_crtc->enabled_power_domains = pipe_domains[pipe];
 
-   intel_display_set_init_power(dev_priv, false);
+   for_each_power_domain(domain, domains)
+   intel_display_power_put(dev_priv, domain);
+   }
 }
 
 static void intel_update_max_cdclk(struct drm_device *dev)
@@ -13115,7 +13123,7 @@ static int __intel_set_mode(struct drm_atomic_state 
*state)
/* The state has been swaped above, so state actually contains the
 * old state now. */
if (any_ms)
-   modeset_update_crtc_power_domains(state);
+   modeset_update_crtc_power_domains(state, false);
 
/* Now enable the clocks, plane, pipe, and connectors that we set up. */
for_each_crtc_in_state(state, crtc, crtc_state, i) {
@@ -15606,6 +15614,7 @@ intel_modeset_setup_hw_state(struct drm_device *dev, 
bool force_restore)
return NULL;
}
 
+   /* swap sw/hw state */
drm_atomic_helper_swap_state(dev, state);
 
if (force_restore) {
@@ -15619,6 +15628,9 @@ intel_modeset_setup_hw_state(struct drm_device *dev, 
bool force_restore)
to_intel_atomic_state(state)->dpll_set = false;
}
 
+   /* update power state to match hw state */
+   modeset_update_crtc_power_domains(state, true);
+
/* HW state is read out, now we need to sanitize this mess. */
for_each_intel_encoder(dev, encoder) {
intel_sanitize_encoder(encoder);
@@ -15697,6 +15709,8 @@ intel_modeset_setup_hw_state(struct drm_device *dev, 
bool force_restore)
drm_atomic_state_free(state);
}
 
+   intel_display_set_init_power(dev_priv, f

[Intel-gfx] [RFC PATCH 7/7] drm/i915: Make intel_display_suspend atomic, try 2.

2015-06-19 Thread Maarten Lankhorst
Calculate all state using a normal transition, but afterwards fudge
crtc->state->active back to its old value. This should still allow
state restore in setup_hw_state to work properly.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/intel_display.c | 74 +++-
 drivers/gpu/drm/i915/intel_drv.h |  2 +-
 2 files changed, 49 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 796d08c4606e..9c25b080260c 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -6175,40 +6175,62 @@ static void i9xx_crtc_disable(struct drm_crtc *crtc)
intel_set_cpu_fifo_underrun_reporting(dev_priv, pipe, false);
 }
 
-static void intel_crtc_disable_noatomic(struct drm_crtc *crtc)
+/*
+ * turn all crtc's off, but do not adjust state
+ * This has to be paired with a call to intel_modeset_setup_hw_state.
+ */
+int intel_display_suspend(struct drm_device *dev)
 {
-   struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
-   struct drm_i915_private *dev_priv = to_i915(crtc->dev);
-   enum intel_display_power_domain domain;
-   unsigned long domains;
+   struct drm_mode_config *config = &dev->mode_config;
+   struct drm_modeset_acquire_ctx *ctx = config->acquire_ctx;
+   struct drm_atomic_state *state;
+   struct drm_crtc *crtc;
+   unsigned crtc_mask = 0;
+   int ret = 0;
 
-   if (!intel_crtc->active)
-   return;
+   if (WARN_ON(!ctx))
+   return 0;
 
-   if (to_intel_plane_state(crtc->primary->state)->visible) {
-   intel_crtc_wait_for_pending_flips(crtc);
-   intel_pre_disable_primary(crtc);
+   lockdep_assert_held(&ctx->ww_ctx);
+   state = drm_atomic_state_alloc(dev);
+   if (WARN_ON(!state))
+   return -ENOMEM;
+
+   state->acquire_ctx = ctx;
+   state->allow_modeset = true;
+
+   for_each_crtc(dev, crtc) {
+   struct drm_crtc_state *crtc_state =
+   drm_atomic_get_crtc_state(state, crtc);
+
+   ret = PTR_ERR_OR_ZERO(crtc_state);
+   if (ret)
+   goto free;
+
+   if (!crtc_state->active)
+   continue;
+
+   crtc_state->active = false;
+   crtc_mask |= 1 << drm_crtc_index(crtc);
}
 
-   intel_crtc_disable_planes(crtc, crtc->state->plane_mask);
-   dev_priv->display.crtc_disable(crtc);
+   if (crtc_mask) {
+   ret = intel_set_mode(state);
 
-   domains = intel_crtc->enabled_power_domains;
-   for_each_power_domain(domain, domains)
-   intel_display_power_put(dev_priv, domain);
-   intel_crtc->enabled_power_domains = 0;
-}
+   if (!ret) {
+   for_each_crtc(dev, crtc)
+   if (crtc_mask & (1 << drm_crtc_index(crtc)))
+   crtc->state->active = true;
 
-/*
- * turn all crtc's off, but do not adjust state
- * This has to be paired with a call to intel_modeset_setup_hw_state.
- */
-void intel_display_suspend(struct drm_device *dev)
-{
-   struct drm_crtc *crtc;
+   return ret;
+   }
+   }
 
-   for_each_crtc(dev, crtc)
-   intel_crtc_disable_noatomic(crtc);
+free:
+   if (ret)
+   DRM_ERROR("Suspending crtc's failed with %i\n", ret);
+   drm_atomic_state_free(state);
+   return ret;
 }
 
 void intel_display_resume(struct drm_device *dev)
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index ea380b83eb5d..fb6c88ade842 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -992,7 +992,7 @@ int intel_pch_rawclk(struct drm_device *dev);
 void intel_mark_busy(struct drm_device *dev);
 void intel_mark_idle(struct drm_device *dev);
 void intel_crtc_restore_mode(struct drm_crtc *crtc);
-void intel_display_suspend(struct drm_device *dev);
+int intel_display_suspend(struct drm_device *dev);
 int intel_crtc_control(struct drm_crtc *crtc, bool enable);
 void intel_crtc_update_dpms(struct drm_crtc *crtc);
 void intel_encoder_destroy(struct drm_encoder *encoder);
-- 
2.1.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [RFC PATCH 6/7] drm/i915: Always reset in intel_crtc_restore_mode

2015-06-19 Thread Maarten Lankhorst
And get rid of things that are no longer true. This function is only
used for forcing a modeset when encoder properties are changed.

All the existing state is fine in this case, only setting mode_changed
will force a full recalculation here, and take all the state needed.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/intel_display.c | 62 +++-
 1 file changed, 18 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 942d25e7490a..796d08c4606e 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13174,66 +13174,40 @@ void intel_crtc_restore_mode(struct drm_crtc *crtc)
 {
struct drm_device *dev = crtc->dev;
struct drm_atomic_state *state;
-   struct intel_crtc *intel_crtc;
-   struct intel_encoder *encoder;
-   struct intel_connector *connector;
-   struct drm_connector_state *connector_state;
-   struct intel_crtc_state *crtc_state;
+   struct drm_crtc_state *crtc_state;
int ret;
 
state = drm_atomic_state_alloc(dev);
if (!state) {
-   DRM_DEBUG_KMS("[CRTC:%d] mode restore failed, out of memory",
+   DRM_DEBUG_KMS("[CRTC:%d] crtc restore failed, out of memory",
  crtc->base.id);
return;
}
 
-   state->acquire_ctx = dev->mode_config.acquire_ctx;
-
-   /* The force restore path in the HW readout code relies on the staged
-* config still keeping the user requested config while the actual
-* state has been overwritten by the configuration read from HW. We
-* need to copy the staged config to the atomic state, otherwise the
-* mode set will just reapply the state the HW is already in. */
-   for_each_intel_encoder(dev, encoder) {
-   if (encoder->base.crtc != crtc)
-   continue;
+   state->acquire_ctx = drm_modeset_legacy_acquire_ctx(crtc);
 
-   for_each_intel_connector(dev, connector) {
-   if (connector->base.state->best_encoder != 
&encoder->base)
-   continue;
-
-   connector_state = drm_atomic_get_connector_state(state, 
&connector->base);
-   if (IS_ERR(connector_state)) {
-   DRM_DEBUG_KMS("Failed to add [CONNECTOR:%d:%s] 
to state: %ld\n",
- connector->base.base.id,
- connector->base.name,
- PTR_ERR(connector_state));
-   continue;
-   }
+retry:
+   crtc_state = drm_atomic_get_crtc_state(state, crtc);
+   ret = PTR_ERR_OR_ZERO(crtc_state);
+   if (!ret) {
+   if (!crtc_state->active)
+   goto out;
 
-   connector_state->crtc = crtc;
-   }
+   crtc_state->mode_changed = true;
+   ret = intel_modeset_compute_config(state);
}
 
-   for_each_intel_crtc(dev, intel_crtc) {
-   crtc_state = intel_atomic_get_crtc_state(state, intel_crtc);
-   if (IS_ERR(crtc_state)) {
-   DRM_DEBUG_KMS("Failed to add [CRTC:%d] to state: %ld\n",
- intel_crtc->base.base.id,
- PTR_ERR(crtc_state));
-   continue;
-   }
+   if (!ret)
+   ret = intel_set_mode_checked(state);
 
-   if (&intel_crtc->base == crtc)
-   drm_mode_copy(&crtc_state->base.mode, &crtc->mode);
+   if (ret == -EDEADLK) {
+   drm_atomic_state_clear(state);
+   drm_modeset_backoff(state->acquire_ctx);
+   goto retry;
}
 
-   intel_modeset_setup_plane_state(state, crtc, &crtc->mode,
-   crtc->primary->fb, crtc->x, crtc->y);
-
-   ret = intel_set_mode(state);
if (ret)
+out:
drm_atomic_state_free(state);
 }
 
-- 
2.1.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [RFC PATCH 0/7] Convert to atomic, part 4.

2015-06-19 Thread Maarten Lankhorst
The only thing missing after part 3 is restoring atomic readout and making
suspend/restore. With fixes for all identified causes of regressions from
atomic suspend being done in convert to atomic, part 3 it's time to worry
about getting atomic suspend/restore working again.

I do this in a few steps, first I re-introduce the old commit for hw state
readout, with fixes. After that I convert the atomic readout to a full initial
modeset, and finally I try to re-enable fastboot optimizations again by default,
in a less hacky way.

This is a RFC because of the following possible issues:
 - mode->clock may not be read out correctly. It can differ slightly from the
   actually set value, which forces a modeset anyway when it could be avoided.
 - Not tested with DRRS support on DP, might be broken with fastboot?

Maarten Lankhorst (7):
  drm/i915: Do not update pfit state when toggling crtc enabled.
  drm/i915: Read hw state into an atomic state struct, try 2.
  All changes from try2.
  drm/i915: enable fastboot for everyone
  drm/i915: Update power domains only on affected crtc's.
  drm/i915: Always reset in intel_crtc_restore_mode
  drm/i915: Make intel_display_suspend atomic, try 2.

 drivers/gpu/drm/i915/i915_dma.c  |   12 +-
 drivers/gpu/drm/i915/i915_drv.c  |2 +-
 drivers/gpu/drm/i915/i915_drv.h  |7 +-
 drivers/gpu/drm/i915/i915_params.c   |5 -
 drivers/gpu/drm/i915/intel_atomic.c  |   23 +-
 drivers/gpu/drm/i915/intel_display.c | 1302 --
 drivers/gpu/drm/i915/intel_dp.c  |2 +-
 drivers/gpu/drm/i915/intel_drv.h |   21 +-
 drivers/gpu/drm/i915/intel_fbdev.c   |   22 +-
 drivers/gpu/drm/i915/intel_lvds.c|2 +-
 10 files changed, 786 insertions(+), 612 deletions(-)

-- 
2.1.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [RFC PATCH 1/7] drm/i915: Do not update pfit state when toggling crtc enabled.

2015-06-19 Thread Maarten Lankhorst
This must be done in advance, and during crtc_disable all scalers
can be force disabled.

This means intel_atomic_setup_scalers is only called in 1 place now,
during crtc_check.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/intel_atomic.c  | 14 ++--
 drivers/gpu/drm/i915/intel_display.c | 67 ++--
 drivers/gpu/drm/i915/intel_dp.c  |  2 +-
 drivers/gpu/drm/i915/intel_drv.h |  2 +-
 4 files changed, 47 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_atomic.c 
b/drivers/gpu/drm/i915/intel_atomic.c
index 0aeced82201e..429677a111d5 100644
--- a/drivers/gpu/drm/i915/intel_atomic.c
+++ b/drivers/gpu/drm/i915/intel_atomic.c
@@ -272,17 +272,12 @@ int intel_atomic_setup_scalers(struct drm_device *dev,
struct drm_plane *plane = NULL;
struct intel_plane *intel_plane;
struct intel_plane_state *plane_state = NULL;
-   struct intel_crtc_scaler_state *scaler_state;
-   struct drm_atomic_state *drm_state;
+   struct intel_crtc_scaler_state *scaler_state =
+   &crtc_state->scaler_state;
+   struct drm_atomic_state *drm_state = crtc_state->base.state;
int num_scalers_need;
int i, j;
 
-   if (INTEL_INFO(dev)->gen < 9 || !intel_crtc || !crtc_state)
-   return 0;
-
-   scaler_state = &crtc_state->scaler_state;
-   drm_state = crtc_state->base.state;
-
num_scalers_need = hweight32(scaler_state->scaler_users);
DRM_DEBUG_KMS("crtc_state = %p need = %d avail = %d scaler_users = 
0x%x\n",
crtc_state, num_scalers_need, intel_crtc->num_scalers,
@@ -327,9 +322,6 @@ int intel_atomic_setup_scalers(struct drm_device *dev,
name = "PLANE";
idx = plane->base.id;
 
-   if (!drm_state)
-   continue;
-
/* plane scaler case: assign as a plane scaler */
/* find the plane that set the bit as scaler_user */
plane = drm_state->planes[i];
diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 59430e4c361f..b3d8d1a30a04 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2898,29 +2898,32 @@ unsigned long intel_plane_obj_offset(struct intel_plane 
*intel_plane,
return i915_gem_obj_ggtt_offset_view(obj, view);
 }
 
+static void skl_detach_scaler(struct intel_crtc *intel_crtc, int id)
+{
+   struct drm_device *dev = intel_crtc->base.dev;
+   struct drm_i915_private *dev_priv = dev->dev_private;
+
+   I915_WRITE(SKL_PS_CTRL(intel_crtc->pipe, id), 0);
+   I915_WRITE(SKL_PS_WIN_POS(intel_crtc->pipe, id), 0);
+   I915_WRITE(SKL_PS_WIN_SZ(intel_crtc->pipe, id), 0);
+   DRM_DEBUG_KMS("CRTC:%d Disabled scaler id %u.%u\n",
+   intel_crtc->base.base.id, intel_crtc->pipe, id);
+}
+
 /*
  * This function detaches (aka. unbinds) unused scalers in hardware
  */
 static void skl_detach_scalers(struct intel_crtc *intel_crtc)
 {
-   struct drm_device *dev;
-   struct drm_i915_private *dev_priv;
struct intel_crtc_scaler_state *scaler_state;
int i;
 
-   dev = intel_crtc->base.dev;
-   dev_priv = dev->dev_private;
scaler_state = &intel_crtc->config->scaler_state;
 
/* loop through and disable scalers that aren't in use */
for (i = 0; i < intel_crtc->num_scalers; i++) {
-   if (!scaler_state->scalers[i].in_use) {
-   I915_WRITE(SKL_PS_CTRL(intel_crtc->pipe, i), 0);
-   I915_WRITE(SKL_PS_WIN_POS(intel_crtc->pipe, i), 0);
-   I915_WRITE(SKL_PS_WIN_SZ(intel_crtc->pipe, i), 0);
-   DRM_DEBUG_KMS("CRTC:%d Disabled scaler id %u.%u\n",
-   intel_crtc->base.base.id, intel_crtc->pipe, i);
-   }
+   if (!scaler_state->scalers[i].in_use)
+   skl_detach_scaler(intel_crtc, i);
}
 }
 
@@ -4351,13 +4354,12 @@ skl_update_scaler(struct intel_crtc_state *crtc_state, 
bool force_detach,
  * skl_update_scaler_crtc - Stages update to scaler state for a given crtc.
  *
  * @state: crtc's scaler state
- * @force_detach: whether to forcibly disable scaler
  *
  * Return
  * 0 - scaler_usage updated successfully
  *error - requested scaling cannot be supported or other error condition
  */
-int skl_update_scaler_crtc(struct intel_crtc_state *state, int force_detach)
+int skl_update_scaler_crtc(struct intel_crtc_state *state)
 {
struct intel_crtc *intel_crtc = to_intel_crtc(state->base.crtc);
struct drm_display_mode *adjusted_mode =
@@ -4366,7 +4368,7 @@ int skl_update_scaler_crtc(struct intel_crtc_state 
*state, int force_detach)
DRM_DEBUG_KMS("Updating scaler for [CRTC:%i] scaler_user index %u.%u\n",
  intel_crtc->base.base.id

[Intel-gfx] [RFC PATCH 3/7] All changes from try2.

2015-06-19 Thread Maarten Lankhorst
Always read out plane state, and do an initial modeset in modeset_gem_init.

---
 drivers/gpu/drm/i915/i915_dma.c  |  12 +-
 drivers/gpu/drm/i915/i915_drv.c  |   2 +-
 drivers/gpu/drm/i915/i915_drv.h  |   7 +-
 drivers/gpu/drm/i915/intel_atomic.c  |   7 -
 drivers/gpu/drm/i915/intel_display.c | 501 ++-
 drivers/gpu/drm/i915/intel_drv.h |   3 +-
 drivers/gpu/drm/i915/intel_fbdev.c   |  12 +-
 drivers/gpu/drm/i915/intel_lvds.c|   2 +-
 8 files changed, 280 insertions(+), 266 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 88795d2f1819..ede07f6fe7f7 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -393,6 +393,7 @@ static const struct vga_switcheroo_client_ops 
i915_switcheroo_ops = {
 static int i915_load_modeset_init(struct drm_device *dev)
 {
struct drm_i915_private *dev_priv = dev->dev_private;
+   struct drm_atomic_state *state;
int ret;
 
ret = intel_parse_bios(dev);
@@ -431,13 +432,18 @@ static int i915_load_modeset_init(struct drm_device *dev)
 
/* Important: The output setup functions called by modeset_init need
 * working irqs for e.g. gmbus and dp aux transfers. */
-   intel_modeset_init(dev);
+   state = intel_modeset_init(dev);
 
ret = i915_gem_init(dev);
-   if (ret)
+   if (ret) {
+   if (state) {
+   drm_atomic_state_free(state);
+   drm_modeset_unlock_all(dev);
+   }
goto cleanup_irq;
+   }
 
-   intel_modeset_gem_init(dev);
+   intel_modeset_gem_init(dev, state);
 
/* Always safe in the mode setting case. */
/* FIXME: do pre/post-mode set stuff in core KMS code */
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 78ef0bb53c36..a29b24bca25e 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -758,7 +758,7 @@ static int i915_drm_resume(struct drm_device *dev)
spin_unlock_irq(&dev_priv->irq_lock);
 
drm_modeset_lock_all(dev);
-   intel_modeset_setup_hw_state(dev, true);
+   intel_display_resume(dev);
drm_modeset_unlock_all(dev);
 
intel_dp_mst_resume(dev);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d9410bd4ab59..e67d2f1e3ab8 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3233,13 +3233,12 @@ static inline void intel_unregister_dsm_handler(void) { 
return; }
 
 /* modesetting */
 extern void intel_modeset_init_hw(struct drm_device *dev);
-extern void intel_modeset_init(struct drm_device *dev);
-extern void intel_modeset_gem_init(struct drm_device *dev);
+extern struct drm_atomic_state *intel_modeset_init(struct drm_device *dev);
+extern void intel_modeset_gem_init(struct drm_device *dev, struct 
drm_atomic_state *);
 extern void intel_modeset_cleanup(struct drm_device *dev);
 extern void intel_connector_unregister(struct intel_connector *);
 extern int intel_modeset_vga_set_state(struct drm_device *dev, bool state);
-extern void intel_modeset_setup_hw_state(struct drm_device *dev,
-bool force_restore);
+extern void intel_display_resume(struct drm_device *dev);
 extern void i915_redisable_vga(struct drm_device *dev);
 extern void i915_redisable_vga_power_on(struct drm_device *dev);
 extern bool ironlake_set_drps(struct drm_device *dev, u8 val);
diff --git a/drivers/gpu/drm/i915/intel_atomic.c 
b/drivers/gpu/drm/i915/intel_atomic.c
index 4d87574a2032..f36fcc7ceecb 100644
--- a/drivers/gpu/drm/i915/intel_atomic.c
+++ b/drivers/gpu/drm/i915/intel_atomic.c
@@ -98,13 +98,6 @@ int intel_atomic_check(struct drm_device *dev,
return -EINVAL;
}
 
-   if (crtc_state &&
-   crtc_state->quirks & PIPE_CONFIG_QUIRK_INITIAL_PLANES) {
-   ret = drm_atomic_add_affected_planes(state, 
&nuclear_crtc->base);
-   if (ret)
-   return ret;
-   }
-
ret = drm_atomic_helper_check_planes(dev, state);
if (ret)
return ret;
diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 41193dab58c5..9b2acc7b4e29 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -109,6 +109,9 @@ static void skl_init_scalers(struct drm_device *dev, struct 
intel_crtc *intel_cr
struct intel_crtc_state *crtc_state);
 static int i9xx_get_refclk(const struct intel_crtc_state *crtc_state,
   int num_connectors);
+static void intel_pre_disable_primary(struct drm_crtc *crtc);
+static struct drm_atomic_state *
+intel_modeset_setup_hw_state(struct drm_device *dev, bool force_restore);
 
 static struct intel_encoder *intel_find_encoder(struct intel_connector 
*connector, int pipe)
 {
@@ -2567,40 +2

[Intel-gfx] [RFC PATCH 4/7] drm/i915: enable fastboot for everyone

2015-06-19 Thread Maarten Lankhorst
Now that we read out the full atomic state we can force fastboot without hacks.
The only thing that we worry about is preventing a modeset. This can be easily
done by calculating if sw state matches hw state, with exception for pfit and
pipe size. Since the original fastboot code only touched pipe size and panel
fitter we keep those, and compare full sw state against hw state.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/i915_params.c   |   5 -
 drivers/gpu/drm/i915/intel_display.c | 271 ++-
 drivers/gpu/drm/i915/intel_fbdev.c   |  10 +-
 3 files changed, 169 insertions(+), 117 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_params.c 
b/drivers/gpu/drm/i915/i915_params.c
index 8ac5a1b29ac0..9b344a956ba6 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -41,7 +41,6 @@ struct i915_params i915 __read_mostly = {
.preliminary_hw_support = 
IS_ENABLED(CONFIG_DRM_I915_PRELIMINARY_HW_SUPPORT),
.disable_power_well = 1,
.enable_ips = 1,
-   .fastboot = 0,
.prefault_disable = 0,
.load_detect_test = 0,
.reset = true,
@@ -137,10 +136,6 @@ MODULE_PARM_DESC(disable_power_well,
 module_param_named(enable_ips, i915.enable_ips, int, 0600);
 MODULE_PARM_DESC(enable_ips, "Enable IPS (default: true)");
 
-module_param_named(fastboot, i915.fastboot, bool, 0600);
-MODULE_PARM_DESC(fastboot,
-   "Try to skip unnecessary mode sets at boot time (default: false)");
-
 module_param_named_unsafe(prefault_disable, i915.prefault_disable, bool, 0600);
 MODULE_PARM_DESC(prefault_disable,
"Disable page prefaulting for pread/pwrite/reloc (default:false). "
diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 9b2acc7b4e29..de975ef09e23 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -110,6 +110,7 @@ static void skl_init_scalers(struct drm_device *dev, struct 
intel_crtc *intel_cr
 static int i9xx_get_refclk(const struct intel_crtc_state *crtc_state,
   int num_connectors);
 static void intel_pre_disable_primary(struct drm_crtc *crtc);
+static void skylake_pfit_enable(struct intel_crtc *crtc);
 static struct drm_atomic_state *
 intel_modeset_setup_hw_state(struct drm_device *dev, bool force_restore);
 
@@ -3289,14 +3290,17 @@ static bool intel_crtc_has_pending_flip(struct drm_crtc 
*crtc)
return pending;
 }
 
-static void intel_update_pipe_size(struct intel_crtc *crtc)
+static void intel_update_fastboot(struct intel_crtc *crtc,
+ struct intel_crtc_state *old_crtc_state)
 {
struct drm_device *dev = crtc->base.dev;
struct drm_i915_private *dev_priv = dev->dev_private;
-   const struct drm_display_mode *adjusted_mode;
+   struct intel_crtc_state *pipe_config =
+   to_intel_crtc_state(crtc->base.state);
 
-   if (!i915.fastboot)
-   return;
+   DRM_DEBUG_KMS("Applying fastboot quirks, %ix%i -> %ix%i\n",
+ old_crtc_state->pipe_src_w, old_crtc_state->pipe_src_h,
+ pipe_config->pipe_src_w, pipe_config->pipe_src_h);
 
/*
 * Update pipe size and adjust fitter if needed: the reason for this is
@@ -3305,27 +3309,27 @@ static void intel_update_pipe_size(struct intel_crtc 
*crtc)
 * fastboot case, we'll flip, but if we don't update the pipesrc and
 * pfit state, we'll end up with a big fb scanned out into the wrong
 * sized surface.
-*
-* To fix this properly, we need to hoist the checks up into
-* compute_mode_changes (or above), check the actual pfit state and
-* whether the platform allows pfit disable with pipe active, and only
-* then update the pipesrc and pfit state, even on the flip path.
 */
 
-   adjusted_mode = &crtc->config->base.adjusted_mode;
-
I915_WRITE(PIPESRC(crtc->pipe),
-  ((adjusted_mode->crtc_hdisplay - 1) << 16) |
-  (adjusted_mode->crtc_vdisplay - 1));
-   if (!crtc->config->pch_pfit.enabled &&
-   (intel_pipe_has_type(crtc, INTEL_OUTPUT_LVDS) ||
-intel_pipe_has_type(crtc, INTEL_OUTPUT_EDP))) {
+  ((pipe_config->pipe_src_w - 1) << 16) |
+  (pipe_config->pipe_src_h - 1));
+
+   /* on skylake this is done by detaching scalers */
+   if (INTEL_INFO(dev)->gen == 9) {
+   skl_detach_scalers(crtc);
+
+   if (pipe_config->pch_pfit.enabled)
+   skylake_pfit_enable(crtc);
+   }
+   else if (!pipe_config->pch_pfit.enabled &&
+   old_crtc_state->pch_pfit.enabled) {
+   DRM_DEBUG_KMS("Disabling panel fitter\n");
+
I915_WRITE(PF_CTL(crtc->pipe), 0);
I915_WRITE(PF_WIN_POS(crtc->pipe), 0);
I915_WRITE(PF_WIN_SZ(crtc->pipe), 0);
   

Re: [Intel-gfx] [PATCH 06/15] drm/i915: Debugfs interface to read GuC load status

2015-06-19 Thread Dave Gordon
On 16/06/15 10:40, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:36:24PM +0100, Dave Gordon wrote:
>> From: Alex Dai 
>>
>> The new node provides access to the status of the common uC loader
>> code and the GuC-specific loader; also the scratch registers used
>> for communicatio between the i915 driver and the GuC firmware.
>>
>> Issue: VIZ-4884
>> Signed-off-by: Alex Dai 
>> Signed-off-by: Dave Gordon 
>> ---
>>  drivers/gpu/drm/i915/i915_debugfs.c |   37 
>> +++
>>  1 file changed, 37 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
>> b/drivers/gpu/drm/i915/i915_debugfs.c
>> index 47636f3..c52a745 100644
>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>> @@ -2352,6 +2352,42 @@ static int i915_llc(struct seq_file *m, void *data)
>>  return 0;
>>  }
>>  
>> +static void i915_uc_load_status_info(struct seq_file *m, struct intel_uc_fw 
>> *uc_fw)
>> +{
>> +seq_printf(m, "%s firmware status:\n\tpath: <%s>\n\tfetch: %d\n\tload: 
>> %d\n",
>> +uc_fw->uc_name,
>> +uc_fw->uc_fw_path,
>> +uc_fw->uc_fw_fetch_status,
>> +uc_fw->uc_fw_load_status);
> 
> If you made this one seq_printf() per line visualing the resulting
> format would have been easier - and easier to modify.

Done.

> Don't use <%s>, that's just visual noise to make cutting and pasting
> harder.

My terminal doesn't include <> in word selections (but DOES include "/"
and ".") so selecting a pathname just works :) But I've removed the
 anyway.

> If you can decode numeric status values, do so.

Done. I've added a _repr function for decoding the enum and used it
everywhere.

>> +}
>> +
>> +static int i915_guc_load_status_info(struct seq_file *m, void *data)
>> +{
>> +struct drm_info_node *node = m->private;
>> +struct drm_i915_private *dev_priv = node->minor->dev->dev_private;
>> +u32 tmp, i;
>> +
>> +if (!HAS_GUC_UCODE(dev_priv->dev))
> 
> Here and elsewhere it should be return -ENODEV;

There's only one other use of HAS_GUC_UCODE (in intel_guc_ucode_init())
and that one doesn't and mustn't trigger an error if false. I don't see
why it should be an *error* here either; the caller hasn't done anything
wrong, and there's no h/w or s/w failure. An empty result (EOF) is a
nice way of saying that there's nothing to say, without making the user
think something broke.

In fact it may be perfectly meaningful to continue rather than
returning; consider the case of a future GuC that comes with firmware
preloaded, so HAS_GUC() is true but HAS_GUC_UCODE() is FALSE. We could
still read and decode the GUC_STATUS even though we haven't loaded any
firmware.

>> +return 0;
>> +
>> +i915_uc_load_status_info(m, &dev_priv->guc.guc_fw);
>> +
>> +tmp = I915_READ(GUC_STATUS);
>> +
>> +seq_printf(m, "\nGuC status 0x%08x:\n", tmp);
>> +seq_printf(m, "\tBootrom status = 0x%x\n",
>> +(tmp & GS_BOOTROM_MASK) >> GS_BOOTROM_SHIFT);
>> +seq_printf(m, "\tuKernel status = 0x%x\n",
>> +(tmp & GS_UKERNEL_MASK) >> GS_UKERNEL_SHIFT);
>> +seq_printf(m, "\tMIA Core status = 0x%x\n",
>> +(tmp & GS_MIA_MASK) >> GS_MIA_SHIFT);
>> +seq_puts(m, "\nScratch registers value:\n");
>> +for (i = 0; i < 16; i++)
>> +seq_printf(m, "\t%2d: \t0x%x\n", i, I915_READ(SOFT_SCRATCH(i)));
> 
> I have a feeling these probably don't want to be upstreamed.
> -Chris

It's just a register dump; nothing secret there. You could read them
with IGT's register dumper anyway.

.Dave.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx