Re: [Intel-gfx] [PATCH 1/3] drm/i915: Support for pre-populating the object with system pages

2015-08-25 Thread Siluvery, Arun

On 24/08/2015 12:58, ankitprasad.r.sha...@intel.com wrote:

From: Ankitprasad Sharma ankitprasad.r.sha...@intel.com

This patch provides support for the User to populate the object
with system pages at its creation time. Since this can be safely
performed without holding the 'struct_mutex', it would help to reduce
the time 'struct_mutex' is kept locked especially during the exec-buffer
path, where it is generally held for the longest time.

Signed-off-by: Ankitprasad Sharma ankitprasad.r.sha...@intel.com
---
  drivers/gpu/drm/i915/i915_dma.c |  2 +-
  drivers/gpu/drm/i915/i915_gem.c | 51 +++--
  include/uapi/drm/i915_drm.h | 11 -
  3 files changed, 45 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 8319e07..955aa16 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -171,7 +171,7 @@ static int i915_getparam(struct drm_device *dev, void *data,
value = HAS_RESOURCE_STREAMER(dev);
break;
case I915_PARAM_CREATE_VERSION:
-   value = 2;
+   value = 3;
break;
default:
DRM_DEBUG(Unknown parameter %d\n, param-param);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c44bd05..3904feb 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -46,6 +46,7 @@ static void
  i915_gem_object_retire__write(struct drm_i915_gem_object *obj);
  static void
  i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring);
+static int __i915_gem_object_get_pages(struct drm_i915_gem_object *obj);

  static bool cpu_cache_is_coherent(struct drm_device *dev,
  enum i915_cache_level level)
@@ -414,6 +415,18 @@ i915_gem_create(struct drm_file *file,
if (obj == NULL)
return -ENOMEM;

+   if (flags  I915_CREATE_POPULATE) {
+   struct drm_i915_private *dev_priv = dev-dev_private;
+
+   ret = __i915_gem_object_get_pages(obj);
+   if (ret)
+   return ret;
+
+   mutex_lock(dev-struct_mutex);
+   list_add_tail(obj-global_list, dev_priv-mm.unbound_list);
+   mutex_unlock(dev-struct_mutex);
+   }
+
ret = drm_gem_handle_create(file, obj-base, handle);


If I915_CREATE_POPULATE is set, don't we have to release the pages when 
this call fails?


regards
Arun


/* drop reference from allocate - handle holds it now */
drm_gem_object_unreference_unlocked(obj-base);
@@ -2328,6 +2341,31 @@ err_pages:
return ret;
  }

+static int
+__i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
+{
+   const struct drm_i915_gem_object_ops *ops = obj-ops;
+   int ret;
+
+   WARN_ON(obj-pages);
+
+   if (obj-madv != I915_MADV_WILLNEED) {
+   DRM_DEBUG(Attempting to obtain a purgeable object\n);
+   return -EFAULT;
+   }
+
+   BUG_ON(obj-pages_pin_count);
+
+   ret = ops-get_pages(obj);
+   if (ret)
+   return ret;
+
+   obj-get_page.sg = obj-pages-sgl;
+   obj-get_page.last = 0;
+
+   return 0;
+}
+
  /* Ensure that the associated pages are gathered from the backing storage
   * and pinned into our object. i915_gem_object_get_pages() may be called
   * multiple times before they are released by a single call to
@@ -2339,28 +2377,17 @@ int
  i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
  {
struct drm_i915_private *dev_priv = obj-base.dev-dev_private;
-   const struct drm_i915_gem_object_ops *ops = obj-ops;
int ret;

if (obj-pages)
return 0;

-   if (obj-madv != I915_MADV_WILLNEED) {
-   DRM_DEBUG(Attempting to obtain a purgeable object\n);
-   return -EFAULT;
-   }
-
-   BUG_ON(obj-pages_pin_count);
-
-   ret = ops-get_pages(obj);
+   ret = __i915_gem_object_get_pages(obj);
if (ret)
return ret;

list_add_tail(obj-global_list, dev_priv-mm.unbound_list);

-   obj-get_page.sg = obj-pages-sgl;
-   obj-get_page.last = 0;
-
return 0;
  }

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index f71f75c..26ea715 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -457,20 +457,19 @@ struct drm_i915_gem_create {
__u32 handle;
__u32 pad;
/**
-* Requested flags (currently used for placement
-* (which memory domain))
+* Requested flags
 *
 * You can request that the object be created from special memory
 * rather than regular system pages using this parameter. Such
 * irregular objects may have certain restrictions (such as CPU
 * access to a stolen object is verboten).
-*
-* This can be 

Re: [Intel-gfx] [PATCH v1 2/2] drm/i915/gen9: Disable gather at set shader bit

2015-08-12 Thread Siluvery, Arun

On 12/08/2015 16:41, Dave Gordon wrote:

On 11/08/15 15:44, Arun Siluvery wrote:

 From Gen9, Push constant instruction parsing behaviour varies
according to
whether set shader is enabled or not. If we want legacy behaviour then it
can be achieved by disabling set shader.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89959

Cc: Ben Widawsky benjamin.widaw...@intel.com
Cc: Joonas Lahtinen joonas.lahti...@linux.intel.com
Cc: Mika Kuoppala mika.kuopp...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/i915_reg.h |  5 +
  drivers/gpu/drm/i915/intel_ringbuffer.c | 10 ++
  2 files changed, 15 insertions(+)


[snip]


diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index cf61262..7d284ed 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -983,6 +983,16 @@ static int gen9_init_workarounds(struct
intel_engine_cs *ring)
  tmp |= HDC_FORCE_CSR_NON_COHERENT_OVR_DISABLE;
  WA_SET_BIT_MASKED(HDC_CHICKEN0, tmp);

+/* Chicken bits to disable set shader is in multiple places,
+ * set bits in all required registers to disable it correctly
+ */
+WA_SET_BIT_MASKED(COMMON_SLICE_CHICKEN2,
GEN9_DISABLE_GATHER_SET_SHADER_SLICE);
+if ((IS_SKYLAKE(dev)  INTEL_REVID(dev) = SKL_REVID_D0) ||
+(IS_BROXTON(dev)  INTEL_REVID(dev) == BXT_REVID_A0))
+WA_SET_BIT_MASKED(RS_CHICKEN,
RS_CHICKEN_DISABLE_GATHER_AT_SHADER);
+else
+WA_SET_BIT_MASKED(CS_RCS_BE, CS_RCS_DISABLE_GATHER_AT_SHADER);
+
  return 0;
  }


This workaround isn't tagged with a specific /* WaXyz:chip */ comment.
Also, the style isn't consistent with the other paragraphs earlier in
this function: those have braces round the body part even when there's
only one line of code, possibly to make it clear where the WA comment
applies (of course, this is why the buggy WA_REG() macro wasn't spotted
earlier).

So, maybe prettify this a bit, if possible? The code actually looks
correct, just ugly.

Oh, and keep patch 1 even if you decide to abandon this one!



Hi Dave,

This patch can be ignored if we use below patch,
[Intel-gfx] [PATCH] lib/rendercopy_gen9: Setup Push constantpointer 
before sending BTP commands

http://lists.freedesktop.org/archives/intel-gfx/2015-August/073483.html

I think the correct option would be to ignore this patch.

regards
Arun


.Dave.




___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] lib/rendercopy_gen9: WaBindlessSurfaceStateModifyEnable

2015-08-12 Thread Siluvery, Arun

On 11/08/2015 13:25, Mika Kuoppala wrote:

Don't set the size of bindless surface state on rendercopy.
And as of doing so, take into account the workaround for setting
the command size.

This was tried during hunting for
https://bugs.freedesktop.org/show_bug.cgi?id=89959. But no
impact was found.

Cc: Arun Siluvery arun.siluv...@linux.intel.com
Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com
---
  lib/rendercopy_gen9.c | 10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/lib/rendercopy_gen9.c b/lib/rendercopy_gen9.c
index 0766192..4a4a604 100644
--- a/lib/rendercopy_gen9.c
+++ b/lib/rendercopy_gen9.c
@@ -511,7 +511,11 @@ gen7_emit_push_constants(struct intel_batchbuffer *batch) {

  static void
  gen9_emit_state_base_address(struct intel_batchbuffer *batch) {
-   OUT_BATCH(GEN6_STATE_BASE_ADDRESS | (19 - 2));
+
+   /* WaBindlessSurfaceStateModifyEnable:skl,bxt */
+   /* The length has to be one less if we dont modify
+  bindless state */
+   OUT_BATCH(GEN6_STATE_BASE_ADDRESS | (19 - 1 - 2));

/* general */
OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
@@ -544,9 +548,9 @@ gen9_emit_state_base_address(struct intel_batchbuffer 
*batch) {
OUT_BATCH(1  12 | 1);

/* Bindless surface state base address */
-   OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
OUT_BATCH(0);
-   OUT_BATCH(0xf000);
+   OUT_BATCH(0);
+   OUT_BATCH(0);
  }

  static void



Agrees with spec and looks good to me. No impact observed with 
gem_concurrent_blit subtests.


Reviewed-by: Arun Siluvery arun.siluv...@linux.intel.com

regards
Arun

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v1 0/2] Enable legacy behaviour for Push constants

2015-08-12 Thread Siluvery, Arun

On 11/08/2015 21:58, Timo Aaltonen wrote:

On 11.08.2015 17:44, Arun Siluvery wrote:

Patch1 fixes a simple compile error in Patch2
Patch2 fixes gpu hang observed with a subtest of gem_concurrent_blit.

Arun Siluvery (1):
   drm/i915/gen9: Disable gather at set shader bit

Mika Kuoppala (1):
   drm/i915: Contain the WA_REG macro

  drivers/gpu/drm/i915/i915_reg.h |  5 +
  drivers/gpu/drm/i915/intel_ringbuffer.c | 14 --
  2 files changed, 17 insertions(+), 2 deletions(-)



prw-blt-overwrite-source-read-rcs-forked runs fine with these, tested on
SKL-Y  -H

Tested-by: Timo Aaltonen timo.aalto...@canonical.com



This patch can be ignored if the following patch is applied,

[Intel-gfx] [PATCH] lib/rendercopy_gen9: Setup Push constant	pointer 
before sending BTP commands

http://lists.freedesktop.org/archives/intel-gfx/2015-August/073483.html

regards
Arun

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915:gen9: Add WA for disable gather at set shader bit

2015-08-10 Thread Siluvery, Arun

On 08/08/2015 06:35, Ben Widawsky wrote:

On Fri, Aug 07, 2015 at 06:33:37PM +0100, Arun Siluvery wrote:

This WA doesn't have a name. According to the spec, driver need to reset
disable gather at set shader bit in per ctx WA batch. It is to be noted
that the default value is already '0' for this bit but we still need to
apply this WA because userspace may set it. If userspace really need it
to be set then they need to do in every batch.

Cc: Ben Widawsky benjamin.widaw...@intel.com
Cc: Mika Kuoppala mika.kuopp...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/i915_reg.h  | 1 +
  drivers/gpu/drm/i915/intel_lrc.c | 9 +
  2 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index ea46d68..838537f 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -5834,6 +5834,7 @@ enum skl_disp_power_wells {
  # define GEN7_CSC1_RHWO_OPT_DISABLE_IN_RCC((110) | (126))
  # define GEN9_RHWO_OPTIMIZATION_DISABLE   (114)
  #define COMMON_SLICE_CHICKEN2 0x7014
+#define  GEN9_DISABLE_GATHER_SET_SHADER_SLICE   (112)
  # define GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE (10)

  #define HIZ_CHICKEN   0x7018
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 4c40614..df3bb98 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1302,6 +1302,15 @@ static int gen9_init_perctx_bb(struct intel_engine_cs 
*ring,
struct drm_device *dev = ring-dev;
uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS);

+   /* WaNoName:skl,bxt
+* This WA has no name, according to the spec driver needs to reset
+* disable gather at set shader slice bit in per ctx batch
+*/
+   wa_ctx_emit(batch, index, MI_LOAD_REGISTER_IMM(1));
+   wa_ctx_emit(batch, index, COMMON_SLICE_CHICKEN2);
+   wa_ctx_emit(batch, index,
+   _MASKED_BIT_DISABLE(GEN9_DISABLE_GATHER_SET_SHADER_SLICE));
+
/* WaSetDisablePixMaskCammingAndRhwoInCommonSliceChicken:skl,bxt */
if ((IS_SKYLAKE(dev)  (INTEL_REVID(dev) = SKL_REVID_B0)) ||
(IS_BROXTON(dev)  (INTEL_REVID(dev) == BXT_REVID_A0))) {


Hmm. I thought we needed this, but looking at the User Mode Privileged
Commands of the spec, it seems like this register is not allowed to be written.
So unless this register is put in a whitelist somewhere in the future, I think
it's safe to drop this patch.


We need to whitelist few registers for preemption related WA, this can 
be added to whitelist if userspace really needs to write to it.


regards
Arun


As a preventative measure, I don't see this as harmful - but I don't feel I have
any authority to suggest whether we keep this in or not.



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Check idle to active before processing CSQ

2015-08-07 Thread Siluvery, Arun

On 07/08/2015 12:52, Daniel Vetter wrote:

On Fri, Aug 07, 2015 at 11:15:56AM +0300, Mika Kuoppala wrote:

Daniel Vetter dan...@ffwll.ch writes:


On Thu, Aug 06, 2015 at 05:09:17PM +0300, Mika Kuoppala wrote:

If idle to active bit is set, the rest of the fields
in CSQ are not valid.

Bail out early if this is the case in order to prevent
rest of the loop inspecting stale values.

Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com




looks good to me, didn't observe any impact with this patch.
Reviewed-by: Arun Siluvery arun.siluv...@linux.intel.com

regards
Arun


Same questions here too, what's the impact. E.g. if you only found this by
bspec/code inspection then it's for -next, but if it's to fix some known
breakage then it's for -fixes + cc: stable.



To this and the masked write one: Both of these were found
when I was trying to find out root cause for skl hangs.

They are both for -next. Both are in the correctness
department vrt bspec and I haven't observed any other
impact.

Point taken on being more verbose.


Thanks I added a note about this to the first patch and merged it. This
one here still seems to miss an r-b.
-Daniel



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/skl WaDisableSbeCacheDispatchPortSharing

2015-08-06 Thread Siluvery, Arun

On 06/08/2015 14:51, Mika Kuoppala wrote:

Add WaDisableSbeCacheDispatchPortSharing:skl

Cc: Arun Siluvery arun.siluv...@linux.intel.com
Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com
---
  drivers/gpu/drm/i915/intel_ringbuffer.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 1c14233..1a10358 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1059,6 +1059,13 @@ static int skl_init_workarounds(struct intel_engine_cs 
*ring)
  HDC_FENCE_DEST_SLM_DISABLE |
  HDC_BARRIER_PERFORMANCE_DISABLE);

+   /* WaDisableSbeCacheDispatchPortSharing:skl */
+   if (INTEL_REVID(dev) = SKL_REVID_F0) {
+   WA_SET_BIT_MASKED(
+   GEN7_HALF_SLICE_CHICKEN1,
+   GEN7_SBE_SS_CACHE_DISPATCH_PORT_SHARING_DISABLE);
+   }
+

seems to be applicable for BXT also until B0.

regards
Arun


return skl_tune_iz_hashing(ring);
  }




___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/skl WaDisableSbeCacheDispatchPortSharing

2015-08-06 Thread Siluvery, Arun

On 06/08/2015 15:45, Mika Kuoppala wrote:

Siluvery, Arun arun.siluv...@linux.intel.com writes:


On 06/08/2015 14:51, Mika Kuoppala wrote:

Add WaDisableSbeCacheDispatchPortSharing:skl

Cc: Arun Siluvery arun.siluv...@linux.intel.com
Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com
---
   drivers/gpu/drm/i915/intel_ringbuffer.c | 7 +++
   1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 1c14233..1a10358 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1059,6 +1059,13 @@ static int skl_init_workarounds(struct intel_engine_cs 
*ring)
  HDC_FENCE_DEST_SLM_DISABLE |
  HDC_BARRIER_PERFORMANCE_DISABLE);

+   /* WaDisableSbeCacheDispatchPortSharing:skl */
+   if (INTEL_REVID(dev) = SKL_REVID_F0) {
+   WA_SET_BIT_MASKED(
+   GEN7_HALF_SLICE_CHICKEN1,
+   GEN7_SBE_SS_CACHE_DISPATCH_PORT_SHARING_DISABLE);
+   }
+

seems to be applicable for BXT also until B0.



Yes, we have that in bxt_init_workarounds. I pondered
it is more clean to have rev check for each in their
respective setup functions.


fine with this.
Reviewed-by: Arun Siluvery arun.siluv...@linux.intel.com

regards
Arun



-Mika



regards
Arun


return skl_tune_iz_hashing(ring);
   }






___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 2/2] drm/i915:gen9: Add disable gather at set shader w/a

2015-08-05 Thread Siluvery, Arun

On 05/08/2015 15:45, Mika Kuoppala wrote:

Arun Siluvery arun.siluv...@linux.intel.com writes:


This WA is implemented in init_context as well as WA batch init.
There are also some dependent bits need to be set in other registers
for this to be complete.

v2: behaviour of disable gather at set shader bit can be specified by
two different registers, use a better option (Ben).



For me it looks like there are 2 orthogonal goals for this patch.

I think the actual workaround should be one patch, the resetting
of the set shader bit and the patch named accordingly.

Then the set shader initialization in a different patch,
if there is justification for it (that I have not managed yet
to find).


I agree it needs to be split into two patches.



But lets concentrate on the workaround itself...


Cc: Ben Widawsky benjamin.widaw...@intel.com
Cc: Joonas Lahtinen joonas.lahti...@linux.intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/i915_reg.h |  5 +
  drivers/gpu/drm/i915/intel_lrc.c|  8 
  drivers/gpu/drm/i915/intel_ringbuffer.c | 18 ++
  3 files changed, 31 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 8991cd5..8719a5a 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -1721,6 +1721,10 @@ enum skl_disp_power_wells {
  #define   MEM_DISPLAY_TRICKLE_FEED_DISABLE (12) /* 85x only */
  #define FW_BLC0x020d8
  #define FW_BLC2   0x020dc
+#define GEN7_RS_CHICKEN  0x20DC
+#define   GEN9_RS_CHICKEN_DISABLE_GATHER_AT_SHADER (12)
+#define GEN7_FF_SLICE_CHICKEN10x20E0
+#define   GEN9_PER_CTX_DISABLE_GATHER_CONTROL  (115)
  #define FW_BLC_SELF   0x020e0 /* 915+ only */
  #define   FW_BLC_SELF_EN_MASK  (131)
  #define   FW_BLC_SELF_FIFO_MASK(116) /* 945 only */
@@ -5836,6 +5840,7 @@ enum skl_disp_power_wells {
  # define GEN7_CSC1_RHWO_OPT_DISABLE_IN_RCC((110) | (126))
  # define GEN9_RHWO_OPTIMIZATION_DISABLE   (114)
  #define COMMON_SLICE_CHICKEN2 0x7014
+#define  GEN9_DISABLE_GATHER_SET_SHADER_SLICE   (112)
  # define GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE (10)

  #define HIZ_CHICKEN   0x7018
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9faad82..d3a03f3 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1292,6 +1292,14 @@ static int gen9_init_perctx_bb(struct intel_engine_cs 
*ring,
struct drm_device *dev = ring-dev;
uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS);

+   /* WA to reset disable gather at set shader slice bit */


I am thinking how we could also alert the reader that this workaround
needs to be revisited when it has been given a name.
By adding WaNoName:skl,bxt along with the comment above?


+   if (IS_SKYLAKE(dev)) {


As Ben noted, documentation is rather sparse. But the reference
to previous problems with this bit being save/restored in wrong order,
we can conclude that this should be for BXT also.


+   wa_ctx_emit(batch, index, MI_LOAD_REGISTER_IMM(1));
+   wa_ctx_emit(batch, index, COMMON_SLICE_CHICKEN2);
+   wa_ctx_emit(batch, index,
+   
_MASKED_BIT_DISABLE(GEN9_DISABLE_GATHER_SET_SHADER_SLICE));
+   }
+


The actual value of 'disable set shader' is orthogonal and beyond scope
of this Workaround so the rest should be strip out to different patch.

As you mentioned on irc we first need to know whether we need to disable 
the set shader or not (set ox7014[12:12] to 1), because the WA only 
talks about resetting this bit in per ctx batch.
The following bits can be ignore if there is not need to set that bit in 
the first place.


with reference to gem_ringfill, on my system it only completes without 
any hang if I add this patch completely but on some system this patch 
doesn't seem to be necessary.


regards
Arun


-Mika


/* WaSetDisablePixMaskCammingAndRhwoInCommonSliceChicken:skl,bxt */
if ((IS_SKYLAKE(dev)  (INTEL_REVID(dev) = SKL_REVID_B0)) ||
(IS_BROXTON(dev)  (INTEL_REVID(dev) == BXT_REVID_A0))) {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index dcd1b8f..5e8e5f9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -985,6 +985,17 @@ static int gen9_init_workarounds(struct intel_engine_cs 
*ring)
tmp |= HDC_FORCE_CSR_NON_COHERENT_OVR_DISABLE;
WA_SET_BIT_MASKED(HDC_CHICKEN0, tmp);

+   /* WA to gather at set shader - skl,bxt
+* These are dependent bits need to be set for the WA.
+*/
+   if ((IS_SKYLAKE(dev)  INTEL_REVID(dev)  SKL_REVID_D0) ||
+   (IS_BROXTON(dev)  INTEL_REVID(dev)  BXT_REVID_A0)) {
+   WA_SET_BIT_MASKED(GEN7_FF_SLICE_CHICKEN1,

Re: [Intel-gfx] [PATCH v1 2/2] drm/i915:gen9: Add disable gather at set shader w/a

2015-08-04 Thread Siluvery, Arun

On 04/08/2015 00:21, Ben Widawsky wrote:

On Mon, Aug 03, 2015 at 08:24:57PM +0100, Arun Siluvery wrote:

This WA is implemented in init_context as well as WA batch init.
There are also some dependent bits need to be set in other registers
for this to be complete.

Cc: Ben Widawsky benjamin.widaw...@intel.com
Cc: Joonas Lahtinen joonas.lahti...@linux.intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/i915_reg.h |  3 +++
  drivers/gpu/drm/i915/intel_lrc.c|  8 
  drivers/gpu/drm/i915/intel_ringbuffer.c | 16 
  3 files changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 8991cd5..24b8bb9 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -1720,7 +1720,9 @@ enum skl_disp_power_wells {
  #define   MEM_DISPLAY_A_TRICKLE_FEED_DISABLE (12) /* 830/845 only */
  #define   MEM_DISPLAY_TRICKLE_FEED_DISABLE (12) /* 85x only */
  #define FW_BLC0x020d8
+#define   GEN9_DISABLE_GATHER_AT_SET_SHADER(17)
  #define FW_BLC2   0x020dc
+#define   GEN9_RS_CHICKEN_DISABLE_GATHER_AT_SHADER (12)


Neither of these belong here. BLC is for backlight. Create a new define if we
don't have one.

#define RS_CHICKEN  0x20dc

I thought of reusing existing define but created a new one as you suggested.




  #define FW_BLC_SELF   0x020e0 /* 915+ only */
  #define   FW_BLC_SELF_EN_MASK  (131)
  #define   FW_BLC_SELF_FIFO_MASK(116) /* 945 only */
@@ -5836,6 +5838,7 @@ enum skl_disp_power_wells {
  # define GEN7_CSC1_RHWO_OPT_DISABLE_IN_RCC((110) | (126))
  # define GEN9_RHWO_OPTIMIZATION_DISABLE   (114)
  #define COMMON_SLICE_CHICKEN2 0x7014
+#define  GEN9_DISABLE_GATHER_SET_SHADER_SLICE   (112)
  # define GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE (10)

  #define HIZ_CHICKEN   0x7018
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9faad82..d3a03f3 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1292,6 +1292,14 @@ static int gen9_init_perctx_bb(struct intel_engine_cs 
*ring,
struct drm_device *dev = ring-dev;
uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS);

+   /* WA to reset disable gather at set shader slice bit */
+   if (IS_SKYLAKE(dev)) {
+   wa_ctx_emit(batch, index, MI_LOAD_REGISTER_IMM(1));
+   wa_ctx_emit(batch, index, COMMON_SLICE_CHICKEN2);
+   wa_ctx_emit(batch, index,
+   
_MASKED_BIT_DISABLE(GEN9_DISABLE_GATHER_SET_SHADER_SLICE));
+   }
+


Shouldn't this be for BXT as well? Also, why bother with the revid check below
and not here?


spec says only SKL+




/* WaSetDisablePixMaskCammingAndRhwoInCommonSliceChicken:skl,bxt */
if ((IS_SKYLAKE(dev)  (INTEL_REVID(dev) = SKL_REVID_B0)) ||
(IS_BROXTON(dev)  (INTEL_REVID(dev) == BXT_REVID_A0))) {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index dcd1b8f..4fc4b5e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -985,6 +985,15 @@ static int gen9_init_workarounds(struct intel_engine_cs 
*ring)
tmp |= HDC_FORCE_CSR_NON_COHERENT_OVR_DISABLE;
WA_SET_BIT_MASKED(HDC_CHICKEN0, tmp);

+   /* WA to gather at set shader - skl,bxt
+* These are dependent bits need to be set for the WA.
+*/
+   if (IS_SKYLAKE(dev)  (INTEL_REVID(dev)  SKL_REVID_D0) ||
+   (IS_BROXTON(dev)  INTEL_REVID(dev)  BXT_REVID_A0)) {
+   WA_SET_BIT_MASKED(FW_BLC, GEN9_DISABLE_GATHER_AT_SET_SHADER);
+   WA_SET_BIT_MASKED(FW_BLC2, 
GEN9_RS_CHICKEN_DISABLE_GATHER_AT_SHADER);
+   }
+
return 0;
  }

@@ -1068,6 +1077,13 @@ static int skl_init_workarounds(struct intel_engine_cs 
*ring)
  HDC_FENCE_DEST_SLM_DISABLE |
  HDC_BARRIER_PERFORMANCE_DISABLE);

+   /* WA to Disable gather at set shader - skl
+* This bit needs to be reset in Per ctx WA batch and it is also
+* dependent on other bits in different register, all of them need
+* be set for the WA to be complete.
+*/
+   WA_SET_BIT_MASKED(COMMON_SLICE_CHICKEN2, 
GEN9_DISABLE_GATHER_SET_SHADER_SLICE);
+
return skl_tune_iz_hashing(ring);
  }



I wouldn't set both 20dc, and 20d8, I am not sure what implication it has.
Instead, set or read bit 15 of 0x20e0 and then just set one. To me, it seems
like the best way to do this is to set 115 of 0x20e0, and then use bit 2 of
0x20dc for the workaround. We don't need per context controls of something we
have to disable always anyway.


changed it to use 0x20e0

regards
Arun


___
Intel-gfx mailing list

Re: [Intel-gfx] [PATCH v1 1/2] drm/i915:skl: Add WaEnableGapsTsvCreditFix

2015-08-04 Thread Siluvery, Arun

On 04/08/2015 09:58, Mika Kuoppala wrote:

Ben Widawsky benjamin.widaw...@intel.com writes:


On Mon, Aug 03, 2015 at 08:24:56PM +0100, Arun Siluvery wrote:

Cc: Ben Widawsky benjamin.widaw...@intel.com
Cc: Joonas Lahtinen joonas.lahti...@linux.intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/i915_reg.h | 3 +++
  drivers/gpu/drm/i915/intel_pm.c | 6 ++
  2 files changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 77967ca..8991cd5 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -6849,6 +6849,9 @@ enum skl_disp_power_wells {
  #define GEN7_MISCCPCTL(0x9424)
  #define   GEN7_DOP_CLOCK_GATE_ENABLE  (10)

+#define GEN8_GARBCNTL   0xB004
+#define   GEN9_GAPS_TSV_CREDIT_DISABLE  (17)
+
  /* IVYBRIDGE DPF */
  #define GEN7_L3CDERRST1   0xB008 /* L3CD Error Status 1 */
  #define HSW_L3CDERRST11   0xB208 /* L3CD Error Status 
register 1 slice 1 */
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index c23cab6..9152113 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -106,6 +106,12 @@ static void skl_init_clock_gating(struct drm_device *dev)
/* WaDisableLSQCROPERFforOCL:skl */
I915_WRITE(GEN8_L3SQCREG4, I915_READ(GEN8_L3SQCREG4) |
   GEN8_LQSC_RO_PERF_DIS);
+
+   /* WaEnableGapsTsvCreditFix:skl */
+   if (IS_SKYLAKE(dev)  (INTEL_REVID(dev) = SKL_REVID_C0)) {
+   I915_WRITE(GEN8_GARBCNTL, (I915_READ(GEN8_GARBCNTL) |
+  GEN9_GAPS_TSV_CREDIT_DISABLE));
+   }
  }

  static void bxt_init_clock_gating(struct drm_device *dev)


FWIW, the docs make it sound like BIOS should be doing this. Did you verify we
actually don't have the bit set with more recent BKC?



I have pretty recent BIOS and the bit was not set.


I checked about this, it should be done in driver.

regards
Arun




Tested-by: Ben Widawsky b...@bwidawsk.net
Reviewed-by: Ben Widawsky b...@bwidawsk.net



Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90854
Tested-by: Mika Kuoppala mika.kuopp...@intel.com


--
Ben Widawsky, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx




___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 2/4] drm/i915: Add provision to extend Golden context batch

2015-07-20 Thread Siluvery, Arun

On 17/07/2015 21:03, Chris Wilson wrote:

On Fri, Jul 17, 2015 at 07:13:32PM +0100, Arun Siluvery wrote:

The Golden batch carries 3D state at the beginning so that HW starts with
a known state. It is carried as a binary blob which is auto-generated from
source. The idea was it would be easier to maintain and keep the complexity
out of the kernel which makes sense as we don't really touch it. However if
you really need to update it then you need to update generator source and
keep the binary blob in sync with it.

There is a need to patch this in bxt to send one additional command to enable
a feature. A solution was to patch the binary data with some additional
data structures (included as part of auto-generator source) but it was
unnecessarily complicated.

Chris suggested the idea of having a secondary batch and execute two batch
buffers. It has clear advantages as we needn't touch the base golden batch,
can customize secondary/auxiliary batch depending on Gen and can be carried
in the driver with no dependencies.

This patch adds support for this auxiliary batch which is inserted at the
end of golden batch and is completely independent from it. Thanks to Mika
for the preliminary review.

v2: Strictly conform to the batch size requirements to cover Gen2 and
add comments to clarify overflow check in macro (Chris, Mika).

Cc: Mika Kuoppala mika.kuopp...@intel.com
Cc: Chris Wilson ch...@chris-wilson.co.uk
Cc: Armin Reese armin.c.re...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/i915_gem_render_state.c | 45 
  drivers/gpu/drm/i915/i915_gem_render_state.h |  2 ++
  drivers/gpu/drm/i915/intel_lrc.c |  6 
  3 files changed, 53 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c 
b/drivers/gpu/drm/i915/i915_gem_render_state.c
index b6492fe..5026a62 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -73,6 +73,24 @@ free_gem:
return ret;
  }

+/*
+ * Macro to add commands to auxiliary batch.
+ * This macro only checks for page overflow before inserting the commands,
+ * this is sufficient as the null state generator makes the final batch
+ * with two passes to build command and state separately. At this point
+ * the size of both are known and it compacts them by relocating the state
+ * right after the commands taking care of aligment so we should sufficient
+ * space below them for adding new commands.
+ */
+#define OUT_BATCH(batch, i, val)   \
+   do {\
+   if (WARN_ON((i) = PAGE_SIZE / sizeof(u32))) {   \
+   ret = -ENOSPC;  \
+   goto err_out;   \
+   }   \
+   (batch)[(i)++] = (val); \
+   } while(0)
+
  static int render_state_setup(struct render_state *so)
  {
const struct intel_renderstate_rodata *rodata = so-rodata;
@@ -110,6 +128,21 @@ static int render_state_setup(struct render_state *so)

d[i++] = s;
}
+
+   while (i % CACHELINE_DWORDS)
+   OUT_BATCH(d, i, MI_NOOP);
+
+   so-aux_batch_offset = i * sizeof(u32);
+
+   OUT_BATCH(d, i, MI_BATCH_BUFFER_END);
+   so-aux_batch_size = (i * sizeof(u32)) - so-aux_batch_offset;
+
+   /*
+* Since we are sending length, we need to strictly conform to
+* all requirements. For Gen2 this must be a multiple of 8.
+*/
+   so-aux_batch_size = ALIGN(so-aux_batch_size, 8);
+
kunmap(page);

ret = i915_gem_object_set_to_gtt_domain(so-obj, false);
@@ -128,6 +161,8 @@ err_out:
return ret;
  }

+#undef OUT_BATCH
+
  void i915_gem_render_state_fini(struct render_state *so)
  {
i915_gem_object_ggtt_unpin(so-obj);
@@ -176,6 +211,16 @@ int i915_gem_render_state_init(struct drm_i915_gem_request 
*req)
if (ret)
goto out;

+   if (so.aux_batch_size  8) {
+   ret = req-ring-dispatch_execbuffer(req,
+(so.ggtt_offset +
+ so.aux_batch_offset),
+so.aux_batch_size,
+I915_DISPATCH_SECURE);
+   if (ret)
+   goto out;
+   }
+
i915_vma_move_to_active(i915_gem_obj_to_ggtt(so.obj), req);

  out:
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.h 
b/drivers/gpu/drm/i915/i915_gem_render_state.h
index 7aa7372..79de101 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.h
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.h
@@ -37,6 +37,8 @@ struct render_state {
struct drm_i915_gem_object *obj;
u64 

Re: [Intel-gfx] [PATCH] drm/i915: Change SRM, LRM instructions to use correct length

2015-07-20 Thread Siluvery, Arun

On 16/07/2015 16:19, Arun Siluvery wrote:

MI_STORE_REGISTER_MEM, MI_LOAD_REGISTER_MEM instructions are not really
variable length instructions unlike MI_LOAD_REGISTER_IMM where it expects
(reg, addr) pairs so use fixed length for these instructions.

Cc: Dave Gordon david.s.gor...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---


ping? to any reviewers?

regards
Arun


  drivers/gpu/drm/i915/i915_cmd_parser.c | 8 
  drivers/gpu/drm/i915/i915_reg.h| 8 
  drivers/gpu/drm/i915/intel_display.c   | 4 ++--
  drivers/gpu/drm/i915/intel_lrc.c   | 4 ++--
  4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 430571b..3771922 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -124,14 +124,14 @@ static const struct drm_i915_cmd_descriptor common_cmds[] 
= {
CMD(  MI_STORE_DWORD_INDEX, SMI,   !F,  0xFF,   R  ),
CMD(  MI_LOAD_REGISTER_IMM(1),  SMI,   !F,  0xFF,   W,
  .reg = { .offset = 1, .mask = 0x007C, .step = 2 }),
-   CMD(  MI_STORE_REGISTER_MEM(1), SMI,   !F,  0xFF,   W | B,
+   CMD(  MI_STORE_REGISTER_MEM,SMI,F,  1, W | B,
  .reg = { .offset = 1, .mask = 0x007C },
  .bits = {{
.offset = 0,
.mask = MI_GLOBAL_GTT,
.expected = 0,
  }},  ),
-   CMD(  MI_LOAD_REGISTER_MEM(1), SMI,   !F,  0xFF,   W | B,
+   CMD(  MI_LOAD_REGISTER_MEM, SMI,F,  1, W | B,
  .reg = { .offset = 1, .mask = 0x007C },
  .bits = {{
.offset = 0,
@@ -1021,7 +1021,7 @@ static bool check_cmd(const struct intel_engine_cs *ring,
 * only MI_LOAD_REGISTER_IMM commands.
 */
if (reg_addr == OACONTROL) {
-   if (desc-cmd.value == MI_LOAD_REGISTER_MEM(1)) 
{
+   if (desc-cmd.value == MI_LOAD_REGISTER_MEM) {
DRM_DEBUG_DRIVER(CMD: Rejected LRM to 
OACONTROL\n);
return false;
}
@@ -1035,7 +1035,7 @@ static bool check_cmd(const struct intel_engine_cs *ring,
 * allowed mask/value pair given in the whitelist entry.
 */
if (reg-mask) {
-   if (desc-cmd.value == MI_LOAD_REGISTER_MEM(1)) 
{
+   if (desc-cmd.value == MI_LOAD_REGISTER_MEM) {
DRM_DEBUG_DRIVER(CMD: Rejected LRM to 
masked register 0x%08X\n,
 reg_addr);
return false;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index bd13494..cc3cb3e 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -342,8 +342,8 @@
   */
  #define MI_LOAD_REGISTER_IMM(x)   MI_INSTR(0x22, 2*(x)-1)
  #define   MI_LRI_FORCE_POSTED (112)
-#define MI_STORE_REGISTER_MEM(x) MI_INSTR(0x24, 2*(x)-1)
-#define MI_STORE_REGISTER_MEM_GEN8(x) MI_INSTR(0x24, 3*(x)-1)
+#define MI_STORE_REGISTER_MEMMI_INSTR(0x24, 1)
+#define MI_STORE_REGISTER_MEM_GEN8   MI_INSTR(0x24, 2)
  #define   MI_SRM_LRM_GLOBAL_GTT   (122)
  #define MI_FLUSH_DW   MI_INSTR(0x26, 1) /* for GEN6 */
  #define   MI_FLUSH_DW_STORE_INDEX (121)
@@ -354,8 +354,8 @@
  #define   MI_INVALIDATE_BSD   (17)
  #define   MI_FLUSH_DW_USE_GTT (12)
  #define   MI_FLUSH_DW_USE_PPGTT   (02)
-#define MI_LOAD_REGISTER_MEM(x) MI_INSTR(0x29, 2*(x)-1)
-#define MI_LOAD_REGISTER_MEM_GEN8(x) MI_INSTR(0x29, 3*(x)-1)
+#define MI_LOAD_REGISTER_MEM  MI_INSTR(0x29, 1)
+#define MI_LOAD_REGISTER_MEM_GEN8  MI_INSTR(0x29, 2)
  #define MI_BATCH_BUFFER   MI_INSTR(0x30, 1)
  #define   MI_BATCH_NON_SECURE (1)
  /* for snb/ivb/vlv this also means batch in ppgtt when ppgtt is enabled. */
diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 472c544..a78c823 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11053,10 +11053,10 @@ static int intel_gen7_queue_flip(struct drm_device 
*dev,
DERRMR_PIPEB_PRI_FLIP_DONE |
DERRMR_PIPEC_PRI_FLIP_DONE));
if (IS_GEN8(dev))
-   intel_ring_emit(ring, MI_STORE_REGISTER_MEM_GEN8(1) |
+   intel_ring_emit(ring, MI_STORE_REGISTER_MEM_GEN8 |
  

Re: [Intel-gfx] [PATCH v1 3/4] drm/i915:bxt: Enable Pooled EU support

2015-07-17 Thread Siluvery, Arun

On 17/07/2015 17:27, Chris Wilson wrote:

On Fri, Jul 17, 2015 at 05:08:53PM +0100, Arun Siluvery wrote:

This mode allows to assign EUs to pools.
The command to enable this mode is sent in auxiliary golden context batch
as this is only issued once with each context initialization. Thanks to
Mika for the preliminary review.


A quick explanation for why this has to be in the kernel would be nice.
Privileged instruction?


This purpose of auxiliary batch is explained in patch2, but I can add 
some explanation about this one also.




Not fond of the split between this and patch 4. Patch 4 intoduces one
feature flag that looks different to the one we use here to enable
support.
I will patch4 as separate as it deals with libdrm changes but use the 
feature flag in this one.


regards
Arun


-Chris



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 3/4] drm/i915/bxt: Add get_param to query Pooled EU availability

2015-07-17 Thread Siluvery, Arun

On 17/07/2015 19:13, Arun Siluvery wrote:

User space clients need to know when the pooled EU feature is present
and enabled on the hardware so that they can adapt work submissions.
Create a new device info flag for this purpose, and create a new GETPARAM
entry to allow user space to query its setting.

Set has_pooled_eu to true in the Broxton static device info - Broxton
supports the feature in hardware and the driver will enable it by
default.

Signed-off-by: Jeff McGee jeff.mc...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---


Please ignore this patch, this is squashed with Patch4 drm/i915:bxt: 
Enable Pooled EU support to keep all enabling changes in the same place 
otherwise we would've announced support to userspace before enabling it 
in kernel.


regards
Arun


  drivers/gpu/drm/i915/i915_dma.c | 3 +++
  drivers/gpu/drm/i915/i915_drv.c | 1 +
  drivers/gpu/drm/i915/i915_drv.h | 5 -
  include/uapi/drm/i915_drm.h | 1 +
  4 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 5e63076..6c31beb 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -170,6 +170,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
case I915_PARAM_HAS_RESOURCE_STREAMER:
value = HAS_RESOURCE_STREAMER(dev);
break;
+   case I915_PARAM_HAS_POOLED_EU:
+   value = HAS_POOLED_EU(dev);
+   break;
default:
DRM_DEBUG(Unknown parameter %d\n, param-param);
return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index e44dc0d..213f74d 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -389,6 +389,7 @@ static const struct intel_device_info intel_broxton_info = {
.num_pipes = 3,
.has_ddi = 1,
.has_fbc = 1,
+   .has_pooled_eu = 1,
GEN_DEFAULT_PIPEOFFSETS,
IVB_CURSOR_OFFSETS,
  };
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 768d1db..32850a8 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -775,7 +775,8 @@ struct intel_csr {
func(supports_tv) sep \
func(has_llc) sep \
func(has_ddi) sep \
-   func(has_fpga_dbg)
+   func(has_fpga_dbg) sep \
+   func(has_pooled_eu)

  #define DEFINE_FLAG(name) u8 name:1
  #define SEP_SEMICOLON ;
@@ -2549,6 +2550,8 @@ struct drm_i915_cmd_table {
  #define HAS_RESOURCE_STREAMER(dev) (IS_HASWELL(dev) || \
INTEL_INFO(dev)-gen = 8)

+#define HAS_POOLED_EU(dev) (INTEL_INFO(dev)-has_pooled_eu)
+
  #define INTEL_PCH_DEVICE_ID_MASK  0xff00
  #define INTEL_PCH_IBX_DEVICE_ID_TYPE  0x3b00
  #define INTEL_PCH_CPT_DEVICE_ID_TYPE  0x1c00
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index e7c29f1..9649577 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -356,6 +356,7 @@ typedef struct drm_i915_irq_wait {
  #define I915_PARAM_EU_TOTAL34
  #define I915_PARAM_HAS_GPU_RESET   35
  #define I915_PARAM_HAS_RESOURCE_STREAMER 36
+#define I915_PARAM_HAS_POOLED_EU 37

  typedef struct drm_i915_getparam {
int param;



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Do kunmap if renderstate parsing fails

2015-07-16 Thread Siluvery, Arun

On 16/07/2015 15:36, Mika Kuoppala wrote:

Kunmap the renderstate page on error path.

Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com
---
  drivers/gpu/drm/i915/i915_gem_render_state.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c 
b/drivers/gpu/drm/i915/i915_gem_render_state.c
index a0201fc..b6492fe 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -96,8 +96,10 @@ static int render_state_setup(struct render_state *so)
s = lower_32_bits(r);
if (so-gen = 8) {
if (i + 1 = rodata-batch_items ||
-   rodata-batch[i + 1] != 0)
-   return -EINVAL;
+   rodata-batch[i + 1] != 0) {
+   ret = -EINVAL;
+   goto err_out;
+   }

d[i++] = s;
s = upper_32_bits(r);
@@ -120,6 +122,10 @@ static int render_state_setup(struct render_state *so)
}

return 0;
+
+err_out:
+   kunmap(page);
+   return ret;
  }

  void i915_gem_render_state_fini(struct render_state *so)



Looks good to me,
Reviewed-by: Arun Siluvery arun.siluv...@linux.intel.com

regards
Arun

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC 0/2] Add Pooled EU support

2015-07-13 Thread Siluvery, Arun

On 11/07/2015 20:09, Chris Wilson wrote:

On Sat, Jul 11, 2015 at 08:05:05PM +0100, Chris Wilson wrote:

On Fri, Jul 10, 2015 at 06:35:18PM +0100, Arun Siluvery wrote:

These patches enabled Pooled EU support for BXT, they are implemented
by Armin Reese. I am sending these patches in its current form for comments.

These patches modify Golden batch to have a set of modification values
where we can change the commands based on Gen. The commands to enable
Pooled EU are inserted after MI_BATCH_BUFFER_END. If the given Gen
supports this feature, modification values are used to replace
MI_BATCH_BUFFER_END so we send commands to enable Pooled EU. These
commands need to be part of this batch because they are to be
initialized only once. Userspace will have option to query the
availability of this feature, those changes are not included in
this series.


Would it not just be simpler to execute 2 batches? First holding the
basic and common state for the gen, the second using subgen. That we
have a chunk of binary data is nasty, but at least we can point to the
generator and be able to decipher it and recreate it as required. Doing
binary patching on top, on that path lies madness.


I like this idea of sending 2 batches if that is acceptable. In this 
case we don't have to touch the golden batch and hence the generator 
tool and also not worry about using the correct binary blob as header.


the setup in this case would be,

1. send golden batch
2. prepare and send batch to configure pooled EU as per subslice and EU 
count


Why we have a separate tool in the first place, is it not possible to 
carry all of them in code or are there any restrictions in doing so?




What is the minimum instruction sequence required to be able to setup the
default EU state? Is it small enough that carrying it as code in the
kernel is viable (and readable)?

setting up of pooled EU configuration is only few instructions, it can 
be added to the driver.

(That actually is critical here as currently we have to juggle multiple
sources and look very carefully at what is being patched - I am not
confident that we will not introduce mistakes in a week's time, let
alone a year or two.)


The alternative is to just say that the patch table is also
autogenerated and for that to be simple and clear, and far more
documentated as it relies on a strict protocol.


The patch table is also auto generated using intel_null_state_gen tool 
but it is patched based on Gen.


regards
Arun


-Chris



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] drm/i915: Update WaFlushCoherentL3CacheLinesAtContextSwitch

2015-07-10 Thread Siluvery, Arun

On 10/07/2015 09:25, Dan Carpenter wrote:

Hello Arun Siluvery,

The patch 9e00084750c0: drm/i915: Update
WaFlushCoherentL3CacheLinesAtContextSwitch from Jul 3, 2015, leads
to the following static checker warning:

drivers/gpu/drm/i915/intel_lrc.c:1188 gen8_init_indirectctx_bb()
warn: unsigned 'index' is never less than zero.

drivers/gpu/drm/i915/intel_lrc.c
   1174  static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring,
   1175  struct i915_wa_ctx_bb *wa_ctx,
   1176  uint32_t *const batch,
   1177  uint32_t *offset)
   1178  {
   1179  uint32_t scratch_addr;
   1180  uint32_t index = wa_ctx_start(wa_ctx, *offset, 
CACHELINE_DWORDS);
   1181
   1182  /* WaDisableCtxRestoreArbitration:bdw,chv */
   1183  wa_ctx_emit(batch, index, MI_ARB_ON_OFF | MI_ARB_DISABLE);
   1184
   1185  /* WaFlushCoherentL3CacheLinesAtContextSwitch:bdw */
   1186  if (IS_BROADWELL(ring-dev)) {
   1187  index = gen8_emit_flush_coherentl3_wa(ring, batch, 
index);
   1188  if (index  0)
 ^
Never true.


Thank you for reporting this, I will change it as below.

int ret = gen8_emit_flush_coherentl3_wa(ring, batch, index);
if (ret  0)
return ret;
index = ret;

regards
Arun



   1189  return index;
   1190  }
   1191

regards,
dan carpenter



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 1/4] drm/i915: Enable WA batch buffers for Gen9

2015-07-10 Thread Siluvery, Arun

On 10/07/2015 16:52, Mika Kuoppala wrote:

Arun Siluvery arun.siluv...@linux.intel.com writes:


This patch only enables support for Gen9, the actual WA will be
initialized in subsequent patches.

The WARN that we use to warn user if WA batch support is not available
for a particular Gen is replaced with DRM_ERROR as warning here doesn't
really add much value.

v2: include all infrastructure bits in this patch so that subsequent
changes only correspond the WA added (Chris)

Cc: Imre Deak imre.d...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com


The wa_ctx_emits need index as second param now. With those,

Reviewed-by: Mika Kuoppala mika.kuopp...@intel.com


Thanks Mika, I will send the updated patches.
Your r-b tag is for all patches correct?
Should I include the tag while sending the patches?

regards
Arun


---
  drivers/gpu/drm/i915/intel_lrc.c | 50 +---
  1 file changed, 47 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 23ff018..1e88b3b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1269,6 +1269,35 @@ static int gen8_init_perctx_bb(struct intel_engine_cs 
*ring,
return wa_ctx_end(wa_ctx, *offset = index, 1);
  }

+static int gen9_init_indirectctx_bb(struct intel_engine_cs *ring,
+   struct i915_wa_ctx_bb *wa_ctx,
+   uint32_t *const batch,
+   uint32_t *offset)
+{
+   uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS);
+
+   /* FIXME: Replace me with WA */
+   wa_ctx_emit(batch, MI_NOOP);
+
+   /* Pad to end of cacheline */
+   while (index % CACHELINE_DWORDS)
+   wa_ctx_emit(batch, MI_NOOP);
+
+   return wa_ctx_end(wa_ctx, *offset = index, CACHELINE_DWORDS);
+}
+
+static int gen9_init_perctx_bb(struct intel_engine_cs *ring,
+  struct i915_wa_ctx_bb *wa_ctx,
+  uint32_t *const batch,
+  uint32_t *offset)
+{
+   uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS);
+
+   wa_ctx_emit(batch, MI_BATCH_BUFFER_END);
+
+   return wa_ctx_end(wa_ctx, *offset = index, 1);
+}
+
  static int lrc_setup_wa_ctx_obj(struct intel_engine_cs *ring, u32 size)
  {
int ret;
@@ -1310,10 +1339,11 @@ static int intel_init_workaround_bb(struct 
intel_engine_cs *ring)
WARN_ON(ring-id != RCS);

/* update this when WA for higher Gen are added */
-   if (WARN(INTEL_INFO(ring-dev)-gen  8,
-WA batch buffer is not initialized for Gen%d\n,
-INTEL_INFO(ring-dev)-gen))
+   if (INTEL_INFO(ring-dev)-gen  9) {
+   DRM_ERROR(WA batch buffer is not initialized for Gen%d\n,
+ INTEL_INFO(ring-dev)-gen);
return 0;
+   }

/* some WA perform writes to scratch page, ensure it is valid */
if (ring-scratch.obj == NULL) {
@@ -1345,6 +1375,20 @@ static int intel_init_workaround_bb(struct 
intel_engine_cs *ring)
  offset);
if (ret)
goto out;
+   } else if (INTEL_INFO(ring-dev)-gen == 9) {
+   ret = gen9_init_indirectctx_bb(ring,
+  wa_ctx-indirect_ctx,
+  batch,
+  offset);
+   if (ret)
+   goto out;
+
+   ret = gen9_init_perctx_bb(ring,
+ wa_ctx-per_ctx,
+ batch,
+ offset);
+   if (ret)
+   goto out;
}

  out:
--
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx




___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCHv7] drm/i915: Added Programming of the MOCS

2015-07-08 Thread Siluvery, Arun

On 07/07/2015 20:13, Francisco Jerez wrote:

From: Peter Antoine peter.anto...@intel.com

This change adds the programming of the MOCS registers to the gen 9+
platforms. This change set programs the MOCS register values to a set
of values that are defined to be optimal.

It creates a fixed register set that is programmed across the different
engines so that all engines have the same table. This is done as the
main RCS context only holds the registers for itself and the shared
L3 values. By trying to keep the registers consistent across the
different engines it should make the programming for the registers
consistent.

v2:
-'static const' for private data structures and style changes.(Matt Turner)
v3:
- Make the tables slightly more readable. (Damien Lespiau)
- Updated tables fix performance regression.
v4:
- Code formatting. (Chris Wilson)
- re-privatised mocs code. (Daniel Vetter)
v5:
- Changed the name of a function. (Chris Wilson)
v6:
- re-based
- Added Mesa table entry (skylake  broxton) (Francisco Jerez)
- Tidied up the readability defines (Francisco Jerez)
- NUMBER of entries defines wrong. (Jim Bish)
- Added comments to clear up the meaning of the tables (Jim Bish)

Signed-off-by: Peter Antoine peter.anto...@intel.com

v7 (Francisco Jerez):
- Don't write L3-specific MOCS_ESC/SCC values into the e/LLC control
   tables.  Prefix L3-specific defines consistently with L3_ and
   e/LLC-specific defines with LE_ to avoid this kind of confusion in
   the future.
- Change L3CC WT define back to RESERVED (matches my hardware
   documentation and the original patch, probably a misunderstanding
   of my own previous comment).
- Drop Android tables, define new minimal tables more suitable for the
   open source stack.
- Add comment that the MOCS tables are part of the kernel ABI.
- Move intel_logical_ring_begin() and _advance() calls one level down
   (Chris Wilson).
- Minor formatting and style fixes.

Signed-off-by: Francisco Jerez curroje...@riseup.net
---
  drivers/gpu/drm/i915/Makefile |   1 +
  drivers/gpu/drm/i915/i915_reg.h   |   9 ++
  drivers/gpu/drm/i915/intel_lrc.c  |  11 +-
  drivers/gpu/drm/i915/intel_lrc.h  |   1 +
  drivers/gpu/drm/i915/intel_mocs.c | 324 ++
  drivers/gpu/drm/i915/intel_mocs.h |  57 +++
  6 files changed, 401 insertions(+), 2 deletions(-)
  create mode 100644 drivers/gpu/drm/i915/intel_mocs.c
  create mode 100644 drivers/gpu/drm/i915/intel_mocs.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index de21965..e52e012 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -36,6 +36,7 @@ i915-y += i915_cmd_parser.o \
  i915_trace_points.o \
  intel_hotplug.o \
  intel_lrc.o \
+ intel_mocs.o \
  intel_ringbuffer.o \
  intel_uncore.o

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 2a29bcc..9b17260 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -7906,4 +7906,13 @@ enum skl_disp_power_wells {
  #define _PALETTE_A (dev_priv-info.display_mmio_offset + 0xa000)
  #define _PALETTE_B (dev_priv-info.display_mmio_offset + 0xa800)

+/* MOCS (Memory Object Control State) registers */
+#define GEN9_LNCFCMOCS0(0xB020)/* L3 Cache Control 
base */
+
+#define GEN9_GFX_MOCS_0(0xc800)/* Graphics MOCS base 
register*/
+#define GEN9_MFX0_MOCS_0   (0xc900)/* Media 0 MOCS base register*/
+#define GEN9_MFX1_MOCS_0   (0xcA00)/* Media 1 MOCS base register*/
+#define GEN9_VEBOX_MOCS_0  (0xcB00)/* Video MOCS base register*/
+#define GEN9_BLT_MOCS_0(0xcc00)/* Blitter MOCS base 
register*/
+
  #endif /* _I915_REG_H_ */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d4f8b43..466d17c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -135,6 +135,7 @@
  #include drm/drmP.h
  #include drm/i915_drm.h
  #include i915_drv.h
+#include intel_mocs.h

  #define GEN9_LR_CONTEXT_RENDER_SIZE (22 * PAGE_SIZE)
  #define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE)
@@ -772,8 +773,7 @@ static int logical_ring_prepare(struct drm_i915_gem_request 
*req, int bytes)
   *
   * Return: non-zero if the ringbuffer is not ready to be written to.
   */
-static int intel_logical_ring_begin(struct drm_i915_gem_request *req,
-   int num_dwords)
+int intel_logical_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
  {
struct drm_i915_private *dev_priv;
int ret;
@@ -1675,6 +1675,13 @@ static int gen8_init_rcs_context(struct 
drm_i915_gem_request *req)
if (ret)
return ret;

+   /*
+* Failing to program the MOCS is non-fatal.The system will not
+* run at peak performance. So generate a warning and carry on.
+*/
+   if 

Re: [Intel-gfx] [PATCH] drm/i915: Update WaFlushCoherentL3CacheLinesAtContextSwitch

2015-07-06 Thread Siluvery, Arun

On 06/07/2015 12:52, Dave Gordon wrote:

On 03/07/15 16:42, Chris Wilson wrote:

On Fri, Jul 03, 2015 at 02:27:31PM +0100, Arun Siluvery wrote:

In this WA we need to set GEN8_L3SQCREG4[21:21] and reset it after PIPE_CONTROL
instruction but there is a slight complication as this is applied in WA batch
where the values are only initialized once.
Dave identified an issue with the current implementation where the register 
value
is read once at the beginning and it is reused; this patch corrects this by 
saving
the register value to memory, update register with the bit of our interest and
restore it back with original value.

This implementation uses MI_LOAD_REGISTER_MEM which is currently only used
by command parser and was using a default length of 0. This is now updated
with correct length and moved to appropriate place.

Cc: Chris Wilson ch...@chris-wilson.co.uk
Cc: Dave Gordon david.s.gor...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
   drivers/gpu/drm/i915/i915_cmd_parser.c |  6 +--
   drivers/gpu/drm/i915/i915_reg.h|  3 +-
   drivers/gpu/drm/i915/intel_lrc.c   | 72 
+-
   3 files changed, 58 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 306d9e4..430571b 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -131,7 +131,7 @@ static const struct drm_i915_cmd_descriptor common_cmds[] = 
{
.mask = MI_GLOBAL_GTT,
.expected = 0,
  }},  ),
-   CMD(  MI_LOAD_REGISTER_MEM, SMI,   !F,  0xFF,   W | B,
+   CMD(  MI_LOAD_REGISTER_MEM(1), SMI,   !F,  0xFF,   W | B,
  .reg = { .offset = 1, .mask = 0x007C },
  .bits = {{
.offset = 0,
@@ -1021,7 +1021,7 @@ static bool check_cmd(const struct intel_engine_cs *ring,
 * only MI_LOAD_REGISTER_IMM commands.
 */
if (reg_addr == OACONTROL) {
-   if (desc-cmd.value == MI_LOAD_REGISTER_MEM) {
+   if (desc-cmd.value == MI_LOAD_REGISTER_MEM(1)) 
{


I had a double take here, but it all comes out in the wash. For one
moment, I thought the cmd matching had changed, but that has the length
masked out.

Reviewed-by: Chris Wilson ch...@cris-wilson.co.uk

Who will start to complain about all the extra frequent register writes,
probably into common power wells
-Chris


Hmm ... that is quite confusing, especially as the actual opcode in the
instruction stream will be MI_LOAD_REGISTER_MEM(2) on GEN8+. It might

true, but cmd parser is only upto GEN7.

regards
Arun


almost be better to use MI_LOAD_REGISTER_MEM(0) to emphasise that the
length field is a wildcard and not something that will be matched exactly.

.Dave.



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/4] drm/i915: Enable WA batch buffers for Gen9

2015-07-03 Thread Siluvery, Arun

On 03/07/2015 17:57, Chris Wilson wrote:

On Fri, Jul 03, 2015 at 05:53:38PM +0100, Arun Siluvery wrote:

This patch only enables support for Gen9, the actual WA will be
initialized in subsequent patches.

The WARN that we use to warn user if WA batch support is not available
for a particular Gen is replaced with DRM_ERROR as warning here doesn't
really add much value.

Cc: Imre Deak imre.d...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/intel_lrc.c | 41 +---
  1 file changed, 38 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 23ff018..927f395 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1269,6 +1269,26 @@ static int gen8_init_perctx_bb(struct intel_engine_cs 
*ring,
return wa_ctx_end(wa_ctx, *offset = index, 1);
  }

+static int gen9_init_indirectctx_bb(struct intel_engine_cs *ring,
+   struct i915_wa_ctx_bb *wa_ctx,
+   uint32_t *const batch,
+   uint32_t *offset)
+{
+   /* FIXME: Replace me with WA */


Do the same int index = wa_ctx_begin();

wa_ctx_emit(MI_BATCH_BUFFER_END) (and MI_NOOP for perctx)

return wa_ctx_end()

you did for gen8. That way the series doesn't suddenly break halfway
through (or just after the first patch) and we can check the
infrastructure in situ, and the actual wa separately later.


(forgot to reply-all)

right, will update it along with other review comments, thanks.

regards
Arun


-Chris



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH resend 3/5] drm/i915: Enable resource streamer on Execlists

2015-06-26 Thread Siluvery, Arun

On 16/06/2015 11:39, Abdiel Janulgue wrote:

GEN8 and above uses Execlists by default instead of the legacy
ringbuffer for batch execution. This patch enables the resource
streamer bits when required.

Patch is based on the initial work by Minu Mathai minu.mat...@intel.com
This version also adds the required bits to enable GEN8 Resource
Streamer context save and restore for Execlists.

Cc: ville.syrj...@linux.intel.com
Signed-off-by: Abdiel Janulgue abdiel.janul...@linux.intel.com
---
  drivers/gpu/drm/i915/intel_lrc.c | 8 ++--
  drivers/gpu/drm/i915/intel_lrc.h | 1 +
  2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index fcb074b..b015e96 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1172,7 +1172,10 @@ static int gen8_emit_bb_start(struct intel_ringbuffer 
*ringbuf,
return ret;

/* FIXME(BDW): Address space and security selectors. */
-   intel_logical_ring_emit(ringbuf, MI_BATCH_BUFFER_START_GEN8 | 
(ppgtt8));
+   intel_logical_ring_emit(ringbuf, MI_BATCH_BUFFER_START_GEN8 |
+   (ppgtt8) |
+   (dispatch_flags  I915_DISPATCH_RS ?
+MI_BATCH_RESOURCE_STREAMER : 0));
intel_logical_ring_emit(ringbuf, lower_32_bits(offset));
intel_logical_ring_emit(ringbuf, upper_32_bits(offset));
intel_logical_ring_emit(ringbuf, MI_NOOP);
@@ -1726,7 +1729,8 @@ populate_lr_context(struct intel_context *ctx, struct 
drm_i915_gem_object *ctx_o
reg_state[CTX_CONTEXT_CONTROL] = RING_CONTEXT_CONTROL(ring);
reg_state[CTX_CONTEXT_CONTROL+1] =
_MASKED_BIT_ENABLE(CTX_CTRL_INHIBIT_SYN_CTX_SWITCH |
-   CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT);
+  CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT |
+  CTX_CTRL_RS_CTX_ENABLE);
reg_state[CTX_RING_HEAD] = RING_HEAD(ring-mmio_base);
reg_state[CTX_RING_HEAD+1] = 0;
reg_state[CTX_RING_TAIL] = RING_TAIL(ring-mmio_base);
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index adb731e4..de6087a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -32,6 +32,7 @@
  #define RING_CONTEXT_CONTROL(ring)((ring)-mmio_base+0x244)
  #define CTX_CTRL_INHIBIT_SYN_CTX_SWITCH   (1  3)
  #define CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT   (1  0)
+#define   CTX_CTRL_RS_CTX_ENABLE(1  1)
  #define RING_CONTEXT_STATUS_BUF(ring) ((ring)-mmio_base+0x370)
  #define RING_CONTEXT_STATUS_PTR(ring) ((ring)-mmio_base+0x3a0)



looks good to me,
Reviewed-by: Arun Siluvery arun.siluv...@linux.intel.com

regards
Arun

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v6 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround

2015-06-23 Thread Siluvery, Arun

On 22/06/2015 17:59, Siluvery, Arun wrote:

On 22/06/2015 17:21, Ville Syrjälä wrote:

On Fri, Jun 19, 2015 at 06:37:15PM +0100, Arun Siluvery wrote:

In Per context w/a batch buffer,
WaRsRestoreWithPerCtxtBb

This WA performs writes to scratch page so it must be valid, this check
is performed before initializing the batch with this WA.

v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
so as to not break any future users of existing definitions (Michel)

v3: Length defined in current definitions of LRM, LRR instructions was specified
as 0. It seems it is common convention for instructions whose length vary 
between
platforms. This is not an issue so far because they are not used anywhere except
command parser; now that we use in this patch update them with correct length
and also move them out of command parser placeholder to appropriate place.
remove unnecessary padding and follow the WA programming sequence exactly
as mentioned in spec which is essential for this WA (Dave).

Cc: Chris Wilson ch...@chris-wilson.co.uk
Cc: Dave Gordon david.s.gor...@intel.com
Signed-off-by: Rafael Barbalho rafael.barba...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
   drivers/gpu/drm/i915/i915_reg.h  | 29 +++--
   drivers/gpu/drm/i915/intel_lrc.c | 54 

   2 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 7637e64..208620d 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -347,6 +347,31 @@
   #define   MI_INVALIDATE_BSD  (17)
   #define   MI_FLUSH_DW_USE_GTT(12)
   #define   MI_FLUSH_DW_USE_PPGTT  (02)
+#define MI_LOAD_REGISTER_MEMMI_INSTR(0x29, 1)
+#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2)
+#define   MI_LRM_USE_GLOBAL_GTT (122)
+#define   MI_LRM_ASYNC_MODE_ENABLE (121)
+#define MI_LOAD_REGISTER_REGMI_INSTR(0x2A, 1)
+#define MI_ATOMIC(len) MI_INSTR(0x2F, (len-2))
+#define   MI_ATOMIC_MEMORY_TYPE_GGTT   (122)
+#define   MI_ATOMIC_INLINE_DATA(118)
+#define   MI_ATOMIC_CS_STALL   (117)
+#define   MI_ATOMIC_RETURN_DATA_CTL(116)
+#define MI_ATOMIC_OP_MASK(op)  ((op)  8)
+#define MI_ATOMIC_AND  MI_ATOMIC_OP_MASK(0x01)
+#define MI_ATOMIC_OR   MI_ATOMIC_OP_MASK(0x02)
+#define MI_ATOMIC_XOR  MI_ATOMIC_OP_MASK(0x03)
+#define MI_ATOMIC_MOVE MI_ATOMIC_OP_MASK(0x04)
+#define MI_ATOMIC_INC  MI_ATOMIC_OP_MASK(0x05)
+#define MI_ATOMIC_DEC  MI_ATOMIC_OP_MASK(0x06)
+#define MI_ATOMIC_ADD  MI_ATOMIC_OP_MASK(0x07)
+#define MI_ATOMIC_SUB  MI_ATOMIC_OP_MASK(0x08)
+#define MI_ATOMIC_RSUB MI_ATOMIC_OP_MASK(0x09)
+#define MI_ATOMIC_IMAX MI_ATOMIC_OP_MASK(0x0A)
+#define MI_ATOMIC_IMIN MI_ATOMIC_OP_MASK(0x0B)
+#define MI_ATOMIC_UMAX MI_ATOMIC_OP_MASK(0x0C)
+#define MI_ATOMIC_UMIN MI_ATOMIC_OP_MASK(0x0D)
+
   #define MI_BATCH_BUFFER  MI_INSTR(0x30, 1)
   #define   MI_BATCH_NON_SECURE(1)
   /* for snb/ivb/vlv this also means batch in ppgtt when ppgtt is enabled. */
@@ -451,8 +476,6 @@
   #define MI_CLFLUSH  MI_INSTR(0x27, 0)
   #define MI_REPORT_PERF_COUNTMI_INSTR(0x28, 0)
   #define   MI_REPORT_PERF_COUNT_GGTT (10)
-#define MI_LOAD_REGISTER_MEMMI_INSTR(0x29, 0)
-#define MI_LOAD_REGISTER_REGMI_INSTR(0x2A, 0)
   #define MI_RS_STORE_DATA_IMMMI_INSTR(0x2B, 0)
   #define MI_LOAD_URB_MEM MI_INSTR(0x2C, 0)
   #define MI_STORE_URB_MEMMI_INSTR(0x2D, 0)
@@ -1799,6 +1822,8 @@ enum skl_disp_power_wells {
   #define   GEN8_RC_SEMA_IDLE_MSG_DISABLE  (1  12)
   #define   GEN8_FF_DOP_CLOCK_GATE_DISABLE (110)

+#define GEN8_RS_PREEMPT_STATUS 0x215C
+
   /* Fuse readout registers for GT */
   #define CHV_FUSE_GT  (VLV_DISPLAY_BASE + 0x2168)
   #define   CHV_FGT_DISABLE_SS0(1  10)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 664455c..28198c4 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1215,11 +1215,65 @@ static int gen8_init_perctx_bb(struct intel_engine_cs 
*ring,
   uint32_t *const batch,
   uint32_t *offset)
   {
+   uint32_t scratch_addr;
uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS);

+   /* Actual scratch location is at 128 bytes offset */
+   scratch_addr = ring-scratch.gtt_offset + 2*CACHELINE_BYTES;
+   scratch_addr |= PIPE_CONTROL_GLOBAL_GTT;
+
/* WaDisableCtxRestoreArbitration:bdw,chv */
wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_ENABLE);

+   /*
+* As per Bspec, to workaround a known HW issue, SW must perform the
+* below programming sequence prior to programming MI_BATCH_BUFFER_END.
+*
+* This is only applicable for Gen8

Re: [Intel-gfx] [PATCH] drm/i915/gen9: fix error path in intel_init_workaround_bb

2015-06-23 Thread Siluvery, Arun

On 23/06/2015 15:36, Imre Deak wrote:

On ti, 2015-06-23 at 15:31 +0100, Chris Wilson wrote:

On Tue, Jun 23, 2015 at 05:26:13PM +0300, Imre Deak wrote:

On the GEN!=8 error path we call kmap_atomic() which returns in atomic
context and then lrc_destroy_wa_ctx_obj() which can be called only in
process context. Fix this by preserving the correct cleanup order on
this error path.

Also convert the WARN to DRM_ERROR the stack trace isn't really useful.

Signed-off-by: Imre Deak imre.d...@intel.com
---
  drivers/gpu/drm/i915/intel_lrc.c | 10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 1b50dd7..8bff1a2 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1289,10 +1289,14 @@ static int intel_init_workaround_bb(struct 
intel_engine_cs *ring)
if (ret)
goto out;
} else {
-   WARN(INTEL_INFO(ring-dev)-gen = 8,
-WA batch buffer is not initialized for Gen%d\n,
-INTEL_INFO(ring-dev)-gen);
+   if (INTEL_INFO(ring-dev)-gen = 8)
+   DRM_ERROR(WA batch buffer is not initialized for 
Gen%d\n,
+ INTEL_INFO(ring-dev)-gen);
+


Do this test upfront, then we don't have multiple error paths.
http://paste.debian.net/255769


I didn't bother moving it, I suppose GEN9 support will be added soon
anyway and we get a bit more test coverage on GEN9 meanwhile. But if you
insist I can move it.


Hi Imre,

I sent the following patch with the changes suggested by Chris.
https://patchwork.kernel.org/patch/6661891/
Since you sent it first, my patch can be ignored if your patch is updated.

regards
Arun




-Chris




___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/gen9: fix error path in intel_init_workaround_bb

2015-06-23 Thread Siluvery, Arun

On 23/06/2015 17:01, Chris Wilson wrote:

On Tue, Jun 23, 2015 at 06:58:42PM +0300, Imre Deak wrote:

On ti, 2015-06-23 at 16:44 +0100, Chris Wilson wrote:

On Tue, Jun 23, 2015 at 06:18:21PM +0300, Imre Deak wrote:

On ti, 2015-06-23 at 16:13 +0100, Siluvery, Arun wrote:

On 23/06/2015 15:36, Imre Deak wrote:

On ti, 2015-06-23 at 15:31 +0100, Chris Wilson wrote:

On Tue, Jun 23, 2015 at 05:26:13PM +0300, Imre Deak wrote:

On the GEN!=8 error path we call kmap_atomic() which returns in atomic
context and then lrc_destroy_wa_ctx_obj() which can be called only in
process context. Fix this by preserving the correct cleanup order on
this error path.

Also convert the WARN to DRM_ERROR the stack trace isn't really useful.

Signed-off-by: Imre Deak imre.d...@intel.com
---
   drivers/gpu/drm/i915/intel_lrc.c | 10 +++---
   1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 1b50dd7..8bff1a2 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1289,10 +1289,14 @@ static int intel_init_workaround_bb(struct 
intel_engine_cs *ring)
if (ret)
goto out;
} else {
-   WARN(INTEL_INFO(ring-dev)-gen = 8,
-WA batch buffer is not initialized for Gen%d\n,
-INTEL_INFO(ring-dev)-gen);
+   if (INTEL_INFO(ring-dev)-gen = 8)
+   DRM_ERROR(WA batch buffer is not initialized for 
Gen%d\n,
+ INTEL_INFO(ring-dev)-gen);
+


Do this test upfront, then we don't have multiple error paths.
http://paste.debian.net/255769


I didn't bother moving it, I suppose GEN9 support will be added soon
anyway and we get a bit more test coverage on GEN9 meanwhile. But if you
insist I can move it.


Hi Imre,

I sent the following patch with the changes suggested by Chris.
https://patchwork.kernel.org/patch/6661891/
Since you sent it first, my patch can be ignored if your patch is updated.


I'm fine applying your patch, but I would ask to convert the WARN to
DRM_ERROR. The stack trace doesn't add much to the error message and the
WARN is needlessly verbose now on BXT,SKL..


I presumed Arun choose WARN because we are missing w/a and wanted
someone to step forward and prove the fixes?


Imo it's unnecessarily verbose, during development when loading the
driver I know that things are mostly ok if I can't see any such
backtraces. But no strong opinion, I can also change this locally.


Error message can easily get lost and also it is not an error to not 
apply these WA which is why we also continue. I thought WARN will 
probably get more attention and help in adding missing WA quickly.


regards
Arun



An alternative would be to provide the stub wa_bb emission functions so
that future wa need only start with a plain copy'n'paste.
-Chris



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v6 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

2015-06-22 Thread Siluvery, Arun

On 22/06/2015 16:36, Daniel Vetter wrote:

On Fri, Jun 19, 2015 at 06:50:36PM +0100, Chris Wilson wrote:

On Fri, Jun 19, 2015 at 06:37:10PM +0100, Arun Siluvery wrote:

Some of the WA are to be applied during context save but before restore and
some at the end of context save/restore but before executing the instructions
in the ring, WA batch buffers are created for this purpose and these WA cannot
be applied using normal means. Each context has two registers to load the
offsets of these batch buffers. If they are non-zero, HW understands that it
need to execute these batches.

v1: In this version two separate ring_buffer objects were used to load WA
instructions for indirect and per context batch buffers and they were part
of every context.

v2: Chris suggested to include additional page in context and use it to load
these WA instead of creating separate objects. This will simplify lot of things
as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC
is planning to use a similar setup to share data between GuC and driver and
WA batch buffers can probably share that page. However after discussions with
Dave who is implementing GuC changes, he suggested to use an independent page
for the reasons - GuC area might grow and these WA are initialized only once and
are not changed afterwards so we can share them share across all contexts.

The page is updated with WA during render ring init. This has an advantage of
not adding more special cases to default_context.

We don't know upfront the number of WA we will applying using these batch 
buffers.
For this reason the size was fixed earlier but it is not a good idea. To fix 
this,
the functions that load instructions are modified to report the no of commands
inserted and the size is now calculated after the batch is updated. A macro is
introduced to add commands to these batch buffers which also checks for overflow
and returns error.
We have a full page dedicated for these WA so that should be sufficient for
good number of WA, anything more means we have major issues.
The list for Gen8 is small, same for Gen9 also, maybe few more gets added
going forward but not close to filling entire page. Chris suggested a two-pass
approach but we agreed to go with single page setup as it is a one-off routine
and simpler code wins.

One additional option is offset field which is helpful if we would like to
have multiple batches at different offsets within the page and select them
based on some criteria. This is not a requirement at this point but could
help in future (Dave).

Chris provided some helpful macros and suggestions which further simplified
the code, they will also help in reducing code duplication when WA for
other Gen are added. Add detailed comments explaining restrictions.

(Many thanks to Chris, Dave and Thomas for their reviews and inputs)

Cc: Chris Wilson ch...@chris-wilson.co.uk
Cc: Dave Gordon david.s.gor...@intel.com
Signed-off-by: Rafael Barbalho rafael.barba...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com


Sigh, after all that, I found one minor thing, but nevertheless
Reviewed-by: Chris Wilson ch...@chris-wilson.co.uk


+#define wa_ctx_emit(batch, cmd) {  \
+   if (WARN_ON(index = (PAGE_SIZE / sizeof(uint32_t {  \
+   return -ENOSPC; \
+   }   \
+   batch[index++] = (cmd); \
+   }


We should have wrapped this in do { } while(0) - think of all those
trialing semicolons we have in the code! Fortunately we haven't used
this in a if (foo) wa_ctx_emit(bar); else wa_ctx_emit(baz); yet.


Uh yes, this is a critical one. Arun, can you please do a follow-up patch
to wrap your macro in a do {} while(0) like Chris suggested? I'll apply
the paches meanwhile.


Hi Daniel,

Already sent the updated patch.
I think I got the message-id wrong, the updated patch that I sent is 
showing up as the last message in this series.


regards
Arun



Thanks, Daniel



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v6 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround

2015-06-22 Thread Siluvery, Arun

On 22/06/2015 17:21, Ville Syrjälä wrote:

On Fri, Jun 19, 2015 at 06:37:15PM +0100, Arun Siluvery wrote:

In Per context w/a batch buffer,
WaRsRestoreWithPerCtxtBb

This WA performs writes to scratch page so it must be valid, this check
is performed before initializing the batch with this WA.

v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
so as to not break any future users of existing definitions (Michel)

v3: Length defined in current definitions of LRM, LRR instructions was specified
as 0. It seems it is common convention for instructions whose length vary 
between
platforms. This is not an issue so far because they are not used anywhere except
command parser; now that we use in this patch update them with correct length
and also move them out of command parser placeholder to appropriate place.
remove unnecessary padding and follow the WA programming sequence exactly
as mentioned in spec which is essential for this WA (Dave).

Cc: Chris Wilson ch...@chris-wilson.co.uk
Cc: Dave Gordon david.s.gor...@intel.com
Signed-off-by: Rafael Barbalho rafael.barba...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/i915_reg.h  | 29 +++--
  drivers/gpu/drm/i915/intel_lrc.c | 54 
  2 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 7637e64..208620d 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -347,6 +347,31 @@
  #define   MI_INVALIDATE_BSD   (17)
  #define   MI_FLUSH_DW_USE_GTT (12)
  #define   MI_FLUSH_DW_USE_PPGTT   (02)
+#define MI_LOAD_REGISTER_MEMMI_INSTR(0x29, 1)
+#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2)
+#define   MI_LRM_USE_GLOBAL_GTT (122)
+#define   MI_LRM_ASYNC_MODE_ENABLE (121)
+#define MI_LOAD_REGISTER_REGMI_INSTR(0x2A, 1)
+#define MI_ATOMIC(len) MI_INSTR(0x2F, (len-2))
+#define   MI_ATOMIC_MEMORY_TYPE_GGTT   (122)
+#define   MI_ATOMIC_INLINE_DATA(118)
+#define   MI_ATOMIC_CS_STALL   (117)
+#define   MI_ATOMIC_RETURN_DATA_CTL(116)
+#define MI_ATOMIC_OP_MASK(op)  ((op)  8)
+#define MI_ATOMIC_AND  MI_ATOMIC_OP_MASK(0x01)
+#define MI_ATOMIC_OR   MI_ATOMIC_OP_MASK(0x02)
+#define MI_ATOMIC_XOR  MI_ATOMIC_OP_MASK(0x03)
+#define MI_ATOMIC_MOVE MI_ATOMIC_OP_MASK(0x04)
+#define MI_ATOMIC_INC  MI_ATOMIC_OP_MASK(0x05)
+#define MI_ATOMIC_DEC  MI_ATOMIC_OP_MASK(0x06)
+#define MI_ATOMIC_ADD  MI_ATOMIC_OP_MASK(0x07)
+#define MI_ATOMIC_SUB  MI_ATOMIC_OP_MASK(0x08)
+#define MI_ATOMIC_RSUB MI_ATOMIC_OP_MASK(0x09)
+#define MI_ATOMIC_IMAX MI_ATOMIC_OP_MASK(0x0A)
+#define MI_ATOMIC_IMIN MI_ATOMIC_OP_MASK(0x0B)
+#define MI_ATOMIC_UMAX MI_ATOMIC_OP_MASK(0x0C)
+#define MI_ATOMIC_UMIN MI_ATOMIC_OP_MASK(0x0D)
+
  #define MI_BATCH_BUFFER   MI_INSTR(0x30, 1)
  #define   MI_BATCH_NON_SECURE (1)
  /* for snb/ivb/vlv this also means batch in ppgtt when ppgtt is enabled. */
@@ -451,8 +476,6 @@
  #define MI_CLFLUSH  MI_INSTR(0x27, 0)
  #define MI_REPORT_PERF_COUNTMI_INSTR(0x28, 0)
  #define   MI_REPORT_PERF_COUNT_GGTT (10)
-#define MI_LOAD_REGISTER_MEMMI_INSTR(0x29, 0)
-#define MI_LOAD_REGISTER_REGMI_INSTR(0x2A, 0)
  #define MI_RS_STORE_DATA_IMMMI_INSTR(0x2B, 0)
  #define MI_LOAD_URB_MEM MI_INSTR(0x2C, 0)
  #define MI_STORE_URB_MEMMI_INSTR(0x2D, 0)
@@ -1799,6 +1822,8 @@ enum skl_disp_power_wells {
  #define   GEN8_RC_SEMA_IDLE_MSG_DISABLE   (1  12)
  #define   GEN8_FF_DOP_CLOCK_GATE_DISABLE  (110)

+#define GEN8_RS_PREEMPT_STATUS 0x215C
+
  /* Fuse readout registers for GT */
  #define CHV_FUSE_GT   (VLV_DISPLAY_BASE + 0x2168)
  #define   CHV_FGT_DISABLE_SS0 (1  10)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 664455c..28198c4 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1215,11 +1215,65 @@ static int gen8_init_perctx_bb(struct intel_engine_cs 
*ring,
   uint32_t *const batch,
   uint32_t *offset)
  {
+   uint32_t scratch_addr;
uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS);

+   /* Actual scratch location is at 128 bytes offset */
+   scratch_addr = ring-scratch.gtt_offset + 2*CACHELINE_BYTES;
+   scratch_addr |= PIPE_CONTROL_GLOBAL_GTT;
+
/* WaDisableCtxRestoreArbitration:bdw,chv */
wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_ENABLE);

+   /*
+* As per Bspec, to workaround a known HW issue, SW must perform the
+* below programming sequence prior to programming MI_BATCH_BUFFER_END.
+*
+* This is only applicable for Gen8.
+*/
+
+   /* WaRsRestoreWithPerCtxtBb:bdw,chv */


This w/a doesn't seem to be 

Re: [Intel-gfx] [PATCH v6 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

2015-06-22 Thread Siluvery, Arun

On 22/06/2015 16:41, Daniel Vetter wrote:

On Fri, Jun 19, 2015 at 07:07:01PM +0100, Arun Siluvery wrote:

Some of the WA are to be applied during context save but before restore and
some at the end of context save/restore but before executing the instructions
in the ring, WA batch buffers are created for this purpose and these WA cannot
be applied using normal means. Each context has two registers to load the
offsets of these batch buffers. If they are non-zero, HW understands that it
need to execute these batches.

v1: In this version two separate ring_buffer objects were used to load WA
instructions for indirect and per context batch buffers and they were part
of every context.

v2: Chris suggested to include additional page in context and use it to load
these WA instead of creating separate objects. This will simplify lot of things
as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC
is planning to use a similar setup to share data between GuC and driver and
WA batch buffers can probably share that page. However after discussions with
Dave who is implementing GuC changes, he suggested to use an independent page
for the reasons - GuC area might grow and these WA are initialized only once and
are not changed afterwards so we can share them share across all contexts.

The page is updated with WA during render ring init. This has an advantage of
not adding more special cases to default_context.

We don't know upfront the number of WA we will applying using these batch 
buffers.
For this reason the size was fixed earlier but it is not a good idea. To fix 
this,
the functions that load instructions are modified to report the no of commands
inserted and the size is now calculated after the batch is updated. A macro is
introduced to add commands to these batch buffers which also checks for overflow
and returns error.
We have a full page dedicated for these WA so that should be sufficient for
good number of WA, anything more means we have major issues.
The list for Gen8 is small, same for Gen9 also, maybe few more gets added
going forward but not close to filling entire page. Chris suggested a two-pass
approach but we agreed to go with single page setup as it is a one-off routine
and simpler code wins.

One additional option is offset field which is helpful if we would like to
have multiple batches at different offsets within the page and select them
based on some criteria. This is not a requirement at this point but could
help in future (Dave).

Chris provided some helpful macros and suggestions which further simplified
the code, they will also help in reducing code duplication when WA for
other Gen are added. Add detailed comments explaining restrictions.
Use do {} while(0) for wa_ctx_emit() macro.

(Many thanks to Chris, Dave and Thomas for their reviews and inputs)

Cc: Chris Wilson ch...@chris-wilson.co.uk
Cc: Dave Gordon david.s.gor...@intel.com
Signed-off-by: Rafael Barbalho rafael.barba...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
Reviewed-by: Chris Wilson ch...@chris-wilson.co.uk


Why did you resend this one - I don't spot any updates in the commit
message? Also when resending please in-reply to the corresponding previous
version of that patch, not the cover letter of the series.


Hi Daniel,

This is the updated patch with do {} while (0).
I picked a different message-id of cover letter by mistake which is why 
it is replied to cover letter instead of the corresponding patch.


regards
Arun


-Daniel


---
  drivers/gpu/drm/i915/intel_lrc.c| 223 +++-
  drivers/gpu/drm/i915/intel_ringbuffer.h |  21 +++
  2 files changed, 240 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0413b8f..0585298 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -211,6 +211,7 @@ enum {
FAULT_AND_CONTINUE /* Unsupported */
  };
  #define GEN8_CTX_ID_SHIFT 32
+#define CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT  0x17

  static int intel_lr_context_pin(struct intel_engine_cs *ring,
struct intel_context *ctx);
@@ -1077,6 +1078,191 @@ static int intel_logical_ring_workarounds_emit(struct 
intel_engine_cs *ring,
return 0;
  }

+#define wa_ctx_emit(batch, cmd)
\
+   do {\
+   if (WARN_ON(index = (PAGE_SIZE / sizeof(uint32_t {  \
+   return -ENOSPC; \
+   }   \
+   batch[index++] = (cmd); \
+   } while (0)
+
+static inline uint32_t wa_ctx_start(struct i915_wa_ctx_bb *wa_ctx,
+   uint32_t offset,
+   uint32_t start_alignment)
+{
+   return wa_ctx-offset = 

Re: [Intel-gfx] [PATCH v6 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround

2015-06-22 Thread Siluvery, Arun

On 19/06/2015 19:09, Chris Wilson wrote:

On Fri, Jun 19, 2015 at 06:37:14PM +0100, Arun Siluvery wrote:

In Indirect context w/a batch buffer,
WaClearSlmSpaceAtContextSwitch

This WA performs writes to scratch page so it must be valid, this check
is performed before initializing the batch with this WA.

v2: s/PIPE_CONTROL_FLUSH_RO_CACHES/PIPE_CONTROL_FLUSH_L3 (Ville)

Cc: Chris Wilson ch...@chris-wilson.co.uk
Cc: Dave Gordon david.s.gor...@intel.com
Signed-off-by: Rafael Barbalho rafael.barba...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/i915_reg.h  |  1 +
  drivers/gpu/drm/i915/intel_lrc.c | 16 
  2 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index d14ad20..7637e64 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -410,6 +410,7 @@
  #define   DISPLAY_PLANE_A   (020)
  #define   DISPLAY_PLANE_B   (120)
  #define GFX_OP_PIPE_CONTROL(len)  ((0x329)|(0x327)|(0x224)|(len-2))
+#define   PIPE_CONTROL_FLUSH_L3(127)
  #define   PIPE_CONTROL_GLOBAL_GTT_IVB (124) /* gen7+ */
  #define   PIPE_CONTROL_MMIO_WRITE (123)
  #define   PIPE_CONTROL_STORE_DATA_INDEX   (121)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3e7aaa9..664455c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1137,6 +1137,7 @@ static int gen8_init_indirectctx_bb(struct 
intel_engine_cs *ring,
uint32_t *const batch,
uint32_t *offset)
  {
+   uint32_t scratch_addr;
uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS);

/* WaDisableCtxRestoreArbitration:bdw,chv */
@@ -1165,6 +1166,21 @@ static int gen8_init_indirectctx_bb(struct 
intel_engine_cs *ring,
wa_ctx_emit(batch, l3sqc4_flush  
~GEN8_LQSC_FLUSH_COHERENT_LINES);
}

+   /* WaClearSlmSpaceAtContextSwitch:bdw,chv */
+   /* Actual scratch location is at 128 bytes offset */
+   scratch_addr = ring-scratch.gtt_offset + 2*CACHELINE_BYTES;
+   scratch_addr |= PIPE_CONTROL_GLOBAL_GTT;


I thought this bit was now mbz - that's how we treat it elsewhere e.g.
gen8_emit_flush_render, and that the address has to be naturally aligned
for the target write. (Similar bit in patch 6 fwiw.)

you are correct, this bit is mbz.

Daniel, could you please remove this line when applying patches?
sorry for additional work.

 +  scratch_addr |= PIPE_CONTROL_GLOBAL_GTT;

regards
Arun


-Chris



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v6 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround

2015-06-22 Thread Siluvery, Arun

On 19/06/2015 18:37, Arun Siluvery wrote:

In Per context w/a batch buffer,
WaRsRestoreWithPerCtxtBb

This WA performs writes to scratch page so it must be valid, this check
is performed before initializing the batch with this WA.

v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
so as to not break any future users of existing definitions (Michel)

v3: Length defined in current definitions of LRM, LRR instructions was specified
as 0. It seems it is common convention for instructions whose length vary 
between
platforms. This is not an issue so far because they are not used anywhere except
command parser; now that we use in this patch update them with correct length
and also move them out of command parser placeholder to appropriate place.
remove unnecessary padding and follow the WA programming sequence exactly
as mentioned in spec which is essential for this WA (Dave).

Cc: Chris Wilson ch...@chris-wilson.co.uk
Cc: Dave Gordon david.s.gor...@intel.com
Signed-off-by: Rafael Barbalho rafael.barba...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/i915_reg.h  | 29 +++--
  drivers/gpu/drm/i915/intel_lrc.c | 54 
  2 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 7637e64..208620d 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -347,6 +347,31 @@
  #define   MI_INVALIDATE_BSD   (17)
  #define   MI_FLUSH_DW_USE_GTT (12)
  #define   MI_FLUSH_DW_USE_PPGTT   (02)
+#define MI_LOAD_REGISTER_MEMMI_INSTR(0x29, 1)
+#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2)
+#define   MI_LRM_USE_GLOBAL_GTT (122)
+#define   MI_LRM_ASYNC_MODE_ENABLE (121)
+#define MI_LOAD_REGISTER_REGMI_INSTR(0x2A, 1)
+#define MI_ATOMIC(len) MI_INSTR(0x2F, (len-2))
+#define   MI_ATOMIC_MEMORY_TYPE_GGTT   (122)
+#define   MI_ATOMIC_INLINE_DATA(118)
+#define   MI_ATOMIC_CS_STALL   (117)
+#define   MI_ATOMIC_RETURN_DATA_CTL(116)
+#define MI_ATOMIC_OP_MASK(op)  ((op)  8)
+#define MI_ATOMIC_AND  MI_ATOMIC_OP_MASK(0x01)
+#define MI_ATOMIC_OR   MI_ATOMIC_OP_MASK(0x02)
+#define MI_ATOMIC_XOR  MI_ATOMIC_OP_MASK(0x03)
+#define MI_ATOMIC_MOVE MI_ATOMIC_OP_MASK(0x04)
+#define MI_ATOMIC_INC  MI_ATOMIC_OP_MASK(0x05)
+#define MI_ATOMIC_DEC  MI_ATOMIC_OP_MASK(0x06)
+#define MI_ATOMIC_ADD  MI_ATOMIC_OP_MASK(0x07)
+#define MI_ATOMIC_SUB  MI_ATOMIC_OP_MASK(0x08)
+#define MI_ATOMIC_RSUB MI_ATOMIC_OP_MASK(0x09)
+#define MI_ATOMIC_IMAX MI_ATOMIC_OP_MASK(0x0A)
+#define MI_ATOMIC_IMIN MI_ATOMIC_OP_MASK(0x0B)
+#define MI_ATOMIC_UMAX MI_ATOMIC_OP_MASK(0x0C)
+#define MI_ATOMIC_UMIN MI_ATOMIC_OP_MASK(0x0D)
+
  #define MI_BATCH_BUFFER   MI_INSTR(0x30, 1)
  #define   MI_BATCH_NON_SECURE (1)
  /* for snb/ivb/vlv this also means batch in ppgtt when ppgtt is enabled. */
@@ -451,8 +476,6 @@
  #define MI_CLFLUSH  MI_INSTR(0x27, 0)
  #define MI_REPORT_PERF_COUNTMI_INSTR(0x28, 0)
  #define   MI_REPORT_PERF_COUNT_GGTT (10)
-#define MI_LOAD_REGISTER_MEMMI_INSTR(0x29, 0)
-#define MI_LOAD_REGISTER_REGMI_INSTR(0x2A, 0)
  #define MI_RS_STORE_DATA_IMMMI_INSTR(0x2B, 0)
  #define MI_LOAD_URB_MEM MI_INSTR(0x2C, 0)
  #define MI_STORE_URB_MEMMI_INSTR(0x2D, 0)
@@ -1799,6 +1822,8 @@ enum skl_disp_power_wells {
  #define   GEN8_RC_SEMA_IDLE_MSG_DISABLE   (1  12)
  #define   GEN8_FF_DOP_CLOCK_GATE_DISABLE  (110)

+#define GEN8_RS_PREEMPT_STATUS 0x215C
+
  /* Fuse readout registers for GT */
  #define CHV_FUSE_GT   (VLV_DISPLAY_BASE + 0x2168)
  #define   CHV_FGT_DISABLE_SS0 (1  10)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 664455c..28198c4 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1215,11 +1215,65 @@ static int gen8_init_perctx_bb(struct intel_engine_cs 
*ring,
   uint32_t *const batch,
   uint32_t *offset)
  {
+   uint32_t scratch_addr;
uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS);

+   /* Actual scratch location is at 128 bytes offset */
+   scratch_addr = ring-scratch.gtt_offset + 2*CACHELINE_BYTES;
+   scratch_addr |= PIPE_CONTROL_GLOBAL_GTT;
+


Daniel, could you please remove this line when applying this patch?
sorry for additional work.

 +  scratch_addr |= PIPE_CONTROL_GLOBAL_GTT;

regards
Arun


/* WaDisableCtxRestoreArbitration:bdw,chv */
wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_ENABLE);

+   /*
+* As per Bspec, to workaround a known HW issue, SW must perform the
+* below programming sequence prior to programming MI_BATCH_BUFFER_END.
+*
+* This is only applicable for Gen8.

Re: [Intel-gfx] [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

2015-06-19 Thread Siluvery, Arun

On 19/06/2015 10:27, Chris Wilson wrote:

On Thu, Jun 18, 2015 at 06:33:24PM +0100, Arun Siluvery wrote:
Totally minor worries now.


+/**
+ * gen8_init_indirectctx_bb() - initialize indirect ctx batch with WA
+ *
+ * @ring: only applicable for RCS
+ * @wa_ctx_batch: page in which WA are loaded
+ * @offset: This is for future use in case if we would like to have multiple
+ *  batches at different offsets and select them based on a criteria.
+ * @num_dwords: The number of WA applied are known at the beginning, it returns
+ * the no of DWORDS written. This batch does not contain MI_BATCH_BUFFER_END
+ * so it adds padding to make it cacheline aligned. MI_BATCH_BUFFER_END will be
+ * added to perctx batch and both of them together makes a complete batch 
buffer.
+ *
+ * Return: non-zero if we exceed the PAGE_SIZE limit.
+ */
+
+static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring,
+   uint32_t **wa_ctx_batch,
+   uint32_t offset,
+   uint32_t *num_dwords)
+{
+   uint32_t index;
+   uint32_t *batch = *wa_ctx_batch;
+
+   index = offset;


I worry that offset need not be cacheline aligned on entry (for example
if indirectctx and perctx were switched, or someone else stuffed more
controls into the per-ring object). Like perctx, there is no mention of
any alignment worries for the starting location, but here you tell use
that the INDIRECT_CTX length is specified in cacheline, so I also presume
the start needs to be aligned.


offset need to be cachealigned, I will update the comments.


+static int lrc_setup_wa_ctx_obj(struct intel_engine_cs *ring, u32 size)
+{
+   int ret;
+
+   WARN_ON(ring-id != RCS);
+
+   ring-wa_ctx.obj = i915_gem_alloc_object(ring-dev, PAGE_ALIGN(size));
+   if (!ring-wa_ctx.obj) {
+   DRM_DEBUG_DRIVER(alloc LRC WA ctx backing obj failed.\n);
+   return -ENOMEM;
+   }
+
+   ret = i915_gem_obj_ggtt_pin(ring-wa_ctx.obj, PAGE_SIZE, 0);


One day _pin() will return the vma being pinned and I will rejoice as it
makes reviewing pinning much easier! Not a problem for you right now.


+static int intel_init_workaround_bb(struct intel_engine_cs *ring)
+{
+   int ret = 0;
+   uint32_t *batch;
+   uint32_t num_dwords;
+   struct page *page;
+   struct i915_ctx_workarounds *wa_ctx = ring-wa_ctx;
+
+   WARN_ON(ring-id != RCS);
+
+   if (ring-scratch.obj == NULL) {
+   DRM_ERROR(scratch page not allocated for %s\n, ring-name);
+   return -EINVAL;
+   }


I haven't found the dependence upon scratch.obj, could you explain it?
Does it appear later?


yes it does in patch 2 which rearranges init_pipe_control().
I will move this check to that patch as per your comment.


@@ -1754,15 +1934,26 @@ populate_lr_context(struct intel_context *ctx, struct 
drm_i915_gem_object *ctx_o
reg_state[CTX_SECOND_BB_STATE] = ring-mmio_base + 0x118;
reg_state[CTX_SECOND_BB_STATE+1] = 0;
if (ring-id == RCS) {
-   /* TODO: according to BSpec, the register state context
-* for CHV does not have these. OTOH, these registers do
-* exist in CHV. I'm waiting for a clarification */
reg_state[CTX_BB_PER_CTX_PTR] = ring-mmio_base + 0x1c0;
reg_state[CTX_BB_PER_CTX_PTR+1] = 0;
reg_state[CTX_RCS_INDIRECT_CTX] = ring-mmio_base + 0x1c4;
reg_state[CTX_RCS_INDIRECT_CTX+1] = 0;
reg_state[CTX_RCS_INDIRECT_CTX_OFFSET] = ring-mmio_base + 
0x1c8;
reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] = 0;
+   if (ring-wa_ctx.obj) {
+   reg_state[CTX_RCS_INDIRECT_CTX+1] =
+   (i915_gem_obj_ggtt_offset(ring-wa_ctx.obj) +
+ring-wa_ctx.indctx_batch_offset * 
sizeof(uint32_t)) |
+   (ring-wa_ctx.indctx_batch_size / 
CACHELINE_DWORDS);


Ok, this really does impose alignment conditions on indctx_batch_offset

Oh, if I do get a chance to complain, spell out indirect_ctx, make it a
struct or even just precalculate the reg value, just indctx's only value
is that is the same length as perctx, but otherwise quite obtuse.

variable names were getting too long and caused difficulties in 
indentation so tried to shorten them, will change this part.


regards
Arun


Other than that, I couldn't punch any holes in its robustness, and the
series is starting to look quite good and very neat.
-Chris



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Initialize HWS page address after GPU reset

2015-06-18 Thread Siluvery, Arun

On 15/06/2015 06:20, Daniel Vetter wrote:

On Wed, Jun 3, 2015 at 6:14 PM, Ville Syrjälä
ville.syrj...@linux.intel.com wrote:

I was going to suggest removing the same thing from the
lrc_setup_hardware_status_page(), but after another look it seems we
sometimes call .init_hw() before the context setup. Would be nice to
have a more consistent sequence for init and reset. But anyway the patch
looks OK to me. I verified that we indeed lose this register on GPU
reset.


Yep, this is a mess. And historically _any_ difference between driver
load and gpu reset (or resume fwiw) has lead to hilarious bugs, so
this difference is really troubling to me. Arun, can you please work
on a patch to unify the setup sequence here, so that both driver load
gpu resets work the same way? By the time we're calling gem_init_hw
the default context should have been created already, and hence we
should be able to write to HWS_PGA in ring-init_hw only.



Hi Daniel,

I think the problem in this case was the code to init HWS page after 
reset was missing for Gen8+. For Gen7 we are doing this as part of 
ring-init_hw.


Gen7:
i915_reset()
+-- i915_gem_init_hw()
+-- ring-init_hw() which is init_render_ring()
+-- init_ring_common()
+ intel_ring_setup_status_page()

Gen8:
i915_reset()
+-- i915_gem_init_hw()
+-- ring-init_hw() which is gen8_init_render_ring()
+ gen8_init_common_ring() - I added changes in this function.

We could probably use intel_ring_setup_status_page() for both cases, 
does it have to be Gen7 specific?



Also I wonder about resume, where's the HWS_PGA restore for that case?

It is covered.

i915_drm_resume()
+--i915_gem_init_hw

regards
Arun


-Daniel


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v4 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

2015-06-17 Thread Siluvery, Arun

On 17/06/2015 19:48, Siluvery, Arun wrote:

On 16/06/2015 21:25, Chris Wilson wrote:

On Tue, Jun 16, 2015 at 08:25:20PM +0100, Arun Siluvery wrote:

+static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring,
+   uint32_t offset,
+   uint32_t *num_dwords)
+{
+   uint32_t index;
+   struct page *page;
+   uint32_t *cmd;
+
+   page = i915_gem_object_get_page(ring-wa_ctx.obj, 0);
+   cmd = kmap_atomic(page);
+
+   index = offset;
+
+   /* FIXME: fill one cacheline with NOOPs.
+* Replace these instructions with WA
+*/
+   while (index  (offset + 16))
+   cmd[index++] = MI_NOOP;
+
+   /*
+* MI_BATCH_BUFFER_END is not required in Indirect ctx BB because
+* execution depends on the length specified in terms of cache lines
+* in the register CTX_RCS_INDIRECT_CTX
+*/
+
+   kunmap_atomic(cmd);
+
+   if (index  (PAGE_SIZE / sizeof(uint32_t)))
+   return -EINVAL;


Check before you GPF!

You just overran the buffer and corrupted memory, if you didn't succeed
in trapping a segfault.

To be generic, align to the cacheline then check you have enough room
for your own data.
-Chris


Hi Chris,

The placement of condition is not correct. I don't completely follow
your suggestion, could you please elaborate; here we don't know upfront
how much more data to be written.
I have made below changes to check after writing every command and
return error as soon as we reach the end.

#define wa_ctx_emit(batch, cmd) {   \
 if (WARN_ON(index = (PAGE_SIZE / sizeof(uint32_t { \
  kunmap_atomic(batch);  \
  return -ENOSPC;\
  }  \
  batch[index++] = (cmd);\
  }
is this acceptable?
I think this is the only one issue, all other comments are addressed.

one other improvement is possible - mapping/unmapping page can be kept 
in common path, will update the patch accordingly.


regards
Arun


regards
Arun

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v4 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

2015-06-17 Thread Siluvery, Arun

On 16/06/2015 21:25, Chris Wilson wrote:

On Tue, Jun 16, 2015 at 08:25:20PM +0100, Arun Siluvery wrote:

+static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring,
+   uint32_t offset,
+   uint32_t *num_dwords)
+{
+   uint32_t index;
+   struct page *page;
+   uint32_t *cmd;
+
+   page = i915_gem_object_get_page(ring-wa_ctx.obj, 0);
+   cmd = kmap_atomic(page);
+
+   index = offset;
+
+   /* FIXME: fill one cacheline with NOOPs.
+* Replace these instructions with WA
+*/
+   while (index  (offset + 16))
+   cmd[index++] = MI_NOOP;
+
+   /*
+* MI_BATCH_BUFFER_END is not required in Indirect ctx BB because
+* execution depends on the length specified in terms of cache lines
+* in the register CTX_RCS_INDIRECT_CTX
+*/
+
+   kunmap_atomic(cmd);
+
+   if (index  (PAGE_SIZE / sizeof(uint32_t)))
+   return -EINVAL;


Check before you GPF!

You just overran the buffer and corrupted memory, if you didn't succeed
in trapping a segfault.

To be generic, align to the cacheline then check you have enough room
for your own data.
-Chris


Hi Chris,

The placement of condition is not correct. I don't completely follow 
your suggestion, could you please elaborate; here we don't know upfront 
how much more data to be written.
I have made below changes to check after writing every command and 
return error as soon as we reach the end.


#define wa_ctx_emit(batch, cmd) {   \
   if (WARN_ON(index = (PAGE_SIZE / sizeof(uint32_t { \
kunmap_atomic(batch);  \
return -ENOSPC;\
}  \
batch[index++] = (cmd);\
}
is this acceptable?
I think this is the only one issue, all other comments are addressed.

regards
Arun

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v4 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

2015-06-17 Thread Siluvery, Arun

On 17/06/2015 21:21, Chris Wilson wrote:

On Wed, Jun 17, 2015 at 07:48:16PM +0100, Siluvery, Arun wrote:

On 16/06/2015 21:25, Chris Wilson wrote:

On Tue, Jun 16, 2015 at 08:25:20PM +0100, Arun Siluvery wrote:

+static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring,
+   uint32_t offset,
+   uint32_t *num_dwords)
+{
+   uint32_t index;
+   struct page *page;
+   uint32_t *cmd;
+
+   page = i915_gem_object_get_page(ring-wa_ctx.obj, 0);
+   cmd = kmap_atomic(page);
+
+   index = offset;
+
+   /* FIXME: fill one cacheline with NOOPs.
+* Replace these instructions with WA
+*/
+   while (index  (offset + 16))
+   cmd[index++] = MI_NOOP;
+
+   /*
+* MI_BATCH_BUFFER_END is not required in Indirect ctx BB because
+* execution depends on the length specified in terms of cache lines
+* in the register CTX_RCS_INDIRECT_CTX
+*/
+
+   kunmap_atomic(cmd);
+
+   if (index  (PAGE_SIZE / sizeof(uint32_t)))
+   return -EINVAL;


Check before you GPF!

You just overran the buffer and corrupted memory, if you didn't succeed
in trapping a segfault.

To be generic, align to the cacheline then check you have enough room
for your own data.
-Chris


Hi Chris,

The placement of condition is not correct. I don't completely follow
your suggestion, could you please elaborate; here we don't know
upfront how much more data to be written.


Hmm, are we anticipating an unbounded number of workarounds? At some
point you have to have a rough upper bound in order to do the bo
allocation. If we are really unsure, then we do need to split this into
two passes, one to count the number of dwords and the second to allocate
and actually fill the cmd[].

Since we have a full page dedicated for this, that should be sufficient 
for good number of WA; if we need more than one page means we have major 
issues.
The list for Gen8 is small, same for Gen9 also, maybe few more gets 
added going forward but not close to filling entire page. Some of them 
will even be restricted to specific steppings/revisions. For these 
reasons I think a single page setup is sufficient.
Do you anticipate any other use cases that require allocating more than 
one page?


Two pass approach can be implemented but it adds unnecessary complexity 
which may not be required in this case. please let me know your thoughts.



I have made below changes to check after writing every command and
return error as soon as we reach the end.

#define wa_ctx_emit(batch, cmd) {   \
if (WARN_ON(index = (PAGE_SIZE / sizeof(uint32_t { \
 kunmap_atomic(batch);  \
 return -ENOSPC;\
 }  \
 batch[index++] = (cmd);\
 }
is this acceptable?
I think this is the only one issue, all other comments are addressed.


It's the lesser of evils for sure. Still feel dubious that we don't know
upfront how much data we need to allocate.

yes, but with single pass approach do you see any way it can be improved?

regards
Arun


-Chris



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v4 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

2015-06-17 Thread Siluvery, Arun

On 16/06/2015 21:33, Chris Wilson wrote:

On Tue, Jun 16, 2015 at 08:25:20PM +0100, Arun Siluvery wrote:

+static int lrc_setup_wa_ctx_obj(struct intel_engine_cs *ring, u32 size)
+{
+   int ret;
+   struct drm_device *dev = ring-dev;


You only use it once, keeping it as a local seems counter-intuitive.


+   WARN_ON(ring-id != RCS);
+
+   size = roundup(size, PAGE_SIZE);


Out of curiousity is gcc smart enough to turn this into an ALIGN()?


replaced with PAGE_ALIGN(size)




+   ring-wa_ctx.obj = i915_gem_alloc_object(dev, size);
+   if (!ring-wa_ctx.obj) {
+   DRM_DEBUG_DRIVER(alloc LRC WA ctx backing obj failed.\n);
+   return -ENOMEM;
+   }
+
+   ret = i915_gem_obj_ggtt_pin(ring-wa_ctx.obj, GEN8_LR_CONTEXT_ALIGN, 0);


Strange choice of alignment since we pass around cacheline offsets.

this is from the initial version where it was part of context, sorry 
missed this, replaced with PAGE_SIZE.



+   if (ret) {
+   DRM_DEBUG_DRIVER(pin LRC WA ctx backing obj failed: %d\n,
+ret);
+   drm_gem_object_unreference(ring-wa_ctx.obj-base);
+   return ret;
+   }
+
+   return 0;
+}
+
+static void lrc_destroy_wa_ctx_obj(struct intel_engine_cs *ring)
+{
+   WARN_ON(ring-id != RCS);
+
+   i915_gem_object_ggtt_unpin(ring-wa_ctx.obj);
+   drm_gem_object_unreference(ring-wa_ctx.obj-base);
+   ring-wa_ctx.obj = NULL;
+}
+
  /**
   * intel_logical_ring_cleanup() - deallocate the Engine Command Streamer
   *
@@ -1474,7 +1612,29 @@ static int logical_render_ring_init(struct drm_device 
*dev)
if (ret)
return ret;

-   return intel_init_pipe_control(ring);
+   if (INTEL_INFO(ring-dev)-gen = 8) {
+   ret = lrc_setup_wa_ctx_obj(ring, PAGE_SIZE);
+   if (ret) {
+   DRM_DEBUG_DRIVER(Failed to setup context WA page: 
%d\n,
+ret);
+   return ret;
+   }
+
+   ret = intel_init_workaround_bb(ring);
+   if (ret) {
+   lrc_destroy_wa_ctx_obj(ring);
+   DRM_ERROR(WA batch buffers are not initialized: %d\n,
+ ret);
+   }
+   }
+
+   ret = intel_init_pipe_control(ring);


Did you consider stuffing it into the spare are of the pipe control
scatch bo? :)
Not exactly but I think it is better to keep them separate. It is not 
that a single page is not sufficient even if we add more WA in future 
but for logical reasons. In case if there is any error while 
initializing these WA we are destroying the page and continuing further 
which cannot be done with scratch page.


regards
Arun



-Chris



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 1/7] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

2015-06-15 Thread Siluvery, Arun

On 15/06/2015 11:41, Daniel Vetter wrote:

On Thu, Jun 04, 2015 at 03:30:56PM +0100, Siluvery, Arun wrote:

On 02/06/2015 19:47, Dave Gordon wrote:

On 02/06/15 19:36, Siluvery, Arun wrote:

On 01/06/2015 11:22, Daniel, Thomas wrote:


Indeed, allocating an extra scratch page in the context would simplify
vma/mm management. A trick might be to allocate the scratch page at the
start, then offset the lrc regs etc - that would then be consistent
amongst gen and be easy enough to extend if we need more per-context
scratch space in future.
-Chris


Yes, I think we already have another use for more per-context space at
the start.  The GuC is planning to do this.  Arun, you probably should
work with Alex Dai and Dave Gordon to avoid conflicts here.

Thomas.



Thanks for the heads-up Thomas.
I have discussed with Dave and agreed to share this page;
GuC probably doesn't need whole page so first half is reserved for it's
use and second half is used for WA.

I have modified my patches to use context page for applying these WA and
don't see any issues.

During the discussions Dave proposed another approach. Even though these
WA are called per context they are only initialized once and not changed
afterwards, same set of WA are applied for each context so instead of
adding them in each context, does it make sense to create a separate
page and share across all contexts? but of course GuC will anyway add a
new page to context so I might as well share that page.

Chris/Dave, do you see any problems with sharing page with GuC or you
prefer to allocate a separate page for these WA and share across all
contexts? Please give your comments.

regards
Arun


I think we have to consider which is more future-proof i.e. which is
least likely:
(1) the area shared with the GuC grows (definitely still in flux), or
(2) workarounds need to be context-specific (possible, but unlikely)

So I'd prefer a single area set up just once to contain the pre- and
post-context restore workaround batches. If necessary, the one area
could contain multiple batches at different offsets, so we could point
different contexts at different (shared) batches as required. I think
they're unlikely to actually need per-context customisation[*], but
there might be a need for different workarounds according to workload
type or privilege level or some other criterion ... ?

.Dave.

[*] unless they need per-context memory addresses coded into them?



Considering these WA are initialized only once and not changed afterwards
and GuC area probably grows in future which may run into the space used by
WA, independent single area setup makes senses.
I also checked spec and it is not clear whether any customization is going
to be required for different contexts.
I have modified patches to setup a single page with WA when default_context
is initialized and this is used by all contexts.

I will send patches but please let me know if there are any other comments.


Yeah if the wa batches aren't ctx specific, then there's really no need to
allocate one of them per ctx. One global buffer with all the wa combined
should really be all we need.
-Daniel


Hi Daniel,

Agree, this is already taken into account in the next revision v3 
(already sent to mailing list).


I can see you are still going through the list but when you get there, 
please let me know if you have any other comments.


regards
Arun

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround

2015-06-15 Thread Siluvery, Arun

On 12/06/2015 18:03, Dave Gordon wrote:

On 12/06/15 12:58, Siluvery, Arun wrote:

On 09/06/2015 19:43, Dave Gordon wrote:

On 05/06/15 14:57, Arun Siluvery wrote:

In Per context w/a batch buffer,
WaRsRestoreWithPerCtxtBb

v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
so as to not break any future users of existing definitions (Michel)

Signed-off-by: Rafael Barbalho rafael.barba...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
   drivers/gpu/drm/i915/i915_reg.h  | 26 ++
   drivers/gpu/drm/i915/intel_lrc.c | 59

   2 files changed, 85 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h
b/drivers/gpu/drm/i915/i915_reg.h
index 33b0ff1..6928162 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h

[snip]

   #define MI_LOAD_REGISTER_MEMMI_INSTR(0x29, 0)
   #define MI_LOAD_REGISTER_REGMI_INSTR(0x2A, 0)
+#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2)
+#define   MI_LRM_USE_GLOBAL_GTT (122)
+#define   MI_LRM_ASYNC_MODE_ENABLE (121)
+#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1)


Isn't the original definition of MI_LOAD_REGISTER_REG wrong anyway? It's
a two-operand instruction, each of which is a one-word MMIO register
address, hence always 3 words total. The length bias is 2, so the
so-called 'flags' field must be 1. The original definition (where the
second argument of the MI_INSTR macro is 0) shouldn't work.

So just correct the original definition of MI_LOAD_REGISTER_REG; this
isn't something that's actually changed on GEN8.


I did notice that the original instructions are odd but thought I might
be wrong hence I created new ones to not disturb the original ones.
ok I will just correct original one and reuse it.


While we're mentioning it, I think the above MI_LOAD_REGISTER_MEM is
wrong too. The length should be 1 pre-GEN8, and 2 in GEN8+.


ok.

   #define MI_RS_STORE_DATA_IMMMI_INSTR(0x2B, 0)
   #define MI_LOAD_URB_MEM MI_INSTR(0x2C, 0)
   #define MI_STORE_URB_MEMMI_INSTR(0x2D, 0)


And these are wrong too! In fact all of these instructions have been
added under a comment which says Commands used only by the command
parser. Looks like they were added as placeholders without the proper
length fields, and then people have started using them as though they
were complete definitions :(

Time update them all, perhaps ...

these are not related to this patch, so it can be taken up as a
different patch.


As a minimum, you should move these updated #defines out of the section
under the comment Commands used only by the command parser and put
them in the appropriate place in the regular list of MI_ commnds,
preferably in numerical order. Then the ones that are genuinely only
used by the command parser could be left for another patch ...


[snip]


+/*
+ * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and
+ * MI_BATCH_BUFFER_END instructions in this sequence need to be
+ * in the same cacheline.
+ */
+while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0)
+cmd[index++] = MI_NOOP;
+
+cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 |
+MI_LRM_USE_GLOBAL_GTT |
+MI_LRM_ASYNC_MODE_ENABLE;
+cmd[index++] = INSTPM;
+cmd[index++] = scratch_addr;
+cmd[index++] = 0;
+
+/*
+ * BSpec says there should not be any commands programmed
+ * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so
+ * do not add any new commands
+ */
+cmd[index++] = MI_LOAD_REGISTER_REG_GEN8;
+cmd[index++] = GEN8_RS_PREEMPT_STATUS;
+cmd[index++] = GEN8_RS_PREEMPT_STATUS;
+
   /* padding */
   while (index  end)
   cmd[index++] = MI_NOOP;



Where's the MI_BATCH_BUFFER_END referred to in the comment?


MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6].
Since the diff context is only few lines it didn't showup in the diff.


The second comment above says no commands between LOAD_REG_REG and
BB_END, so the point of my comment was that the BB_END is *NOT*
immediately after the LOAD_REG_REG -- there are a bunch of no-ops there!
true, but they are no-ops so they shouldn't really affect anything. I 
guess the spec means no valid commands.




And therefore also, these instructions do *not* all end up in the same
cacheline, thus contradicting the first comment above.
I don't understand why. As per the requirement the commands from the 
first MI_LOAD_REGISTER_MEM_GEN8 (after while) through BB_END will be 
part of same cacheline (in this case second line).




Padding *after* a BB_END would be redundant.


yes, I just wanted to keep MI_BATCH_BUFFER_END at the end instead of 
abruptly terminating the batch which is why I am padding with no-ops, I 
can change this if that is preferred.


.Dave.




___
Intel-gfx mailing list

Re: [Intel-gfx] [PATCH v3 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

2015-06-15 Thread Siluvery, Arun

On 15/06/2015 16:22, Daniel Vetter wrote:

On Fri, Jun 05, 2015 at 12:00:54PM +0100, Chris Wilson wrote:

On Fri, Jun 05, 2015 at 11:34:01AM +0100, Arun Siluvery wrote:

+   /* FIXME: fill unused locations with NOOPs.
+* Replace these instructions with WA
+*/
+while (index  end)
+   reg_state[index++] = MI_NOOP;


I found calling it reg_state was very confusing. Maybe batch, bb, data or cmd?


Concurred, reg_state sounds like an mmio dump not a batchbuffer. wa_batch
would be my naming bikeshed, but I'd go with either. If this is all that
needs changing I can do that while applying patches.
-Daniel


I have already changed this to cmd.
There are some more comments from Dave which I am addressing now, I will 
send them soon.


regards
Arun

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround

2015-06-15 Thread Siluvery, Arun

On 15/06/2015 18:29, Dave Gordon wrote:

On 15/06/15 15:10, Siluvery, Arun wrote:

On 12/06/2015 18:03, Dave Gordon wrote:

On 12/06/15 12:58, Siluvery, Arun wrote:

On 09/06/2015 19:43, Dave Gordon wrote:

On 05/06/15 14:57, Arun Siluvery wrote:

In Per context w/a batch buffer,
WaRsRestoreWithPerCtxtBb

v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
so as to not break any future users of existing definitions (Michel)

Signed-off-by: Rafael Barbalho rafael.barba...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
drivers/gpu/drm/i915/i915_reg.h  | 26 ++
drivers/gpu/drm/i915/intel_lrc.c | 59

2 files changed, 85 insertions(+)


[snip]


+/*
+ * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and
+ * MI_BATCH_BUFFER_END instructions in this sequence need to be
+ * in the same cacheline.
+ */
+while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0)
+cmd[index++] = MI_NOOP;
+
+cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 |
+MI_LRM_USE_GLOBAL_GTT |
+MI_LRM_ASYNC_MODE_ENABLE;
+cmd[index++] = INSTPM;
+cmd[index++] = scratch_addr;
+cmd[index++] = 0;
+
+/*
+ * BSpec says there should not be any commands programmed
+ * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so
+ * do not add any new commands
+ */
+cmd[index++] = MI_LOAD_REGISTER_REG_GEN8;
+cmd[index++] = GEN8_RS_PREEMPT_STATUS;
+cmd[index++] = GEN8_RS_PREEMPT_STATUS;
+
/* padding */
while (index  end)
cmd[index++] = MI_NOOP;



Where's the MI_BATCH_BUFFER_END referred to in the comment?


MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6].
Since the diff context is only few lines it didn't showup in the diff.


The second comment above says no commands between LOAD_REG_REG and
BB_END, so the point of my comment was that the BB_END is *NOT*
immediately after the LOAD_REG_REG -- there are a bunch of no-ops there!


true, but they are no-ops so they shouldn't really affect anything. I
guess the spec means no valid commands.


I guess the spec means NO COMMANDS. NOOP is a perfectly valid command,
and I've even seen cases where a workaround specifically requires a
NOOP with the set-no-op-id-register bit set to fix some particular bug.
The only special thing about NOOP is that it doesn't get captured in IPEHR.

I think the w/a requires this:

0%CLSIZE: ... LRM (reg, addr, 0) LRR (reg, reg) BB_END ... (63%CLSIZE)

no gaps, no insertions, all together, all on one cacheline. Those
instructions take up 8 DWords (32 bytes) so the sequence doesn't
necessarily have to start on a cacheline boundary, as long as it's
entirely within the same line. But it's simpler to start on a new line.
You seem to have:

0%CLSIZE: LRM (reg, mem, 0) LRR (reg, reg) NOP NOP NOP BB_END

so the condition in the comment is not fulfilled. If this works, maybe
the comment is wrong.


And therefore also, these instructions do *not* all end up in the same
cacheline, thus contradicting the first comment above.


I don't understand why. As per the requirement the commands from the
first MI_LOAD_REGISTER_MEM_GEN8 (after while) through BB_END will be
part of same cacheline (in this case second line).


OK, they're all in the same line; I didn't look back at the full context
enough and thought 'end' would point to the end of the buffer, rather
than the end of the cacheline .. because it /does/ point to the end of
the buffer, it just happens to be the end of the very same cacheline as
well.

So I really don't like the way the sizes of the two workaround batches
have been defined in terms of cache lines. Also I'm not keen on one bit
of code allocating the object and defining the sizes of the sub-areas
within it, and then separate functions filling in each of the sequences
within these areas, knowing that the areas are /just the right size/.
It would be simpler to maintain if the size in cachelines values in
lrc_setup_ctx_wa_obj() didn't have to be hand-edited to stay in sync
with the number of instructions written by gen8_init_perctx_bb() and
gen8_init_indirectctx_bb().

How about having each of these return the number of bytes they've
appended to the (u32 *)buffer that they've been given, and let the
caller manage mapping/unmapping, alignment, padding, etc, and fill in
the size fields accordingly *after* the content has been defined?


This is an issue, editing the size if more WA are added is not good, it 
can be changed as you suggested.


regards
Arun



.Dave.


Padding *after* a BB_END would be redundant.


yes, I just wanted to keep MI_BATCH_BUFFER_END at the end instead of
abruptly terminating the batch which is why I am padding with no-ops, I
can change this if that is preferred.


.Dave.






___
Intel-gfx

Re: [Intel-gfx] [PATCH v3 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround

2015-06-12 Thread Siluvery, Arun

On 05/06/2015 15:48, Ville Syrjälä wrote:

On Fri, Jun 05, 2015 at 02:56:48PM +0100, Arun Siluvery wrote:

In Indirect context w/a batch buffer,
+WaFlushCoherentL3CacheLinesAtContextSwitch

Signed-off-by: Rafael Barbalho rafael.barba...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/i915_reg.h  | 1 +
  drivers/gpu/drm/i915/intel_lrc.c | 8 
  2 files changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 84af255..5203c79 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -426,6 +426,7 @@
  #define   PIPE_CONTROL_INDIRECT_STATE_DISABLE (19)
  #define   PIPE_CONTROL_NOTIFY (18)
  #define   PIPE_CONTROL_FLUSH_ENABLE   (17) /* gen7+ */
+#define   PIPE_CONTROL_DC_FLUSH_ENABLE (15)
  #define   PIPE_CONTROL_VF_CACHE_INVALIDATE(14)
  #define   PIPE_CONTROL_CONST_CACHE_INVALIDATE (13)
  #define   PIPE_CONTROL_STATE_CACHE_INVALIDATE (12)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a71eb81..9d8cf65c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1101,6 +1101,14 @@ static int gen8_init_indirectctx_bb(struct 
intel_engine_cs *ring)
/* WaDisableCtxRestoreArbitration:bdw,chv */
cmd[index++] = MI_ARB_ON_OFF | MI_ARB_DISABLE;

+   /* WaFlushCoherentL3CacheLinesAtContextSwitch:bdw,chv */
+   cmd[index++] = GFX_OP_PIPE_CONTROL(6);
+   cmd[index++] = PIPE_CONTROL_DC_FLUSH_ENABLE;
+   cmd[index++] = 0;
+   cmd[index++] = 0;
+   cmd[index++] = 0;
+   cmd[index++] = 0;
+


This looks incomplete. Seems like you should have LRIs around this
guy to enable/disable the L3SQCREG4 coherent line flush bit.

And chv shouldn't do coherent L3, so this might not be needed there.



I checked with HW team and yes I need to add LRIs to enable/disable 
L3SQCREG4 coherent line flush bit.

As you mentioned, it is not required for CHV.


Also do we need a CS stall here too?
DC Flush Enable 5 Requires stall bit ([20] of DW) set for all GPGPU and Media 
Workloads.

I didn't check the restrictions of this bit, will check again and correc 
this.


regards
Arun


Supposedly we should add the DC flush to the normal ring flush hooks
too. But that's a separate issue.


/* padding */
  while (index  end)
cmd[index++] = MI_NOOP;
--
2.3.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx




___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround

2015-06-12 Thread Siluvery, Arun

On 09/06/2015 19:43, Dave Gordon wrote:

On 05/06/15 14:57, Arun Siluvery wrote:

In Per context w/a batch buffer,
WaRsRestoreWithPerCtxtBb

v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
so as to not break any future users of existing definitions (Michel)

Signed-off-by: Rafael Barbalho rafael.barba...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/i915_reg.h  | 26 ++
  drivers/gpu/drm/i915/intel_lrc.c | 59 
  2 files changed, 85 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 33b0ff1..6928162 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h

[snip]

  #define MI_LOAD_REGISTER_MEMMI_INSTR(0x29, 0)
  #define MI_LOAD_REGISTER_REGMI_INSTR(0x2A, 0)
+#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2)
+#define   MI_LRM_USE_GLOBAL_GTT (122)
+#define   MI_LRM_ASYNC_MODE_ENABLE (121)
+#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1)


Isn't the original definition of MI_LOAD_REGISTER_REG wrong anyway? It's
a two-operand instruction, each of which is a one-word MMIO register
address, hence always 3 words total. The length bias is 2, so the
so-called 'flags' field must be 1. The original definition (where the
second argument of the MI_INSTR macro is 0) shouldn't work.

So just correct the original definition of MI_LOAD_REGISTER_REG; this
isn't something that's actually changed on GEN8.

I did notice that the original instructions are odd but thought I might 
be wrong hence I created new ones to not disturb the original ones.

ok I will just correct original one and reuse it.


While we're mentioning it, I think the above MI_LOAD_REGISTER_MEM is
wrong too. The length should be 1 pre-GEN8, and 2 in GEN8+.


ok.

  #define MI_RS_STORE_DATA_IMMMI_INSTR(0x2B, 0)
  #define MI_LOAD_URB_MEM MI_INSTR(0x2C, 0)
  #define MI_STORE_URB_MEMMI_INSTR(0x2D, 0)


And these are wrong too! In fact all of these instructions have been
added under a comment which says Commands used only by the command
parser. Looks like they were added as placeholders without the proper
length fields, and then people have started using them as though they
were complete definitions :(

Time update them all, perhaps ...
these are not related to this patch, so it can be taken up as a 
different patch.


[snip]


+   /*
+* BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and
+* MI_BATCH_BUFFER_END instructions in this sequence need to be
+* in the same cacheline.
+*/
+   while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0)
+   cmd[index++] = MI_NOOP;
+
+   cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 |
+   MI_LRM_USE_GLOBAL_GTT |
+   MI_LRM_ASYNC_MODE_ENABLE;
+   cmd[index++] = INSTPM;
+   cmd[index++] = scratch_addr;
+   cmd[index++] = 0;
+
+   /*
+* BSpec says there should not be any commands programmed
+* between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so
+* do not add any new commands
+*/
+   cmd[index++] = MI_LOAD_REGISTER_REG_GEN8;
+   cmd[index++] = GEN8_RS_PREEMPT_STATUS;
+   cmd[index++] = GEN8_RS_PREEMPT_STATUS;
+
/* padding */
  while (index  end)
cmd[index++] = MI_NOOP;



Where's the MI_BATCH_BUFFER_END referred to in the comment?


MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6].
Since the diff context is only few lines it didn't showup in the diff.

regards
Arun



.Dave.




___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode

2015-06-09 Thread Siluvery, Arun

On 09/06/2015 16:27, Dave Gordon wrote:

On 05/06/15 11:34, Arun Siluvery wrote:

Some of the WA applied using WA batch buffers perform writes to scratch page.
In the current flow WA are initialized before scratch obj is allocated.
This patch reorders intel_init_pipe_control() to have a valid scratch obj
before we initialize WA.

Signed-off-by: Michel Thierry michel.thie...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/intel_lrc.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0b3422a..20c56e4 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1562,11 +1562,16 @@ static int logical_render_ring_init(struct drm_device 
*dev)
ring-emit_bb_start = gen8_emit_bb_start;

ring-dev = dev;
+
+   ret = intel_init_pipe_control(ring);
+   if (ret)
+   return ret;
+
ret = logical_ring_init(dev, ring);
if (ret)
return ret;

-   return intel_init_pipe_control(ring);
+   return 0;
  }


You could squash the last several lines down to just:

return logical_ring_init(dev, ring);

.Dave.



yes but this is updated based on suggestion from Chris to allocate 
wa_ctx page during ring init itself; in the updated version it is not 
possible as we need to free that page also if ring_init fails.
I sent the updated patches in the same series so as to collate all 
reviews instead of resending as a separate series.


regards
Arun






___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

2015-06-05 Thread Siluvery, Arun

On 05/06/2015 11:56, Chris Wilson wrote:

On Fri, Jun 05, 2015 at 11:34:01AM +0100, Arun Siluvery wrote:

Some of the WA are to be applied during context save but before restore and
some at the end of context save/restore but before executing the instructions
in the ring, WA batch buffers are created for this purpose and these WA cannot
be applied using normal means. Each context has two registers to load the
offsets of these batch buffers. If they are non-zero, HW understands that it
need to execute these batches.

v1: In this version two separate ring_buffer objects were used to load WA
instructions for indirect and per context batch buffers and they were part
of every context.

v2: Chris suggested to include additional page in context and use it to load
these WA instead of creating separate objects. This will simplify lot of things
as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC
is planning to use a similar setup to share data between GuC and driver and
WA batch buffers can probably share that page. However after discussions with
Dave who is implementing GuC changes, he suggested to use an independent page
for the reasons - GuC area might grow and these WA are initialized only once and
are not changed afterwards so we can share them share across all contexts.
This version uses this approach.
(Thanks to Chris, Dave and Thomas for their inputs)


Having moved to a shared wa_ctx for all lrc, I think it makes sense to
then do the allocation during ring_init itself, next to the scratch/hws
status pages. The advantage is that we don't then need to add more
special cases to the default ctx on RCS, and its permanence is far more
prominent. It will also be more consistent with calling it ring-wa_ctx.

Since you have it already plumbed into ring init/fini, why is it partly
done during default ctx init? Maybe all that is required a little bit of
code and changelog explanation.
-Chris

ok, it is possible to do the allocation and setup in logical_ring_init() 
itself. I wanted to group it with other wa which are setup in 
init_context().


I will also change s/reg_state/cmd.

regards
Arun


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

2015-06-05 Thread Siluvery, Arun

On 05/06/2015 12:36, Chris Wilson wrote:

On Fri, Jun 05, 2015 at 12:24:58PM +0100, Siluvery, Arun wrote:

ok, it is possible to do the allocation and setup in
logical_ring_init() itself. I wanted to group it with other wa which
are setup in init_context().


Phew, I had worried I had missed something. The issue the current split
between ring init and default context init raises in my mind is that the
content has context dependencies upon it - whereas the w/a so far can be
reused globally. So imo the current split is more confusing than just
creating the w/a buffer entirely during ring init.
-Chris

I am moving it to logical_render_ring_init() as these are only 
applicable for RCS.


regards
Arun

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 1/7] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

2015-06-04 Thread Siluvery, Arun

On 02/06/2015 19:47, Dave Gordon wrote:

On 02/06/15 19:36, Siluvery, Arun wrote:

On 01/06/2015 11:22, Daniel, Thomas wrote:


Indeed, allocating an extra scratch page in the context would simplify
vma/mm management. A trick might be to allocate the scratch page at the
start, then offset the lrc regs etc - that would then be consistent
amongst gen and be easy enough to extend if we need more per-context
scratch space in future.
-Chris


Yes, I think we already have another use for more per-context space at
the start.  The GuC is planning to do this.  Arun, you probably should
work with Alex Dai and Dave Gordon to avoid conflicts here.

Thomas.



Thanks for the heads-up Thomas.
I have discussed with Dave and agreed to share this page;
GuC probably doesn't need whole page so first half is reserved for it's
use and second half is used for WA.

I have modified my patches to use context page for applying these WA and
don't see any issues.

During the discussions Dave proposed another approach. Even though these
WA are called per context they are only initialized once and not changed
afterwards, same set of WA are applied for each context so instead of
adding them in each context, does it make sense to create a separate
page and share across all contexts? but of course GuC will anyway add a
new page to context so I might as well share that page.

Chris/Dave, do you see any problems with sharing page with GuC or you
prefer to allocate a separate page for these WA and share across all
contexts? Please give your comments.

regards
Arun


I think we have to consider which is more future-proof i.e. which is
least likely:
(1) the area shared with the GuC grows (definitely still in flux), or
(2) workarounds need to be context-specific (possible, but unlikely)

So I'd prefer a single area set up just once to contain the pre- and
post-context restore workaround batches. If necessary, the one area
could contain multiple batches at different offsets, so we could point
different contexts at different (shared) batches as required. I think
they're unlikely to actually need per-context customisation[*], but
there might be a need for different workarounds according to workload
type or privilege level or some other criterion ... ?

.Dave.

[*] unless they need per-context memory addresses coded into them?



Considering these WA are initialized only once and not changed 
afterwards and GuC area probably grows in future which may run into the 
space used by WA, independent single area setup makes senses.
I also checked spec and it is not clear whether any customization is 
going to be required for different contexts.
I have modified patches to setup a single page with WA when 
default_context is initialized and this is used by all contexts.


I will send patches but please let me know if there are any other comments.

regards
Arun





___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 1/7] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

2015-06-02 Thread Siluvery, Arun

On 01/06/2015 11:22, Daniel, Thomas wrote:


Indeed, allocating an extra scratch page in the context would simplify
vma/mm management. A trick might be to allocate the scratch page at the
start, then offset the lrc regs etc - that would then be consistent
amongst gen and be easy enough to extend if we need more per-context
scratch space in future.
-Chris


Yes, I think we already have another use for more per-context space at the 
start.  The GuC is planning to do this.  Arun, you probably should work with 
Alex Dai and Dave Gordon to avoid conflicts here.

Thomas.



Thanks for the heads-up Thomas.
I have discussed with Dave and agreed to share this page;
GuC probably doesn't need whole page so first half is reserved for it's 
use and second half is used for WA.


I have modified my patches to use context page for applying these WA and 
don't see any issues.


During the discussions Dave proposed another approach. Even though these 
WA are called per context they are only initialized once and not changed 
afterwards, same set of WA are applied for each context so instead of 
adding them in each context, does it make sense to create a separate 
page and share across all contexts? but ofcourse GuC will anyway add a 
new page to context so I might as well share that page.


Chris/Dave, do you see any problems with sharing page with GuC or you 
prefer to allocate a separate page for these WA and share across all 
contexts? Please give your comments.


regards
Arun


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 1/7] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

2015-06-01 Thread Siluvery, Arun

On 29/05/2015 19:16, Chris Wilson wrote:

On Fri, May 29, 2015 at 07:03:19PM +0100, Arun Siluvery wrote:

This patch adds functions to setup WA batch buffers but they are not yet
enabled in this patch. Some of the WA are to be applied during context save
but before restore and some at the end of context save/restore but before
executing the instructions in the ring, WA batch buffers are created for
this purpose and these WA cannot be applied using normal means.

Signed-off-by: Namrta namrta.salo...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/i915_drv.h  |   3 ++
  drivers/gpu/drm/i915/intel_lrc.c | 101 +++
  2 files changed, 104 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 731b5ce..dd4b31d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -814,6 +814,9 @@ struct intel_context {

/* Execlists */
bool rcs_initialized;
+   struct intel_ringbuffer *indirect_ctx_wa_bb;
+   struct intel_ringbuffer *per_ctx_wa_bb;


Eh? They are only command sequences whose starting addresses you encode
into the execlists context. Why have you allocated a ringbuf not an
object? Why have you allocated 2 pages when you only need one, and could
even find space elsewhere in the context


ringbuf is only used so that I can use logical_ring_*(), object can also 
be used.
Single page is enough but since we have two batch buffers and need to 
provide offsets in two different registers, two pages are used for 
simplifying things, I guess we can manage with single page, I will try this.


Your idea of using space in context itself simplifies many things but 
the context size varies across Gens, is it safe to pick last page or 
increase the size by one more page and use that to load these 
instructions? I think using an additional page is safe to avoid the risk 
of HW overwriting that page or do you have any other recommendation? I 
will first try and see if it works.




And these should be pinned alongside the context *not permanently*.
right, I will correct this but this won't be required if we use the 
space in context.




I want a debug mode that limits us to say 16M of GGTT space so that
these address space leaks are easier to demonstrate in practice.
-Chris



regards
Arun

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 1/2] drm/i915/gen8: The WA BB framework is enabled.

2015-03-02 Thread Siluvery, Arun

On 02/03/2015 11:02, Arun Siluvery wrote:

Please ignore this one. I used message id of cover letter instead of v1 
of this patch. Latest patches are sent in reply to their initial revisions.


regards
Arun


From: Namrta namrta.salo...@intel.com

This can be used to enable WA BB infrastructure for features like
RC6, SSEU and in between context save/restore etc.

The patch which would need WA BB will have to declare the wa_bb obj
utilizing the function here. Update the WA BB with required commands
and update the address of the WA BB at appropriate place.

v2: Move function to the right place to keeps diffs clearer in the
patch that uses this function (Michel)

Change-Id: I9cc49ae7426560215e7b6a6d10ba411caeb9321b
Signed-off-by: Namrta namrta.salo...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
Reviewed-by: Michel Thierry michel.thie...@intel.com
---
  drivers/gpu/drm/i915/intel_lrc.c | 32 
  1 file changed, 32 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9c851d8..ea37a56 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1107,6 +1107,38 @@ static int intel_logical_ring_workarounds_emit(struct 
intel_engine_cs *ring,
return 0;
  }

+static struct intel_ringbuffer *
+create_wa_bb(struct intel_engine_cs *ring, uint32_t bb_size)
+{
+   struct drm_device *dev = ring-dev;
+   struct intel_ringbuffer *ringbuf;
+   int ret;
+
+   ringbuf = kzalloc(sizeof(*ringbuf), GFP_KERNEL);
+   if (!ringbuf)
+   return NULL;
+
+   ringbuf-ring = ring;
+
+   ringbuf-size = roundup(bb_size, PAGE_SIZE);
+   ringbuf-effective_size = ringbuf-size;
+   ringbuf-head = 0;
+   ringbuf-tail = 0;
+   ringbuf-space = ringbuf-size;
+   ringbuf-last_retired_head = -1;
+
+   ret = intel_alloc_ringbuffer_obj(dev, ringbuf);
+   if (ret) {
+   DRM_DEBUG_DRIVER(
+   Failed to allocate ringbuf obj for wa_bb%s: %d\n,
+   ring-name, ret);
+   kfree(ringbuf);
+   return NULL;
+   }
+
+   return ringbuf;
+}
+
  static int gen8_init_common_ring(struct intel_engine_cs *ring)
  {
struct drm_device *dev = ring-dev;



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/2] drm/i915/gen8: Apply Per-context workarounds using W/A batch buffers

2015-03-02 Thread Siluvery, Arun

On 02/03/2015 10:10, Michel Thierry wrote:



On 25/02/15 17:54, Arun Siluvery wrote:

Some of the workarounds are to be applied during context save but before
restore and some at the end of context save/restore but before executing
the instructions in the ring. Workaround batch buffers are created for
this purpose as they cannot be applied using normal means. HW executes
them at specific stages during context save/restore.

In this method we initialize batch buffer with w/a commands and its address
is supplied using context offset pointers when a context is initialized.

This patch introduces indirect and per-context batch buffers using which
following workarounds are applied. These are required to fix issues
observed with preemption related workloads.

In Indirect context w/a batch buffer,
+WaDisableCtxRestoreArbitration
+WaFlushCoherentL3CacheLinesAtContextSwitch
+WaClearSlmSpaceAtContextSwitch

In Per context w/a batch buffer,
+WaDisableCtxRestoreArbitration
+WaRsRestoreWithPerCtxtBb

v2: Use GTT address type for all privileged instructions, update as
per dynamic pinning changes, minor simplifications, rename variables
as follows to keep lines under 80 chars and rebase.
s/indirect_ctx_wa_ringbuf/indirect_ctx_wa_bb
s/per_ctx_wa_ringbuf/per_ctx_wa_bb

v3: Modify WA BB initialization to Gen specific.

Change-Id: I0cedb536b7f6d9f10ba9e81ba625848e7bab603c
Signed-off-by: Rafael Barbalho rafael.barba...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
   drivers/gpu/drm/i915/i915_drv.h |   3 +
   drivers/gpu/drm/i915/i915_reg.h |  30 +++-
   drivers/gpu/drm/i915/intel_lrc.c| 302 
+++-
   drivers/gpu/drm/i915/intel_ringbuffer.h |   3 +
   4 files changed, 297 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d42040f..86cdb52 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -774,6 +774,9 @@ struct intel_context {

/* Execlists */
bool rcs_initialized;
+   struct intel_ringbuffer *indirect_ctx_wa_bb;
+   struct intel_ringbuffer *per_ctx_wa_bb;
+
struct {
struct drm_i915_gem_object *state;
struct intel_ringbuffer *ringbuf;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 55143cb..eb41d7f 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -347,6 +347,26 @@
   #define   MI_INVALIDATE_BSD  (17)
   #define   MI_FLUSH_DW_USE_GTT(12)
   #define   MI_FLUSH_DW_USE_PPGTT  (02)
+#define MI_ATOMIC(len) MI_INSTR(0x2F, (len-2))
+#define   MI_ATOMIC_MEMORY_TYPE_GGTT   (122)
+#define   MI_ATOMIC_INLINE_DATA(118)
+#define   MI_ATOMIC_CS_STALL   (117)
+#define   MI_ATOMIC_RETURN_DATA_CTL(116)
+#define MI_ATOMIC_OP_MASK(op)  ((op)  8)
+#define MI_ATOMIC_AND  MI_ATOMIC_OP_MASK(0x01)
+#define MI_ATOMIC_OR   MI_ATOMIC_OP_MASK(0x02)
+#define MI_ATOMIC_XOR  MI_ATOMIC_OP_MASK(0x03)
+#define MI_ATOMIC_MOVE MI_ATOMIC_OP_MASK(0x04)
+#define MI_ATOMIC_INC  MI_ATOMIC_OP_MASK(0x05)
+#define MI_ATOMIC_DEC  MI_ATOMIC_OP_MASK(0x06)
+#define MI_ATOMIC_ADD  MI_ATOMIC_OP_MASK(0x07)
+#define MI_ATOMIC_SUB  MI_ATOMIC_OP_MASK(0x08)
+#define MI_ATOMIC_RSUB MI_ATOMIC_OP_MASK(0x09)
+#define MI_ATOMIC_IMAX MI_ATOMIC_OP_MASK(0x0A)
+#define MI_ATOMIC_IMIN MI_ATOMIC_OP_MASK(0x0B)
+#define MI_ATOMIC_UMAX MI_ATOMIC_OP_MASK(0x0C)
+#define MI_ATOMIC_UMIN MI_ATOMIC_OP_MASK(0x0D)
+
   #define MI_BATCH_BUFFER  MI_INSTR(0x30, 1)
   #define   MI_BATCH_NON_SECURE(1)
   /* for snb/ivb/vlv this also means batch in ppgtt when ppgtt is enabled. */
@@ -410,6 +430,7 @@
   #define   DISPLAY_PLANE_A   (020)
   #define   DISPLAY_PLANE_B   (120)
   #define GFX_OP_PIPE_CONTROL(len) ((0x329)|(0x327)|(0x224)|(len-2))
+#define   PIPE_CONTROL_FLUSH_RO_CACHES (127)

I think the consensus is to rename this to PIPE_CONTROL_FLUSH_L3, isn't it?


Yes, it will be renamed to PIPE_CONTROL_FLUSH_L3 in v2.


   #define   PIPE_CONTROL_GLOBAL_GTT_IVB(124) /* 
gen7+ */
   #define   PIPE_CONTROL_MMIO_WRITE(123)
   #define   PIPE_CONTROL_STORE_DATA_INDEX  (121)
@@ -426,6 +447,7 @@
   #define   PIPE_CONTROL_INDIRECT_STATE_DISABLE(19)
   #define   PIPE_CONTROL_NOTIFY(18)
   #define   PIPE_CONTROL_FLUSH_ENABLE  (17) /* gen7+ */
+#define   PIPE_CONTROL_DC_FLUSH_ENABLE (15)
   #define   PIPE_CONTROL_VF_CACHE_INVALIDATE   (14)
   #define   PIPE_CONTROL_CONST_CACHE_INVALIDATE(13)
   #define   PIPE_CONTROL_STATE_CACHE_INVALIDATE(12)
@@ -449,8 +471,10 @@
   #define MI_CLFLUSH  MI_INSTR(0x27, 0)
   #define MI_REPORT_PERF_COUNTMI_INSTR(0x28, 0)
   #define   

Re: [Intel-gfx] [PATCH v2 2/2] drm/i915/gen8: Apply Per-context workarounds using W/A batch buffers

2015-03-02 Thread Siluvery, Arun

On 02/03/2015 17:43, Daniel Vetter wrote:

On Mon, Mar 02, 2015 at 11:07:20AM +, Arun Siluvery wrote:

Some of the workarounds are to be applied during context save but before
restore and some at the end of context save/restore but before executing
the instructions in the ring. Workaround batch buffers are created for
this purpose as they cannot be applied using normal means. HW executes
them at specific stages during context save/restore.

In this method we initialize batch buffer with w/a commands and its address
is supplied using context offset pointers when a context is initialized.

This patch introduces indirect and per-context batch buffers using which
following workarounds are applied. These are required to fix issues
observed with preemption related workloads.

In Indirect context w/a batch buffer,
+WaDisableCtxRestoreArbitration
+WaFlushCoherentL3CacheLinesAtContextSwitch
+WaClearSlmSpaceAtContextSwitch

In Per context w/a batch buffer,
+WaDisableCtxRestoreArbitration
+WaRsRestoreWithPerCtxtBb

v2: Use GTT address type for all privileged instructions, update as
per dynamic pinning changes, minor simplifications, rename variables
as follows to keep lines under 80 chars and rebase.
s/indirect_ctx_wa_ringbuf/indirect_ctx_wa_bb
s/per_ctx_wa_ringbuf/per_ctx_wa_bb

v3: Modify WA BB initialization to Gen specific.

v4: s/PIPE_CONTROL_FLUSH_RO_CACHES/PIPE_CONTROL_FLUSH_L3 (Ville)
This patches modifies definitions of MI_LOAD_REGISTER_MEM and
MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
so as to not break any future users of existing definitions (Michel)

Change-Id: I0cedb536b7f6d9f10ba9e81ba625848e7bab603c
Signed-off-by: Rafael Barbalho rafael.barba...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/i915_drv.h |   3 +
  drivers/gpu/drm/i915/i915_reg.h |  28 
  drivers/gpu/drm/i915/intel_lrc.c| 231 +++-
  drivers/gpu/drm/i915/intel_ringbuffer.h |   3 +
  4 files changed, 258 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d42040f..86cdb52 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -774,6 +774,9 @@ struct intel_context {

/* Execlists */
bool rcs_initialized;
+   struct intel_ringbuffer *indirect_ctx_wa_bb;
+   struct intel_ringbuffer *per_ctx_wa_bb;


Why is this per-ctx and not per-engine? Also your patch splitting doesn't


Since we apply them on a context basis I think intel_context is a better 
place, also they are only applicable for RCS. There is no reason why 
they cannot be added in engine structure; if you think that is a better 
place I can make the changes accordingly.



make that much sense: Patch 1 only adds a static function without any
users (resulting in gcc being unhappy). Imo a better split would be:
- wire up wa batch/ring allocation/freing functions
- wire up the changes to the lrc initial reg state code
- one patch per w/a entry you add
I am not the author of first patch and I tried to retain it as is but it 
has not resulted in a clean split up. I will split them as suggested.


regards
Arun



Cheers, Daniel


+
struct {
struct drm_i915_gem_object *state;
struct intel_ringbuffer *ringbuf;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 55143cb..3048494 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -347,6 +347,26 @@
  #define   MI_INVALIDATE_BSD   (17)
  #define   MI_FLUSH_DW_USE_GTT (12)
  #define   MI_FLUSH_DW_USE_PPGTT   (02)
+#define MI_ATOMIC(len) MI_INSTR(0x2F, (len-2))
+#define   MI_ATOMIC_MEMORY_TYPE_GGTT   (122)
+#define   MI_ATOMIC_INLINE_DATA(118)
+#define   MI_ATOMIC_CS_STALL   (117)
+#define   MI_ATOMIC_RETURN_DATA_CTL(116)
+#define MI_ATOMIC_OP_MASK(op)  ((op)  8)
+#define MI_ATOMIC_AND  MI_ATOMIC_OP_MASK(0x01)
+#define MI_ATOMIC_OR   MI_ATOMIC_OP_MASK(0x02)
+#define MI_ATOMIC_XOR  MI_ATOMIC_OP_MASK(0x03)
+#define MI_ATOMIC_MOVE MI_ATOMIC_OP_MASK(0x04)
+#define MI_ATOMIC_INC  MI_ATOMIC_OP_MASK(0x05)
+#define MI_ATOMIC_DEC  MI_ATOMIC_OP_MASK(0x06)
+#define MI_ATOMIC_ADD  MI_ATOMIC_OP_MASK(0x07)
+#define MI_ATOMIC_SUB  MI_ATOMIC_OP_MASK(0x08)
+#define MI_ATOMIC_RSUB MI_ATOMIC_OP_MASK(0x09)
+#define MI_ATOMIC_IMAX MI_ATOMIC_OP_MASK(0x0A)
+#define MI_ATOMIC_IMIN MI_ATOMIC_OP_MASK(0x0B)
+#define MI_ATOMIC_UMAX MI_ATOMIC_OP_MASK(0x0C)
+#define MI_ATOMIC_UMIN MI_ATOMIC_OP_MASK(0x0D)
+
  #define MI_BATCH_BUFFER   MI_INSTR(0x30, 1)
  #define   MI_BATCH_NON_SECURE (1)
  /* for snb/ivb/vlv this also means batch in ppgtt when ppgtt is enabled. */
@@ -410,6 +430,7 @@
  #define   DISPLAY_PLANE_A   (020)
  #define   DISPLAY_PLANE_B   (120)
  #define GFX_OP_PIPE_CONTROL(len)  ((0x329)|(0x327)|(0x224)|(len-2))

Re: [Intel-gfx] [PATCH] drm/i915: Skip Stolen Memory first page.

2015-02-03 Thread Siluvery, Arun

On 01/08/2014 17:34, Jesse Barnes wrote:

On Thu, 31 Jul 2014 12:08:20 -0700
Rodrigo Vivi rodrigo.v...@intel.com wrote:


WA to skip the first page of stolen memory due to sporadic HW write on *CS Idle

v2: Improve variable names and fix allocated size.

Reviewed-by: Ben Widawsky b...@bwidawsk.net
Signed-off-by: Rodrigo Vivi rodrigo.v...@intel.com
---
  drivers/gpu/drm/i915/i915_gem_stolen.c | 15 ++-
  1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c 
b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 21c025a..82035b0 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -289,7 +289,8 @@ void i915_gem_cleanup_stolen(struct drm_device *dev)
  int i915_gem_init_stolen(struct drm_device *dev)
  {
struct drm_i915_private *dev_priv = dev-dev_private;
-   int bios_reserved = 0;
+   int start_rsvd = 0;
+   int end_rsvd = 0;

  #ifdef CONFIG_INTEL_IOMMU
if (intel_iommu_gfx_mapped  INTEL_INFO(dev)-gen  8) {
@@ -308,15 +309,19 @@ int i915_gem_init_stolen(struct drm_device *dev)
DRM_DEBUG_KMS(found %zd bytes of stolen memory at %08lx\n,
  dev_priv-gtt.stolen_size, dev_priv-mm.stolen_base);

+   /* WaSkipStolenMemoryFirstPage */
+   if (INTEL_INFO(dev)-gen = 8)
+   start_rsvd = 4096;
+
if (IS_VALLEYVIEW(dev))
-   bios_reserved = 1024*1024; /* top 1M on VLV/BYT */
+   end_rsvd = 1024*1024; /* top 1M on VLV/BYT */

-   if (WARN_ON(bios_reserved  dev_priv-gtt.stolen_size))
+   if (WARN_ON((start_rsvd + end_rsvd)  dev_priv-gtt.stolen_size))
return 0;

/* Basic memrange allocator for stolen space */
-   drm_mm_init(dev_priv-mm.stolen, 0, dev_priv-gtt.stolen_size -
-   bios_reserved);
+   drm_mm_init(dev_priv-mm.stolen, start_rsvd,
+   dev_priv-gtt.stolen_size - start_rsvd - end_rsvd);

return 0;
  }


Beyond the fastboot stuff Ville has already mentioned, the early
allocation of the existing fb from stolen will prevent us from
clobbering the currently displayed buffer with the contents of the
ringbuffers and whatever else we allocate out of stolen at early boot.

We might be able to avoid that by doing stolen allocations top down, or
by reserving the displayed fb even if we can't allocate an obj for it,
only freeing it after our first mode set.

Can you file a bug or JIRA for that to make sure we don't lose track of
the fastboot  boot corruption issues after this fix lands?


Reviving an old thread,
Any particular reason why this patch is not merged to nightly?
Is it known to cause any other regressions?

regards
Arun




Thanks,



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] Significance of Golden context

2015-02-02 Thread Siluvery, Arun

Hi,

Could someone explain the significance of Null context/Golden state?
I understand we are initializing 3D state in this batch and we send this 
at the beginning to start the HW with a known state but what are 
implications of not doing this? what kind of issues we can expect if we 
don't do this? How is this golden state determined?


As a test I disabled this for Gen8 and I can boot Android without any 
issues.


regards
Arun
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/4] drm/i915: Implement Wa4x4STCOptimizationDisable:chv

2015-01-21 Thread Siluvery, Arun

On 21/01/2015 17:37, ville.syrj...@linux.intel.com wrote:

From: Ville Syrjälä ville.syrj...@linux.intel.com

Wa4x4STCOptimizationDisable got only implemented for BDW, but according
to the w/a database CHV needs it too, so add it.

Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com
---
  drivers/gpu/drm/i915/intel_ringbuffer.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index d7aa5c4..2a1a178 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -851,6 +851,10 @@ static int chv_init_workarounds(struct intel_engine_cs 
*ring)
 */
WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE);

+   /* Wa4x4STCOptimizationDisable:chv */
+   WA_SET_BIT_MASKED(CACHE_MODE_1,
+ GEN8_4x4_STC_OPTIMIZATION_DISABLE);
+
/* Improve HiZ throughput on CHV. */
WA_SET_BIT_MASKED(HIZ_CHICKEN, CHV_HZ_8X8_MODE_IN_1X);



Looks good to me.
only tested Wa4x4STCOptimizationDisable on Android, no issues observed.

For the whole series,
Reviewed-by: Arun Siluvery arun.siluv...@linux.intel.com

regards
Arun

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 04/10] drm/i915: Pixel Clock changes for DSI dual link

2014-12-05 Thread Siluvery, Arun

On 05/12/2014 16:33, Singh, Gaurav K wrote:


On 12/4/2014 2:57 PM, Jani Nikula wrote:

On Thu, 04 Dec 2014, Gaurav K Singh gaurav.k.si...@intel.com wrote:

For dual link MIPI Panels, each port needs half of pixel clock. Pixel overlap
can be enabled if needed by panel, then in that case, pixel clock will be
increased for extra pixels.


just a question, why do we need pixel overlap?
I couldn't find more details from spec other than that when overlap is 
set some extra pixels are sent.


regards
Arun



v2 : Address review comments by Jani
   - Removed the bit mask used for -dual_link
   - Used DSI instead of MIPI for #define variables

Signed-off-by: Gaurav K Singh gaurav.k.si...@intel.com
---
   drivers/gpu/drm/i915/i915_reg.h|4 
   drivers/gpu/drm/i915/intel_bios.h  |3 ++-
   drivers/gpu/drm/i915/intel_dsi.c   |8 
   drivers/gpu/drm/i915/intel_dsi.h   |6 ++
   drivers/gpu/drm/i915/intel_dsi_panel_vbt.c |   21 +
   5 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index c981f5d..87149ba 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -6029,6 +6029,10 @@ enum punit_power_well {
   #define GEN8_PMINTR_REDIRECT_TO_NON_DISP (131)
   #define VLV_PWRDWNUPCTL  0xA294

+#define VLV_CHICKEN_3  0x7040C
+#define  PIXEL_OVERLAP_CNT_MASK(3  30)
+#define  PIXEL_OVERLAP_CNT_SHIFT   30

I didn't find this register, but does it not need + VLV_DISPLAY_BASE?

Given that I can't find the register my review is pretty shallow, but I
don't spot anything obviously wrong either. With these caveats,

Reviewed-by: Jani Nikula jani.nik...@intel.com

This reg is available in BSpec though the bit definitions have not been updated 
in the BSpec. Also, it was communicated by the BIOS team.

+
   #define GEN6_PMISR   0x44020
   #define GEN6_PMIMR   0x44024 /* rps_lock */
   #define GEN6_PMIIR   0x44028
diff --git a/drivers/gpu/drm/i915/intel_bios.h 
b/drivers/gpu/drm/i915/intel_bios.h
index de01167..a6a8710 100644
--- a/drivers/gpu/drm/i915/intel_bios.h
+++ b/drivers/gpu/drm/i915/intel_bios.h
@@ -818,7 +818,8 @@ struct mipi_config {
   #define DUAL_LINK_PIXEL_ALT  2
u16 dual_link:2;
u16 lane_cnt:2;
-   u16 rsvd3:12;
+   u16 pixel_overlap:3;
+   u16 rsvd3:9;

u16 rsvd4;

diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c
index dbe52e9..4e18abd 100644
--- a/drivers/gpu/drm/i915/intel_dsi.c
+++ b/drivers/gpu/drm/i915/intel_dsi.c
@@ -111,6 +111,14 @@ static void intel_dsi_port_enable(struct intel_encoder 
*encoder)
enum port port;
u32 temp;

+   if (intel_dsi-dual_link == DSI_DUAL_LINK_FRONT_BACK) {
+   temp = I915_READ(VLV_CHICKEN_3);
+   temp = ~PIXEL_OVERLAP_CNT_MASK |
+   intel_dsi-pixel_overlap 
+   PIXEL_OVERLAP_CNT_SHIFT;
+   I915_WRITE(VLV_CHICKEN_3, temp);
+   }
+
for_each_dsi_port(port, intel_dsi-ports) {
temp = I915_READ(MIPI_PORT_CTRL(port));

diff --git a/drivers/gpu/drm/i915/intel_dsi.h b/drivers/gpu/drm/i915/intel_dsi.h
index f2cc2fc..8fe2064 100644
--- a/drivers/gpu/drm/i915/intel_dsi.h
+++ b/drivers/gpu/drm/i915/intel_dsi.h
@@ -28,6 +28,11 @@
   #include drm/drm_crtc.h
   #include intel_drv.h

+/* Dual Link support */
+#define DSI_DUAL_LINK_NONE 0
+#define DSI_DUAL_LINK_FRONT_BACK   1
+#define DSI_DUAL_LINK_PIXEL_ALT2
+
   struct intel_dsi_device {
unsigned int panel_id;
const char *name;
@@ -105,6 +110,7 @@ struct intel_dsi {

u8 escape_clk_div;
u8 dual_link;
+   u8 pixel_overlap;
u32 port_bits;
u32 bw_timer;
u32 dphy_reg;
diff --git a/drivers/gpu/drm/i915/intel_dsi_panel_vbt.c 
b/drivers/gpu/drm/i915/intel_dsi_panel_vbt.c
index f60146f..f8c2269 100644
--- a/drivers/gpu/drm/i915/intel_dsi_panel_vbt.c
+++ b/drivers/gpu/drm/i915/intel_dsi_panel_vbt.c
@@ -288,6 +288,7 @@ static bool generic_init(struct intel_dsi_device *dsi)
intel_dsi-lane_count = mipi_config-lane_cnt + 1;
intel_dsi-pixel_format = mipi_config-videomode_color_format  7;
intel_dsi-dual_link = mipi_config-dual_link;
+   intel_dsi-pixel_overlap = mipi_config-pixel_overlap;

if (intel_dsi-dual_link)
intel_dsi-ports = ((1  PORT_A) | (1  PORT_C));
@@ -310,6 +311,20 @@ static bool generic_init(struct intel_dsi_device *dsi)

pclk = mode-clock;

+   /* In dual link mode each port needs half of pixel clock */
+   if (intel_dsi-dual_link) {
+   pclk = pclk / 2;
+
+   /* we can enable pixel_overlap if 

Re: [Intel-gfx] [PATCH 04/10] drm/i915: Pixel Clock changes for DSI dual link

2014-12-05 Thread Siluvery, Arun

On 05/12/2014 17:36, Jani Nikula wrote:

On Fri, 05 Dec 2014, Siluvery, Arun arun.siluv...@linux.intel.com wrote:

On 05/12/2014 16:33, Singh, Gaurav K wrote:


On 12/4/2014 2:57 PM, Jani Nikula wrote:

On Thu, 04 Dec 2014, Gaurav K Singh gaurav.k.si...@intel.com wrote:

For dual link MIPI Panels, each port needs half of pixel clock. Pixel overlap
can be enabled if needed by panel, then in that case, pixel clock will be
increased for extra pixels.


just a question, why do we need pixel overlap?
I couldn't find more details from spec other than that when overlap is
set some extra pixels are sent.



From the host perspective a dual link (or dual channel) DSI device is

two independent peripheral devices. On the peripheral side the display
has to combine the input from the two links (which may be two
independent DSI blocks on the peripheral as well) into one contiguous
display. I don't know the details, but I'm guessing pixel overlap just
makes it easier for the peripheral implementation to get it all
together.


Thank you for the details.
I am just wondering how few extra pixels help on the display side unless 
they are fixed values which act like some kind of markers to synchronize 
between two halves.


regards
Arun



BR,
Jani.



regards
Arun



v2 : Address review comments by Jani
- Removed the bit mask used for -dual_link
- Used DSI instead of MIPI for #define variables

Signed-off-by: Gaurav K Singh gaurav.k.si...@intel.com
---
drivers/gpu/drm/i915/i915_reg.h|4 
drivers/gpu/drm/i915/intel_bios.h  |3 ++-
drivers/gpu/drm/i915/intel_dsi.c   |8 
drivers/gpu/drm/i915/intel_dsi.h   |6 ++
drivers/gpu/drm/i915/intel_dsi_panel_vbt.c |   21 +
5 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index c981f5d..87149ba 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -6029,6 +6029,10 @@ enum punit_power_well {
#define GEN8_PMINTR_REDIRECT_TO_NON_DISP(131)
#define VLV_PWRDWNUPCTL 0xA294

+#define VLV_CHICKEN_3  0x7040C
+#define  PIXEL_OVERLAP_CNT_MASK(3  30)
+#define  PIXEL_OVERLAP_CNT_SHIFT   30

I didn't find this register, but does it not need + VLV_DISPLAY_BASE?

Given that I can't find the register my review is pretty shallow, but I
don't spot anything obviously wrong either. With these caveats,

Reviewed-by: Jani Nikula jani.nik...@intel.com

This reg is available in BSpec though the bit definitions have not been updated 
in the BSpec. Also, it was communicated by the BIOS team.

+
#define GEN6_PMISR  0x44020
#define GEN6_PMIMR  0x44024 /* rps_lock */
#define GEN6_PMIIR  0x44028
diff --git a/drivers/gpu/drm/i915/intel_bios.h 
b/drivers/gpu/drm/i915/intel_bios.h
index de01167..a6a8710 100644
--- a/drivers/gpu/drm/i915/intel_bios.h
+++ b/drivers/gpu/drm/i915/intel_bios.h
@@ -818,7 +818,8 @@ struct mipi_config {
#define DUAL_LINK_PIXEL_ALT 2
u16 dual_link:2;
u16 lane_cnt:2;
-   u16 rsvd3:12;
+   u16 pixel_overlap:3;
+   u16 rsvd3:9;

u16 rsvd4;

diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c
index dbe52e9..4e18abd 100644
--- a/drivers/gpu/drm/i915/intel_dsi.c
+++ b/drivers/gpu/drm/i915/intel_dsi.c
@@ -111,6 +111,14 @@ static void intel_dsi_port_enable(struct intel_encoder 
*encoder)
enum port port;
u32 temp;

+   if (intel_dsi-dual_link == DSI_DUAL_LINK_FRONT_BACK) {
+   temp = I915_READ(VLV_CHICKEN_3);
+   temp = ~PIXEL_OVERLAP_CNT_MASK |
+   intel_dsi-pixel_overlap 
+   PIXEL_OVERLAP_CNT_SHIFT;
+   I915_WRITE(VLV_CHICKEN_3, temp);
+   }
+
for_each_dsi_port(port, intel_dsi-ports) {
temp = I915_READ(MIPI_PORT_CTRL(port));

diff --git a/drivers/gpu/drm/i915/intel_dsi.h b/drivers/gpu/drm/i915/intel_dsi.h
index f2cc2fc..8fe2064 100644
--- a/drivers/gpu/drm/i915/intel_dsi.h
+++ b/drivers/gpu/drm/i915/intel_dsi.h
@@ -28,6 +28,11 @@
#include drm/drm_crtc.h
#include intel_drv.h

+/* Dual Link support */
+#define DSI_DUAL_LINK_NONE 0
+#define DSI_DUAL_LINK_FRONT_BACK   1
+#define DSI_DUAL_LINK_PIXEL_ALT2
+
struct intel_dsi_device {
unsigned int panel_id;
const char *name;
@@ -105,6 +110,7 @@ struct intel_dsi {

u8 escape_clk_div;
u8 dual_link;
+   u8 pixel_overlap;
u32 port_bits;
u32 bw_timer;
u32 dphy_reg;
diff --git a/drivers/gpu/drm/i915/intel_dsi_panel_vbt.c 
b/drivers/gpu/drm/i915/intel_dsi_panel_vbt.c
index f60146f..f8c2269 100644
--- a/drivers/gpu/drm/i915

Re: [Intel-gfx] [PATCH] drm/i915: Free resources correctly if we cannot map status page during ctx create

2014-11-17 Thread Siluvery, Arun

On 17/11/2014 15:54, Daniel, Thomas wrote:

-Original Message-



From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On Behalf



Of Arun Siluvery



Sent: Monday, November 17, 2014 3:48 PM



To: intel-gfx@lists.freedesktop.org



Subject: [Intel-gfx] [PATCH] drm/i915: Free resources correctly if we cannot



map status page during ctx create







We are not freeing memory allocated for ringbuf and ctx if we fail to map



status page so release all resources correctly.







Signed-off-by: Arun Siluvery 
arun.siluv...@linux.intel.commailto:arun.siluv...@linux.intel.com



---



  drivers/gpu/drm/i915/intel_lrc.c | 6 --



  1 file changed, 4 insertions(+), 2 deletions(-)







diff --git a/drivers/gpu/drm/i915/intel_lrc.c



b/drivers/gpu/drm/i915/intel_lrc.c



index f3efdbd..a84d24b 100644



--- a/drivers/gpu/drm/i915/intel_lrc.c



+++ b/drivers/gpu/drm/i915/intel_lrc.c



@@ -1777,8 +1777,10 @@ int intel_lr_context_deferred_create(struct



intel_context *ctx,



 ring-status_page.gfx_addr =



i915_gem_obj_ggtt_offset(ctx_obj);



 ring-status_page.page_addr =



 
kmap(sg_page(ctx_obj-pages-sgl));



-   if (ring-status_page.page_addr == NULL)



-   return -ENOMEM;



+  if (ring-status_page.page_addr == NULL) {



+  ret = -ENOMEM;



+  goto error;



+  }



 ring-status_page.obj = ctx_obj;



 }






Hi Arun,



I think your tree is out of date.  See this patch:

http://patchwork.freedesktop.org/patch/35828/



Cheers,

Thomas.



You are right, I don't have latest changes.
This patch can be ignored.

regards
Arun

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 1/2] drm/i915: Initialize bdw workarounds in logical ring mode too

2014-11-05 Thread Siluvery, Arun

On 04/11/2014 19:23, Rodrigo Vivi wrote:

These patches got listed to -collector but got a huge conflict. If it
is still relevant please rebase it.

This patch is currently not relevant, rebased version is already sent to 
the list for review.


https://patchwork.kernel.org/patch/5178771/

regards
Arun


Also my bikeshed is to findo better names to help on differentiate
them at least.

On Wed, Sep 24, 2014 at 5:02 AM, Michel Thierry
michel.thie...@intel.com wrote:

Following the legacy ring submission example, update the
ring-init_context() hook to support the execlist submission mode.

Workarounds are defined in bdw_emit_workarounds(), but the emit
now depends on the ring submission mode.

v2: Updated after Cleanup pre prod workarounds

For: VIZ-4092
Signed-off-by: Michel Thierry michel.thie...@intel.com
---
  drivers/gpu/drm/i915/i915_gem_context.c |  2 +-
  drivers/gpu/drm/i915/intel_lrc.c| 66 +
  drivers/gpu/drm/i915/intel_ringbuffer.c | 75 +++--
  drivers/gpu/drm/i915/intel_ringbuffer.h | 11 -
  4 files changed, 120 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index 7b73b36..d1ed21a 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -657,7 +657,7 @@ done:

 if (uninitialized) {
 if (ring-init_context) {
-   ret = ring-init_context(ring);
+   ret = ring-init_context(ring-buffer);
 if (ret)
 DRM_ERROR(ring init context: %d\n, ret);
 }
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d64d518..a0aa3f0 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1020,6 +1020,62 @@ int intel_logical_ring_begin(struct intel_ringbuffer 
*ringbuf, int num_dwords)
 return 0;
  }

+static inline void intel_logical_ring_emit_wa(struct intel_ringbuffer *ringbuf,
+  u32 addr, u32 value)
+{
+   struct intel_engine_cs *ring = ringbuf-ring;
+   struct drm_device *dev = ring-dev;
+   struct drm_i915_private *dev_priv = dev-dev_private;
+
+   if (WARN_ON(dev_priv-num_wa_regs = I915_MAX_WA_REGS))
+   return;
+
+   intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(1));
+   intel_logical_ring_emit(ringbuf, addr);
+   intel_logical_ring_emit(ringbuf, value);
+
+   dev_priv-intel_wa_regs[dev_priv-num_wa_regs].addr = addr;
+   dev_priv-intel_wa_regs[dev_priv-num_wa_regs].mask = value  0x;
+   /* value is updated with the status of remaining bits of this
+* register when it is read from debugfs file
+*/
+   dev_priv-intel_wa_regs[dev_priv-num_wa_regs].value = value;
+   dev_priv-num_wa_regs++;
+}
+
+static int bdw_init_logical_workarounds(struct intel_ringbuffer *ringbuf)
+{
+   int ret;
+   struct intel_engine_cs *ring = ringbuf-ring;
+   struct drm_device *dev = ring-dev;
+   struct drm_i915_private *dev_priv = dev-dev_private;
+
+   /*
+* workarounds applied in this fn are part of register state context,
+* they need to be re-initialized followed by gpu reset, suspend/resume,
+* module reload.
+*/
+   dev_priv-num_wa_regs = 0;
+   memset(dev_priv-intel_wa_regs, 0, sizeof(dev_priv-intel_wa_regs));
+
+   /*
+* update the number of dwords required based on the
+* actual number of workarounds applied
+*/
+   ret = intel_logical_ring_begin(ringbuf, BDW_WA_DWORDS_SIZE);
+   if (ret)
+   return ret;
+
+   bdw_emit_workarounds(ringbuf);
+
+   intel_logical_ring_advance(ringbuf);
+
+   DRM_DEBUG_DRIVER(Number of Workarounds applied: %d\n,
+dev_priv-num_wa_regs);
+
+   return 0;
+}
+
  static int gen8_init_common_ring(struct intel_engine_cs *ring)
  {
 struct drm_device *dev = ring-dev;
@@ -1315,6 +1371,10 @@ static int logical_render_ring_init(struct drm_device 
*dev)
 if (HAS_L3_DPF(dev))
 ring-irq_keep_mask |= GT_RENDER_L3_PARITY_ERROR_INTERRUPT;

+   if (IS_BROADWELL(dev))
+   ring-init_context = bdw_init_logical_workarounds;
+   ring-emit_wa = intel_logical_ring_emit_wa;
+
 ring-init = gen8_init_render_ring;
 ring-cleanup = intel_fini_pipe_control;
 ring-get_seqno = gen8_get_seqno;
@@ -1802,6 +1862,12 @@ int intel_lr_context_deferred_create(struct 
intel_context *ctx,
 }

 if (ring-id == RCS  !ctx-rcs_initialized) {
+   if (ring-init_context) {
+   ret = ring-init_context(ringbuf);
+   if (ret)
+   DRM_ERROR(ring init context: %d\n, ret);
+   }
+
 ret = 

Re: [Intel-gfx] [PATCH 0/3] drm/i915/chv: Add new WA and remove pre-production ones

2014-10-30 Thread Siluvery, Arun

On 28/10/2014 18:33, Arun Siluvery wrote:

The patches in this series adds two new workarounds for CHV and
removes pre-production ones.

Based on review comments from Ville, add/remove patches are split-up
which helps in reverting them if required.

The initial patch can be found at,
https://patchwork.kernel.org/patch/5178021/


Hi Ville,

Patches are split-up as you suggested.
Please let me know if further changes are required.

regards
Arun


Arun Siluvery (3):
   drm/i915/chv: Remove pre-production workarounds
   drm/i915/chv: Combine GEN8_ROW_CHICKEN w/a
   drm/i915/chv: Add new workarounds for chv

  drivers/gpu/drm/i915/i915_reg.h |  1 +
  drivers/gpu/drm/i915/intel_pm.c | 12 
  drivers/gpu/drm/i915/intel_ringbuffer.c | 22 +++---
  3 files changed, 12 insertions(+), 23 deletions(-)



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/2] drm/i915/chv: Add few more CHV workarounds

2014-10-28 Thread Siluvery, Arun

On 28/10/2014 12:23, Ville Syrjälä wrote:

On Tue, Oct 28, 2014 at 11:57:50AM +, Arun Siluvery wrote:

WaDisableInstructionShootdown:chv
WaForceEnableNonCoherent:chv
WaHdcDisableFetchWhenMasked:chv
WaDisableFenceDestinationToSLM:chv (pre-production)

s/WaDisableDopClockGating/WaDisableRowChickenDopClockGating, because another
CHV WA is defined with the same name in intel_pm.c for a different reg.

For: VIZ-4090
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/i915_reg.h |  2 ++
  drivers/gpu/drm/i915/intel_ringbuffer.c | 20 ++--
  2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 77fce96..840e5d9 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -5024,6 +5024,7 @@ enum punit_power_well {
  /* GEN8 chicken */
  #define HDC_CHICKEN0  0x7300
  #define  HDC_FORCE_NON_COHERENT   (14)
+#define  HDC_DONOT_FETCH_MEM_WHEN_MASKED   (111)
  #define  HDC_FENCE_DEST_SLM_DISABLE   (114)

  /* WaCatErrorRejectionIssue */
@@ -5941,6 +5942,7 @@ enum punit_power_well {
  #define   GEN9_DG_MIRROR_FIX_ENABLE   (15)

  #define GEN8_ROW_CHICKEN  0xe4f0
+#define   INSTRUCTION_SHOOTDOWN_DISABLE (19)
  #define   PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE   (18)
  #define   STALL_DOP_GATING_DISABLE(15)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index a8f72e8..2c07a02 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -788,14 +788,30 @@ static int chv_init_workarounds(struct intel_engine_cs 
*ring)
struct drm_i915_private *dev_priv = dev-dev_private;

/* WaDisablePartialInstShootdown:chv */
+   /* WaDisableInstructionShootdown:chv */
WA_SET_BIT_MASKED(GEN8_ROW_CHICKEN,
- PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE);
+ PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE |
+ (dev-pdev-revision  0x06 ?
+  INSTRUCTION_SHOOTDOWN_DISABLE : 0));


I think we should just drop the current early pre-prod workarounds, and
not add more of them.


ok I will drop this.
Is there any guideline on particular revision for bdw, chv below which 
we should drop that workaround?




/* WaDisableThreadStallDopClockGating:chv */
WA_SET_BIT_MASKED(GEN8_ROW_CHICKEN,
  STALL_DOP_GATING_DISABLE);

-   /* WaDisableDopClockGating:chv (pre-production hw) */
+   /* Use Force Non-Coherent whenever executing a 3D context. This is a
+* workaround for a possible hang in the unlikely event a TLB
+* invalidation occurs during a PSD flush.
+*/


We haven't generally documented the w/as in any great detail. Does it
help someone if we start doing that?


This was already documented for bdw hence I included it for chv also.

regards
Arun




+   /* WaForceEnableNonCoherent:chv */
+   /* WaHdcDisableFetchWhenMasked:chv */
+   /* WaDisableFenceDestinationToSLM:chv (pre-production) */
+   WA_SET_BIT_MASKED(HDC_CHICKEN0,
+ HDC_FORCE_NON_COHERENT |
+ HDC_DONOT_FETCH_MEM_WHEN_MASKED |
+ (dev-pdev-revision  0x06 ?
+  HDC_FENCE_DEST_SLM_DISABLE : 0));
+
+   /* WaDisableRowChickenDopClockGating:chv (pre-production hw) */
WA_SET_BIT_MASKED(GEN7_ROW_CHICKEN2,
  DOP_CLOCK_GATING_DISABLE);

--
2.1.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx




___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/chv: Add new WA and remove pre-production ones

2014-10-28 Thread Siluvery, Arun

On 28/10/2014 17:06, Ville Syrjälä wrote:

On Tue, Oct 28, 2014 at 03:48:24PM +, Arun Siluvery wrote:

+WaForceEnableNonCoherent:chv
+WaHdcDisableFetchWhenMasked:chv
-WaDisableDopClockGating:chv
-WaDisableSamplerPowerBypass:chv
-WaDisableGunitClockGating:chv
-WaDisableFfDopClockGating:chv
-WaDisableDopClockGating:chv

WaDisablePartialInstShootdown:chv and
WaDisableThreadStallDopClockGating:chv are related to the
same register so combine them.


Please split into at least two patches (one to add new w/as and another to
remove old ones). Otherwise reverting is a pita in case we find that one
of the dropped w/as was actually still needed.



I thought of doing that but then combined them as these are early 
pre-production ones and we may not need them in future but I agree 
splitting them helps in reverting them if required.




v2: Remove pre-production WA instead of restricting them
based on revision id (Ville)

For: VIZ-4090
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/i915_reg.h |  1 +
  drivers/gpu/drm/i915/intel_pm.c | 12 
  drivers/gpu/drm/i915/intel_ringbuffer.c | 22 +++---
  3 files changed, 12 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 77fce96..9d39700 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -5024,6 +5024,7 @@ enum punit_power_well {
  /* GEN8 chicken */
  #define HDC_CHICKEN0  0x7300
  #define  HDC_FORCE_NON_COHERENT   (14)
+#define  HDC_DONOT_FETCH_MEM_WHEN_MASKED   (111)
  #define  HDC_FENCE_DEST_SLM_DISABLE   (114)

  /* WaCatErrorRejectionIssue */
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 7a69eba..93db25f 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -5944,18 +5944,6 @@ static void cherryview_init_clock_gating(struct 
drm_device *dev)
/* WaDisableSDEUnitClockGating:chv */
I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) |
   GEN8_SDEUNIT_CLOCK_GATE_DISABLE);
-
-   /* WaDisableGunitClockGating:chv (pre-production hw) */
-   I915_WRITE(VLV_GUNIT_CLOCK_GATE, I915_READ(VLV_GUNIT_CLOCK_GATE) |
-  GINT_DIS);


OK


-
-   /* WaDisableFfDopClockGating:chv (pre-production hw) */
-   I915_WRITE(GEN6_RC_SLEEP_PSMI_CONTROL,
-  _MASKED_BIT_ENABLE(GEN8_FF_DOP_CLOCK_GATE_DISABLE));


OK


-
-   /* WaDisableDopClockGating:chv (pre-production hw) */
-   I915_WRITE(GEN6_UCGCTL1, I915_READ(GEN6_UCGCTL1) |
-  GEN6_EU_TCUNIT_CLOCK_GATE_DISABLE);


OK, I think. This was the weird w/a where it seemed hard to figure out
what it needed. Nothing in BSpec about needing this bit on chv.


  }

  static void g4x_init_clock_gating(struct drm_device *dev)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index a8f72e8..368b20a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -788,20 +788,20 @@ static int chv_init_workarounds(struct intel_engine_cs 
*ring)
struct drm_i915_private *dev_priv = dev-dev_private;

/* WaDisablePartialInstShootdown:chv */
-   WA_SET_BIT_MASKED(GEN8_ROW_CHICKEN,
- PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE);
-
/* WaDisableThreadStallDopClockGating:chv */
WA_SET_BIT_MASKED(GEN8_ROW_CHICKEN,
- STALL_DOP_GATING_DISABLE);
-
-   /* WaDisableDopClockGating:chv (pre-production hw) */
-   WA_SET_BIT_MASKED(GEN7_ROW_CHICKEN2,
- DOP_CLOCK_GATING_DISABLE);


OK, again the weird w/a but Bspec seems to agree at least.


+ PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE |
+ STALL_DOP_GATING_DISABLE);


Bspec says bit 5 is MBZ now, and yet the w/a database says it's
forever. And the hardware accepts 1 there so it's not like many other
MBZ bits that you can't set even if you try. Also Bspec has three
different definitions for this bit on gen8, all disagree with each other
and one definiton even manages to disagree with itself. And reading the
hsd stuff I'm not the only that has been confused by this, and yet I see
no conclusion there as to how this bit should be configured.

Oh well, I guess we can leave it set for now and maybe eventually
someone will figure out what we're supposed to do.

I am using w/a database as reference and it says forever, spec seems to 
disagree but probably not yet updated.


regards
Arun



-   /* WaDisableSamplerPowerBypass:chv (pre-production hw) */
-   WA_SET_BIT_MASKED(HALF_SLICE_CHICKEN3,
- GEN8_SAMPLER_POWER_BYPASS_DIS);


OK


+   /* Use Force Non-Coherent whenever executing a 3D context. This is a
+* workaround for a possible hang in the unlikely event a TLB
+* 

Re: [Intel-gfx] [PATCH] drm/i915: Emit even number of dwords when emitting LRIs

2014-10-23 Thread Siluvery, Arun

On 23/10/2014 14:41, Ville Syrjälä wrote:

On Thu, Oct 23, 2014 at 01:50:23PM +0100, Chris Wilson wrote:

On Thu, Oct 23, 2014 at 01:42:38PM +0100, Damien Lespiau wrote:

On Thu, Oct 23, 2014 at 02:21:02PM +0200, Daniel Vetter wrote:

On Wed, Oct 22, 2014 at 06:59:52PM +0100, Arun Siluvery wrote:

The number of DWords should be even when doing ring emits as
command sequences require QWord alignment.

v2: user LRI variant that can write multiple regs in one go (Damien).
We can simply insert one NOP at the end instead of one per register write.

Cc: Mika Kuoppala mika.kuopp...@intel.com
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 497b836..a8f72e8 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -680,15 +680,16 @@ static int intel_ring_workarounds_emit(struct 
intel_engine_cs *ring)
if (ret)
return ret;

-   ret = intel_ring_begin(ring, w-count * 3);
+   ret = intel_ring_begin(ring, (w-count * 2 + 2));
if (ret)
return ret;

+   intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(w-count));


Afaik there's a limit to the size of an MI_LRI. Where's the check for
that (probably with a WARN_ON for now to avoid unecessary complexity)?


I guess there's always the size of the length field, I don't see any
other indication. Note that I can find the documentation of the
multi-registers version of LRI either. So, well, we probably should
double check it does work.


It does work. The max is around 60 iirc (the max length of the
command).


The maximum length seems to be 0xff on gen6+ and 0x3f before that,
which would mean at most 128 or 32 registers.

Also the context image is full of these multi register LRIs. Based on a
quick glance the longest LRI in there is 0x5f on IVB, 0xcf on HSW, and
0xdf on BDW, which translate to 48, 104, and 108 registers per LRI. So
we know at least those must work or context restore would not work.
Before gen7 the context doesn't seem to resemble a batch, so I can't
tell anything about those platforms based on the context image.



w-count is already checked against max workarounds which is 16 now so 
we are well within the limit; I think additional check would be 
redundant here and it is unlikely to have more than 128 workarounds.


regards
Arun

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Add means to apply WA conditionally

2014-10-23 Thread Siluvery, Arun

On 23/10/2014 16:51, Daniel Vetter wrote:

On Thu, Oct 23, 2014 at 04:29:30PM +0100, Arun Siluvery wrote:

We would want to apply some of the workarounds based on a condition to a
particular platform or Gen but we may not know all possible controlling
parameters in advance hence allow to define open conditions; a WA makes
it to the list only if the condition is true.

With the appropriate conditions we can combine all of the workarounds
and apply them from a single place irrespective of platform instead of
having them in separate functions.

For: VIZ-4090
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com


Imo we should just pull the condition out into proper control flow. Hiding
it like that in the macro doesn't seem to buy us anything at all, but
obfuscates the code.


No we are not hiding the condition, I thought it would be easier to read 
it this way, e.g.,


WA_SET_BIT_MASKED_IF(IS_BDW_GT3(dev), WA_REG, WA_MASK);

do you prefer adding if(cond) to each WA?

regards
Arun


-Daniel


---
  drivers/gpu/drm/i915/intel_ringbuffer.c | 35 +
  1 file changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 497b836..0525a5d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -736,6 +736,41 @@ static int wa_add(struct drm_i915_private *dev_priv,

  #define WA_WRITE(addr, val) WA_REG(addr, val, 0x)

+#define WA_SET_BIT_MASKED_IF(cond, addr, mask) \
+   do {\
+   if (cond) { \
+   WA_SET_BIT_MASKED(addr, mask);  \
+   }   \
+   } while(0)
+
+#define WA_CLR_BIT_MASKED_IF(cond, addr, mask) \
+   do {\
+   if (cond) { \
+   WA_CLR_BIT_MASKED(addr, mask);  \
+   }   \
+   } while(0)
+
+#define WA_SET_BIT_IF(cond, addr, mask)\
+   do {\
+   if (cond) { \
+   WA_SET_BIT(addr, mask); \
+   }   \
+   } while(0)
+
+#define WA_CLR_BIT_IF(cond, addr, mask)\
+   do {\
+   if (cond) { \
+   WA_CLR_BIT(addr, mask); \
+   }   \
+   } while(0)
+
+#define WA_WRITE_IF(cond, addr, val)   \
+   do {\
+   if (cond) { \
+   WA_WRITE(addr, val);\
+   }   \
+   } while(0)
+
  static int bdw_init_workarounds(struct intel_engine_cs *ring)
  {
struct drm_device *dev = ring-dev;
--
2.1.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx




___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: add missing forcewake put on i915_wa_registers()

2014-10-22 Thread Siluvery, Arun

On 22/10/2014 08:35, Ville Syrjälä wrote:

On Tue, Oct 21, 2014 at 07:40:35PM +0200, Daniel Vetter wrote:

On Tue, Oct 21, 2014 at 02:58:08PM -0200, Paulo Zanoni wrote:

From: Paulo Zanoni paulo.r.zan...@intel.com

Otherwise, a simple cat to the debugfs file can make the machine use
much more power than needed, and prevent it from runtime suspending.

Related commit:

 commit 8452e1d173a16d9812422a2272c4ab0f0ba81057
 Author: Mika Kuoppala mika.kuopp...@linux.intel.com
 Date:   Tue Oct 7 17:21:26 2014 +0300
 drm/i915: Build workaround list in ring initialization

Cc: Mika Kuoppala mika.kuopp...@linux.intel.com
Cc: Arun Siluvery arun.siluv...@linux.intel.com
Testcase: igt/pm_rpm/debugfs-read
Signed-off-by: Paulo Zanoni paulo.r.zan...@intel.com


tbh I'm not even sure we want to do the manual forcewake get here -
I915_READ will do it for us, and this is a debug interface. So no one
should care about perf. Mika, is that right? If so I'd like to merge the
inverse patch which drops the fw_get.


Don't we still need the idle msg disable+poll CSPWRFSM trick here on
gen8? That also needs forcewake around it.



I had a chat with Mika on this yesterday and he seem to agree that 
forcewake is probably not required here. I couldn't send the patch 
yesterday but as per Ville's comments looks like we need forcewake here?


regards
Arun


-Daniel


---
  drivers/gpu/drm/i915/i915_debugfs.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 9600285..36a4baa 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2671,6 +2671,7 @@ static int i915_wa_registers(struct seq_file *m, void 
*unused)
   addr, value, mask, read, ok ? OK : FAIL);
}

+   gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
intel_runtime_pm_put(dev_priv);
mutex_unlock(dev-struct_mutex);

--
2.1.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx




___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 3/4] drm/i915: Build workaround list in ring initialization

2014-10-20 Thread Siluvery, Arun

On 07/10/2014 15:21, Mika Kuoppala wrote:

If we build the workaround list in ring initialization
and decouple it from the actual writing of values, we
gain the ability to decide where and how we want to apply
the values.

The advantage of this will become more clear when
we need to initialize workarounds on older gens where
it is not possible to write all the registers through ring
LRIs.

v2: rebase on newest bdw workarounds

Cc: Arun Siluvery arun.siluv...@linux.intel.com
Cc: Damien Lespiau damien.lesp...@intel.com
Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com
---
  drivers/gpu/drm/i915/i915_debugfs.c |  20 ++--
  drivers/gpu/drm/i915/i915_drv.h |  28 ++---
  drivers/gpu/drm/i915/intel_ringbuffer.c | 185 ++--
  3 files changed, 130 insertions(+), 103 deletions(-)


Hi Daniel,

Patches 3, 4 in this series are independent of the first two.
Could you please pull-in these patches?

regards
Arun



diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index da4036d..87482f8 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2655,18 +2655,20 @@ static int i915_wa_registers(struct seq_file *m, void 
*unused)

intel_runtime_pm_get(dev_priv);

-   seq_printf(m, Workarounds applied: %d\n, dev_priv-num_wa_regs);
-   for (i = 0; i  dev_priv-num_wa_regs; ++i) {
+   gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
+
+   seq_printf(m, Workarounds applied: %d\n, dev_priv-workarounds.count);
+   for (i = 0; i  dev_priv-workarounds.count; ++i) {
u32 addr, mask;

-   addr = dev_priv-intel_wa_regs[i].addr;
-   mask = dev_priv-intel_wa_regs[i].mask;
-   dev_priv-intel_wa_regs[i].value = I915_READ(addr) | mask;
-   if (dev_priv-intel_wa_regs[i].addr)
+   addr = dev_priv-workarounds.reg[i].addr;
+   mask = dev_priv-workarounds.reg[i].mask;
+   dev_priv-workarounds.reg[i].value = I915_READ(addr) | mask;
+   if (dev_priv-workarounds.reg[i].addr)
seq_printf(m, 0x%X: 0x%08X, mask: 0x%08X\n,
-  dev_priv-intel_wa_regs[i].addr,
-  dev_priv-intel_wa_regs[i].value,
-  dev_priv-intel_wa_regs[i].mask);
+  dev_priv-workarounds.reg[i].addr,
+  dev_priv-workarounds.reg[i].value,
+  dev_priv-workarounds.reg[i].mask);
}

intel_runtime_pm_put(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1e476b5..f7265bf 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1448,6 +1448,20 @@ struct i915_frontbuffer_tracking {
unsigned flip_bits;
  };

+struct i915_wa_reg {
+   u32 addr;
+   u32 value;
+   /* bitmask representing WA bits */
+   u32 mask;
+};
+
+#define I915_MAX_WA_REGS 16
+
+struct i915_workarounds {
+   struct i915_wa_reg reg[I915_MAX_WA_REGS];
+   u32 count;
+};
+
  struct drm_i915_private {
struct drm_device *dev;
struct kmem_cache *slab;
@@ -1590,19 +1604,7 @@ struct drm_i915_private {
struct intel_shared_dpll shared_dplls[I915_NUM_PLLS];
int dpio_phy_iosf_port[I915_NUM_PHYS_VLV];

-   /*
-* workarounds are currently applied at different places and
-* changes are being done to consolidate them so exact count is
-* not clear at this point, use a max value for now.
-*/
-#define I915_MAX_WA_REGS  16
-   struct {
-   u32 addr;
-   u32 value;
-   /* bitmask representing WA bits */
-   u32 mask;
-   } intel_wa_regs[I915_MAX_WA_REGS];
-   u32 num_wa_regs;
+   struct i915_workarounds workarounds;

/* Reclocking support */
bool render_reclock_avail;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 816a692..12a546f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -665,80 +665,107 @@ err:
return ret;
  }

-static inline void intel_ring_emit_wa(struct intel_engine_cs *ring,
-  u32 addr, u32 value)
+static int intel_ring_workarounds_emit(struct intel_engine_cs *ring)
  {
+   int ret, i;
struct drm_device *dev = ring-dev;
struct drm_i915_private *dev_priv = dev-dev_private;
+   struct i915_workarounds *w = dev_priv-workarounds;

-   if (WARN_ON(dev_priv-num_wa_regs = I915_MAX_WA_REGS))
-   return;
+   if (WARN_ON(w-count == 0))
+   return 0;

-   intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
-   intel_ring_emit(ring, addr);
-   intel_ring_emit(ring, value);
+   ring-gpu_caches_dirty = 

Re: [Intel-gfx] [PATCH 3/4] drm/i915: Build workaround list in ring initialization

2014-10-13 Thread Siluvery, Arun

On 07/10/2014 15:21, Mika Kuoppala wrote:

If we build the workaround list in ring initialization
and decouple it from the actual writing of values, we
gain the ability to decide where and how we want to apply
the values.

The advantage of this will become more clear when
we need to initialize workarounds on older gens where
it is not possible to write all the registers through ring
LRIs.

v2: rebase on newest bdw workarounds

Cc: Arun Siluvery arun.siluv...@linux.intel.com
Cc: Damien Lespiau damien.lesp...@intel.com
Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com
---
  drivers/gpu/drm/i915/i915_debugfs.c |  20 ++--
  drivers/gpu/drm/i915/i915_drv.h |  28 ++---
  drivers/gpu/drm/i915/intel_ringbuffer.c | 185 ++--
  3 files changed, 130 insertions(+), 103 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index da4036d..87482f8 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2655,18 +2655,20 @@ static int i915_wa_registers(struct seq_file *m, void 
*unused)

intel_runtime_pm_get(dev_priv);

-   seq_printf(m, Workarounds applied: %d\n, dev_priv-num_wa_regs);
-   for (i = 0; i  dev_priv-num_wa_regs; ++i) {
+   gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
+
+   seq_printf(m, Workarounds applied: %d\n, dev_priv-workarounds.count);
+   for (i = 0; i  dev_priv-workarounds.count; ++i) {
u32 addr, mask;

-   addr = dev_priv-intel_wa_regs[i].addr;
-   mask = dev_priv-intel_wa_regs[i].mask;
-   dev_priv-intel_wa_regs[i].value = I915_READ(addr) | mask;
-   if (dev_priv-intel_wa_regs[i].addr)
+   addr = dev_priv-workarounds.reg[i].addr;
+   mask = dev_priv-workarounds.reg[i].mask;
+   dev_priv-workarounds.reg[i].value = I915_READ(addr) | mask;
+   if (dev_priv-workarounds.reg[i].addr)
seq_printf(m, 0x%X: 0x%08X, mask: 0x%08X\n,
-  dev_priv-intel_wa_regs[i].addr,
-  dev_priv-intel_wa_regs[i].value,
-  dev_priv-intel_wa_regs[i].mask);
+  dev_priv-workarounds.reg[i].addr,
+  dev_priv-workarounds.reg[i].value,
+  dev_priv-workarounds.reg[i].mask);
}

intel_runtime_pm_put(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1e476b5..f7265bf 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1448,6 +1448,20 @@ struct i915_frontbuffer_tracking {
unsigned flip_bits;
  };

+struct i915_wa_reg {
+   u32 addr;
+   u32 value;
+   /* bitmask representing WA bits */
+   u32 mask;
+};
+
+#define I915_MAX_WA_REGS 16
+
+struct i915_workarounds {
+   struct i915_wa_reg reg[I915_MAX_WA_REGS];
+   u32 count;
+};
+
  struct drm_i915_private {
struct drm_device *dev;
struct kmem_cache *slab;
@@ -1590,19 +1604,7 @@ struct drm_i915_private {
struct intel_shared_dpll shared_dplls[I915_NUM_PLLS];
int dpio_phy_iosf_port[I915_NUM_PHYS_VLV];

-   /*
-* workarounds are currently applied at different places and
-* changes are being done to consolidate them so exact count is
-* not clear at this point, use a max value for now.
-*/
-#define I915_MAX_WA_REGS  16
-   struct {
-   u32 addr;
-   u32 value;
-   /* bitmask representing WA bits */
-   u32 mask;
-   } intel_wa_regs[I915_MAX_WA_REGS];
-   u32 num_wa_regs;
+   struct i915_workarounds workarounds;

/* Reclocking support */
bool render_reclock_avail;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 816a692..12a546f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -665,80 +665,107 @@ err:
return ret;
  }

-static inline void intel_ring_emit_wa(struct intel_engine_cs *ring,
-  u32 addr, u32 value)
+static int intel_ring_workarounds_emit(struct intel_engine_cs *ring)
  {
+   int ret, i;
struct drm_device *dev = ring-dev;
struct drm_i915_private *dev_priv = dev-dev_private;
+   struct i915_workarounds *w = dev_priv-workarounds;

-   if (WARN_ON(dev_priv-num_wa_regs = I915_MAX_WA_REGS))
-   return;
+   if (WARN_ON(w-count == 0))
+   return 0;

-   intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
-   intel_ring_emit(ring, addr);
-   intel_ring_emit(ring, value);
+   ring-gpu_caches_dirty = true;
+   ret = intel_ring_flush_all_caches(ring);
+   if (ret)
+   return ret;

-   

Re: [Intel-gfx] [PATCH 4/4] drm/i915: Check workaround status on dfs read time

2014-10-13 Thread Siluvery, Arun

On 07/10/2014 15:21, Mika Kuoppala wrote:

As the workaround list has the value as initialization time
constant, we can do the simple checking on the go without
negleting igt.

Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com
---
  drivers/gpu/drm/i915/i915_debugfs.c | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 87482f8..dbd5dc5 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2659,16 +2659,16 @@ static int i915_wa_registers(struct seq_file *m, void 
*unused)

seq_printf(m, Workarounds applied: %d\n, dev_priv-workarounds.count);
for (i = 0; i  dev_priv-workarounds.count; ++i) {
-   u32 addr, mask;
+   u32 addr, mask, value, read;
+   bool ok;

addr = dev_priv-workarounds.reg[i].addr;
mask = dev_priv-workarounds.reg[i].mask;
-   dev_priv-workarounds.reg[i].value = I915_READ(addr) | mask;
-   if (dev_priv-workarounds.reg[i].addr)
-   seq_printf(m, 0x%X: 0x%08X, mask: 0x%08X\n,
-  dev_priv-workarounds.reg[i].addr,
-  dev_priv-workarounds.reg[i].value,
-  dev_priv-workarounds.reg[i].mask);
+   value = dev_priv-workarounds.reg[i].value;
+   read = I915_READ(addr);
+   ok = (value  mask) == (read  mask);
+   seq_printf(m, 0x%X: 0x%08X, mask: 0x%08X, read: 0x%08x, status: 
%s\n,
+  addr, value, mask, read, ok ? OK : FAIL);
}

intel_runtime_pm_put(dev_priv);



Reviewed-by: Arun Siluvery arun.siluv...@linux.intel.com

regards
Arun

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH igt] gem_workarounds: intel_wa_registers is now prefixed with i915

2014-09-03 Thread Siluvery, Arun

On 30/08/2014 22:46, Damien Lespiau wrote:

Signed-off-by: Damien Lespiau damien.lesp...@intel.com
---
  tests/gem_workarounds.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/gem_workarounds.c b/tests/gem_workarounds.c
index 6826562..32156d2 100644
--- a/tests/gem_workarounds.c
+++ b/tests/gem_workarounds.c
@@ -184,7 +184,7 @@ igt_main
devid = intel_get_drm_devid(drm_fd);
batch = intel_batchbuffer_alloc(bufmgr, devid);

-   fd = igt_debugfs_open(intel_wa_registers, O_RDONLY);
+   fd = igt_debugfs_open(i915_wa_registers, O_RDONLY);
igt_assert(fd = 0);

file = fdopen(fd, r);


Reviewed-by: Arun Siluvery arun.siluv...@linux.intel.com

regards
Arun

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 0/5] A few fixes on top of the wa_regs patches

2014-09-03 Thread Siluvery, Arun

On 30/08/2014 16:50, Damien Lespiau wrote:

Hi Arun,

I've compiled a few patches that I think solve some small-ish issues around
your wa_regs series. Could you please have a look at them and comment/give your
r-b tag if you judge appropriate?

On top of those patches, I'd love some comments on the issues I raised in the
other mail and possible follow up patches to address them.

   http://lists.freedesktop.org/archives/intel-gfx/2014-August/051514.html

At some point, we'll also need a bit of coherence with what Mika has been doing:

   http://lists.freedesktop.org/archives/intel-gfx/2014-August/05.html


Hi Daniel,

Since the new workaround design/implementation takes time could you 
please pull the patches in this series to fix the issues and also the 
patch to change filename in igt.


for the series,
Reviewed-by: Arun Siluvery arun.siluv...@linux.intel.com

regards
Arun

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 0/5] A few fixes on top of the wa_regs patches

2014-09-01 Thread Siluvery, Arun

On 01/09/2014 10:08, Daniel Vetter wrote:

On Sun, Aug 31, 2014 at 08:32:55PM +0100, Siluvery, Arun wrote:

On 30/08/2014 16:50, Damien Lespiau wrote:

Hi Arun,

I've compiled a few patches that I think solve some small-ish issues around
your wa_regs series. Could you please have a look at them and comment/give your
r-b tag if you judge appropriate?

On top of those patches, I'd love some comments on the issues I raised in the
other mail and possible follow up patches to address them.

   http://lists.freedesktop.org/archives/intel-gfx/2014-August/051514.html


Hi Damien,

I really appreciate you taking time to not just give review comments but
also sending patches to fix those issues.

Chris suggested a way of emitting all LRIs using a simple function and I
really wanted to rework everything based on that suggestion.

The LRIs are now organized in an array as opposed to sending them
individually also debugfs patch can make use of it. I have removed the
temporary array included in driver private structure.
I think now it looks clean and we can easily add new w/a with minimal
changes.

Since all of the patches are modified I think it is better to squash them
with the merged ones rather than updating them with new patches so I have
folded your patches during rework and will send them after testing, please
review them and give your comments.


Please don't squash fixup patches when I've merged your patch already -
usually I only drop patches when they're terminally broken, so if you send
me a new version I have to fiddle things to make it all apply. But
squashing in a fixup patch is simpler. And imo also easier to review.


In this case all of the code is newly added so most of it should apply 
cleanly but if the preference is to not squash them I can send fix-up 
patches accordingly.


regards
Arun



And once we deal in fixup patches it's ok to have a bunch of them imo,
too.
-Daniel



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/5] drm/i915: Rename intel_wa_registers with a i915_ prefix

2014-09-01 Thread Siluvery, Arun

On 30/08/2014 16:50, Damien Lespiau wrote:

Those debugfs files are prefixed by i915, the name of the kernel module,
presumably to make the difference with files exposed by core DRM.

Also, add a ',' at the end of the last entry. This is to ease the
conflict resolution when rebasing internal patches that add a member at
the end of the array. Without it, wiggle can't do its job as we need to
modify an existing line (appending the ',').

Signed-off-by: Damien Lespiau damien.lesp...@intel.com
---
  drivers/gpu/drm/i915/i915_debugfs.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 1467cc1..fc3d582a 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2628,7 +2628,7 @@ static int i915_shared_dplls_info(struct seq_file *m, 
void *unused)
return 0;
  }

-static int intel_wa_registers(struct seq_file *m, void *unused)
+static int i915_wa_registers(struct seq_file *m, void *unused)
  {
int i;
int ret;
@@ -4198,7 +4198,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
{i915_semaphore_status, i915_semaphore_status, 0},
{i915_shared_dplls_info, i915_shared_dplls_info, 0},
{i915_dp_mst_info, i915_dp_mst_info, 0},
-   {intel_wa_registers, intel_wa_registers, 0}
+   {i915_wa_registers, i915_wa_registers, 0},
  };
  #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list)



Reviewed-by: Arun Siluvery arun.siluv...@linux.intel.com
This is only for this patch, remaining patches are not required in the 
rework.


regards
Arun

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 5/5] drm/i915: Don't restrict i915_wa_registers to BDW

2014-09-01 Thread Siluvery, Arun

On 30/08/2014 16:51, Damien Lespiau wrote:

We have CHV code that already makes the test obsolete. Besides, when
num_wa_regs is 0 (platforms not gathering that W/A data), we expose
something sensible already.

Signed-off-by: Damien Lespiau damien.lesp...@intel.com
---
  drivers/gpu/drm/i915/i915_debugfs.c | 5 -
  1 file changed, 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index fc3d582a..cd4f045 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2636,11 +2636,6 @@ static int i915_wa_registers(struct seq_file *m, void 
*unused)
struct drm_device *dev = node-minor-dev;
struct drm_i915_private *dev_priv = dev-dev_private;

-   if (!IS_BROADWELL(dev)) {
-   DRM_DEBUG_DRIVER(Workaround table not available !!\n);
-   return -EINVAL;
-   }
-
ret = mutex_lock_interruptible(dev-struct_mutex);
if (ret)
return ret;


This can also be taken, so patches 1 and 5 in this series.
Reviewed-by: Arun Siluvery arun.siluv...@linux.intel.com

regards
Arun

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/4] drm/i915: Rework workaround data exporting to debugfs

2014-09-01 Thread Siluvery, Arun

On 01/09/2014 15:06, Damien Lespiau wrote:

On Mon, Sep 01, 2014 at 02:28:53PM +0100, Arun Siluvery wrote:

Now w/a are organized in an array so we know exactly how many of them
are applied; use the same array while exporting data to debugfs and
remove the temporary array we currently have in driver priv structure.

Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/i915_debugfs.c | 41 +++--
  drivers/gpu/drm/i915/i915_drv.h | 14 ---
  drivers/gpu/drm/i915/intel_ringbuffer.c | 15 
  drivers/gpu/drm/i915/intel_ringbuffer.h |  8 +++
  4 files changed, 52 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 2727bda..bab0408 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2465,6 +2465,14 @@ static int i915_wa_registers(struct seq_file *m, void 
*unused)
struct drm_info_node *node = (struct drm_info_node *) m-private;
struct drm_device *dev = node-minor-dev;
struct drm_i915_private *dev_priv = dev-dev_private;
+   struct intel_ring_context_rodata ro_data;
+
+   ret = ring_context_rodata(dev, ro_data);
+   if (ret) {
+   seq_printf(m, Workarounds applied: 0\n);


seq_puts()


+   DRM_DEBUG_DRIVER(Workaround table not available !!\n);
+   return -EINVAL;
+   }

ret = mutex_lock_interruptible(dev-struct_mutex);
if (ret)
@@ -2472,18 +2480,27 @@ static int i915_wa_registers(struct seq_file *m, void 
*unused)

intel_runtime_pm_get(dev_priv);

-   seq_printf(m, Workarounds applied: %d\n, dev_priv-num_wa_regs);
-   for (i = 0; i  dev_priv-num_wa_regs; ++i) {
-   u32 addr, mask;
-
-   addr = dev_priv-intel_wa_regs[i].addr;
-   mask = dev_priv-intel_wa_regs[i].mask;
-   dev_priv-intel_wa_regs[i].value = I915_READ(addr) | mask;
-   if (dev_priv-intel_wa_regs[i].addr)
-   seq_printf(m, 0x%X: 0x%08X, mask: 0x%08X\n,
-  dev_priv-intel_wa_regs[i].addr,
-  dev_priv-intel_wa_regs[i].value,
-  dev_priv-intel_wa_regs[i].mask);
+   seq_printf(m, Workarounds applied: %d\n, ro_data.num_items/2);
+   for (i = 0; i  ro_data.num_items; i += 2) {
+   u32 addr, mask, value;
+
+   addr = ro_data.init_context[i];
+   /*
+* Most of workarounds are  masked registers;
+* to set a bit in lower 16-bits we set a mask bit in
+* upper 16-bits so we can take either of them as mask but
+* it doesn't work if the w/a is about clearing a bit so
+* use upper 16-bits to cover both cases.
+*/
+   mask = ro_data.init_context[i+1]  16;


Most doesn't seem good here. Either it's all and we're happy, or we
need a generic way to describe the W/A (masked register or not). value +
mask is generic enough to code for both cases.


It seems some of them could be unmasked registers.
We can use 'mask' itself to determine whether it is a masked/unmasked 
register. mask == 0 if it is an unmasked register.

+
+   /*
+* value represents the status of other bits in the
+* register besides w/a bits
+*/
+   value  = I915_READ(addr) | mask;
+   seq_printf(m, 0x%X: 0x%08X, mask: 0x%08X\n,
+  addr, value, mask);
}


I still don't get it. 'value' is supposed to be the reference value for
the W/A, but you're or'ing the mask here, so you treat the mask as if it
were the reference value. This won't work if the W/A is about setting
multi-bits fields or about clearing a bit.

The comment is still not clear enough. You're saying other bits besides
the w/a bits, but or'ing the mask doesn't do that.

Why do we care about the other bits in the reference value? they don't
matter. Why use something else than (ro_data.init_context[i+1]  0x)
for the value here (as long we're talking about masked registers)?

I have always considered value as the register value (remaining bits of 
the register and w/a bits) and now I see your point.
Yes lower 16-bits can be used as reference value, depending on whether 
it is a masked/unmasked we can use/not use the mask in conjunction with 
value in the test.


regards
Arun


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 0/5] A few fixes on top of the wa_regs patches

2014-08-31 Thread Siluvery, Arun

On 30/08/2014 16:50, Damien Lespiau wrote:

Hi Arun,

I've compiled a few patches that I think solve some small-ish issues around
your wa_regs series. Could you please have a look at them and comment/give your
r-b tag if you judge appropriate?

On top of those patches, I'd love some comments on the issues I raised in the
other mail and possible follow up patches to address them.

   http://lists.freedesktop.org/archives/intel-gfx/2014-August/051514.html


Hi Damien,

I really appreciate you taking time to not just give review comments but 
also sending patches to fix those issues.


Chris suggested a way of emitting all LRIs using a simple function and I 
really wanted to rework everything based on that suggestion.


The LRIs are now organized in an array as opposed to sending them 
individually also debugfs patch can make use of it. I have removed the 
temporary array included in driver private structure.
I think now it looks clean and we can easily add new w/a with minimal 
changes.


Since all of the patches are modified I think it is better to squash 
them with the merged ones rather than updating them with new patches so 
I have folded your patches during rework and will send them after 
testing, please review them and give your comments.


regards
Arun



At some point, we'll also need a bit of coherence with what Mika has been doing:

   http://lists.freedesktop.org/archives/intel-gfx/2014-August/05.html



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/2] drm/i915/bdw: Export workaround data to debugfs

2014-08-31 Thread Siluvery, Arun

On 30/08/2014 16:10, Damien Lespiau wrote:

On Tue, Aug 26, 2014 at 02:44:51PM +0100, Arun Siluvery wrote:

The workarounds that are applied are exported to a debugfs file;
this is used to verify their state after the test case (reset or
suspend/resume etc). This patch is only required to support i-g-t.


I'm really, really confused. Please bear with me.
I have reworked all the patches hopefully things will be more clear this 
time :)




1. We only deal with masked registers AFAICS. Those registers have the
high 16 bits masking the writes.

2. The values given to intel_ring_emit_wa() are the actual values we're going
to write in the register, so they include those mask bits. say:

 intel_ring_emit_wa(ring, GEN7_ROW_CHICKEN2,
_MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE));

3. We then record in intel_wa_regs the reg address and two fields named mask
and value.

3. a) mask

   intel_wa_regs[dev_priv-num_wa_regs].mask = value  0x;

You're selecting the low 16bits and put it in mask. But the masked bits are the
upper 16bits? This may work when the W/A is about setting bits, but we have a
bug if we ever have a W/A that is about clearing a bit. It would seem better to
me to grab the upper bits which are, after all, the bitmask we're interested in.


it is a valid issue, changed to use upper 16-bits as mask.


3. b) value

/* value is updated with the status of remaining bits of this
 * register when it is read from debugfs file
 */
   dev_priv-intel_wa_regs[dev_priv-num_wa_regs].value = value;

I don't understand what the comment explains. The *why* we need to do that is
missing and, frankly, having to update the reference values we capture at
intel_ring_emit_wa() time sounds like a bug to me.

I also take a note that, at this, point, intel_wa_regs.value contains both the
value and the mask. Weird.


I agree the why part is missing.
The idea is value represents the status of other bits in this register 
besides w/a bit; this is actually redundant here, I guess I added 
because I wanted to initialize all members.

This change is not applicable in the new patches.


4. Time to expose that intel_wa_regs array to user space

4. a) mask

mask = dev_priv-intel_wa_regs[i].mask

Straigthforward enough, except that these are still the lower 16 bits, so the
value really.

4. b) value

   dev_priv-intel_wa_regs[i].value = I915_READ(addr) | mask;

Hum? This really started my journey to dig futher. So we:

   - override the reference value from intel_ring_emit_wa() with whatever we
 have in the register at that moment

   - Or it with a mask that's not really a mask (but the reference value)

5. igt test

   So you grab those mask and value fields from the debugs file and read the
register through mapped MMIO. and then

status = (current_wa[i].value  current_wa[i].mask) !=
 (wa_regs[i].value  wa_regs[i].mask);

So that's where I'm starting to put things back together and understand what
the intention is. I still think that's not quite right, especially how we get
the mask and why we read back the register in the debugfs file.

We read the register value after the test case (eg reset) and compare it 
with a known value that is exported to debugfs file.


regards
Arun


Or am I just missing something? In any case, having to spend that much time
trying to understand what's going on is a maintainability problem, we need code
that a least looks straightforward.

HTH,




___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] libdrm_intel: Rework BO allocs to avoid rounding up to bucket size

2014-08-29 Thread Siluvery, Arun

On 29/08/2014 11:16, Chris Wilson wrote:

On Fri, Aug 29, 2014 at 11:02:01AM +0100, Arun Siluvery wrote:

From: Garry Lancaster garry.lancas...@intel.com

libdrm includes a scheme where freed buffer objects (BOs)
are held in a cache. This allows incoming allocation requests to be
serviced by re-using an old BO, instead of requiring a new
object to be allocated. This is a performance enhancement.
The cache is divided into buckets. Each bucket holds unused
BOs of a pre-determined size. When a BO allocation request is seen,
the bucket for BOs of this size or larger is selected. Any BO
currently in the bucket will be re-used for the allocation. If the
bucket was empty, a new BO is created. However, the BO is created
with the size determined by the selected bucket (i.e. the size is
rounded up to the bucket size), rather than being created with the
originally requested size. This is so that when the BO is freed,
it can be released into the bucket and re-used by any other allocation
which selects the same bucket.

Depending upon the size of the allocation, this rounding up can
result in a significant wastage of memory when allocating a BO. For
example, a BO request just over 132K allocated during GLES context
creation was rounded up to the next bucket size of 160K. Such wastage
can be critical on devices with low memory.

This commit reworks the BO allocation code. On a BO allocation request,
if the selected bucket contains any BOs, each of them is checked to
see if any is large enough to fulfill the allocation request. If not,
a new BO is created, but (due to the new check) it is no longer
necessary to round up its size to match the size determined by the
selected bucket.

So, previously, buckets contained BOs that were all the same size. But now
the BOs in a bucket can be different sizes: in the range from the size of the
next smaller, nominal, bucket size to the current, nominal, bucket size.

On a 1GB system, the following reductions in BO memory usage were seen:

BaseMark X 1.0:324.4MB - 306.0MB (-18.4MB;  5.7% saving)
BaseMark X 1.1 Medium Quality: 206.9MB - 201.2MB (- 5.7MB;  2.8% saving)
GFXBench 3.0 TRex: 216.6MB - 200.0MB (-16.6MB;  8.3% saving)
GFXBench 3.0 Manhattan:281.4MB - 246.8MB (-34.6MB; 12.3% saving)

No performance change was seen on BaseMarkX. GFXBench 3.0 showed small
performance increases (~0.5fps on Manhattan, ~1-2fps on TRex) which may be
due to reduced activity of the OOM killer.


The principle for rounding up was to increase the cache hit rate and
thereby reduce allocations. Might be interesting to know whether the
number of bo allocated also changes. If not, the argument is that the
working set is pretty stable and has a natural set of sizes which it
reuses. A counter example might then be uxa, glamor, compositors which
off-the-top-of-my-head would have more variable object sizes.

Reducing the impact of thrashing should itself be measurable, and a
useful statistic to track.

As a corollary to exact allocations, you can then reduce the number of
buckets again (the number was increased to allow finer-grained
allocations). Again, it is hard to judge whether handing back larger
objects will lead to memory wastage. So yet another statistic to track
is requested versus allocated memory sizes.


Reducing number of buckets would lead to more wastage of memory right?

The current bucket sizes are,
Bucket[0]: 4K
Bucket[1]: 8K
Bucket[2]: 12K
Bucket[3]: 16K
Bucket[4]: 20K
Bucket[5]: 24K
Bucket[6]: 28K
Bucket[7]: 32K
Bucket[8]: 40K
Bucket[9]: 48K
Bucket[10]: 56K
Bucket[11]: 64K
Bucket[12]: 80K
Bucket[13]: 96K
Bucket[14]: 112K
Bucket[15]: 128K
Bucket[16]: 160K
Bucket[17]: 192K
Bucket[18]: 224K
Bucket[19]: 256K
...
...

If there are more objects with size 132K we would end up allocating 
160K. We can track requested vs allocated but that depends on the 
application and usage, what would be the best measure to track this? I 
mean we measure over a given time or any other criteria?



Also it is important to state what type of system you are measuring the
impact of allocations for -- the behaviour of a cache miss is
dramatically different between LLC and non-LLC systems.


The current data is from a non-LLC system.

regards
Arun


-Chris



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/2] drm/i915/bdw: Export workaround data to debugfs

2014-08-27 Thread Siluvery, Arun

On 27/08/2014 16:44, Daniel Vetter wrote:

On Tue, Aug 26, 2014 at 02:44:51PM +0100, Arun Siluvery wrote:

The workarounds that are applied are exported to a debugfs file;
this is used to verify their state after the test case (reset or
suspend/resume etc). This patch is only required to support i-g-t.

Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/i915_debugfs.c | 40 +
  drivers/gpu/drm/i915/i915_drv.h | 14 
  drivers/gpu/drm/i915/intel_ringbuffer.c | 23 +++
  3 files changed, 77 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index d42db6b..f0d63f6 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2451,20 +2451,59 @@ static int i915_shared_dplls_info(struct seq_file *m, 
void *unused)
seq_printf(m,  dpll_md: 0x%08x\n, pll-hw_state.dpll_md);
seq_printf(m,  fp0: 0x%08x\n, pll-hw_state.fp0);
seq_printf(m,  fp1: 0x%08x\n, pll-hw_state.fp1);
seq_printf(m,  wrpll:   0x%08x\n, pll-hw_state.wrpll);
}
drm_modeset_unlock_all(dev);

return 0;
  }

+static int intel_wa_registers(struct seq_file *m, void *unused)
+{
+   int i;
+   int ret;
+   struct drm_info_node *node = (struct drm_info_node *) m-private;
+   struct drm_device *dev = node-minor-dev;
+   struct drm_i915_private *dev_priv = dev-dev_private;
+
+   if (!IS_BROADWELL(dev)) {
+   DRM_DEBUG_DRIVER(Workaround table not available !!\n);
+   return -EINVAL;
+   }
+
+   ret = mutex_lock_interruptible(dev-struct_mutex);
+   if (ret)
+   return ret;
+
+   intel_runtime_pm_get(dev_priv);
+
+   seq_printf(m, Workarounds applied: %d\n, dev_priv-num_wa_regs);
+   for (i = 0; i  dev_priv-num_wa_regs; ++i) {
+   u32 addr, mask;
+
+   addr = dev_priv-intel_wa_regs[i].addr;
+   mask = dev_priv-intel_wa_regs[i].mask;
+   dev_priv-intel_wa_regs[i].value = I915_READ(addr) | mask;
+   if (dev_priv-intel_wa_regs[i].addr)
+   seq_printf(m, 0x%X: 0x%08X, mask: 0x%08X\n,
+  dev_priv-intel_wa_regs[i].addr,
+  dev_priv-intel_wa_regs[i].value,
+  dev_priv-intel_wa_regs[i].mask);
+   }
+
+   intel_runtime_pm_put(dev_priv);
+   mutex_unlock(dev-struct_mutex);
+
+   return 0;
+}
+
  struct pipe_crc_info {
const char *name;
struct drm_device *dev;
enum pipe pipe;
  };

  static int i915_dp_mst_info(struct seq_file *m, void *unused)
  {
struct drm_info_node *node = (struct drm_info_node *) m-private;
struct drm_device *dev = node-minor-dev;
@@ -3980,20 +4019,21 @@ static const struct drm_info_list i915_debugfs_list[] = 
{
{i915_llc, i915_llc, 0},
{i915_edp_psr_status, i915_edp_psr_status, 0},
{i915_sink_crc_eDP1, i915_sink_crc, 0},
{i915_energy_uJ, i915_energy_uJ, 0},
{i915_pc8_status, i915_pc8_status, 0},
{i915_power_domain_info, i915_power_domain_info, 0},
{i915_display_info, i915_display_info, 0},
{i915_semaphore_status, i915_semaphore_status, 0},
{i915_shared_dplls_info, i915_shared_dplls_info, 0},
{i915_dp_mst_info, i915_dp_mst_info, 0},
+   {intel_wa_registers, intel_wa_registers, 0}
  };
  #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list)

  static const struct i915_debugfs_files {
const char *name;
const struct file_operations *fops;
  } i915_debugfs_files[] = {
{i915_wedged, i915_wedged_fops},
{i915_max_freq, i915_max_freq_fops},
{i915_min_freq, i915_min_freq_fops},
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index bcf79f0..49b7be7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1546,20 +1546,34 @@ struct drm_i915_private {
wait_queue_head_t pending_flip_queue;

  #ifdef CONFIG_DEBUG_FS
struct intel_pipe_crc pipe_crc[I915_MAX_PIPES];
  #endif

int num_shared_dpll;
struct intel_shared_dpll shared_dplls[I915_NUM_PLLS];
int dpio_phy_iosf_port[I915_NUM_PHYS_VLV];

+   /*
+* workarounds are currently applied at different places and
+* changes are being done to consolidate them so exact count is
+* not clear at this point, use a max value for now.
+*/
+#define I915_MAX_WA_REGS  16
+   struct {
+   u32 addr;
+   u32 value;
+   /* bitmask representing WA bits */
+   u32 mask;
+   } intel_wa_regs[I915_MAX_WA_REGS];
+   u32 num_wa_regs;
+
/* Reclocking support */
bool render_reclock_avail;
bool 

Re: [Intel-gfx] [PATCH] igt/gem_workarounds: igt to test workaround registers

2014-08-27 Thread Siluvery, Arun

On 27/08/2014 16:59, Chris Wilson wrote:

On Wed, Aug 27, 2014 at 05:50:16PM +0200, Daniel Vetter wrote:

On Tue, Aug 26, 2014 at 02:50:28PM +0100, Arun Siluvery wrote:

Some of the workarounds are lost followed by a gpu reset, suspend/resume;
this patch adds a test which compares register state before and after
the test scenario.

This test currently verifies only bdw workarounds.


The existing tool didn't need kernel help (other than forcewake). Why
was that not used as a starting point?
-Chris


Do you mean intel_reg_checker()?
This new test uses kernel help to get the initial state of workarounds 
which are exported to debugfs. We could add this known state to the test 
itself but Daniel is not ok with that. debugfs part is only added to 
support the test.


regards
Arun


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] igt/gem_workarounds: igt to test workaround registers

2014-08-27 Thread Siluvery, Arun

On 27/08/2014 17:23, Chris Wilson wrote:

On Wed, Aug 27, 2014 at 05:17:11PM +0100, Siluvery, Arun wrote:

On 27/08/2014 16:59, Chris Wilson wrote:

On Wed, Aug 27, 2014 at 05:50:16PM +0200, Daniel Vetter wrote:

On Tue, Aug 26, 2014 at 02:50:28PM +0100, Arun Siluvery wrote:

Some of the workarounds are lost followed by a gpu reset, suspend/resume;
this patch adds a test which compares register state before and after
the test scenario.

This test currently verifies only bdw workarounds.


The existing tool didn't need kernel help (other than forcewake). Why
was that not used as a starting point?
-Chris


Do you mean intel_reg_checker()?
This new test uses kernel help to get the initial state of
workarounds which are exported to debugfs. We could add this known
state to the test itself but Daniel is not ok with that. debugfs
part is only added to support the test.


I disagree vehemently with Daniel here then. The kernel lies.
-Chris

Just to clarify, he was not ok because the list we maintain in the test 
can get out of sync with the workarounds we apply in the driver which 
can be avoided if it is generated by the kernel itself.


It may be ok to maintain the list in the test in this case considering 
the list is fairly small but it is not my call.


regards
Arun


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function

2014-08-26 Thread Siluvery, Arun

On 25/08/2014 13:18, Ville Syrjälä wrote:

On Fri, Aug 22, 2014 at 08:39:11PM +0100, Arun Siluvery wrote:

For BDW workarounds are currently initialized in init_clock_gating() but
they are lost during reset, suspend/resume etc; this patch moves the WAs
that are part of register state context to render ring init fn otherwise
default context ends up with incorrect values as they don't get initialized
until init_clock_gating fn.

v2: Add workarounds to golden render state
This method has its own issues, first of all this is different for
each gen and it is generated using a tool so adding new workaround
and mainitaining them across gens is not a straightforward process.

v3: Use LRIs to emit these workarounds (Ville)
Instead of modifying the golden render state the same LRIs are
emitted from within the driver.

For: VIZ-4092
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/i915_gem_context.c |  6 +++
  drivers/gpu/drm/i915/intel_pm.c | 48 --
  drivers/gpu/drm/i915/intel_ringbuffer.c | 70 +
  drivers/gpu/drm/i915/intel_ringbuffer.h |  7 
  4 files changed, 83 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index 9683e62..2debce4 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -631,20 +631,26 @@ static int do_switch(struct intel_engine_cs *ring,
}

uninitialized = !to-legacy_hw_ctx.initialized  from == NULL;
to-legacy_hw_ctx.initialized = true;

  done:
i915_gem_context_reference(to);
ring-last_context = to;

if (uninitialized) {
+   if (IS_BROADWELL(ring-dev)) {
+   ret = bdw_init_workarounds(ring);
+   if (ret)
+   DRM_ERROR(init workarounds: %d\n, ret);
+   }
+
ret = i915_gem_render_state_init(ring);
if (ret)
DRM_ERROR(init render state: %d\n, ret);
}

return 0;

  unpin_out:
if (ring-id == RCS)
i915_gem_object_ggtt_unpin(to-legacy_hw_ctx.rcs_state);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index c8f744c..668acd9 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -5507,101 +5507,53 @@ static void gen8_init_clock_gating(struct drm_device 
*dev)
struct drm_i915_private *dev_priv = dev-dev_private;
enum pipe pipe;

I915_WRITE(WM3_LP_ILK, 0);
I915_WRITE(WM2_LP_ILK, 0);
I915_WRITE(WM1_LP_ILK, 0);

/* FIXME(BDW): Check all the w/a, some might only apply to
 * pre-production hw. */

-   /* WaDisablePartialInstShootdown:bdw */
-   I915_WRITE(GEN8_ROW_CHICKEN,
-  _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE));
-
-   /* WaDisableThreadStallDopClockGating:bdw */
-   /* FIXME: Unclear whether we really need this on production bdw. */
-   I915_WRITE(GEN8_ROW_CHICKEN,
-  _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE));

-   /*
-* This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for
-* pre-production hardware
-*/
-   I915_WRITE(HALF_SLICE_CHICKEN3,
-  _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS));
-   I915_WRITE(HALF_SLICE_CHICKEN3,
-  _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS));
I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_BWGTLB_DISABLE));

I915_WRITE(_3D_CHICKEN3,
   
_MASKED_BIT_ENABLE(_3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2)));

-   I915_WRITE(COMMON_SLICE_CHICKEN2,
-  _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE));
-
-   I915_WRITE(GEN7_HALF_SLICE_CHICKEN1,
-  _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE));
-
-   /* WaDisableDopClockGating:bdw May not be needed for production */
-   I915_WRITE(GEN7_ROW_CHICKEN2,
-  _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE));

/* WaSwitchSolVfFArbitrationPriority:bdw */
I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL);

/* WaPsrDPAMaskVBlankInSRD:bdw */
I915_WRITE(CHICKEN_PAR1_1,
   I915_READ(CHICKEN_PAR1_1) | DPA_MASK_VBLANK_SRD);

/* WaPsrDPRSUnmaskVBlankInSRD:bdw */
for_each_pipe(pipe) {
I915_WRITE(CHICKEN_PIPESL_1(pipe),
   I915_READ(CHICKEN_PIPESL_1(pipe)) |
   BDW_DPRS_MASK_VBLANK_SRD);
}

-   /* Use Force Non-Coherent whenever executing a 3D context. This is a
-* workaround for for a possible hang in the unlikely event a TLB
-* invalidation occurs during a PSD flush.
-*/
-   I915_WRITE(HDC_CHICKEN0,
-  

Re: [Intel-gfx] [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function

2014-08-26 Thread Siluvery, Arun

On 26/08/2014 11:09, Chris Wilson wrote:

On Tue, Aug 26, 2014 at 10:33:16AM +0100, Arun Siluvery wrote:

For BDW workarounds are currently initialized in init_clock_gating() but
they are lost during reset, suspend/resume etc; this patch moves the WAs
that are part of register state context to render ring init fn otherwise
default context ends up with incorrect values as they don't get initialized
until init_clock_gating fn.

v2: Add workarounds to golden render state
This method has its own issues, first of all this is different for
each gen and it is generated using a tool so adding new workaround
and mainitaining them across gens is not a straightforward process.

v3: Use LRIs to emit these workarounds (Ville)
Instead of modifying the golden render state the same LRIs are
emitted from within the driver.

For: VIZ-4092
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/i915_gem_context.c |  6 +++
  drivers/gpu/drm/i915/intel_pm.c | 48 
  drivers/gpu/drm/i915/intel_ringbuffer.c | 78 +
  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
  4 files changed, 85 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index 9683e62..2debce4 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -631,20 +631,26 @@ static int do_switch(struct intel_engine_cs *ring,
}

uninitialized = !to-legacy_hw_ctx.initialized  from == NULL;
to-legacy_hw_ctx.initialized = true;

  done:
i915_gem_context_reference(to);
ring-last_context = to;

if (uninitialized) {
+   if (IS_BROADWELL(ring-dev)) {
+   ret = bdw_init_workarounds(ring);
+   if (ret)
+   DRM_ERROR(init workarounds: %d\n, ret);


A good rule of thumb is that if you are exporting gen specific routines,
the layering and abstraction is fishy.
-Chris

ok, so something like i915_init_workarounds() is ok? with a check for 
bdw/gen8 done inside that function.


regards
Arun


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function

2014-08-26 Thread Siluvery, Arun

On 26/08/2014 11:34, Chris Wilson wrote:

On Tue, Aug 26, 2014 at 11:16:29AM +0100, Siluvery, Arun wrote:

On 26/08/2014 11:09, Chris Wilson wrote:

On Tue, Aug 26, 2014 at 10:33:16AM +0100, Arun Siluvery wrote:

For BDW workarounds are currently initialized in init_clock_gating() but
they are lost during reset, suspend/resume etc; this patch moves the WAs
that are part of register state context to render ring init fn otherwise
default context ends up with incorrect values as they don't get initialized
until init_clock_gating fn.

v2: Add workarounds to golden render state
This method has its own issues, first of all this is different for
each gen and it is generated using a tool so adding new workaround
and mainitaining them across gens is not a straightforward process.

v3: Use LRIs to emit these workarounds (Ville)
Instead of modifying the golden render state the same LRIs are
emitted from within the driver.

For: VIZ-4092
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/i915_gem_context.c |  6 +++
  drivers/gpu/drm/i915/intel_pm.c | 48 
  drivers/gpu/drm/i915/intel_ringbuffer.c | 78 +
  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
  4 files changed, 85 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index 9683e62..2debce4 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -631,20 +631,26 @@ static int do_switch(struct intel_engine_cs *ring,
}

uninitialized = !to-legacy_hw_ctx.initialized  from == NULL;
to-legacy_hw_ctx.initialized = true;

  done:
i915_gem_context_reference(to);
ring-last_context = to;

if (uninitialized) {
+   if (IS_BROADWELL(ring-dev)) {
+   ret = bdw_init_workarounds(ring);
+   if (ret)
+   DRM_ERROR(init workarounds: %d\n, ret);


A good rule of thumb is that if you are exporting gen specific routines,
the layering and abstraction is fishy.
-Chris


ok, so something like i915_init_workarounds() is ok? with a check
for bdw/gen8 done inside that function.


Except for init_workarounds is quite useless as a function name and we
already have a structure that is already customised per-engine and
per-gen that you could hook into.
engine-ring_init_context() ?
-Chris



Ok thanks, I can create a new fn ring_init_context().

regards
Arun

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/2] drm/i915/bdw: Apply workarounds using the golden render state

2014-08-26 Thread Siluvery, Arun

On 26/08/2014 13:53, Daniel Vetter wrote:

On Fri, Aug 22, 2014 at 01:10:26PM +0100, Siluvery, Arun wrote:

On 22/08/2014 12:06, Mika Kuoppala wrote:

Ville Syrjälä ville.syrj...@linux.intel.com writes:


On Wed, Aug 20, 2014 at 03:19:17PM +0100, Arun Siluvery wrote:

Workarounds for bdw are currently applied in init_clock_gating() but they
are lost following a gpu reset. Some of the WA registers are part of register
state context and they are restored with every context switch so initializing
them in golden render state ensures that they are applied even when we start
with an uninitialized context or during hw initlialization followed by a reset.

v2: Add comments corresponding to WAs in golden render state (Chris).

The generation of render state is not a straighforward process, it would
be ideal to augment WA values from during the setup state as opposed to
using a tool but that would be a follow up patch.


I'd still prefer just emitting the LRIs from code rather tha mucking
about with null batch. Less hoops to jump through when adding a new w/a.


I agree with this. We should aim to keep null state as per
gen. Workaround set is different for gtX inside particular
gen so we would need then multiple null states per gen.

After brief chat with Ville, I think that the correct
spot to init the context specific workarounds is after MI_SET_CONTEXT
to default and right before null batch is run. If we do these
with emitting LRIs to ring, we should be safe as they are then saved
with default ctx.

The default ctx is then used as a 'parent' for newly created
contexts. Ofcource if registers get globbered, then we inherit
crap.

If we have the per gen null state and the ring is initializing
workarounds for the default context, then in future we can
save this state as 'read only golden context'. And use it as the
initial state for all newly created contexts.

Then the full plan how to init would look like this:

#1 reset the gpu (on driver load, on resume or on hang recovery)
#2 if we have 'read only golden context', copy it to default ctx
#3 switch to default context
#4 if we had 'read only golden context' we are done with the init.

---

#5 if this is driver load thus there is no 'read only golden context' yet.
#6 init workarounds through ring LRIs
#7 run null/golden state batch
#8 save this state as a 'read only golden context'

---

#9 for each new context, initialize ctx obj with 'read only golden
  context' (either by memcpy or restoring from it when switching to new)


I understand applying WAs using null batch has its issues but as I mentioned
in the commit msg I will fix this as a follow up patch.
It is going to take some time though to change the patch as per the new
sequence.
The patch in its current state helps fix WA issues after reset; so it can
only be accepted if it is updated as per the new sequence?


We already have a lot of let's fix it later experiments running, so I
don't want to overload the ship. So I highly prefer to merge the revised
version directly.
-Daniel

I understand, a revised version with LRIs emitting from the driver is 
already submitted and is being reviewed.


regards
Arun


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/bdw: Render moot context reset and switch with Execlists

2014-08-26 Thread Siluvery, Arun

On 26/08/2014 06:59, Chris Wilson wrote:

On Mon, Aug 25, 2014 at 10:39:39PM +0200, Daniel Vetter wrote:

On Wed, Aug 20, 2014 at 04:36:05PM +0100, Chris Wilson wrote:

On Wed, Aug 20, 2014 at 04:29:24PM +0100, Thomas Daniel wrote:

These two functions make no sense in an Logical Ring Context  Execlists
world.

v2: We got rid of lrc_enabled and centralized everything in the sanitized
i915.enable_execlists instead.

Signed-off-by: Oscar Mateo oscar.ma...@intel.com

v3: Rebased.  Corrected a typo in comment for i915_switch_context and
added a comment that it should not be called in execlist mode. Added
WARN_ON if i915_switch_context is called in execlist mode. Moved check
for execlist mode out of i915_switch_context and into callers. Added
comment in context_reset explaining why nothing is done in execlist
mode.


No, this is not the way. The requirement is to reduce the number of
special cases not increase them. These should be evaluated to be no-ops
when execlists is used.


I think it's ok-ish for now. Maybe we need to reconsider when we wire up
lrc reclaim - which is the real user of the switch_context in gpu_idle.
The problem I have though is that I can't parse the subject of the patch,
someone please translate that to simplified English for me. I can do the
replacement while applying.


No, it is not. execlists is badly designed and this is a further symptom
of that.
-Chris


Thomas is not available and I am replying on his behalf.
Is the following subject is good for this patch?

Don't execute context reset and switch when using Execlists

regards
Arun


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function

2014-08-26 Thread Siluvery, Arun

On 26/08/2014 15:37, Ville Syrjälä wrote:

On Tue, Aug 26, 2014 at 02:44:50PM +0100, Arun Siluvery wrote:

For BDW workarounds are currently initialized in init_clock_gating() but
they are lost during reset, suspend/resume etc; this patch moves the WAs
that are part of register state context to render ring init fn otherwise
default context ends up with incorrect values as they don't get initialized
until init_clock_gating fn.

v2: Add workarounds to golden render state
This method has its own issues, first of all this is different for
each gen and it is generated using a tool so adding new workaround
and mainitaining them across gens is not a straightforward process.

v3: Use LRIs to emit these workarounds (Ville)
Instead of modifying the golden render state the same LRIs are
emitted from within the driver.

v4: Use abstract name when exporting gen specific routines (Chris)

For: VIZ-4092
Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com


This one looks good as far as I'm concerned.
Reviewed-by: Ville Syrjälä ville.syrj...@linux.intel.com

Do you plan to give other platforms the same treatment? We need at least
CHV converted ASAP. But if you don't have a test machine I can take care
of that myself.

I don't have hardware for CHV, I can borrow and try to do but since it 
is required at the earliest could you please modify it for CHV?


regards
Arun


---
  drivers/gpu/drm/i915/i915_gem_context.c |  6 +++
  drivers/gpu/drm/i915/intel_pm.c | 48 
  drivers/gpu/drm/i915/intel_ringbuffer.c | 79 +
  drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +
  4 files changed, 87 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index 9683e62..0a9bb0e 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -631,20 +631,26 @@ static int do_switch(struct intel_engine_cs *ring,
}

uninitialized = !to-legacy_hw_ctx.initialized  from == NULL;
to-legacy_hw_ctx.initialized = true;

  done:
i915_gem_context_reference(to);
ring-last_context = to;

if (uninitialized) {
+   if (ring-init_context) {
+   ret = ring-init_context(ring);
+   if (ret)
+   DRM_ERROR(ring init context: %d\n, ret);
+   }
+
ret = i915_gem_render_state_init(ring);
if (ret)
DRM_ERROR(init render state: %d\n, ret);
}

return 0;

  unpin_out:
if (ring-id == RCS)
i915_gem_object_ggtt_unpin(to-legacy_hw_ctx.rcs_state);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index c8f744c..668acd9 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -5507,101 +5507,53 @@ static void gen8_init_clock_gating(struct drm_device 
*dev)
struct drm_i915_private *dev_priv = dev-dev_private;
enum pipe pipe;

I915_WRITE(WM3_LP_ILK, 0);
I915_WRITE(WM2_LP_ILK, 0);
I915_WRITE(WM1_LP_ILK, 0);

/* FIXME(BDW): Check all the w/a, some might only apply to
 * pre-production hw. */

-   /* WaDisablePartialInstShootdown:bdw */
-   I915_WRITE(GEN8_ROW_CHICKEN,
-  _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE));
-
-   /* WaDisableThreadStallDopClockGating:bdw */
-   /* FIXME: Unclear whether we really need this on production bdw. */
-   I915_WRITE(GEN8_ROW_CHICKEN,
-  _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE));

-   /*
-* This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for
-* pre-production hardware
-*/
-   I915_WRITE(HALF_SLICE_CHICKEN3,
-  _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS));
-   I915_WRITE(HALF_SLICE_CHICKEN3,
-  _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS));
I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_BWGTLB_DISABLE));

I915_WRITE(_3D_CHICKEN3,
   
_MASKED_BIT_ENABLE(_3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2)));

-   I915_WRITE(COMMON_SLICE_CHICKEN2,
-  _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE));
-
-   I915_WRITE(GEN7_HALF_SLICE_CHICKEN1,
-  _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE));
-
-   /* WaDisableDopClockGating:bdw May not be needed for production */
-   I915_WRITE(GEN7_ROW_CHICKEN2,
-  _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE));

/* WaSwitchSolVfFArbitrationPriority:bdw */
I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL);

/* WaPsrDPAMaskVBlankInSRD:bdw */
I915_WRITE(CHICKEN_PAR1_1,
   I915_READ(CHICKEN_PAR1_1) | DPA_MASK_VBLANK_SRD);

/* WaPsrDPRSUnmaskVBlankInSRD:bdw */

Re: [Intel-gfx] [PATCH 1/2] drm/i915/bdw: Apply workarounds using the golden render state

2014-08-22 Thread Siluvery, Arun

On 22/08/2014 12:06, Mika Kuoppala wrote:

Ville Syrjälä ville.syrj...@linux.intel.com writes:


On Wed, Aug 20, 2014 at 03:19:17PM +0100, Arun Siluvery wrote:

Workarounds for bdw are currently applied in init_clock_gating() but they
are lost following a gpu reset. Some of the WA registers are part of register
state context and they are restored with every context switch so initializing
them in golden render state ensures that they are applied even when we start
with an uninitialized context or during hw initlialization followed by a reset.

v2: Add comments corresponding to WAs in golden render state (Chris).

The generation of render state is not a straighforward process, it would
be ideal to augment WA values from during the setup state as opposed to
using a tool but that would be a follow up patch.


I'd still prefer just emitting the LRIs from code rather tha mucking
about with null batch. Less hoops to jump through when adding a new w/a.


I agree with this. We should aim to keep null state as per
gen. Workaround set is different for gtX inside particular
gen so we would need then multiple null states per gen.

After brief chat with Ville, I think that the correct
spot to init the context specific workarounds is after MI_SET_CONTEXT
to default and right before null batch is run. If we do these
with emitting LRIs to ring, we should be safe as they are then saved
with default ctx.

The default ctx is then used as a 'parent' for newly created
contexts. Ofcource if registers get globbered, then we inherit
crap.

If we have the per gen null state and the ring is initializing
workarounds for the default context, then in future we can
save this state as 'read only golden context'. And use it as the
initial state for all newly created contexts.

Then the full plan how to init would look like this:

#1 reset the gpu (on driver load, on resume or on hang recovery)
#2 if we have 'read only golden context', copy it to default ctx
#3 switch to default context
#4 if we had 'read only golden context' we are done with the init.

---

#5 if this is driver load thus there is no 'read only golden context' yet.
#6 init workarounds through ring LRIs
#7 run null/golden state batch
#8 save this state as a 'read only golden context'

---

#9 for each new context, initialize ctx obj with 'read only golden
  context' (either by memcpy or restoring from it when switching to new)

I understand applying WAs using null batch has its issues but as I 
mentioned in the commit msg I will fix this as a follow up patch.
It is going to take some time though to change the patch as per the new 
sequence.
The patch in its current state helps fix WA issues after reset; so it 
can only be accepted if it is updated as per the new sequence?


regards
Arun



-Mika




___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/2] igt/gem_workarounds: igt to test workaround registers

2014-08-20 Thread Siluvery, Arun

On 20/08/2014 16:37, Thomas Wood wrote:

On 20 August 2014 15:52, Arun Siluvery arun.siluv...@linux.intel.com wrote:

Some of the workarounds are lost followed by a gpu reset, suspend/resume;
this patch adds a test which compares register state before and after
the test scenario.

This test currently verifies only bdw workarounds.


Just a few points from an igt perspective: could you add the binary to
tests/.gitignore and perhaps consider using igt_debug or igt_info
instead of printf? There are also some debugfs helpers in igt_debugfs


Thank you for the comments, corrected the patch locally, will send the 
updated version along with any other comments.


regards
Arun


to open/fopen debugfs files. There are also a few other tests that
implement GPU hangs, so it would be good to share code to do this
between them, but not essential for this patch.







Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  tests/Makefile.sources  |   1 +
  tests/gem_workarounds.c | 238 
  2 files changed, 239 insertions(+)
  create mode 100644 tests/gem_workarounds.c

diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index 0eb9369..a17acd1 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -127,20 +127,21 @@ TESTS_progs = \
 gem_storedw_loop_vebox \
 gem_threaded_access_tiled \
 gem_tiled_fence_blits \
 gem_tiled_pread \
 gem_tiled_pread_pwrite \
 gem_tiled_swapping \
 gem_tiling_max_stride \
 gem_unfence_active_buffers \
 gem_unref_active_buffers \
 gem_wait_render_timeout \
+   gem_workarounds \
 gen3_mixed_blits \
 gen3_render_linear_blits \
 gen3_render_mixed_blits \
 gen3_render_tiledx_blits \
 gen3_render_tiledy_blits \
 gen7_forcewake_mt \
 kms_force_connector \
 kms_sink_crc_basic \
 kms_fence_pin_leak \
 pm_psr \
diff --git a/tests/gem_workarounds.c b/tests/gem_workarounds.c
new file mode 100644
index 000..56bf4b1
--- /dev/null
+++ b/tests/gem_workarounds.c
@@ -0,0 +1,238 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the Software),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *  Arun Siluvery arun.siluv...@linux.intel.com
+ *
+ */
+
+#define _GNU_SOURCE
+#include stdbool.h
+#include unistd.h
+#include stdlib.h
+#include stdio.h
+#include string.h
+#include fcntl.h
+#include inttypes.h
+#include errno.h
+#include sys/stat.h
+#include sys/ioctl.h
+#include sys/mman.h
+#include time.h
+#include signal.h
+
+#include ioctl_wrappers.h
+#include drmtest.h
+#include igt_debugfs.h
+#include igt_aux.h
+#include intel_chipset.h
+#include intel_io.h
+
+enum operation {
+   GPU_RESET = 0x01,
+   SUSPEND_RESUME = 0x02,
+};
+
+struct intel_wa_reg {
+   uint32_t addr;
+   uint32_t value;
+   uint32_t mask;
+};
+
+int drm_fd;
+uint32_t devid;
+static drm_intel_bufmgr *bufmgr;
+struct intel_batchbuffer *batch;
+int num_wa;
+struct intel_wa_reg *wa_regs;
+
+
+static void test_hang_gpu(void)
+{
+   int retry_count = 30;
+   enum stop_ring_flags flags;
+   struct drm_i915_gem_execbuffer2 execbuf;
+   struct drm_i915_gem_exec_object2 gem_exec;
+   uint32_t b[2] = {MI_BATCH_BUFFER_END};
+
+   igt_assert(retry_count);
+   igt_set_stop_rings(STOP_RING_DEFAULTS);
+
+   memset(gem_exec, 0, sizeof(gem_exec));
+   gem_exec.handle = gem_create(drm_fd, 4096);
+   gem_write(drm_fd, gem_exec.handle, 0, b, sizeof(b));
+
+   memset(execbuf, 0, sizeof(execbuf));
+   execbuf.buffers_ptr = (uintptr_t)gem_exec;
+   execbuf.buffer_count = 1;
+   execbuf.batch_len = sizeof(b);
+
+   drmIoctl(drm_fd, DRM_IOCTL_I915_GEM_EXECBUFFER2, execbuf);
+
+   while(retry_count--) {
+   flags = igt_get_stop_rings();
+   if (flags == 0)
+ 

Re: [Intel-gfx] [RFC] drm/i915/bdw: Apply workarounds to the golden render state

2014-08-08 Thread Siluvery, Arun

On 08/08/2014 10:57, Chris Wilson wrote:

On Fri, Aug 08, 2014 at 10:52:57AM +0100, arun.siluv...@linux.intel.com wrote:

From: Arun Siluvery arun.siluv...@linux.intel.com

Workarounds for bdw are currently applied in init_clock_gating() but they
are lost following a gpu reset. Some of the registers are part of register
state context and they are restored with every context switch so initializing
WAs in golden render state ensures that they are applied even when we start
with an uninitialized context or during hw initialization followed by a reset.


Interesting, but let's try to keep the opaque blobs minimal. The
comments for w/a are even more valuable than the code.

I agree, I will add comments to each workaround.
We are looking at augmenting workarounds to the null batch in render 
state setup function itself. Do you have any comments with that approach?


regards
Arun


-Chris



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] drm/i915/bdw: Apply workarounds to the golden render state

2014-08-08 Thread Siluvery, Arun

On 08/08/2014 13:20, Ville Syrjälä wrote:

On Fri, Aug 08, 2014 at 10:52:57AM +0100, arun.siluv...@linux.intel.com wrote:

From: Arun Siluvery arun.siluv...@linux.intel.com

Workarounds for bdw are currently applied in init_clock_gating() but they
are lost following a gpu reset. Some of the registers are part of register
state context and they are restored with every context switch so initializing
WAs in golden render state ensures that they are applied even when we start
with an uninitialized context or during hw initialization followed by a reset.


This approach might require separate null states for BDW vs. CHV and IVB
vs. HSW vs. VLV, which seems a bit unfortunate. Might be better to just
issue the w/a register writes via LRIs from the code as part of the null
state load.

Yes this is a better approach, I am currently changing the code to 
achieve this, not sure how easy it would be.



Although I don't actually undertand how this improves things as opposed
to just appllying the w/as via mmio writes. Does it?

I observed random behaviour CACHE_MODE_1 which simply used to lose the 
applied workaround on first context switch even though it is loaded with 
inhibit==1; register values are not supposed to change but it was changing.


I think it is better to add them in null batch to ensure hardware starts 
with WAs applied.


regards
Arun



Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/intel_pm.c   | 50 -
  drivers/gpu/drm/i915/intel_renderstate_gen8.c | 62 +--
  2 files changed, 39 insertions(+), 73 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 1ddd4df..ab64b64 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -5402,38 +5402,11 @@ static void gen8_init_clock_gating(struct drm_device 
*dev)
/* FIXME(BDW): Check all the w/a, some might only apply to
 * pre-production hw. */

-   /* WaDisablePartialInstShootdown:bdw */
-   I915_WRITE(GEN8_ROW_CHICKEN,
-  _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE));
-
-   /* WaDisableThreadStallDopClockGating:bdw */
-   /* FIXME: Unclear whether we really need this on production bdw. */
-   I915_WRITE(GEN8_ROW_CHICKEN,
-  _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE));
-
-   /*
-* This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for
-* pre-production hardware
-*/
-   I915_WRITE(HALF_SLICE_CHICKEN3,
-  _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS));
-   I915_WRITE(HALF_SLICE_CHICKEN3,
-  _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS));
I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_BWGTLB_DISABLE));

I915_WRITE(_3D_CHICKEN3,
   
_MASKED_BIT_ENABLE(_3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2)));

-   I915_WRITE(COMMON_SLICE_CHICKEN2,
-  _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE));
-
-   I915_WRITE(GEN7_HALF_SLICE_CHICKEN1,
-  _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE));
-
-   /* WaDisableDopClockGating:bdw May not be needed for production */
-   I915_WRITE(GEN7_ROW_CHICKEN2,
-  _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE));
-
/* WaSwitchSolVfFArbitrationPriority:bdw */
I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL);

@@ -5448,41 +5421,18 @@ static void gen8_init_clock_gating(struct drm_device 
*dev)
   BDW_DPRS_MASK_VBLANK_SRD);
}

-   /* Use Force Non-Coherent whenever executing a 3D context. This is a
-* workaround for for a possible hang in the unlikely event a TLB
-* invalidation occurs during a PSD flush.
-*/
-   I915_WRITE(HDC_CHICKEN0,
-  I915_READ(HDC_CHICKEN0) |
-  _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT));
-
/* WaVSRefCountFullforceMissDisable:bdw */
/* WaDSRefCountFullforceMissDisable:bdw */
I915_WRITE(GEN7_FF_THREAD_MODE,
   I915_READ(GEN7_FF_THREAD_MODE) 
   ~(GEN8_FF_DS_REF_CNT_FFME | GEN7_FF_VS_REF_CNT_FFME));

-   /*
-* BSpec recommends 8x4 when MSAA is used,
-* however in practice 16x4 seems fastest.
-*
-* Note that PS/WM thread counts depend on the WIZ hashing
-* disable bit, which we don't touch here, but it's good
-* to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
-*/
-   I915_WRITE(GEN7_GT_MODE,
-  GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
-
I915_WRITE(GEN6_RC_SLEEP_PSMI_CONTROL,
   _MASKED_BIT_ENABLE(GEN8_RC_SEMA_IDLE_MSG_DISABLE));

/* WaDisableSDEUnitClockGating:bdw */
I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) |
   GEN8_SDEUNIT_CLOCK_GATE_DISABLE);
-
-   

Re: [Intel-gfx] [RFC 2/2] igt/gem_workarounds: igt to test workaround registers

2014-08-08 Thread Siluvery, Arun

On 08/08/2014 15:12, Daniel Vetter wrote:

On Fri, Aug 08, 2014 at 10:54:56AM +0100, arun.siluv...@linux.intel.com wrote:

From: Arun Siluvery arun.siluv...@linux.intel.com

Some of the workarounds are lost followed by a gpu reset, suspend/resume;
this patch adds a test which captures register state before and after
the test scenario.

This test currently verifies only bdw workarounds.

Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com


Some comments below.


---
  lib/intel_reg.h |   8 ++
  tests/Makefile.sources  |   1 +
  tests/gem_workarounds.c | 211 
  3 files changed, 220 insertions(+)
  create mode 100644 tests/gem_workarounds.c

diff --git a/lib/intel_reg.h b/lib/intel_reg.h
index 86175bb..d015c36 100644
--- a/lib/intel_reg.h
+++ b/lib/intel_reg.h
@@ -3628,4 +3628,12 @@ typedef enum {
  #define   GEN6_WIZ_HASHING_16x4   GEN6_WIZ_HASHING(1, 0)
  #define   GEN6_WIZ_HASHING_MASK   (GEN6_WIZ_HASHING(1, 1) 
 16)

+#define GAMTARBMODE0x04a08
+#define _3D_CHICKEN3   0x02090
+#define GAM_ECOCHK 0x4090
+#define CHICKEN_PAR1_1 0x42080
+#define GEN7_FF_THREAD_MODE0x20a0
+#define GEN6_RC_SLEEP_PSMI_CONTROL 0x2050
+#define GEN8_UCGCTL6   0x9430
+
  #endif /* _I810_REG_H */
diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index 0eb9369..a17acd1 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -134,6 +134,7 @@ TESTS_progs = \
gem_unfence_active_buffers \
gem_unref_active_buffers \
gem_wait_render_timeout \
+   gem_workarounds \
gen3_mixed_blits \
gen3_render_linear_blits \
gen3_render_mixed_blits \
diff --git a/tests/gem_workarounds.c b/tests/gem_workarounds.c
new file mode 100644
index 000..35d1aa7
--- /dev/null
+++ b/tests/gem_workarounds.c
@@ -0,0 +1,211 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the Software),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *  Arun Siluvery arun.siluv...@linux.intel.com
+ *
+ */
+
+#define _GNU_SOURCE
+#include stdbool.h
+#include unistd.h
+#include stdlib.h
+#include stdio.h
+#include string.h
+#include fcntl.h
+#include inttypes.h
+#include errno.h
+#include sys/stat.h
+#include sys/ioctl.h
+#include sys/mman.h
+#include time.h
+#include signal.h
+
+#include ioctl_wrappers.h
+#include drmtest.h
+#include igt_debugfs.h
+#include igt_aux.h
+#include intel_chipset.h
+#include intel_io.h
+
+int drm_fd;
+static drm_intel_bufmgr *bufmgr;
+struct intel_batchbuffer *batch;
+uint32_t devid;
+
+enum operation {
+   GPU_RESET,
+   SUSPEND_RESUME,


The suspend test doesn't seem to be wire up ...

Also I think it would be worth to have a module-reload version here too.

Suspend/Resume is not working; device is not resuming even after the 
timer is elapsed. Do we know suspend/resume works correctly on nightly?



+};
+
+struct workaround {
+   const char *reg_name;
+   uint32_t address;
+};
+
+static struct workaround bdw_workarounds[] =
+{
+   { GEN8_ROW_CHICKEN, GEN8_ROW_CHICKEN },
+   { GEN7_ROW_CHICKEN2, GEN7_ROW_CHICKEN2 },
+   { HALF_SLICE_CHICKEN3, HALF_SLICE_CHICKEN3 },
+   { GEN7_HALF_SLICE_CHICKEN1, GEN7_HALF_SLICE_CHICKEN1 },
+   { COMMON_SLICE_CHICKEN2, COMMON_SLICE_CHICKEN2 },
+   { HDC_CHICKEN0, HDC_CHICKEN0 },
+   { GEN7_CACHE_MODE_1, GEN7_CACHE_MODE_1 },
+   { GEN7_GT_MODE, GEN7_GT_MODE },
+   { GAMTARBMODE, GAMTARBMODE },
+   { _3D_CHICKEN3, _3D_CHICKEN3 },
+   { GAM_ECOCHK, GAM_ECOCHK },
+   { CHICKEN_PAR1_1, CHICKEN_PAR1_1 },
+   { GEN7_FF_THREAD_MODE, GEN7_FF_THREAD_MODE },
+   { GEN6_RC_SLEEP_PSMI_CONTROL, GEN6_RC_SLEEP_PSMI_CONTROL },
+   { GEN8_UCGCTL6, GEN8_UCGCTL6 },
+   { NULL, 0x },
+};


Crazy idea I've just had to 

Re: [Intel-gfx] [RFC] Move BDW workarounds to ring init fn

2014-07-29 Thread Siluvery, Arun

On 28/07/2014 18:26, Ville Syrjälä wrote:

On Mon, Jul 28, 2014 at 05:31:45PM +0100, arun.siluv...@linux.intel.com wrote:

From: Arun Siluvery arun.siluv...@linux.intel.com

This patch moves BDW workarounds from init_clock_gating() to render ring
init fn otherwise they are lost when gpu is reset.
In case of execlists, some of the workarounds modify registers that are
part of register state context which doesn't get initialized until
init_clock_gating(); this results in default context with incorrect values
as it is restored and saved before updated by workarounds.


I don't think it has to do with execlists. Many of the registers are
part of the context image even in ring buffer mode AFAIK.



Open issue:
For Wa4x4STCOptimizationDisable, we set CACHE_MODE_1[6:6] = 1
At the time when HW contexts are enabled after rings are initialized with
default context this workaround is valid but followed by a context switch
this is getting reset, please see below log snippet.


This is a bit weird. The default context should have restore inhibit==1
so it shouldn't clobber the CACHE_MODE_1 register. There was a specific magic
dance you're supposed to do when accessing such registers with mmio, but here
we do the write even before the first context switch.

Apparently there was some kind of problem with CACHE_MODE_0 on snb too:
  commit 3a69ddd6f872180b6f61fda87152b37202118fbc
  Author: Kenneth Graunke kenn...@whitecape.org
  Date:   Fri Apr 27 12:44:41 2012 -0700

 drm/i915: Set the Stencil Cache eviction policy to non-LRA mode.

but IIRC I wasn't able to reproduce it when I tried.


Similar to this register I am also applying this in render ring init fn.



Maybe we need to delay these register writes until we've switched to the default
context?

In its current state (WAs applied in init_clock_gating()) we are writing 
these registers after switching to default context.


When a new hw context is created does all the registers part of context 
start with default values or they sample the current state? and at what 
point this sampling takes place?


As a test I have updated CACHE_MODE_1 after mi_set_context() then the 
workaround was valid with every context switch but I think it may not be 
the right way otherwise we will have to update other WA registers also 
at this point with every context switch.


regards
Arun



...
[5.978209] [drm:i915_pages_create_for_stolen] offset=0x0, size=8294400
[5.978213] [drm:intel_alloc_plane_obj] plane fb obj 8801472e
[5.978215] [drm:i915_gem_setup_global_gtt] reserving preallocated space: 0 
+ 7e9000
[5.978216] [drm:i915_gem_setup_global_gtt] clearing unused GTT space: 
[7e9000, f000]
[5.979613] [drm:i915_gem_init] CACHE_MODE_1: 0x0180
[5.981372] [drm:gen8_ppgtt_init] Allocated 4 pages for page directories (0 
wasted)
[5.981373] [drm:gen8_ppgtt_init] Allocated 2048 pages for page tables (0 
wasted)
[5.981376] [drm:i915_gem_context_init] HW context support initialized
[5.981462] [drm:i915_gem_init_hw] CACHE_MODE_1: 0x0180
[5.981467] [drm:i915_gem_init_rings] CACHE_MODE_1: 0x0180
[5.981704] [drm:bdw_init_workarounds] CACHE_MODE_1: 0x01C0
[5.981716] [drm:init_status_page] bsd ring hws offset: 0x0081e000
[5.981792] [drm:init_status_page] blitter ring hws offset: 0x0083f000
[5.981910] [drm:init_status_page] video enhancement ring hws offset: 
0x0086
[5.982001] [drm:i915_gem_init_hw] CACHE_MODE_1: 0x01C0
[5.982104] [drm:i915_gem_context_enable] Switch render ring to 
default_context
[5.982106] [drm:i915_gem_render_state_init] render ring: Render state init
[5.982120] [drm:do_switch] render ring, CACHE_MODE_1: 0x01C0, 
uninitialized: 1
[5.982121] [drm:i915_gem_context_enable] Switch bsd ring to default_context
[5.982122] [drm:do_switch] bsd ring, CACHE_MODE_1: 0x01C0, 
uninitialized: 0
[5.982123] [drm:i915_gem_context_enable] Switch blitter ring to 
default_context
[5.982126] [drm:do_switch] blitter ring, CACHE_MODE_1: 0x01C0, 
uninitialized: 0
[5.982126] [drm:i915_gem_context_enable] Switch video enhancement ring to 
default_context
[5.982128] [drm:do_switch] video enhancement ring, CACHE_MODE_1: 
0x01C0, uninitialized: 0
[5.982133] [drm:i915_gem_init] CACHE_MODE_1: 0x01C0
[5.982258] [drm:intel_init_clock_gating]
...
[   10.037019] [drm:do_switch] blitter ring, CACHE_MODE_1: 0x0180, 
uninitialized: 0
...
[   10.488145] [drm:do_switch] render ring, CACHE_MODE_1: 0x0180, 
uninitialized: 0
...

I am currently testing this with an igt which triggers a gpu reset and compares
WA register contents before and after reset but the test fails because of
this register hence not sending it now.
Please let me know how to keep this WA valid after a context switch.


Arun Siluvery (1):
   drm/i915/bdw: Initialize BDW workarounds in render ring init fn

  drivers/gpu/drm/i915/i915_debugfs.c | 46 ++
  

Re: [Intel-gfx] [RFC] drm/i915/bdw: Initialize BDW workarounds in render ring init fn

2014-07-28 Thread Siluvery, Arun

On 28/07/2014 20:22, Daniel Vetter wrote:

On Mon, Jul 28, 2014 at 08:00:39PM +0300, Ville Syrjälä wrote:

On Mon, Jul 28, 2014 at 05:31:46PM +0100, arun.siluv...@linux.intel.com wrote:

From: Arun Siluvery arun.siluv...@linux.intel.com

The workarounds at the moment are initialized in init_clock_gating() but
they are lost during reset; In case of execlists some workarounds modify
registers that are part of register state context, since these are not
initialized until init_clock_gating() default context ends up with
incorrect values as render context is restored and saved before updated
by workarounds hence move them to render ring init fn. This should be
ok as these workarounds are not related to display clock gating.

Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/i915_debugfs.c | 46 ++
  drivers/gpu/drm/i915/intel_pm.c | 59 
  drivers/gpu/drm/i915/intel_ringbuffer.c | 68 +
  3 files changed, 114 insertions(+), 59 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 083683c..cf7da30 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2397,20 +2397,65 @@ static int i915_shared_dplls_info(struct seq_file *m, 
void *unused)
seq_printf(m,  dpll_md: 0x%08x\n, pll-hw_state.dpll_md);
seq_printf(m,  fp0: 0x%08x\n, pll-hw_state.fp0);
seq_printf(m,  fp1: 0x%08x\n, pll-hw_state.fp1);
seq_printf(m,  wrpll:   0x%08x\n, pll-hw_state.wrpll);
}
drm_modeset_unlock_all(dev);

return 0;
  }

+static int i915_workaround_info(struct seq_file *m, void *unused)
+{
+   struct drm_info_node *node = (struct drm_info_node *) m-private;
+   struct drm_device *dev = node-minor-dev;
+   struct drm_i915_private *dev_priv = dev-dev_private;
+   int ret;
+
+   ret = mutex_lock_interruptible(dev-struct_mutex);
+   if (ret)
+   return ret;
+
+   if (IS_BROADWELL(dev)) {
+   seq_printf(m, GEN8_ROW_CHICKEN:\t0x%08x\n,
+  I915_READ(GEN8_ROW_CHICKEN));
+   seq_printf(m, HALF_SLICE_CHICKEN3:\t0x%08x\n,
+  I915_READ(HALF_SLICE_CHICKEN3));
+   seq_printf(m, GAMTARBMODE:\t0x%08x\n, I915_READ(GAMTARBMODE));
+   seq_printf(m, _3D_CHICKEN3:\t0x%08x\n,
+  I915_READ(_3D_CHICKEN3));
+   seq_printf(m, COMMON_SLICE_CHICKEN2:\t0x%08x\n,
+  I915_READ(COMMON_SLICE_CHICKEN2));
+   seq_printf(m, GEN7_HALF_SLICE_CHICKEN1:\t0x%08x\n,
+  I915_READ(GEN7_HALF_SLICE_CHICKEN1));
+   seq_printf(m, GEN7_ROW_CHICKEN2:\t0x%08x\n,
+  I915_READ(GEN7_ROW_CHICKEN2));
+   seq_printf(m, GAM_ECOCHK:\t0x%08x\n,
+  I915_READ(GAM_ECOCHK));
+   seq_printf(m, HDC_CHICKEN0:\t0x%08x\n,
+  I915_READ(HDC_CHICKEN0));
+   seq_printf(m, GEN7_FF_THREAD_MODE:\t0x%08x\n,
+  I915_READ(GEN7_FF_THREAD_MODE));
+   seq_printf(m, GEN8_UCGCTL6:\t0x%08x\n,
+  I915_READ(GEN8_UCGCTL6));
+   seq_printf(m, GEN6_RC_SLEEP_PSMI_CONTROL:\t0x%08x\n,
+  I915_READ(GEN6_RC_SLEEP_PSMI_CONTROL));
+   seq_printf(m, CACHE_MODE_1:\t0x%08x\n,
+  I915_READ(CACHE_MODE_1));
+   } else
+   DRM_DEBUG_DRIVER(Not available for Gen%d\n,
+INTEL_INFO(dev)-gen);
+
+   mutex_unlock(dev-struct_mutex);
+   return 0;
+}
+


This smells like a separate patch. But I'm not sure we want at all since
intel_reg_read will provide the same information.


Yeah, debugfs files that just do what intel_reg_read does are just an
additional maintaince burden. I know that we have a few that dump lots of
registers, but most of them dump a lot of other information, too.
-Daniel



I've added this mainly for testing workarounds which can be extended 
further as we move WAs for other chipsets but I agree it can be done 
with intel_reg_read.


regards
Arun


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] drm/i915/bdw: Initialize BDW workarounds in render ring init fn

2014-07-28 Thread Siluvery, Arun

On 28/07/2014 18:00, Ville Syrjälä wrote:

On Mon, Jul 28, 2014 at 05:31:46PM +0100, arun.siluv...@linux.intel.com wrote:

From: Arun Siluvery arun.siluv...@linux.intel.com

The workarounds at the moment are initialized in init_clock_gating() but
they are lost during reset; In case of execlists some workarounds modify
registers that are part of register state context, since these are not
initialized until init_clock_gating() default context ends up with
incorrect values as render context is restored and saved before updated
by workarounds hence move them to render ring init fn. This should be
ok as these workarounds are not related to display clock gating.

Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com
---
  drivers/gpu/drm/i915/i915_debugfs.c | 46 ++
  drivers/gpu/drm/i915/intel_pm.c | 59 
  drivers/gpu/drm/i915/intel_ringbuffer.c | 68 +
  3 files changed, 114 insertions(+), 59 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 083683c..cf7da30 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2397,20 +2397,65 @@ static int i915_shared_dplls_info(struct seq_file *m, 
void *unused)
seq_printf(m,  dpll_md: 0x%08x\n, pll-hw_state.dpll_md);
seq_printf(m,  fp0: 0x%08x\n, pll-hw_state.fp0);
seq_printf(m,  fp1: 0x%08x\n, pll-hw_state.fp1);
seq_printf(m,  wrpll:   0x%08x\n, pll-hw_state.wrpll);
}
drm_modeset_unlock_all(dev);

return 0;
  }

+static int i915_workaround_info(struct seq_file *m, void *unused)
+{
+   struct drm_info_node *node = (struct drm_info_node *) m-private;
+   struct drm_device *dev = node-minor-dev;
+   struct drm_i915_private *dev_priv = dev-dev_private;
+   int ret;
+
+   ret = mutex_lock_interruptible(dev-struct_mutex);
+   if (ret)
+   return ret;
+
+   if (IS_BROADWELL(dev)) {
+   seq_printf(m, GEN8_ROW_CHICKEN:\t0x%08x\n,
+  I915_READ(GEN8_ROW_CHICKEN));
+   seq_printf(m, HALF_SLICE_CHICKEN3:\t0x%08x\n,
+  I915_READ(HALF_SLICE_CHICKEN3));
+   seq_printf(m, GAMTARBMODE:\t0x%08x\n, I915_READ(GAMTARBMODE));
+   seq_printf(m, _3D_CHICKEN3:\t0x%08x\n,
+  I915_READ(_3D_CHICKEN3));
+   seq_printf(m, COMMON_SLICE_CHICKEN2:\t0x%08x\n,
+  I915_READ(COMMON_SLICE_CHICKEN2));
+   seq_printf(m, GEN7_HALF_SLICE_CHICKEN1:\t0x%08x\n,
+  I915_READ(GEN7_HALF_SLICE_CHICKEN1));
+   seq_printf(m, GEN7_ROW_CHICKEN2:\t0x%08x\n,
+  I915_READ(GEN7_ROW_CHICKEN2));
+   seq_printf(m, GAM_ECOCHK:\t0x%08x\n,
+  I915_READ(GAM_ECOCHK));
+   seq_printf(m, HDC_CHICKEN0:\t0x%08x\n,
+  I915_READ(HDC_CHICKEN0));
+   seq_printf(m, GEN7_FF_THREAD_MODE:\t0x%08x\n,
+  I915_READ(GEN7_FF_THREAD_MODE));
+   seq_printf(m, GEN8_UCGCTL6:\t0x%08x\n,
+  I915_READ(GEN8_UCGCTL6));
+   seq_printf(m, GEN6_RC_SLEEP_PSMI_CONTROL:\t0x%08x\n,
+  I915_READ(GEN6_RC_SLEEP_PSMI_CONTROL));
+   seq_printf(m, CACHE_MODE_1:\t0x%08x\n,
+  I915_READ(CACHE_MODE_1));
+   } else
+   DRM_DEBUG_DRIVER(Not available for Gen%d\n,
+INTEL_INFO(dev)-gen);
+
+   mutex_unlock(dev-struct_mutex);
+   return 0;
+}
+


This smells like a separate patch. But I'm not sure we want at all since
intel_reg_read will provide the same information.


  struct pipe_crc_info {
const char *name;
struct drm_device *dev;
enum pipe pipe;
  };

  static int i915_pipe_crc_open(struct inode *inode, struct file *filep)
  {
struct pipe_crc_info *info = inode-i_private;
struct drm_i915_private *dev_priv = info-dev-dev_private;
@@ -3904,20 +3949,21 @@ static const struct drm_info_list i915_debugfs_list[] = 
{
{i915_ppgtt_info, i915_ppgtt_info, 0},
{i915_llc, i915_llc, 0},
{i915_edp_psr_status, i915_edp_psr_status, 0},
{i915_sink_crc_eDP1, i915_sink_crc, 0},
{i915_energy_uJ, i915_energy_uJ, 0},
{i915_pc8_status, i915_pc8_status, 0},
{i915_power_domain_info, i915_power_domain_info, 0},
{i915_display_info, i915_display_info, 0},
{i915_semaphore_status, i915_semaphore_status, 0},
{i915_shared_dplls_info, i915_shared_dplls_info, 0},
+   {i915_workaround_info, i915_workaround_info, 0},
  };
  #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list)

  static const struct i915_debugfs_files {
const char *name;
const struct 

Re: [Intel-gfx] WAs in init_clock_gating?

2014-07-24 Thread Siluvery, Arun

On 07/07/2014 22:24, Daniel Vetter wrote:

On Mon, Jul 7, 2014 at 11:16 PM, Jesse Barnes jbar...@virtuousgeek.org wrote:

I don't think it's unreasonable to use a macro that checks a global
list for whether to apply a given WA.  They'll be scattered all over,
but at least it'll be easy to see:
   1) whether we implement a given workaround
and
   2) which platforms  steppings it applies to based on the table.


Oh, I agree it's not unreasonable. But I'm kinda begging for the
simple solution since months (years?) and haven't gotten it, while
still getting a steady stream of bug reports and issues. So I've
readjusted my expectations ;-)

If someone delivers the real deal I'll certainly won't reject it.
-Daniel



I am moving bdw workarounds from clock_gating fn to render ring init fn 
and testing this before and after gpu reset.
One of the workaround is to disable STC optimization, reg CACHE_MODE_1 
bit6 set to 1. I observed that some times after boot this gets reset to 
0 (default value) even after applying workarounds; other than 
workarounds no one else seems to write to this function.

Any ideas about this behaviour?

regards
Arun


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] WAs in init_clock_gating?

2014-07-24 Thread Siluvery, Arun

On 24/07/2014 13:33, Daniel Vetter wrote:

On Thu, Jul 24, 2014 at 11:43:11AM +0100, Siluvery, Arun wrote:

On 07/07/2014 22:24, Daniel Vetter wrote:

On Mon, Jul 7, 2014 at 11:16 PM, Jesse Barnes jbar...@virtuousgeek.org wrote:

I don't think it's unreasonable to use a macro that checks a global
list for whether to apply a given WA.  They'll be scattered all over,
but at least it'll be easy to see:
   1) whether we implement a given workaround
and
   2) which platforms  steppings it applies to based on the table.


Oh, I agree it's not unreasonable. But I'm kinda begging for the
simple solution since months (years?) and haven't gotten it, while
still getting a steady stream of bug reports and issues. So I've
readjusted my expectations ;-)

If someone delivers the real deal I'll certainly won't reject it.
-Daniel



I am moving bdw workarounds from clock_gating fn to render ring init fn and
testing this before and after gpu reset.


Testing = with an igt? Because I'll ask for this ;-)


Yes, triggering gpu reset with igt, at the moment the test fails because 
of this register.



One of the workaround is to disable STC optimization, reg CACHE_MODE_1 bit6
set to 1. I observed that some times after boot this gets reset to 0
(default value) even after applying workarounds; other than workarounds no
one else seems to write to this function.
Any ideas about this behaviour?


gpu init tends to do this, since clock_gating is run before that.


thanks, I will take a look.

regards
Arun


-Daniel



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] drm/i915: Add variable gem object size support to i915

2014-06-25 Thread Siluvery, Arun

On 25/06/2014 12:14, Damien Lespiau wrote:

On Wed, Jun 25, 2014 at 11:51:33AM +0100, Damien Lespiau wrote:

(This is not necessarily things one would need to take into account for
this work, just a few thoughts).

One thing I'm wondering is how fitting the size parameter really is
when talking about inherently 2D buffers.

For instance, let's take a Y-tiled texture with MIPLAYOUT_RIGHT, if we
want to allocate mip map levels 0 and 1, and use the ioctl naively to
reserve the LOD1 region in one go, we'll end up over allocating the
space below LOD1 (if I'm not mistaken that is).

This can be mitigated by several calls to this fallocate ioctl, to
reserve columns of pages (in the case above, columns for the LOD1
region).

So, how about trying to reduce this ioctl overhead by providing a list
of (start, length) in the ioctl structure?


One more thing to factor in is (let's assume one future hardware will
support that):
https://www.opengl.org/registry/specs/ARB/sparse_texture.txt

So maybe what we really want is to be able to specify region of pages
that could be specified in (x, y, width, height, stride) ? (idea popped
when talking to Neil Roberts (I now have someone working on Mesa in the
office).



Hi Damien,

Thank you for your comments and the idea to improve this ioctl.
At the moment start, end of a region are expected to be page-aligned; 
ioctl can be modified to accept a multiple ranges and modify them in one 
go to reduce the overhead of the ioctl.


We can define how we want to specify multiple ranges, if userspace can 
provide the list as (start, end) pairs kernel can directly use them but 
what would be the preferred way from the user point of view?


regards
Arun

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] manage multiple entries of scratch page with scatterlist

2014-06-12 Thread Siluvery, Arun

On 12/06/2014 08:26, Daniel Vetter wrote:

On Thu, Jun 12, 2014 at 12:49:47AM +0100, Siluvery, Arun wrote:

Hi,

I am working on a feature to implement support for gem objects to have
variable size and realized a problem with the current implementation.
Please advice me how to handle this situation efficiently.

In this implementation the backing store of the object is replaced with
scratch pages according to input range; Initially I store table entries in
an array, replace relevant entries with scratch pages and I am using
sg_alloc_table_from_pages() to create new sg_table which is assigned to the
object. This implementation works as expected but I realized it is wasting
memory as scratch page count increases.

Consider the worst case scenario where all pages are replaced with scratch
pages.

The fn sg_alloc_table_from_pages() first computes the number of chunks based
on the page frame numbers. PFNs that are consecutive form a chunk and it
allocates scatterlists for each chunk which form the sg_table.

In case of scratch pages they get the same pfn for each page and
sg_alloc_table_from_pages() considers them not part of a chunk and it
allocates scatterlist structure for each scratch page which takes lot of
memory as the object size increases.

I have to tried to modify sg_alloc_table_from_pages() implementation to
check for scratch pfn and consider them as single chunk but after the update
when iterating through for_each_sg_page() I am seeing different page
addresses instead of all pointing to scratch page.

Eg. In an object of size 8 pages, scratch_page = ea000112 and pfn:
0x00044800, the result I get is,

page[0]: ea000112, pfn: 0x00044800,
page[1]: ea0001120040, pfn: 0x00044801,
page[2]: ea0001120080, pfn: 0x00044802,
page[3]: ea00011200c0, pfn: 0x00044803,
page[4]: ea0001120100, pfn: 0x00044804,
page[5]: ea0001120140, pfn: 0x00044805,
page[6]: ea0001120180, pfn: 0x00044806,
page[7]: ea00011201c0, pfn: 0x00044807,

How to manage multiple pages that have same pfn with a single scatterlist
and still have it's length equal to (PAGE_SIZE*chunk_size)?

I would really appreciate any suggestions to improve this implementation.


sg tables don't have the idea of repeating a given page, since it doesn't
make a lot of sense. Is the memory overhead really a big problem?

One other use case where it can be useful is for the creation of 
blanking buffer. Considering a frame buffer size of 8MB = 2K pages, each 
scatterlist is 32 bytes which takes 64K for an 8MB object.
I think this overhead is acceptable which also simplifies the 
implementation.



Extending the sg implementation with a flag somewhere to repeat a given
page instead of incrementing might be possible. But will be a bit of
effort to push that through the process since we'll touch code outside of
drm.

I will explore this option if we see any issues with the overhead.
Thank you for your comments.

regards
Arun


-Daniel



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [RFC] manage multiple entries of scratch page with scatterlist

2014-06-11 Thread Siluvery, Arun

Hi,

I am working on a feature to implement support for gem objects to have 
variable size and realized a problem with the current implementation.

Please advice me how to handle this situation efficiently.

In this implementation the backing store of the object is replaced with 
scratch pages according to input range; Initially I store table entries 
in an array, replace relevant entries with scratch pages and I am using 
sg_alloc_table_from_pages() to create new sg_table which is assigned to 
the object. This implementation works as expected but I realized it is 
wasting memory as scratch page count increases.


Consider the worst case scenario where all pages are replaced with 
scratch pages.


The fn sg_alloc_table_from_pages() first computes the number of chunks 
based on the page frame numbers. PFNs that are consecutive form a chunk 
and it allocates scatterlists for each chunk which form the sg_table.


In case of scratch pages they get the same pfn for each page and
sg_alloc_table_from_pages() considers them not part of a chunk and it
allocates scatterlist structure for each scratch page which takes lot of 
memory as the object size increases.


I have to tried to modify sg_alloc_table_from_pages() implementation to 
check for scratch pfn and consider them as single chunk but after the 
update when iterating through for_each_sg_page() I am seeing different 
page addresses instead of all pointing to scratch page.


Eg. In an object of size 8 pages, scratch_page = ea000112 and 
pfn: 0x00044800, the result I get is,


page[0]: ea000112, pfn: 0x00044800,
page[1]: ea0001120040, pfn: 0x00044801,
page[2]: ea0001120080, pfn: 0x00044802,
page[3]: ea00011200c0, pfn: 0x00044803,
page[4]: ea0001120100, pfn: 0x00044804,
page[5]: ea0001120140, pfn: 0x00044805,
page[6]: ea0001120180, pfn: 0x00044806,
page[7]: ea00011201c0, pfn: 0x00044807,

How to manage multiple pages that have same pfn with a single 
scatterlist and still have it's length equal to (PAGE_SIZE*chunk_size)?


I would really appreciate any suggestions to improve this implementation.

regards
Arun
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] drm/i915: Add variable gem object size support to i915

2014-05-23 Thread Siluvery, Arun

On 12/05/2014 18:02, Eric Anholt wrote:

arun.siluv...@linux.intel.com writes:


From: Siluvery, Arun arun.siluv...@intel.com

This patch adds support to have gem objects of variable size.
The size of the gem object obj-size is always constant and this fact
is tightly coupled in the driver; this implementation allows to vary
its effective size using an interface similar to fallocate().

A new ioctl() is introduced to mark a range as scratch/usable.
Once marked as scratch, associated backing store is released and the
region is filled with scratch pages. The region can also be unmarked
at a later point in which case new backing pages are created.
The range can be anywhere within the object space, it can have multiple
ranges possibly overlapping forming a large contiguous range.

There is only one single scratch page and Kernel allows to write to this
page; userspace need to keep track of scratch page range otherwise any
subsequent writes to these pages will overwrite previous content.

This feature is useful where the exact size of the object is not clear
at the time of its creation, in such case we usually create an object
with more than the required size but end up using it partially.
In devices where there are tight memory constraints it would be useful
to release that additional space which is currently unused. Using this
interface the region can be simply marked as scratch which releases
its backing store thus reducing the memory pressure on the kernel.

Many thanks to Daniel, ChrisW, Tvrtko, Bob for the idea and feedback
on this implementation.

v2: fix holes in error handling and use consistent data types (Tvrtko)
  - If page allocation fails simply return error; do not try to invoke
shrinker to free backing store.
  - Release new pages created by us in case of error during page allocation
or sg_table update.
  - Use 64-bit data types for start and length values to avoid truncation.


The idea sounds nice to have for Mesa.  We've got this ugly code right
now for guessing how many levels a miptree is going to be, and then do
copies if we find out we were wrong about how many the app was going to
use.  This will let us allocate for a maximum-depth miptree, and mark
the smaller levels as unused until an image gets put there.

The problem I see with this plan is if the page table twiddling ends up
being too expensive in our BO reallocation path (right now, if we make
the same guess on every allocation, we'll reuse cached BOs with the same
size and no mapping cost).

It would be nice to see some performance data from real applications, if
possible.  But then, I don't think I've seen any real applications hit
the copy path.

The way I am planning to test is to calculate the time it takes to 
falloc a big object. Could you suggest a best way to test the 
performance of this change?


regards
Arun

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


  1   2   >