Re: [Intel-gfx] [PATCH 1/3] drm/i915: Support for pre-populating the object with system pages
On 24/08/2015 12:58, ankitprasad.r.sha...@intel.com wrote: From: Ankitprasad Sharma ankitprasad.r.sha...@intel.com This patch provides support for the User to populate the object with system pages at its creation time. Since this can be safely performed without holding the 'struct_mutex', it would help to reduce the time 'struct_mutex' is kept locked especially during the exec-buffer path, where it is generally held for the longest time. Signed-off-by: Ankitprasad Sharma ankitprasad.r.sha...@intel.com --- drivers/gpu/drm/i915/i915_dma.c | 2 +- drivers/gpu/drm/i915/i915_gem.c | 51 +++-- include/uapi/drm/i915_drm.h | 11 - 3 files changed, 45 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 8319e07..955aa16 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -171,7 +171,7 @@ static int i915_getparam(struct drm_device *dev, void *data, value = HAS_RESOURCE_STREAMER(dev); break; case I915_PARAM_CREATE_VERSION: - value = 2; + value = 3; break; default: DRM_DEBUG(Unknown parameter %d\n, param-param); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index c44bd05..3904feb 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -46,6 +46,7 @@ static void i915_gem_object_retire__write(struct drm_i915_gem_object *obj); static void i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring); +static int __i915_gem_object_get_pages(struct drm_i915_gem_object *obj); static bool cpu_cache_is_coherent(struct drm_device *dev, enum i915_cache_level level) @@ -414,6 +415,18 @@ i915_gem_create(struct drm_file *file, if (obj == NULL) return -ENOMEM; + if (flags I915_CREATE_POPULATE) { + struct drm_i915_private *dev_priv = dev-dev_private; + + ret = __i915_gem_object_get_pages(obj); + if (ret) + return ret; + + mutex_lock(dev-struct_mutex); + list_add_tail(obj-global_list, dev_priv-mm.unbound_list); + mutex_unlock(dev-struct_mutex); + } + ret = drm_gem_handle_create(file, obj-base, handle); If I915_CREATE_POPULATE is set, don't we have to release the pages when this call fails? regards Arun /* drop reference from allocate - handle holds it now */ drm_gem_object_unreference_unlocked(obj-base); @@ -2328,6 +2341,31 @@ err_pages: return ret; } +static int +__i915_gem_object_get_pages(struct drm_i915_gem_object *obj) +{ + const struct drm_i915_gem_object_ops *ops = obj-ops; + int ret; + + WARN_ON(obj-pages); + + if (obj-madv != I915_MADV_WILLNEED) { + DRM_DEBUG(Attempting to obtain a purgeable object\n); + return -EFAULT; + } + + BUG_ON(obj-pages_pin_count); + + ret = ops-get_pages(obj); + if (ret) + return ret; + + obj-get_page.sg = obj-pages-sgl; + obj-get_page.last = 0; + + return 0; +} + /* Ensure that the associated pages are gathered from the backing storage * and pinned into our object. i915_gem_object_get_pages() may be called * multiple times before they are released by a single call to @@ -2339,28 +2377,17 @@ int i915_gem_object_get_pages(struct drm_i915_gem_object *obj) { struct drm_i915_private *dev_priv = obj-base.dev-dev_private; - const struct drm_i915_gem_object_ops *ops = obj-ops; int ret; if (obj-pages) return 0; - if (obj-madv != I915_MADV_WILLNEED) { - DRM_DEBUG(Attempting to obtain a purgeable object\n); - return -EFAULT; - } - - BUG_ON(obj-pages_pin_count); - - ret = ops-get_pages(obj); + ret = __i915_gem_object_get_pages(obj); if (ret) return ret; list_add_tail(obj-global_list, dev_priv-mm.unbound_list); - obj-get_page.sg = obj-pages-sgl; - obj-get_page.last = 0; - return 0; } diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index f71f75c..26ea715 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -457,20 +457,19 @@ struct drm_i915_gem_create { __u32 handle; __u32 pad; /** -* Requested flags (currently used for placement -* (which memory domain)) +* Requested flags * * You can request that the object be created from special memory * rather than regular system pages using this parameter. Such * irregular objects may have certain restrictions (such as CPU * access to a stolen object is verboten). -* -* This can be
Re: [Intel-gfx] [PATCH v1 2/2] drm/i915/gen9: Disable gather at set shader bit
On 12/08/2015 16:41, Dave Gordon wrote: On 11/08/15 15:44, Arun Siluvery wrote: From Gen9, Push constant instruction parsing behaviour varies according to whether set shader is enabled or not. If we want legacy behaviour then it can be achieved by disabling set shader. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89959 Cc: Ben Widawsky benjamin.widaw...@intel.com Cc: Joonas Lahtinen joonas.lahti...@linux.intel.com Cc: Mika Kuoppala mika.kuopp...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_reg.h | 5 + drivers/gpu/drm/i915/intel_ringbuffer.c | 10 ++ 2 files changed, 15 insertions(+) [snip] diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index cf61262..7d284ed 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -983,6 +983,16 @@ static int gen9_init_workarounds(struct intel_engine_cs *ring) tmp |= HDC_FORCE_CSR_NON_COHERENT_OVR_DISABLE; WA_SET_BIT_MASKED(HDC_CHICKEN0, tmp); +/* Chicken bits to disable set shader is in multiple places, + * set bits in all required registers to disable it correctly + */ +WA_SET_BIT_MASKED(COMMON_SLICE_CHICKEN2, GEN9_DISABLE_GATHER_SET_SHADER_SLICE); +if ((IS_SKYLAKE(dev) INTEL_REVID(dev) = SKL_REVID_D0) || +(IS_BROXTON(dev) INTEL_REVID(dev) == BXT_REVID_A0)) +WA_SET_BIT_MASKED(RS_CHICKEN, RS_CHICKEN_DISABLE_GATHER_AT_SHADER); +else +WA_SET_BIT_MASKED(CS_RCS_BE, CS_RCS_DISABLE_GATHER_AT_SHADER); + return 0; } This workaround isn't tagged with a specific /* WaXyz:chip */ comment. Also, the style isn't consistent with the other paragraphs earlier in this function: those have braces round the body part even when there's only one line of code, possibly to make it clear where the WA comment applies (of course, this is why the buggy WA_REG() macro wasn't spotted earlier). So, maybe prettify this a bit, if possible? The code actually looks correct, just ugly. Oh, and keep patch 1 even if you decide to abandon this one! Hi Dave, This patch can be ignored if we use below patch, [Intel-gfx] [PATCH] lib/rendercopy_gen9: Setup Push constantpointer before sending BTP commands http://lists.freedesktop.org/archives/intel-gfx/2015-August/073483.html I think the correct option would be to ignore this patch. regards Arun .Dave. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] lib/rendercopy_gen9: WaBindlessSurfaceStateModifyEnable
On 11/08/2015 13:25, Mika Kuoppala wrote: Don't set the size of bindless surface state on rendercopy. And as of doing so, take into account the workaround for setting the command size. This was tried during hunting for https://bugs.freedesktop.org/show_bug.cgi?id=89959. But no impact was found. Cc: Arun Siluvery arun.siluv...@linux.intel.com Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com --- lib/rendercopy_gen9.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/lib/rendercopy_gen9.c b/lib/rendercopy_gen9.c index 0766192..4a4a604 100644 --- a/lib/rendercopy_gen9.c +++ b/lib/rendercopy_gen9.c @@ -511,7 +511,11 @@ gen7_emit_push_constants(struct intel_batchbuffer *batch) { static void gen9_emit_state_base_address(struct intel_batchbuffer *batch) { - OUT_BATCH(GEN6_STATE_BASE_ADDRESS | (19 - 2)); + + /* WaBindlessSurfaceStateModifyEnable:skl,bxt */ + /* The length has to be one less if we dont modify + bindless state */ + OUT_BATCH(GEN6_STATE_BASE_ADDRESS | (19 - 1 - 2)); /* general */ OUT_BATCH(0 | BASE_ADDRESS_MODIFY); @@ -544,9 +548,9 @@ gen9_emit_state_base_address(struct intel_batchbuffer *batch) { OUT_BATCH(1 12 | 1); /* Bindless surface state base address */ - OUT_BATCH(0 | BASE_ADDRESS_MODIFY); OUT_BATCH(0); - OUT_BATCH(0xf000); + OUT_BATCH(0); + OUT_BATCH(0); } static void Agrees with spec and looks good to me. No impact observed with gem_concurrent_blit subtests. Reviewed-by: Arun Siluvery arun.siluv...@linux.intel.com regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v1 0/2] Enable legacy behaviour for Push constants
On 11/08/2015 21:58, Timo Aaltonen wrote: On 11.08.2015 17:44, Arun Siluvery wrote: Patch1 fixes a simple compile error in Patch2 Patch2 fixes gpu hang observed with a subtest of gem_concurrent_blit. Arun Siluvery (1): drm/i915/gen9: Disable gather at set shader bit Mika Kuoppala (1): drm/i915: Contain the WA_REG macro drivers/gpu/drm/i915/i915_reg.h | 5 + drivers/gpu/drm/i915/intel_ringbuffer.c | 14 -- 2 files changed, 17 insertions(+), 2 deletions(-) prw-blt-overwrite-source-read-rcs-forked runs fine with these, tested on SKL-Y -H Tested-by: Timo Aaltonen timo.aalto...@canonical.com This patch can be ignored if the following patch is applied, [Intel-gfx] [PATCH] lib/rendercopy_gen9: Setup Push constant pointer before sending BTP commands http://lists.freedesktop.org/archives/intel-gfx/2015-August/073483.html regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915:gen9: Add WA for disable gather at set shader bit
On 08/08/2015 06:35, Ben Widawsky wrote: On Fri, Aug 07, 2015 at 06:33:37PM +0100, Arun Siluvery wrote: This WA doesn't have a name. According to the spec, driver need to reset disable gather at set shader bit in per ctx WA batch. It is to be noted that the default value is already '0' for this bit but we still need to apply this WA because userspace may set it. If userspace really need it to be set then they need to do in every batch. Cc: Ben Widawsky benjamin.widaw...@intel.com Cc: Mika Kuoppala mika.kuopp...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/intel_lrc.c | 9 + 2 files changed, 10 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index ea46d68..838537f 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -5834,6 +5834,7 @@ enum skl_disp_power_wells { # define GEN7_CSC1_RHWO_OPT_DISABLE_IN_RCC((110) | (126)) # define GEN9_RHWO_OPTIMIZATION_DISABLE (114) #define COMMON_SLICE_CHICKEN2 0x7014 +#define GEN9_DISABLE_GATHER_SET_SHADER_SLICE (112) # define GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE (10) #define HIZ_CHICKEN 0x7018 diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 4c40614..df3bb98 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1302,6 +1302,15 @@ static int gen9_init_perctx_bb(struct intel_engine_cs *ring, struct drm_device *dev = ring-dev; uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS); + /* WaNoName:skl,bxt +* This WA has no name, according to the spec driver needs to reset +* disable gather at set shader slice bit in per ctx batch +*/ + wa_ctx_emit(batch, index, MI_LOAD_REGISTER_IMM(1)); + wa_ctx_emit(batch, index, COMMON_SLICE_CHICKEN2); + wa_ctx_emit(batch, index, + _MASKED_BIT_DISABLE(GEN9_DISABLE_GATHER_SET_SHADER_SLICE)); + /* WaSetDisablePixMaskCammingAndRhwoInCommonSliceChicken:skl,bxt */ if ((IS_SKYLAKE(dev) (INTEL_REVID(dev) = SKL_REVID_B0)) || (IS_BROXTON(dev) (INTEL_REVID(dev) == BXT_REVID_A0))) { Hmm. I thought we needed this, but looking at the User Mode Privileged Commands of the spec, it seems like this register is not allowed to be written. So unless this register is put in a whitelist somewhere in the future, I think it's safe to drop this patch. We need to whitelist few registers for preemption related WA, this can be added to whitelist if userspace really needs to write to it. regards Arun As a preventative measure, I don't see this as harmful - but I don't feel I have any authority to suggest whether we keep this in or not. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Check idle to active before processing CSQ
On 07/08/2015 12:52, Daniel Vetter wrote: On Fri, Aug 07, 2015 at 11:15:56AM +0300, Mika Kuoppala wrote: Daniel Vetter dan...@ffwll.ch writes: On Thu, Aug 06, 2015 at 05:09:17PM +0300, Mika Kuoppala wrote: If idle to active bit is set, the rest of the fields in CSQ are not valid. Bail out early if this is the case in order to prevent rest of the loop inspecting stale values. Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com looks good to me, didn't observe any impact with this patch. Reviewed-by: Arun Siluvery arun.siluv...@linux.intel.com regards Arun Same questions here too, what's the impact. E.g. if you only found this by bspec/code inspection then it's for -next, but if it's to fix some known breakage then it's for -fixes + cc: stable. To this and the masked write one: Both of these were found when I was trying to find out root cause for skl hangs. They are both for -next. Both are in the correctness department vrt bspec and I haven't observed any other impact. Point taken on being more verbose. Thanks I added a note about this to the first patch and merged it. This one here still seems to miss an r-b. -Daniel ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/skl WaDisableSbeCacheDispatchPortSharing
On 06/08/2015 14:51, Mika Kuoppala wrote: Add WaDisableSbeCacheDispatchPortSharing:skl Cc: Arun Siluvery arun.siluv...@linux.intel.com Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com --- drivers/gpu/drm/i915/intel_ringbuffer.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 1c14233..1a10358 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1059,6 +1059,13 @@ static int skl_init_workarounds(struct intel_engine_cs *ring) HDC_FENCE_DEST_SLM_DISABLE | HDC_BARRIER_PERFORMANCE_DISABLE); + /* WaDisableSbeCacheDispatchPortSharing:skl */ + if (INTEL_REVID(dev) = SKL_REVID_F0) { + WA_SET_BIT_MASKED( + GEN7_HALF_SLICE_CHICKEN1, + GEN7_SBE_SS_CACHE_DISPATCH_PORT_SHARING_DISABLE); + } + seems to be applicable for BXT also until B0. regards Arun return skl_tune_iz_hashing(ring); } ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/skl WaDisableSbeCacheDispatchPortSharing
On 06/08/2015 15:45, Mika Kuoppala wrote: Siluvery, Arun arun.siluv...@linux.intel.com writes: On 06/08/2015 14:51, Mika Kuoppala wrote: Add WaDisableSbeCacheDispatchPortSharing:skl Cc: Arun Siluvery arun.siluv...@linux.intel.com Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com --- drivers/gpu/drm/i915/intel_ringbuffer.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 1c14233..1a10358 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1059,6 +1059,13 @@ static int skl_init_workarounds(struct intel_engine_cs *ring) HDC_FENCE_DEST_SLM_DISABLE | HDC_BARRIER_PERFORMANCE_DISABLE); + /* WaDisableSbeCacheDispatchPortSharing:skl */ + if (INTEL_REVID(dev) = SKL_REVID_F0) { + WA_SET_BIT_MASKED( + GEN7_HALF_SLICE_CHICKEN1, + GEN7_SBE_SS_CACHE_DISPATCH_PORT_SHARING_DISABLE); + } + seems to be applicable for BXT also until B0. Yes, we have that in bxt_init_workarounds. I pondered it is more clean to have rev check for each in their respective setup functions. fine with this. Reviewed-by: Arun Siluvery arun.siluv...@linux.intel.com regards Arun -Mika regards Arun return skl_tune_iz_hashing(ring); } ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 2/2] drm/i915:gen9: Add disable gather at set shader w/a
On 05/08/2015 15:45, Mika Kuoppala wrote: Arun Siluvery arun.siluv...@linux.intel.com writes: This WA is implemented in init_context as well as WA batch init. There are also some dependent bits need to be set in other registers for this to be complete. v2: behaviour of disable gather at set shader bit can be specified by two different registers, use a better option (Ben). For me it looks like there are 2 orthogonal goals for this patch. I think the actual workaround should be one patch, the resetting of the set shader bit and the patch named accordingly. Then the set shader initialization in a different patch, if there is justification for it (that I have not managed yet to find). I agree it needs to be split into two patches. But lets concentrate on the workaround itself... Cc: Ben Widawsky benjamin.widaw...@intel.com Cc: Joonas Lahtinen joonas.lahti...@linux.intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_reg.h | 5 + drivers/gpu/drm/i915/intel_lrc.c| 8 drivers/gpu/drm/i915/intel_ringbuffer.c | 18 ++ 3 files changed, 31 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 8991cd5..8719a5a 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -1721,6 +1721,10 @@ enum skl_disp_power_wells { #define MEM_DISPLAY_TRICKLE_FEED_DISABLE (12) /* 85x only */ #define FW_BLC0x020d8 #define FW_BLC2 0x020dc +#define GEN7_RS_CHICKEN 0x20DC +#define GEN9_RS_CHICKEN_DISABLE_GATHER_AT_SHADER (12) +#define GEN7_FF_SLICE_CHICKEN10x20E0 +#define GEN9_PER_CTX_DISABLE_GATHER_CONTROL (115) #define FW_BLC_SELF 0x020e0 /* 915+ only */ #define FW_BLC_SELF_EN_MASK (131) #define FW_BLC_SELF_FIFO_MASK(116) /* 945 only */ @@ -5836,6 +5840,7 @@ enum skl_disp_power_wells { # define GEN7_CSC1_RHWO_OPT_DISABLE_IN_RCC((110) | (126)) # define GEN9_RHWO_OPTIMIZATION_DISABLE (114) #define COMMON_SLICE_CHICKEN2 0x7014 +#define GEN9_DISABLE_GATHER_SET_SHADER_SLICE (112) # define GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE (10) #define HIZ_CHICKEN 0x7018 diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 9faad82..d3a03f3 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1292,6 +1292,14 @@ static int gen9_init_perctx_bb(struct intel_engine_cs *ring, struct drm_device *dev = ring-dev; uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS); + /* WA to reset disable gather at set shader slice bit */ I am thinking how we could also alert the reader that this workaround needs to be revisited when it has been given a name. By adding WaNoName:skl,bxt along with the comment above? + if (IS_SKYLAKE(dev)) { As Ben noted, documentation is rather sparse. But the reference to previous problems with this bit being save/restored in wrong order, we can conclude that this should be for BXT also. + wa_ctx_emit(batch, index, MI_LOAD_REGISTER_IMM(1)); + wa_ctx_emit(batch, index, COMMON_SLICE_CHICKEN2); + wa_ctx_emit(batch, index, + _MASKED_BIT_DISABLE(GEN9_DISABLE_GATHER_SET_SHADER_SLICE)); + } + The actual value of 'disable set shader' is orthogonal and beyond scope of this Workaround so the rest should be strip out to different patch. As you mentioned on irc we first need to know whether we need to disable the set shader or not (set ox7014[12:12] to 1), because the WA only talks about resetting this bit in per ctx batch. The following bits can be ignore if there is not need to set that bit in the first place. with reference to gem_ringfill, on my system it only completes without any hang if I add this patch completely but on some system this patch doesn't seem to be necessary. regards Arun -Mika /* WaSetDisablePixMaskCammingAndRhwoInCommonSliceChicken:skl,bxt */ if ((IS_SKYLAKE(dev) (INTEL_REVID(dev) = SKL_REVID_B0)) || (IS_BROXTON(dev) (INTEL_REVID(dev) == BXT_REVID_A0))) { diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index dcd1b8f..5e8e5f9 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -985,6 +985,17 @@ static int gen9_init_workarounds(struct intel_engine_cs *ring) tmp |= HDC_FORCE_CSR_NON_COHERENT_OVR_DISABLE; WA_SET_BIT_MASKED(HDC_CHICKEN0, tmp); + /* WA to gather at set shader - skl,bxt +* These are dependent bits need to be set for the WA. +*/ + if ((IS_SKYLAKE(dev) INTEL_REVID(dev) SKL_REVID_D0) || + (IS_BROXTON(dev) INTEL_REVID(dev) BXT_REVID_A0)) { + WA_SET_BIT_MASKED(GEN7_FF_SLICE_CHICKEN1,
Re: [Intel-gfx] [PATCH v1 2/2] drm/i915:gen9: Add disable gather at set shader w/a
On 04/08/2015 00:21, Ben Widawsky wrote: On Mon, Aug 03, 2015 at 08:24:57PM +0100, Arun Siluvery wrote: This WA is implemented in init_context as well as WA batch init. There are also some dependent bits need to be set in other registers for this to be complete. Cc: Ben Widawsky benjamin.widaw...@intel.com Cc: Joonas Lahtinen joonas.lahti...@linux.intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_reg.h | 3 +++ drivers/gpu/drm/i915/intel_lrc.c| 8 drivers/gpu/drm/i915/intel_ringbuffer.c | 16 3 files changed, 27 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 8991cd5..24b8bb9 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -1720,7 +1720,9 @@ enum skl_disp_power_wells { #define MEM_DISPLAY_A_TRICKLE_FEED_DISABLE (12) /* 830/845 only */ #define MEM_DISPLAY_TRICKLE_FEED_DISABLE (12) /* 85x only */ #define FW_BLC0x020d8 +#define GEN9_DISABLE_GATHER_AT_SET_SHADER(17) #define FW_BLC2 0x020dc +#define GEN9_RS_CHICKEN_DISABLE_GATHER_AT_SHADER (12) Neither of these belong here. BLC is for backlight. Create a new define if we don't have one. #define RS_CHICKEN 0x20dc I thought of reusing existing define but created a new one as you suggested. #define FW_BLC_SELF 0x020e0 /* 915+ only */ #define FW_BLC_SELF_EN_MASK (131) #define FW_BLC_SELF_FIFO_MASK(116) /* 945 only */ @@ -5836,6 +5838,7 @@ enum skl_disp_power_wells { # define GEN7_CSC1_RHWO_OPT_DISABLE_IN_RCC((110) | (126)) # define GEN9_RHWO_OPTIMIZATION_DISABLE (114) #define COMMON_SLICE_CHICKEN2 0x7014 +#define GEN9_DISABLE_GATHER_SET_SHADER_SLICE (112) # define GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE (10) #define HIZ_CHICKEN 0x7018 diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 9faad82..d3a03f3 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1292,6 +1292,14 @@ static int gen9_init_perctx_bb(struct intel_engine_cs *ring, struct drm_device *dev = ring-dev; uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS); + /* WA to reset disable gather at set shader slice bit */ + if (IS_SKYLAKE(dev)) { + wa_ctx_emit(batch, index, MI_LOAD_REGISTER_IMM(1)); + wa_ctx_emit(batch, index, COMMON_SLICE_CHICKEN2); + wa_ctx_emit(batch, index, + _MASKED_BIT_DISABLE(GEN9_DISABLE_GATHER_SET_SHADER_SLICE)); + } + Shouldn't this be for BXT as well? Also, why bother with the revid check below and not here? spec says only SKL+ /* WaSetDisablePixMaskCammingAndRhwoInCommonSliceChicken:skl,bxt */ if ((IS_SKYLAKE(dev) (INTEL_REVID(dev) = SKL_REVID_B0)) || (IS_BROXTON(dev) (INTEL_REVID(dev) == BXT_REVID_A0))) { diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index dcd1b8f..4fc4b5e 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -985,6 +985,15 @@ static int gen9_init_workarounds(struct intel_engine_cs *ring) tmp |= HDC_FORCE_CSR_NON_COHERENT_OVR_DISABLE; WA_SET_BIT_MASKED(HDC_CHICKEN0, tmp); + /* WA to gather at set shader - skl,bxt +* These are dependent bits need to be set for the WA. +*/ + if (IS_SKYLAKE(dev) (INTEL_REVID(dev) SKL_REVID_D0) || + (IS_BROXTON(dev) INTEL_REVID(dev) BXT_REVID_A0)) { + WA_SET_BIT_MASKED(FW_BLC, GEN9_DISABLE_GATHER_AT_SET_SHADER); + WA_SET_BIT_MASKED(FW_BLC2, GEN9_RS_CHICKEN_DISABLE_GATHER_AT_SHADER); + } + return 0; } @@ -1068,6 +1077,13 @@ static int skl_init_workarounds(struct intel_engine_cs *ring) HDC_FENCE_DEST_SLM_DISABLE | HDC_BARRIER_PERFORMANCE_DISABLE); + /* WA to Disable gather at set shader - skl +* This bit needs to be reset in Per ctx WA batch and it is also +* dependent on other bits in different register, all of them need +* be set for the WA to be complete. +*/ + WA_SET_BIT_MASKED(COMMON_SLICE_CHICKEN2, GEN9_DISABLE_GATHER_SET_SHADER_SLICE); + return skl_tune_iz_hashing(ring); } I wouldn't set both 20dc, and 20d8, I am not sure what implication it has. Instead, set or read bit 15 of 0x20e0 and then just set one. To me, it seems like the best way to do this is to set 115 of 0x20e0, and then use bit 2 of 0x20dc for the workaround. We don't need per context controls of something we have to disable always anyway. changed it to use 0x20e0 regards Arun ___ Intel-gfx mailing list
Re: [Intel-gfx] [PATCH v1 1/2] drm/i915:skl: Add WaEnableGapsTsvCreditFix
On 04/08/2015 09:58, Mika Kuoppala wrote: Ben Widawsky benjamin.widaw...@intel.com writes: On Mon, Aug 03, 2015 at 08:24:56PM +0100, Arun Siluvery wrote: Cc: Ben Widawsky benjamin.widaw...@intel.com Cc: Joonas Lahtinen joonas.lahti...@linux.intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_reg.h | 3 +++ drivers/gpu/drm/i915/intel_pm.c | 6 ++ 2 files changed, 9 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 77967ca..8991cd5 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -6849,6 +6849,9 @@ enum skl_disp_power_wells { #define GEN7_MISCCPCTL(0x9424) #define GEN7_DOP_CLOCK_GATE_ENABLE (10) +#define GEN8_GARBCNTL 0xB004 +#define GEN9_GAPS_TSV_CREDIT_DISABLE (17) + /* IVYBRIDGE DPF */ #define GEN7_L3CDERRST1 0xB008 /* L3CD Error Status 1 */ #define HSW_L3CDERRST11 0xB208 /* L3CD Error Status register 1 slice 1 */ diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index c23cab6..9152113 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -106,6 +106,12 @@ static void skl_init_clock_gating(struct drm_device *dev) /* WaDisableLSQCROPERFforOCL:skl */ I915_WRITE(GEN8_L3SQCREG4, I915_READ(GEN8_L3SQCREG4) | GEN8_LQSC_RO_PERF_DIS); + + /* WaEnableGapsTsvCreditFix:skl */ + if (IS_SKYLAKE(dev) (INTEL_REVID(dev) = SKL_REVID_C0)) { + I915_WRITE(GEN8_GARBCNTL, (I915_READ(GEN8_GARBCNTL) | + GEN9_GAPS_TSV_CREDIT_DISABLE)); + } } static void bxt_init_clock_gating(struct drm_device *dev) FWIW, the docs make it sound like BIOS should be doing this. Did you verify we actually don't have the bit set with more recent BKC? I have pretty recent BIOS and the bit was not set. I checked about this, it should be done in driver. regards Arun Tested-by: Ben Widawsky b...@bwidawsk.net Reviewed-by: Ben Widawsky b...@bwidawsk.net Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90854 Tested-by: Mika Kuoppala mika.kuopp...@intel.com -- Ben Widawsky, Intel Open Source Technology Center ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 2/4] drm/i915: Add provision to extend Golden context batch
On 17/07/2015 21:03, Chris Wilson wrote: On Fri, Jul 17, 2015 at 07:13:32PM +0100, Arun Siluvery wrote: The Golden batch carries 3D state at the beginning so that HW starts with a known state. It is carried as a binary blob which is auto-generated from source. The idea was it would be easier to maintain and keep the complexity out of the kernel which makes sense as we don't really touch it. However if you really need to update it then you need to update generator source and keep the binary blob in sync with it. There is a need to patch this in bxt to send one additional command to enable a feature. A solution was to patch the binary data with some additional data structures (included as part of auto-generator source) but it was unnecessarily complicated. Chris suggested the idea of having a secondary batch and execute two batch buffers. It has clear advantages as we needn't touch the base golden batch, can customize secondary/auxiliary batch depending on Gen and can be carried in the driver with no dependencies. This patch adds support for this auxiliary batch which is inserted at the end of golden batch and is completely independent from it. Thanks to Mika for the preliminary review. v2: Strictly conform to the batch size requirements to cover Gen2 and add comments to clarify overflow check in macro (Chris, Mika). Cc: Mika Kuoppala mika.kuopp...@intel.com Cc: Chris Wilson ch...@chris-wilson.co.uk Cc: Armin Reese armin.c.re...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_gem_render_state.c | 45 drivers/gpu/drm/i915/i915_gem_render_state.h | 2 ++ drivers/gpu/drm/i915/intel_lrc.c | 6 3 files changed, 53 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c index b6492fe..5026a62 100644 --- a/drivers/gpu/drm/i915/i915_gem_render_state.c +++ b/drivers/gpu/drm/i915/i915_gem_render_state.c @@ -73,6 +73,24 @@ free_gem: return ret; } +/* + * Macro to add commands to auxiliary batch. + * This macro only checks for page overflow before inserting the commands, + * this is sufficient as the null state generator makes the final batch + * with two passes to build command and state separately. At this point + * the size of both are known and it compacts them by relocating the state + * right after the commands taking care of aligment so we should sufficient + * space below them for adding new commands. + */ +#define OUT_BATCH(batch, i, val) \ + do {\ + if (WARN_ON((i) = PAGE_SIZE / sizeof(u32))) { \ + ret = -ENOSPC; \ + goto err_out; \ + } \ + (batch)[(i)++] = (val); \ + } while(0) + static int render_state_setup(struct render_state *so) { const struct intel_renderstate_rodata *rodata = so-rodata; @@ -110,6 +128,21 @@ static int render_state_setup(struct render_state *so) d[i++] = s; } + + while (i % CACHELINE_DWORDS) + OUT_BATCH(d, i, MI_NOOP); + + so-aux_batch_offset = i * sizeof(u32); + + OUT_BATCH(d, i, MI_BATCH_BUFFER_END); + so-aux_batch_size = (i * sizeof(u32)) - so-aux_batch_offset; + + /* +* Since we are sending length, we need to strictly conform to +* all requirements. For Gen2 this must be a multiple of 8. +*/ + so-aux_batch_size = ALIGN(so-aux_batch_size, 8); + kunmap(page); ret = i915_gem_object_set_to_gtt_domain(so-obj, false); @@ -128,6 +161,8 @@ err_out: return ret; } +#undef OUT_BATCH + void i915_gem_render_state_fini(struct render_state *so) { i915_gem_object_ggtt_unpin(so-obj); @@ -176,6 +211,16 @@ int i915_gem_render_state_init(struct drm_i915_gem_request *req) if (ret) goto out; + if (so.aux_batch_size 8) { + ret = req-ring-dispatch_execbuffer(req, +(so.ggtt_offset + + so.aux_batch_offset), +so.aux_batch_size, +I915_DISPATCH_SECURE); + if (ret) + goto out; + } + i915_vma_move_to_active(i915_gem_obj_to_ggtt(so.obj), req); out: diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.h b/drivers/gpu/drm/i915/i915_gem_render_state.h index 7aa7372..79de101 100644 --- a/drivers/gpu/drm/i915/i915_gem_render_state.h +++ b/drivers/gpu/drm/i915/i915_gem_render_state.h @@ -37,6 +37,8 @@ struct render_state { struct drm_i915_gem_object *obj; u64
Re: [Intel-gfx] [PATCH] drm/i915: Change SRM, LRM instructions to use correct length
On 16/07/2015 16:19, Arun Siluvery wrote: MI_STORE_REGISTER_MEM, MI_LOAD_REGISTER_MEM instructions are not really variable length instructions unlike MI_LOAD_REGISTER_IMM where it expects (reg, addr) pairs so use fixed length for these instructions. Cc: Dave Gordon david.s.gor...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- ping? to any reviewers? regards Arun drivers/gpu/drm/i915/i915_cmd_parser.c | 8 drivers/gpu/drm/i915/i915_reg.h| 8 drivers/gpu/drm/i915/intel_display.c | 4 ++-- drivers/gpu/drm/i915/intel_lrc.c | 4 ++-- 4 files changed, 12 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index 430571b..3771922 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -124,14 +124,14 @@ static const struct drm_i915_cmd_descriptor common_cmds[] = { CMD( MI_STORE_DWORD_INDEX, SMI, !F, 0xFF, R ), CMD( MI_LOAD_REGISTER_IMM(1), SMI, !F, 0xFF, W, .reg = { .offset = 1, .mask = 0x007C, .step = 2 }), - CMD( MI_STORE_REGISTER_MEM(1), SMI, !F, 0xFF, W | B, + CMD( MI_STORE_REGISTER_MEM,SMI,F, 1, W | B, .reg = { .offset = 1, .mask = 0x007C }, .bits = {{ .offset = 0, .mask = MI_GLOBAL_GTT, .expected = 0, }}, ), - CMD( MI_LOAD_REGISTER_MEM(1), SMI, !F, 0xFF, W | B, + CMD( MI_LOAD_REGISTER_MEM, SMI,F, 1, W | B, .reg = { .offset = 1, .mask = 0x007C }, .bits = {{ .offset = 0, @@ -1021,7 +1021,7 @@ static bool check_cmd(const struct intel_engine_cs *ring, * only MI_LOAD_REGISTER_IMM commands. */ if (reg_addr == OACONTROL) { - if (desc-cmd.value == MI_LOAD_REGISTER_MEM(1)) { + if (desc-cmd.value == MI_LOAD_REGISTER_MEM) { DRM_DEBUG_DRIVER(CMD: Rejected LRM to OACONTROL\n); return false; } @@ -1035,7 +1035,7 @@ static bool check_cmd(const struct intel_engine_cs *ring, * allowed mask/value pair given in the whitelist entry. */ if (reg-mask) { - if (desc-cmd.value == MI_LOAD_REGISTER_MEM(1)) { + if (desc-cmd.value == MI_LOAD_REGISTER_MEM) { DRM_DEBUG_DRIVER(CMD: Rejected LRM to masked register 0x%08X\n, reg_addr); return false; diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index bd13494..cc3cb3e 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -342,8 +342,8 @@ */ #define MI_LOAD_REGISTER_IMM(x) MI_INSTR(0x22, 2*(x)-1) #define MI_LRI_FORCE_POSTED (112) -#define MI_STORE_REGISTER_MEM(x) MI_INSTR(0x24, 2*(x)-1) -#define MI_STORE_REGISTER_MEM_GEN8(x) MI_INSTR(0x24, 3*(x)-1) +#define MI_STORE_REGISTER_MEMMI_INSTR(0x24, 1) +#define MI_STORE_REGISTER_MEM_GEN8 MI_INSTR(0x24, 2) #define MI_SRM_LRM_GLOBAL_GTT (122) #define MI_FLUSH_DW MI_INSTR(0x26, 1) /* for GEN6 */ #define MI_FLUSH_DW_STORE_INDEX (121) @@ -354,8 +354,8 @@ #define MI_INVALIDATE_BSD (17) #define MI_FLUSH_DW_USE_GTT (12) #define MI_FLUSH_DW_USE_PPGTT (02) -#define MI_LOAD_REGISTER_MEM(x) MI_INSTR(0x29, 2*(x)-1) -#define MI_LOAD_REGISTER_MEM_GEN8(x) MI_INSTR(0x29, 3*(x)-1) +#define MI_LOAD_REGISTER_MEM MI_INSTR(0x29, 1) +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2) #define MI_BATCH_BUFFER MI_INSTR(0x30, 1) #define MI_BATCH_NON_SECURE (1) /* for snb/ivb/vlv this also means batch in ppgtt when ppgtt is enabled. */ diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 472c544..a78c823 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -11053,10 +11053,10 @@ static int intel_gen7_queue_flip(struct drm_device *dev, DERRMR_PIPEB_PRI_FLIP_DONE | DERRMR_PIPEC_PRI_FLIP_DONE)); if (IS_GEN8(dev)) - intel_ring_emit(ring, MI_STORE_REGISTER_MEM_GEN8(1) | + intel_ring_emit(ring, MI_STORE_REGISTER_MEM_GEN8 |
Re: [Intel-gfx] [PATCH v1 3/4] drm/i915:bxt: Enable Pooled EU support
On 17/07/2015 17:27, Chris Wilson wrote: On Fri, Jul 17, 2015 at 05:08:53PM +0100, Arun Siluvery wrote: This mode allows to assign EUs to pools. The command to enable this mode is sent in auxiliary golden context batch as this is only issued once with each context initialization. Thanks to Mika for the preliminary review. A quick explanation for why this has to be in the kernel would be nice. Privileged instruction? This purpose of auxiliary batch is explained in patch2, but I can add some explanation about this one also. Not fond of the split between this and patch 4. Patch 4 intoduces one feature flag that looks different to the one we use here to enable support. I will patch4 as separate as it deals with libdrm changes but use the feature flag in this one. regards Arun -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 3/4] drm/i915/bxt: Add get_param to query Pooled EU availability
On 17/07/2015 19:13, Arun Siluvery wrote: User space clients need to know when the pooled EU feature is present and enabled on the hardware so that they can adapt work submissions. Create a new device info flag for this purpose, and create a new GETPARAM entry to allow user space to query its setting. Set has_pooled_eu to true in the Broxton static device info - Broxton supports the feature in hardware and the driver will enable it by default. Signed-off-by: Jeff McGee jeff.mc...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- Please ignore this patch, this is squashed with Patch4 drm/i915:bxt: Enable Pooled EU support to keep all enabling changes in the same place otherwise we would've announced support to userspace before enabling it in kernel. regards Arun drivers/gpu/drm/i915/i915_dma.c | 3 +++ drivers/gpu/drm/i915/i915_drv.c | 1 + drivers/gpu/drm/i915/i915_drv.h | 5 - include/uapi/drm/i915_drm.h | 1 + 4 files changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 5e63076..6c31beb 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -170,6 +170,9 @@ static int i915_getparam(struct drm_device *dev, void *data, case I915_PARAM_HAS_RESOURCE_STREAMER: value = HAS_RESOURCE_STREAMER(dev); break; + case I915_PARAM_HAS_POOLED_EU: + value = HAS_POOLED_EU(dev); + break; default: DRM_DEBUG(Unknown parameter %d\n, param-param); return -EINVAL; diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index e44dc0d..213f74d 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -389,6 +389,7 @@ static const struct intel_device_info intel_broxton_info = { .num_pipes = 3, .has_ddi = 1, .has_fbc = 1, + .has_pooled_eu = 1, GEN_DEFAULT_PIPEOFFSETS, IVB_CURSOR_OFFSETS, }; diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 768d1db..32850a8 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -775,7 +775,8 @@ struct intel_csr { func(supports_tv) sep \ func(has_llc) sep \ func(has_ddi) sep \ - func(has_fpga_dbg) + func(has_fpga_dbg) sep \ + func(has_pooled_eu) #define DEFINE_FLAG(name) u8 name:1 #define SEP_SEMICOLON ; @@ -2549,6 +2550,8 @@ struct drm_i915_cmd_table { #define HAS_RESOURCE_STREAMER(dev) (IS_HASWELL(dev) || \ INTEL_INFO(dev)-gen = 8) +#define HAS_POOLED_EU(dev) (INTEL_INFO(dev)-has_pooled_eu) + #define INTEL_PCH_DEVICE_ID_MASK 0xff00 #define INTEL_PCH_IBX_DEVICE_ID_TYPE 0x3b00 #define INTEL_PCH_CPT_DEVICE_ID_TYPE 0x1c00 diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index e7c29f1..9649577 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -356,6 +356,7 @@ typedef struct drm_i915_irq_wait { #define I915_PARAM_EU_TOTAL34 #define I915_PARAM_HAS_GPU_RESET 35 #define I915_PARAM_HAS_RESOURCE_STREAMER 36 +#define I915_PARAM_HAS_POOLED_EU 37 typedef struct drm_i915_getparam { int param; ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Do kunmap if renderstate parsing fails
On 16/07/2015 15:36, Mika Kuoppala wrote: Kunmap the renderstate page on error path. Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com --- drivers/gpu/drm/i915/i915_gem_render_state.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c index a0201fc..b6492fe 100644 --- a/drivers/gpu/drm/i915/i915_gem_render_state.c +++ b/drivers/gpu/drm/i915/i915_gem_render_state.c @@ -96,8 +96,10 @@ static int render_state_setup(struct render_state *so) s = lower_32_bits(r); if (so-gen = 8) { if (i + 1 = rodata-batch_items || - rodata-batch[i + 1] != 0) - return -EINVAL; + rodata-batch[i + 1] != 0) { + ret = -EINVAL; + goto err_out; + } d[i++] = s; s = upper_32_bits(r); @@ -120,6 +122,10 @@ static int render_state_setup(struct render_state *so) } return 0; + +err_out: + kunmap(page); + return ret; } void i915_gem_render_state_fini(struct render_state *so) Looks good to me, Reviewed-by: Arun Siluvery arun.siluv...@linux.intel.com regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC 0/2] Add Pooled EU support
On 11/07/2015 20:09, Chris Wilson wrote: On Sat, Jul 11, 2015 at 08:05:05PM +0100, Chris Wilson wrote: On Fri, Jul 10, 2015 at 06:35:18PM +0100, Arun Siluvery wrote: These patches enabled Pooled EU support for BXT, they are implemented by Armin Reese. I am sending these patches in its current form for comments. These patches modify Golden batch to have a set of modification values where we can change the commands based on Gen. The commands to enable Pooled EU are inserted after MI_BATCH_BUFFER_END. If the given Gen supports this feature, modification values are used to replace MI_BATCH_BUFFER_END so we send commands to enable Pooled EU. These commands need to be part of this batch because they are to be initialized only once. Userspace will have option to query the availability of this feature, those changes are not included in this series. Would it not just be simpler to execute 2 batches? First holding the basic and common state for the gen, the second using subgen. That we have a chunk of binary data is nasty, but at least we can point to the generator and be able to decipher it and recreate it as required. Doing binary patching on top, on that path lies madness. I like this idea of sending 2 batches if that is acceptable. In this case we don't have to touch the golden batch and hence the generator tool and also not worry about using the correct binary blob as header. the setup in this case would be, 1. send golden batch 2. prepare and send batch to configure pooled EU as per subslice and EU count Why we have a separate tool in the first place, is it not possible to carry all of them in code or are there any restrictions in doing so? What is the minimum instruction sequence required to be able to setup the default EU state? Is it small enough that carrying it as code in the kernel is viable (and readable)? setting up of pooled EU configuration is only few instructions, it can be added to the driver. (That actually is critical here as currently we have to juggle multiple sources and look very carefully at what is being patched - I am not confident that we will not introduce mistakes in a week's time, let alone a year or two.) The alternative is to just say that the patch table is also autogenerated and for that to be simple and clear, and far more documentated as it relies on a strict protocol. The patch table is also auto generated using intel_null_state_gen tool but it is patched based on Gen. regards Arun -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] drm/i915: Update WaFlushCoherentL3CacheLinesAtContextSwitch
On 10/07/2015 09:25, Dan Carpenter wrote: Hello Arun Siluvery, The patch 9e00084750c0: drm/i915: Update WaFlushCoherentL3CacheLinesAtContextSwitch from Jul 3, 2015, leads to the following static checker warning: drivers/gpu/drm/i915/intel_lrc.c:1188 gen8_init_indirectctx_bb() warn: unsigned 'index' is never less than zero. drivers/gpu/drm/i915/intel_lrc.c 1174 static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring, 1175 struct i915_wa_ctx_bb *wa_ctx, 1176 uint32_t *const batch, 1177 uint32_t *offset) 1178 { 1179 uint32_t scratch_addr; 1180 uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS); 1181 1182 /* WaDisableCtxRestoreArbitration:bdw,chv */ 1183 wa_ctx_emit(batch, index, MI_ARB_ON_OFF | MI_ARB_DISABLE); 1184 1185 /* WaFlushCoherentL3CacheLinesAtContextSwitch:bdw */ 1186 if (IS_BROADWELL(ring-dev)) { 1187 index = gen8_emit_flush_coherentl3_wa(ring, batch, index); 1188 if (index 0) ^ Never true. Thank you for reporting this, I will change it as below. int ret = gen8_emit_flush_coherentl3_wa(ring, batch, index); if (ret 0) return ret; index = ret; regards Arun 1189 return index; 1190 } 1191 regards, dan carpenter ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 1/4] drm/i915: Enable WA batch buffers for Gen9
On 10/07/2015 16:52, Mika Kuoppala wrote: Arun Siluvery arun.siluv...@linux.intel.com writes: This patch only enables support for Gen9, the actual WA will be initialized in subsequent patches. The WARN that we use to warn user if WA batch support is not available for a particular Gen is replaced with DRM_ERROR as warning here doesn't really add much value. v2: include all infrastructure bits in this patch so that subsequent changes only correspond the WA added (Chris) Cc: Imre Deak imre.d...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com The wa_ctx_emits need index as second param now. With those, Reviewed-by: Mika Kuoppala mika.kuopp...@intel.com Thanks Mika, I will send the updated patches. Your r-b tag is for all patches correct? Should I include the tag while sending the patches? regards Arun --- drivers/gpu/drm/i915/intel_lrc.c | 50 +--- 1 file changed, 47 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 23ff018..1e88b3b 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1269,6 +1269,35 @@ static int gen8_init_perctx_bb(struct intel_engine_cs *ring, return wa_ctx_end(wa_ctx, *offset = index, 1); } +static int gen9_init_indirectctx_bb(struct intel_engine_cs *ring, + struct i915_wa_ctx_bb *wa_ctx, + uint32_t *const batch, + uint32_t *offset) +{ + uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS); + + /* FIXME: Replace me with WA */ + wa_ctx_emit(batch, MI_NOOP); + + /* Pad to end of cacheline */ + while (index % CACHELINE_DWORDS) + wa_ctx_emit(batch, MI_NOOP); + + return wa_ctx_end(wa_ctx, *offset = index, CACHELINE_DWORDS); +} + +static int gen9_init_perctx_bb(struct intel_engine_cs *ring, + struct i915_wa_ctx_bb *wa_ctx, + uint32_t *const batch, + uint32_t *offset) +{ + uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS); + + wa_ctx_emit(batch, MI_BATCH_BUFFER_END); + + return wa_ctx_end(wa_ctx, *offset = index, 1); +} + static int lrc_setup_wa_ctx_obj(struct intel_engine_cs *ring, u32 size) { int ret; @@ -1310,10 +1339,11 @@ static int intel_init_workaround_bb(struct intel_engine_cs *ring) WARN_ON(ring-id != RCS); /* update this when WA for higher Gen are added */ - if (WARN(INTEL_INFO(ring-dev)-gen 8, -WA batch buffer is not initialized for Gen%d\n, -INTEL_INFO(ring-dev)-gen)) + if (INTEL_INFO(ring-dev)-gen 9) { + DRM_ERROR(WA batch buffer is not initialized for Gen%d\n, + INTEL_INFO(ring-dev)-gen); return 0; + } /* some WA perform writes to scratch page, ensure it is valid */ if (ring-scratch.obj == NULL) { @@ -1345,6 +1375,20 @@ static int intel_init_workaround_bb(struct intel_engine_cs *ring) offset); if (ret) goto out; + } else if (INTEL_INFO(ring-dev)-gen == 9) { + ret = gen9_init_indirectctx_bb(ring, + wa_ctx-indirect_ctx, + batch, + offset); + if (ret) + goto out; + + ret = gen9_init_perctx_bb(ring, + wa_ctx-per_ctx, + batch, + offset); + if (ret) + goto out; } out: -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCHv7] drm/i915: Added Programming of the MOCS
On 07/07/2015 20:13, Francisco Jerez wrote: From: Peter Antoine peter.anto...@intel.com This change adds the programming of the MOCS registers to the gen 9+ platforms. This change set programs the MOCS register values to a set of values that are defined to be optimal. It creates a fixed register set that is programmed across the different engines so that all engines have the same table. This is done as the main RCS context only holds the registers for itself and the shared L3 values. By trying to keep the registers consistent across the different engines it should make the programming for the registers consistent. v2: -'static const' for private data structures and style changes.(Matt Turner) v3: - Make the tables slightly more readable. (Damien Lespiau) - Updated tables fix performance regression. v4: - Code formatting. (Chris Wilson) - re-privatised mocs code. (Daniel Vetter) v5: - Changed the name of a function. (Chris Wilson) v6: - re-based - Added Mesa table entry (skylake broxton) (Francisco Jerez) - Tidied up the readability defines (Francisco Jerez) - NUMBER of entries defines wrong. (Jim Bish) - Added comments to clear up the meaning of the tables (Jim Bish) Signed-off-by: Peter Antoine peter.anto...@intel.com v7 (Francisco Jerez): - Don't write L3-specific MOCS_ESC/SCC values into the e/LLC control tables. Prefix L3-specific defines consistently with L3_ and e/LLC-specific defines with LE_ to avoid this kind of confusion in the future. - Change L3CC WT define back to RESERVED (matches my hardware documentation and the original patch, probably a misunderstanding of my own previous comment). - Drop Android tables, define new minimal tables more suitable for the open source stack. - Add comment that the MOCS tables are part of the kernel ABI. - Move intel_logical_ring_begin() and _advance() calls one level down (Chris Wilson). - Minor formatting and style fixes. Signed-off-by: Francisco Jerez curroje...@riseup.net --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/i915_reg.h | 9 ++ drivers/gpu/drm/i915/intel_lrc.c | 11 +- drivers/gpu/drm/i915/intel_lrc.h | 1 + drivers/gpu/drm/i915/intel_mocs.c | 324 ++ drivers/gpu/drm/i915/intel_mocs.h | 57 +++ 6 files changed, 401 insertions(+), 2 deletions(-) create mode 100644 drivers/gpu/drm/i915/intel_mocs.c create mode 100644 drivers/gpu/drm/i915/intel_mocs.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index de21965..e52e012 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -36,6 +36,7 @@ i915-y += i915_cmd_parser.o \ i915_trace_points.o \ intel_hotplug.o \ intel_lrc.o \ + intel_mocs.o \ intel_ringbuffer.o \ intel_uncore.o diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 2a29bcc..9b17260 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -7906,4 +7906,13 @@ enum skl_disp_power_wells { #define _PALETTE_A (dev_priv-info.display_mmio_offset + 0xa000) #define _PALETTE_B (dev_priv-info.display_mmio_offset + 0xa800) +/* MOCS (Memory Object Control State) registers */ +#define GEN9_LNCFCMOCS0(0xB020)/* L3 Cache Control base */ + +#define GEN9_GFX_MOCS_0(0xc800)/* Graphics MOCS base register*/ +#define GEN9_MFX0_MOCS_0 (0xc900)/* Media 0 MOCS base register*/ +#define GEN9_MFX1_MOCS_0 (0xcA00)/* Media 1 MOCS base register*/ +#define GEN9_VEBOX_MOCS_0 (0xcB00)/* Video MOCS base register*/ +#define GEN9_BLT_MOCS_0(0xcc00)/* Blitter MOCS base register*/ + #endif /* _I915_REG_H_ */ diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index d4f8b43..466d17c 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -135,6 +135,7 @@ #include drm/drmP.h #include drm/i915_drm.h #include i915_drv.h +#include intel_mocs.h #define GEN9_LR_CONTEXT_RENDER_SIZE (22 * PAGE_SIZE) #define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE) @@ -772,8 +773,7 @@ static int logical_ring_prepare(struct drm_i915_gem_request *req, int bytes) * * Return: non-zero if the ringbuffer is not ready to be written to. */ -static int intel_logical_ring_begin(struct drm_i915_gem_request *req, - int num_dwords) +int intel_logical_ring_begin(struct drm_i915_gem_request *req, int num_dwords) { struct drm_i915_private *dev_priv; int ret; @@ -1675,6 +1675,13 @@ static int gen8_init_rcs_context(struct drm_i915_gem_request *req) if (ret) return ret; + /* +* Failing to program the MOCS is non-fatal.The system will not +* run at peak performance. So generate a warning and carry on. +*/ + if
Re: [Intel-gfx] [PATCH] drm/i915: Update WaFlushCoherentL3CacheLinesAtContextSwitch
On 06/07/2015 12:52, Dave Gordon wrote: On 03/07/15 16:42, Chris Wilson wrote: On Fri, Jul 03, 2015 at 02:27:31PM +0100, Arun Siluvery wrote: In this WA we need to set GEN8_L3SQCREG4[21:21] and reset it after PIPE_CONTROL instruction but there is a slight complication as this is applied in WA batch where the values are only initialized once. Dave identified an issue with the current implementation where the register value is read once at the beginning and it is reused; this patch corrects this by saving the register value to memory, update register with the bit of our interest and restore it back with original value. This implementation uses MI_LOAD_REGISTER_MEM which is currently only used by command parser and was using a default length of 0. This is now updated with correct length and moved to appropriate place. Cc: Chris Wilson ch...@chris-wilson.co.uk Cc: Dave Gordon david.s.gor...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_cmd_parser.c | 6 +-- drivers/gpu/drm/i915/i915_reg.h| 3 +- drivers/gpu/drm/i915/intel_lrc.c | 72 +- 3 files changed, 58 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index 306d9e4..430571b 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -131,7 +131,7 @@ static const struct drm_i915_cmd_descriptor common_cmds[] = { .mask = MI_GLOBAL_GTT, .expected = 0, }}, ), - CMD( MI_LOAD_REGISTER_MEM, SMI, !F, 0xFF, W | B, + CMD( MI_LOAD_REGISTER_MEM(1), SMI, !F, 0xFF, W | B, .reg = { .offset = 1, .mask = 0x007C }, .bits = {{ .offset = 0, @@ -1021,7 +1021,7 @@ static bool check_cmd(const struct intel_engine_cs *ring, * only MI_LOAD_REGISTER_IMM commands. */ if (reg_addr == OACONTROL) { - if (desc-cmd.value == MI_LOAD_REGISTER_MEM) { + if (desc-cmd.value == MI_LOAD_REGISTER_MEM(1)) { I had a double take here, but it all comes out in the wash. For one moment, I thought the cmd matching had changed, but that has the length masked out. Reviewed-by: Chris Wilson ch...@cris-wilson.co.uk Who will start to complain about all the extra frequent register writes, probably into common power wells -Chris Hmm ... that is quite confusing, especially as the actual opcode in the instruction stream will be MI_LOAD_REGISTER_MEM(2) on GEN8+. It might true, but cmd parser is only upto GEN7. regards Arun almost be better to use MI_LOAD_REGISTER_MEM(0) to emphasise that the length field is a wildcard and not something that will be matched exactly. .Dave. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/4] drm/i915: Enable WA batch buffers for Gen9
On 03/07/2015 17:57, Chris Wilson wrote: On Fri, Jul 03, 2015 at 05:53:38PM +0100, Arun Siluvery wrote: This patch only enables support for Gen9, the actual WA will be initialized in subsequent patches. The WARN that we use to warn user if WA batch support is not available for a particular Gen is replaced with DRM_ERROR as warning here doesn't really add much value. Cc: Imre Deak imre.d...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/intel_lrc.c | 41 +--- 1 file changed, 38 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 23ff018..927f395 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1269,6 +1269,26 @@ static int gen8_init_perctx_bb(struct intel_engine_cs *ring, return wa_ctx_end(wa_ctx, *offset = index, 1); } +static int gen9_init_indirectctx_bb(struct intel_engine_cs *ring, + struct i915_wa_ctx_bb *wa_ctx, + uint32_t *const batch, + uint32_t *offset) +{ + /* FIXME: Replace me with WA */ Do the same int index = wa_ctx_begin(); wa_ctx_emit(MI_BATCH_BUFFER_END) (and MI_NOOP for perctx) return wa_ctx_end() you did for gen8. That way the series doesn't suddenly break halfway through (or just after the first patch) and we can check the infrastructure in situ, and the actual wa separately later. (forgot to reply-all) right, will update it along with other review comments, thanks. regards Arun -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH resend 3/5] drm/i915: Enable resource streamer on Execlists
On 16/06/2015 11:39, Abdiel Janulgue wrote: GEN8 and above uses Execlists by default instead of the legacy ringbuffer for batch execution. This patch enables the resource streamer bits when required. Patch is based on the initial work by Minu Mathai minu.mat...@intel.com This version also adds the required bits to enable GEN8 Resource Streamer context save and restore for Execlists. Cc: ville.syrj...@linux.intel.com Signed-off-by: Abdiel Janulgue abdiel.janul...@linux.intel.com --- drivers/gpu/drm/i915/intel_lrc.c | 8 ++-- drivers/gpu/drm/i915/intel_lrc.h | 1 + 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index fcb074b..b015e96 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1172,7 +1172,10 @@ static int gen8_emit_bb_start(struct intel_ringbuffer *ringbuf, return ret; /* FIXME(BDW): Address space and security selectors. */ - intel_logical_ring_emit(ringbuf, MI_BATCH_BUFFER_START_GEN8 | (ppgtt8)); + intel_logical_ring_emit(ringbuf, MI_BATCH_BUFFER_START_GEN8 | + (ppgtt8) | + (dispatch_flags I915_DISPATCH_RS ? +MI_BATCH_RESOURCE_STREAMER : 0)); intel_logical_ring_emit(ringbuf, lower_32_bits(offset)); intel_logical_ring_emit(ringbuf, upper_32_bits(offset)); intel_logical_ring_emit(ringbuf, MI_NOOP); @@ -1726,7 +1729,8 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o reg_state[CTX_CONTEXT_CONTROL] = RING_CONTEXT_CONTROL(ring); reg_state[CTX_CONTEXT_CONTROL+1] = _MASKED_BIT_ENABLE(CTX_CTRL_INHIBIT_SYN_CTX_SWITCH | - CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT); + CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT | + CTX_CTRL_RS_CTX_ENABLE); reg_state[CTX_RING_HEAD] = RING_HEAD(ring-mmio_base); reg_state[CTX_RING_HEAD+1] = 0; reg_state[CTX_RING_TAIL] = RING_TAIL(ring-mmio_base); diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h index adb731e4..de6087a 100644 --- a/drivers/gpu/drm/i915/intel_lrc.h +++ b/drivers/gpu/drm/i915/intel_lrc.h @@ -32,6 +32,7 @@ #define RING_CONTEXT_CONTROL(ring)((ring)-mmio_base+0x244) #define CTX_CTRL_INHIBIT_SYN_CTX_SWITCH (1 3) #define CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT (1 0) +#define CTX_CTRL_RS_CTX_ENABLE(1 1) #define RING_CONTEXT_STATUS_BUF(ring) ((ring)-mmio_base+0x370) #define RING_CONTEXT_STATUS_PTR(ring) ((ring)-mmio_base+0x3a0) looks good to me, Reviewed-by: Arun Siluvery arun.siluv...@linux.intel.com regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v6 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround
On 22/06/2015 17:59, Siluvery, Arun wrote: On 22/06/2015 17:21, Ville Syrjälä wrote: On Fri, Jun 19, 2015 at 06:37:15PM +0100, Arun Siluvery wrote: In Per context w/a batch buffer, WaRsRestoreWithPerCtxtBb This WA performs writes to scratch page so it must be valid, this check is performed before initializing the batch with this WA. v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions so as to not break any future users of existing definitions (Michel) v3: Length defined in current definitions of LRM, LRR instructions was specified as 0. It seems it is common convention for instructions whose length vary between platforms. This is not an issue so far because they are not used anywhere except command parser; now that we use in this patch update them with correct length and also move them out of command parser placeholder to appropriate place. remove unnecessary padding and follow the WA programming sequence exactly as mentioned in spec which is essential for this WA (Dave). Cc: Chris Wilson ch...@chris-wilson.co.uk Cc: Dave Gordon david.s.gor...@intel.com Signed-off-by: Rafael Barbalho rafael.barba...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_reg.h | 29 +++-- drivers/gpu/drm/i915/intel_lrc.c | 54 2 files changed, 81 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 7637e64..208620d 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -347,6 +347,31 @@ #define MI_INVALIDATE_BSD (17) #define MI_FLUSH_DW_USE_GTT(12) #define MI_FLUSH_DW_USE_PPGTT (02) +#define MI_LOAD_REGISTER_MEMMI_INSTR(0x29, 1) +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2) +#define MI_LRM_USE_GLOBAL_GTT (122) +#define MI_LRM_ASYNC_MODE_ENABLE (121) +#define MI_LOAD_REGISTER_REGMI_INSTR(0x2A, 1) +#define MI_ATOMIC(len) MI_INSTR(0x2F, (len-2)) +#define MI_ATOMIC_MEMORY_TYPE_GGTT (122) +#define MI_ATOMIC_INLINE_DATA(118) +#define MI_ATOMIC_CS_STALL (117) +#define MI_ATOMIC_RETURN_DATA_CTL(116) +#define MI_ATOMIC_OP_MASK(op) ((op) 8) +#define MI_ATOMIC_AND MI_ATOMIC_OP_MASK(0x01) +#define MI_ATOMIC_OR MI_ATOMIC_OP_MASK(0x02) +#define MI_ATOMIC_XOR MI_ATOMIC_OP_MASK(0x03) +#define MI_ATOMIC_MOVE MI_ATOMIC_OP_MASK(0x04) +#define MI_ATOMIC_INC MI_ATOMIC_OP_MASK(0x05) +#define MI_ATOMIC_DEC MI_ATOMIC_OP_MASK(0x06) +#define MI_ATOMIC_ADD MI_ATOMIC_OP_MASK(0x07) +#define MI_ATOMIC_SUB MI_ATOMIC_OP_MASK(0x08) +#define MI_ATOMIC_RSUB MI_ATOMIC_OP_MASK(0x09) +#define MI_ATOMIC_IMAX MI_ATOMIC_OP_MASK(0x0A) +#define MI_ATOMIC_IMIN MI_ATOMIC_OP_MASK(0x0B) +#define MI_ATOMIC_UMAX MI_ATOMIC_OP_MASK(0x0C) +#define MI_ATOMIC_UMIN MI_ATOMIC_OP_MASK(0x0D) + #define MI_BATCH_BUFFER MI_INSTR(0x30, 1) #define MI_BATCH_NON_SECURE(1) /* for snb/ivb/vlv this also means batch in ppgtt when ppgtt is enabled. */ @@ -451,8 +476,6 @@ #define MI_CLFLUSH MI_INSTR(0x27, 0) #define MI_REPORT_PERF_COUNTMI_INSTR(0x28, 0) #define MI_REPORT_PERF_COUNT_GGTT (10) -#define MI_LOAD_REGISTER_MEMMI_INSTR(0x29, 0) -#define MI_LOAD_REGISTER_REGMI_INSTR(0x2A, 0) #define MI_RS_STORE_DATA_IMMMI_INSTR(0x2B, 0) #define MI_LOAD_URB_MEM MI_INSTR(0x2C, 0) #define MI_STORE_URB_MEMMI_INSTR(0x2D, 0) @@ -1799,6 +1822,8 @@ enum skl_disp_power_wells { #define GEN8_RC_SEMA_IDLE_MSG_DISABLE (1 12) #define GEN8_FF_DOP_CLOCK_GATE_DISABLE (110) +#define GEN8_RS_PREEMPT_STATUS 0x215C + /* Fuse readout registers for GT */ #define CHV_FUSE_GT (VLV_DISPLAY_BASE + 0x2168) #define CHV_FGT_DISABLE_SS0(1 10) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 664455c..28198c4 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1215,11 +1215,65 @@ static int gen8_init_perctx_bb(struct intel_engine_cs *ring, uint32_t *const batch, uint32_t *offset) { + uint32_t scratch_addr; uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS); + /* Actual scratch location is at 128 bytes offset */ + scratch_addr = ring-scratch.gtt_offset + 2*CACHELINE_BYTES; + scratch_addr |= PIPE_CONTROL_GLOBAL_GTT; + /* WaDisableCtxRestoreArbitration:bdw,chv */ wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_ENABLE); + /* +* As per Bspec, to workaround a known HW issue, SW must perform the +* below programming sequence prior to programming MI_BATCH_BUFFER_END. +* +* This is only applicable for Gen8
Re: [Intel-gfx] [PATCH] drm/i915/gen9: fix error path in intel_init_workaround_bb
On 23/06/2015 15:36, Imre Deak wrote: On ti, 2015-06-23 at 15:31 +0100, Chris Wilson wrote: On Tue, Jun 23, 2015 at 05:26:13PM +0300, Imre Deak wrote: On the GEN!=8 error path we call kmap_atomic() which returns in atomic context and then lrc_destroy_wa_ctx_obj() which can be called only in process context. Fix this by preserving the correct cleanup order on this error path. Also convert the WARN to DRM_ERROR the stack trace isn't really useful. Signed-off-by: Imre Deak imre.d...@intel.com --- drivers/gpu/drm/i915/intel_lrc.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 1b50dd7..8bff1a2 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1289,10 +1289,14 @@ static int intel_init_workaround_bb(struct intel_engine_cs *ring) if (ret) goto out; } else { - WARN(INTEL_INFO(ring-dev)-gen = 8, -WA batch buffer is not initialized for Gen%d\n, -INTEL_INFO(ring-dev)-gen); + if (INTEL_INFO(ring-dev)-gen = 8) + DRM_ERROR(WA batch buffer is not initialized for Gen%d\n, + INTEL_INFO(ring-dev)-gen); + Do this test upfront, then we don't have multiple error paths. http://paste.debian.net/255769 I didn't bother moving it, I suppose GEN9 support will be added soon anyway and we get a bit more test coverage on GEN9 meanwhile. But if you insist I can move it. Hi Imre, I sent the following patch with the changes suggested by Chris. https://patchwork.kernel.org/patch/6661891/ Since you sent it first, my patch can be ignored if your patch is updated. regards Arun -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/gen9: fix error path in intel_init_workaround_bb
On 23/06/2015 17:01, Chris Wilson wrote: On Tue, Jun 23, 2015 at 06:58:42PM +0300, Imre Deak wrote: On ti, 2015-06-23 at 16:44 +0100, Chris Wilson wrote: On Tue, Jun 23, 2015 at 06:18:21PM +0300, Imre Deak wrote: On ti, 2015-06-23 at 16:13 +0100, Siluvery, Arun wrote: On 23/06/2015 15:36, Imre Deak wrote: On ti, 2015-06-23 at 15:31 +0100, Chris Wilson wrote: On Tue, Jun 23, 2015 at 05:26:13PM +0300, Imre Deak wrote: On the GEN!=8 error path we call kmap_atomic() which returns in atomic context and then lrc_destroy_wa_ctx_obj() which can be called only in process context. Fix this by preserving the correct cleanup order on this error path. Also convert the WARN to DRM_ERROR the stack trace isn't really useful. Signed-off-by: Imre Deak imre.d...@intel.com --- drivers/gpu/drm/i915/intel_lrc.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 1b50dd7..8bff1a2 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1289,10 +1289,14 @@ static int intel_init_workaround_bb(struct intel_engine_cs *ring) if (ret) goto out; } else { - WARN(INTEL_INFO(ring-dev)-gen = 8, -WA batch buffer is not initialized for Gen%d\n, -INTEL_INFO(ring-dev)-gen); + if (INTEL_INFO(ring-dev)-gen = 8) + DRM_ERROR(WA batch buffer is not initialized for Gen%d\n, + INTEL_INFO(ring-dev)-gen); + Do this test upfront, then we don't have multiple error paths. http://paste.debian.net/255769 I didn't bother moving it, I suppose GEN9 support will be added soon anyway and we get a bit more test coverage on GEN9 meanwhile. But if you insist I can move it. Hi Imre, I sent the following patch with the changes suggested by Chris. https://patchwork.kernel.org/patch/6661891/ Since you sent it first, my patch can be ignored if your patch is updated. I'm fine applying your patch, but I would ask to convert the WARN to DRM_ERROR. The stack trace doesn't add much to the error message and the WARN is needlessly verbose now on BXT,SKL.. I presumed Arun choose WARN because we are missing w/a and wanted someone to step forward and prove the fixes? Imo it's unnecessarily verbose, during development when loading the driver I know that things are mostly ok if I can't see any such backtraces. But no strong opinion, I can also change this locally. Error message can easily get lost and also it is not an error to not apply these WA which is why we also continue. I thought WARN will probably get more attention and help in adding missing WA quickly. regards Arun An alternative would be to provide the stub wa_bb emission functions so that future wa need only start with a plain copy'n'paste. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v6 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
On 22/06/2015 16:36, Daniel Vetter wrote: On Fri, Jun 19, 2015 at 06:50:36PM +0100, Chris Wilson wrote: On Fri, Jun 19, 2015 at 06:37:10PM +0100, Arun Siluvery wrote: Some of the WA are to be applied during context save but before restore and some at the end of context save/restore but before executing the instructions in the ring, WA batch buffers are created for this purpose and these WA cannot be applied using normal means. Each context has two registers to load the offsets of these batch buffers. If they are non-zero, HW understands that it need to execute these batches. v1: In this version two separate ring_buffer objects were used to load WA instructions for indirect and per context batch buffers and they were part of every context. v2: Chris suggested to include additional page in context and use it to load these WA instead of creating separate objects. This will simplify lot of things as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC is planning to use a similar setup to share data between GuC and driver and WA batch buffers can probably share that page. However after discussions with Dave who is implementing GuC changes, he suggested to use an independent page for the reasons - GuC area might grow and these WA are initialized only once and are not changed afterwards so we can share them share across all contexts. The page is updated with WA during render ring init. This has an advantage of not adding more special cases to default_context. We don't know upfront the number of WA we will applying using these batch buffers. For this reason the size was fixed earlier but it is not a good idea. To fix this, the functions that load instructions are modified to report the no of commands inserted and the size is now calculated after the batch is updated. A macro is introduced to add commands to these batch buffers which also checks for overflow and returns error. We have a full page dedicated for these WA so that should be sufficient for good number of WA, anything more means we have major issues. The list for Gen8 is small, same for Gen9 also, maybe few more gets added going forward but not close to filling entire page. Chris suggested a two-pass approach but we agreed to go with single page setup as it is a one-off routine and simpler code wins. One additional option is offset field which is helpful if we would like to have multiple batches at different offsets within the page and select them based on some criteria. This is not a requirement at this point but could help in future (Dave). Chris provided some helpful macros and suggestions which further simplified the code, they will also help in reducing code duplication when WA for other Gen are added. Add detailed comments explaining restrictions. (Many thanks to Chris, Dave and Thomas for their reviews and inputs) Cc: Chris Wilson ch...@chris-wilson.co.uk Cc: Dave Gordon david.s.gor...@intel.com Signed-off-by: Rafael Barbalho rafael.barba...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com Sigh, after all that, I found one minor thing, but nevertheless Reviewed-by: Chris Wilson ch...@chris-wilson.co.uk +#define wa_ctx_emit(batch, cmd) { \ + if (WARN_ON(index = (PAGE_SIZE / sizeof(uint32_t { \ + return -ENOSPC; \ + } \ + batch[index++] = (cmd); \ + } We should have wrapped this in do { } while(0) - think of all those trialing semicolons we have in the code! Fortunately we haven't used this in a if (foo) wa_ctx_emit(bar); else wa_ctx_emit(baz); yet. Uh yes, this is a critical one. Arun, can you please do a follow-up patch to wrap your macro in a do {} while(0) like Chris suggested? I'll apply the paches meanwhile. Hi Daniel, Already sent the updated patch. I think I got the message-id wrong, the updated patch that I sent is showing up as the last message in this series. regards Arun Thanks, Daniel ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v6 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround
On 22/06/2015 17:21, Ville Syrjälä wrote: On Fri, Jun 19, 2015 at 06:37:15PM +0100, Arun Siluvery wrote: In Per context w/a batch buffer, WaRsRestoreWithPerCtxtBb This WA performs writes to scratch page so it must be valid, this check is performed before initializing the batch with this WA. v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions so as to not break any future users of existing definitions (Michel) v3: Length defined in current definitions of LRM, LRR instructions was specified as 0. It seems it is common convention for instructions whose length vary between platforms. This is not an issue so far because they are not used anywhere except command parser; now that we use in this patch update them with correct length and also move them out of command parser placeholder to appropriate place. remove unnecessary padding and follow the WA programming sequence exactly as mentioned in spec which is essential for this WA (Dave). Cc: Chris Wilson ch...@chris-wilson.co.uk Cc: Dave Gordon david.s.gor...@intel.com Signed-off-by: Rafael Barbalho rafael.barba...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_reg.h | 29 +++-- drivers/gpu/drm/i915/intel_lrc.c | 54 2 files changed, 81 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 7637e64..208620d 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -347,6 +347,31 @@ #define MI_INVALIDATE_BSD (17) #define MI_FLUSH_DW_USE_GTT (12) #define MI_FLUSH_DW_USE_PPGTT (02) +#define MI_LOAD_REGISTER_MEMMI_INSTR(0x29, 1) +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2) +#define MI_LRM_USE_GLOBAL_GTT (122) +#define MI_LRM_ASYNC_MODE_ENABLE (121) +#define MI_LOAD_REGISTER_REGMI_INSTR(0x2A, 1) +#define MI_ATOMIC(len) MI_INSTR(0x2F, (len-2)) +#define MI_ATOMIC_MEMORY_TYPE_GGTT (122) +#define MI_ATOMIC_INLINE_DATA(118) +#define MI_ATOMIC_CS_STALL (117) +#define MI_ATOMIC_RETURN_DATA_CTL(116) +#define MI_ATOMIC_OP_MASK(op) ((op) 8) +#define MI_ATOMIC_AND MI_ATOMIC_OP_MASK(0x01) +#define MI_ATOMIC_OR MI_ATOMIC_OP_MASK(0x02) +#define MI_ATOMIC_XOR MI_ATOMIC_OP_MASK(0x03) +#define MI_ATOMIC_MOVE MI_ATOMIC_OP_MASK(0x04) +#define MI_ATOMIC_INC MI_ATOMIC_OP_MASK(0x05) +#define MI_ATOMIC_DEC MI_ATOMIC_OP_MASK(0x06) +#define MI_ATOMIC_ADD MI_ATOMIC_OP_MASK(0x07) +#define MI_ATOMIC_SUB MI_ATOMIC_OP_MASK(0x08) +#define MI_ATOMIC_RSUB MI_ATOMIC_OP_MASK(0x09) +#define MI_ATOMIC_IMAX MI_ATOMIC_OP_MASK(0x0A) +#define MI_ATOMIC_IMIN MI_ATOMIC_OP_MASK(0x0B) +#define MI_ATOMIC_UMAX MI_ATOMIC_OP_MASK(0x0C) +#define MI_ATOMIC_UMIN MI_ATOMIC_OP_MASK(0x0D) + #define MI_BATCH_BUFFER MI_INSTR(0x30, 1) #define MI_BATCH_NON_SECURE (1) /* for snb/ivb/vlv this also means batch in ppgtt when ppgtt is enabled. */ @@ -451,8 +476,6 @@ #define MI_CLFLUSH MI_INSTR(0x27, 0) #define MI_REPORT_PERF_COUNTMI_INSTR(0x28, 0) #define MI_REPORT_PERF_COUNT_GGTT (10) -#define MI_LOAD_REGISTER_MEMMI_INSTR(0x29, 0) -#define MI_LOAD_REGISTER_REGMI_INSTR(0x2A, 0) #define MI_RS_STORE_DATA_IMMMI_INSTR(0x2B, 0) #define MI_LOAD_URB_MEM MI_INSTR(0x2C, 0) #define MI_STORE_URB_MEMMI_INSTR(0x2D, 0) @@ -1799,6 +1822,8 @@ enum skl_disp_power_wells { #define GEN8_RC_SEMA_IDLE_MSG_DISABLE (1 12) #define GEN8_FF_DOP_CLOCK_GATE_DISABLE (110) +#define GEN8_RS_PREEMPT_STATUS 0x215C + /* Fuse readout registers for GT */ #define CHV_FUSE_GT (VLV_DISPLAY_BASE + 0x2168) #define CHV_FGT_DISABLE_SS0 (1 10) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 664455c..28198c4 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1215,11 +1215,65 @@ static int gen8_init_perctx_bb(struct intel_engine_cs *ring, uint32_t *const batch, uint32_t *offset) { + uint32_t scratch_addr; uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS); + /* Actual scratch location is at 128 bytes offset */ + scratch_addr = ring-scratch.gtt_offset + 2*CACHELINE_BYTES; + scratch_addr |= PIPE_CONTROL_GLOBAL_GTT; + /* WaDisableCtxRestoreArbitration:bdw,chv */ wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_ENABLE); + /* +* As per Bspec, to workaround a known HW issue, SW must perform the +* below programming sequence prior to programming MI_BATCH_BUFFER_END. +* +* This is only applicable for Gen8. +*/ + + /* WaRsRestoreWithPerCtxtBb:bdw,chv */ This w/a doesn't seem to be
Re: [Intel-gfx] [PATCH v6 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
On 22/06/2015 16:41, Daniel Vetter wrote: On Fri, Jun 19, 2015 at 07:07:01PM +0100, Arun Siluvery wrote: Some of the WA are to be applied during context save but before restore and some at the end of context save/restore but before executing the instructions in the ring, WA batch buffers are created for this purpose and these WA cannot be applied using normal means. Each context has two registers to load the offsets of these batch buffers. If they are non-zero, HW understands that it need to execute these batches. v1: In this version two separate ring_buffer objects were used to load WA instructions for indirect and per context batch buffers and they were part of every context. v2: Chris suggested to include additional page in context and use it to load these WA instead of creating separate objects. This will simplify lot of things as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC is planning to use a similar setup to share data between GuC and driver and WA batch buffers can probably share that page. However after discussions with Dave who is implementing GuC changes, he suggested to use an independent page for the reasons - GuC area might grow and these WA are initialized only once and are not changed afterwards so we can share them share across all contexts. The page is updated with WA during render ring init. This has an advantage of not adding more special cases to default_context. We don't know upfront the number of WA we will applying using these batch buffers. For this reason the size was fixed earlier but it is not a good idea. To fix this, the functions that load instructions are modified to report the no of commands inserted and the size is now calculated after the batch is updated. A macro is introduced to add commands to these batch buffers which also checks for overflow and returns error. We have a full page dedicated for these WA so that should be sufficient for good number of WA, anything more means we have major issues. The list for Gen8 is small, same for Gen9 also, maybe few more gets added going forward but not close to filling entire page. Chris suggested a two-pass approach but we agreed to go with single page setup as it is a one-off routine and simpler code wins. One additional option is offset field which is helpful if we would like to have multiple batches at different offsets within the page and select them based on some criteria. This is not a requirement at this point but could help in future (Dave). Chris provided some helpful macros and suggestions which further simplified the code, they will also help in reducing code duplication when WA for other Gen are added. Add detailed comments explaining restrictions. Use do {} while(0) for wa_ctx_emit() macro. (Many thanks to Chris, Dave and Thomas for their reviews and inputs) Cc: Chris Wilson ch...@chris-wilson.co.uk Cc: Dave Gordon david.s.gor...@intel.com Signed-off-by: Rafael Barbalho rafael.barba...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com Reviewed-by: Chris Wilson ch...@chris-wilson.co.uk Why did you resend this one - I don't spot any updates in the commit message? Also when resending please in-reply to the corresponding previous version of that patch, not the cover letter of the series. Hi Daniel, This is the updated patch with do {} while (0). I picked a different message-id of cover letter by mistake which is why it is replied to cover letter instead of the corresponding patch. regards Arun -Daniel --- drivers/gpu/drm/i915/intel_lrc.c| 223 +++- drivers/gpu/drm/i915/intel_ringbuffer.h | 21 +++ 2 files changed, 240 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 0413b8f..0585298 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -211,6 +211,7 @@ enum { FAULT_AND_CONTINUE /* Unsupported */ }; #define GEN8_CTX_ID_SHIFT 32 +#define CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT 0x17 static int intel_lr_context_pin(struct intel_engine_cs *ring, struct intel_context *ctx); @@ -1077,6 +1078,191 @@ static int intel_logical_ring_workarounds_emit(struct intel_engine_cs *ring, return 0; } +#define wa_ctx_emit(batch, cmd) \ + do {\ + if (WARN_ON(index = (PAGE_SIZE / sizeof(uint32_t { \ + return -ENOSPC; \ + } \ + batch[index++] = (cmd); \ + } while (0) + +static inline uint32_t wa_ctx_start(struct i915_wa_ctx_bb *wa_ctx, + uint32_t offset, + uint32_t start_alignment) +{ + return wa_ctx-offset =
Re: [Intel-gfx] [PATCH v6 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround
On 19/06/2015 19:09, Chris Wilson wrote: On Fri, Jun 19, 2015 at 06:37:14PM +0100, Arun Siluvery wrote: In Indirect context w/a batch buffer, WaClearSlmSpaceAtContextSwitch This WA performs writes to scratch page so it must be valid, this check is performed before initializing the batch with this WA. v2: s/PIPE_CONTROL_FLUSH_RO_CACHES/PIPE_CONTROL_FLUSH_L3 (Ville) Cc: Chris Wilson ch...@chris-wilson.co.uk Cc: Dave Gordon david.s.gor...@intel.com Signed-off-by: Rafael Barbalho rafael.barba...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/intel_lrc.c | 16 2 files changed, 17 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index d14ad20..7637e64 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -410,6 +410,7 @@ #define DISPLAY_PLANE_A (020) #define DISPLAY_PLANE_B (120) #define GFX_OP_PIPE_CONTROL(len) ((0x329)|(0x327)|(0x224)|(len-2)) +#define PIPE_CONTROL_FLUSH_L3(127) #define PIPE_CONTROL_GLOBAL_GTT_IVB (124) /* gen7+ */ #define PIPE_CONTROL_MMIO_WRITE (123) #define PIPE_CONTROL_STORE_DATA_INDEX (121) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 3e7aaa9..664455c 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1137,6 +1137,7 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring, uint32_t *const batch, uint32_t *offset) { + uint32_t scratch_addr; uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS); /* WaDisableCtxRestoreArbitration:bdw,chv */ @@ -1165,6 +1166,21 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring, wa_ctx_emit(batch, l3sqc4_flush ~GEN8_LQSC_FLUSH_COHERENT_LINES); } + /* WaClearSlmSpaceAtContextSwitch:bdw,chv */ + /* Actual scratch location is at 128 bytes offset */ + scratch_addr = ring-scratch.gtt_offset + 2*CACHELINE_BYTES; + scratch_addr |= PIPE_CONTROL_GLOBAL_GTT; I thought this bit was now mbz - that's how we treat it elsewhere e.g. gen8_emit_flush_render, and that the address has to be naturally aligned for the target write. (Similar bit in patch 6 fwiw.) you are correct, this bit is mbz. Daniel, could you please remove this line when applying patches? sorry for additional work. + scratch_addr |= PIPE_CONTROL_GLOBAL_GTT; regards Arun -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v6 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround
On 19/06/2015 18:37, Arun Siluvery wrote: In Per context w/a batch buffer, WaRsRestoreWithPerCtxtBb This WA performs writes to scratch page so it must be valid, this check is performed before initializing the batch with this WA. v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions so as to not break any future users of existing definitions (Michel) v3: Length defined in current definitions of LRM, LRR instructions was specified as 0. It seems it is common convention for instructions whose length vary between platforms. This is not an issue so far because they are not used anywhere except command parser; now that we use in this patch update them with correct length and also move them out of command parser placeholder to appropriate place. remove unnecessary padding and follow the WA programming sequence exactly as mentioned in spec which is essential for this WA (Dave). Cc: Chris Wilson ch...@chris-wilson.co.uk Cc: Dave Gordon david.s.gor...@intel.com Signed-off-by: Rafael Barbalho rafael.barba...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_reg.h | 29 +++-- drivers/gpu/drm/i915/intel_lrc.c | 54 2 files changed, 81 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 7637e64..208620d 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -347,6 +347,31 @@ #define MI_INVALIDATE_BSD (17) #define MI_FLUSH_DW_USE_GTT (12) #define MI_FLUSH_DW_USE_PPGTT (02) +#define MI_LOAD_REGISTER_MEMMI_INSTR(0x29, 1) +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2) +#define MI_LRM_USE_GLOBAL_GTT (122) +#define MI_LRM_ASYNC_MODE_ENABLE (121) +#define MI_LOAD_REGISTER_REGMI_INSTR(0x2A, 1) +#define MI_ATOMIC(len) MI_INSTR(0x2F, (len-2)) +#define MI_ATOMIC_MEMORY_TYPE_GGTT (122) +#define MI_ATOMIC_INLINE_DATA(118) +#define MI_ATOMIC_CS_STALL (117) +#define MI_ATOMIC_RETURN_DATA_CTL(116) +#define MI_ATOMIC_OP_MASK(op) ((op) 8) +#define MI_ATOMIC_AND MI_ATOMIC_OP_MASK(0x01) +#define MI_ATOMIC_OR MI_ATOMIC_OP_MASK(0x02) +#define MI_ATOMIC_XOR MI_ATOMIC_OP_MASK(0x03) +#define MI_ATOMIC_MOVE MI_ATOMIC_OP_MASK(0x04) +#define MI_ATOMIC_INC MI_ATOMIC_OP_MASK(0x05) +#define MI_ATOMIC_DEC MI_ATOMIC_OP_MASK(0x06) +#define MI_ATOMIC_ADD MI_ATOMIC_OP_MASK(0x07) +#define MI_ATOMIC_SUB MI_ATOMIC_OP_MASK(0x08) +#define MI_ATOMIC_RSUB MI_ATOMIC_OP_MASK(0x09) +#define MI_ATOMIC_IMAX MI_ATOMIC_OP_MASK(0x0A) +#define MI_ATOMIC_IMIN MI_ATOMIC_OP_MASK(0x0B) +#define MI_ATOMIC_UMAX MI_ATOMIC_OP_MASK(0x0C) +#define MI_ATOMIC_UMIN MI_ATOMIC_OP_MASK(0x0D) + #define MI_BATCH_BUFFER MI_INSTR(0x30, 1) #define MI_BATCH_NON_SECURE (1) /* for snb/ivb/vlv this also means batch in ppgtt when ppgtt is enabled. */ @@ -451,8 +476,6 @@ #define MI_CLFLUSH MI_INSTR(0x27, 0) #define MI_REPORT_PERF_COUNTMI_INSTR(0x28, 0) #define MI_REPORT_PERF_COUNT_GGTT (10) -#define MI_LOAD_REGISTER_MEMMI_INSTR(0x29, 0) -#define MI_LOAD_REGISTER_REGMI_INSTR(0x2A, 0) #define MI_RS_STORE_DATA_IMMMI_INSTR(0x2B, 0) #define MI_LOAD_URB_MEM MI_INSTR(0x2C, 0) #define MI_STORE_URB_MEMMI_INSTR(0x2D, 0) @@ -1799,6 +1822,8 @@ enum skl_disp_power_wells { #define GEN8_RC_SEMA_IDLE_MSG_DISABLE (1 12) #define GEN8_FF_DOP_CLOCK_GATE_DISABLE (110) +#define GEN8_RS_PREEMPT_STATUS 0x215C + /* Fuse readout registers for GT */ #define CHV_FUSE_GT (VLV_DISPLAY_BASE + 0x2168) #define CHV_FGT_DISABLE_SS0 (1 10) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 664455c..28198c4 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1215,11 +1215,65 @@ static int gen8_init_perctx_bb(struct intel_engine_cs *ring, uint32_t *const batch, uint32_t *offset) { + uint32_t scratch_addr; uint32_t index = wa_ctx_start(wa_ctx, *offset, CACHELINE_DWORDS); + /* Actual scratch location is at 128 bytes offset */ + scratch_addr = ring-scratch.gtt_offset + 2*CACHELINE_BYTES; + scratch_addr |= PIPE_CONTROL_GLOBAL_GTT; + Daniel, could you please remove this line when applying this patch? sorry for additional work. + scratch_addr |= PIPE_CONTROL_GLOBAL_GTT; regards Arun /* WaDisableCtxRestoreArbitration:bdw,chv */ wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_ENABLE); + /* +* As per Bspec, to workaround a known HW issue, SW must perform the +* below programming sequence prior to programming MI_BATCH_BUFFER_END. +* +* This is only applicable for Gen8.
Re: [Intel-gfx] [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
On 19/06/2015 10:27, Chris Wilson wrote: On Thu, Jun 18, 2015 at 06:33:24PM +0100, Arun Siluvery wrote: Totally minor worries now. +/** + * gen8_init_indirectctx_bb() - initialize indirect ctx batch with WA + * + * @ring: only applicable for RCS + * @wa_ctx_batch: page in which WA are loaded + * @offset: This is for future use in case if we would like to have multiple + * batches at different offsets and select them based on a criteria. + * @num_dwords: The number of WA applied are known at the beginning, it returns + * the no of DWORDS written. This batch does not contain MI_BATCH_BUFFER_END + * so it adds padding to make it cacheline aligned. MI_BATCH_BUFFER_END will be + * added to perctx batch and both of them together makes a complete batch buffer. + * + * Return: non-zero if we exceed the PAGE_SIZE limit. + */ + +static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring, + uint32_t **wa_ctx_batch, + uint32_t offset, + uint32_t *num_dwords) +{ + uint32_t index; + uint32_t *batch = *wa_ctx_batch; + + index = offset; I worry that offset need not be cacheline aligned on entry (for example if indirectctx and perctx were switched, or someone else stuffed more controls into the per-ring object). Like perctx, there is no mention of any alignment worries for the starting location, but here you tell use that the INDIRECT_CTX length is specified in cacheline, so I also presume the start needs to be aligned. offset need to be cachealigned, I will update the comments. +static int lrc_setup_wa_ctx_obj(struct intel_engine_cs *ring, u32 size) +{ + int ret; + + WARN_ON(ring-id != RCS); + + ring-wa_ctx.obj = i915_gem_alloc_object(ring-dev, PAGE_ALIGN(size)); + if (!ring-wa_ctx.obj) { + DRM_DEBUG_DRIVER(alloc LRC WA ctx backing obj failed.\n); + return -ENOMEM; + } + + ret = i915_gem_obj_ggtt_pin(ring-wa_ctx.obj, PAGE_SIZE, 0); One day _pin() will return the vma being pinned and I will rejoice as it makes reviewing pinning much easier! Not a problem for you right now. +static int intel_init_workaround_bb(struct intel_engine_cs *ring) +{ + int ret = 0; + uint32_t *batch; + uint32_t num_dwords; + struct page *page; + struct i915_ctx_workarounds *wa_ctx = ring-wa_ctx; + + WARN_ON(ring-id != RCS); + + if (ring-scratch.obj == NULL) { + DRM_ERROR(scratch page not allocated for %s\n, ring-name); + return -EINVAL; + } I haven't found the dependence upon scratch.obj, could you explain it? Does it appear later? yes it does in patch 2 which rearranges init_pipe_control(). I will move this check to that patch as per your comment. @@ -1754,15 +1934,26 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o reg_state[CTX_SECOND_BB_STATE] = ring-mmio_base + 0x118; reg_state[CTX_SECOND_BB_STATE+1] = 0; if (ring-id == RCS) { - /* TODO: according to BSpec, the register state context -* for CHV does not have these. OTOH, these registers do -* exist in CHV. I'm waiting for a clarification */ reg_state[CTX_BB_PER_CTX_PTR] = ring-mmio_base + 0x1c0; reg_state[CTX_BB_PER_CTX_PTR+1] = 0; reg_state[CTX_RCS_INDIRECT_CTX] = ring-mmio_base + 0x1c4; reg_state[CTX_RCS_INDIRECT_CTX+1] = 0; reg_state[CTX_RCS_INDIRECT_CTX_OFFSET] = ring-mmio_base + 0x1c8; reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] = 0; + if (ring-wa_ctx.obj) { + reg_state[CTX_RCS_INDIRECT_CTX+1] = + (i915_gem_obj_ggtt_offset(ring-wa_ctx.obj) + +ring-wa_ctx.indctx_batch_offset * sizeof(uint32_t)) | + (ring-wa_ctx.indctx_batch_size / CACHELINE_DWORDS); Ok, this really does impose alignment conditions on indctx_batch_offset Oh, if I do get a chance to complain, spell out indirect_ctx, make it a struct or even just precalculate the reg value, just indctx's only value is that is the same length as perctx, but otherwise quite obtuse. variable names were getting too long and caused difficulties in indentation so tried to shorten them, will change this part. regards Arun Other than that, I couldn't punch any holes in its robustness, and the series is starting to look quite good and very neat. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Initialize HWS page address after GPU reset
On 15/06/2015 06:20, Daniel Vetter wrote: On Wed, Jun 3, 2015 at 6:14 PM, Ville Syrjälä ville.syrj...@linux.intel.com wrote: I was going to suggest removing the same thing from the lrc_setup_hardware_status_page(), but after another look it seems we sometimes call .init_hw() before the context setup. Would be nice to have a more consistent sequence for init and reset. But anyway the patch looks OK to me. I verified that we indeed lose this register on GPU reset. Yep, this is a mess. And historically _any_ difference between driver load and gpu reset (or resume fwiw) has lead to hilarious bugs, so this difference is really troubling to me. Arun, can you please work on a patch to unify the setup sequence here, so that both driver load gpu resets work the same way? By the time we're calling gem_init_hw the default context should have been created already, and hence we should be able to write to HWS_PGA in ring-init_hw only. Hi Daniel, I think the problem in this case was the code to init HWS page after reset was missing for Gen8+. For Gen7 we are doing this as part of ring-init_hw. Gen7: i915_reset() +-- i915_gem_init_hw() +-- ring-init_hw() which is init_render_ring() +-- init_ring_common() + intel_ring_setup_status_page() Gen8: i915_reset() +-- i915_gem_init_hw() +-- ring-init_hw() which is gen8_init_render_ring() + gen8_init_common_ring() - I added changes in this function. We could probably use intel_ring_setup_status_page() for both cases, does it have to be Gen7 specific? Also I wonder about resume, where's the HWS_PGA restore for that case? It is covered. i915_drm_resume() +--i915_gem_init_hw regards Arun -Daniel ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v4 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
On 17/06/2015 19:48, Siluvery, Arun wrote: On 16/06/2015 21:25, Chris Wilson wrote: On Tue, Jun 16, 2015 at 08:25:20PM +0100, Arun Siluvery wrote: +static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring, + uint32_t offset, + uint32_t *num_dwords) +{ + uint32_t index; + struct page *page; + uint32_t *cmd; + + page = i915_gem_object_get_page(ring-wa_ctx.obj, 0); + cmd = kmap_atomic(page); + + index = offset; + + /* FIXME: fill one cacheline with NOOPs. +* Replace these instructions with WA +*/ + while (index (offset + 16)) + cmd[index++] = MI_NOOP; + + /* +* MI_BATCH_BUFFER_END is not required in Indirect ctx BB because +* execution depends on the length specified in terms of cache lines +* in the register CTX_RCS_INDIRECT_CTX +*/ + + kunmap_atomic(cmd); + + if (index (PAGE_SIZE / sizeof(uint32_t))) + return -EINVAL; Check before you GPF! You just overran the buffer and corrupted memory, if you didn't succeed in trapping a segfault. To be generic, align to the cacheline then check you have enough room for your own data. -Chris Hi Chris, The placement of condition is not correct. I don't completely follow your suggestion, could you please elaborate; here we don't know upfront how much more data to be written. I have made below changes to check after writing every command and return error as soon as we reach the end. #define wa_ctx_emit(batch, cmd) { \ if (WARN_ON(index = (PAGE_SIZE / sizeof(uint32_t { \ kunmap_atomic(batch); \ return -ENOSPC;\ } \ batch[index++] = (cmd);\ } is this acceptable? I think this is the only one issue, all other comments are addressed. one other improvement is possible - mapping/unmapping page can be kept in common path, will update the patch accordingly. regards Arun regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v4 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
On 16/06/2015 21:25, Chris Wilson wrote: On Tue, Jun 16, 2015 at 08:25:20PM +0100, Arun Siluvery wrote: +static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring, + uint32_t offset, + uint32_t *num_dwords) +{ + uint32_t index; + struct page *page; + uint32_t *cmd; + + page = i915_gem_object_get_page(ring-wa_ctx.obj, 0); + cmd = kmap_atomic(page); + + index = offset; + + /* FIXME: fill one cacheline with NOOPs. +* Replace these instructions with WA +*/ + while (index (offset + 16)) + cmd[index++] = MI_NOOP; + + /* +* MI_BATCH_BUFFER_END is not required in Indirect ctx BB because +* execution depends on the length specified in terms of cache lines +* in the register CTX_RCS_INDIRECT_CTX +*/ + + kunmap_atomic(cmd); + + if (index (PAGE_SIZE / sizeof(uint32_t))) + return -EINVAL; Check before you GPF! You just overran the buffer and corrupted memory, if you didn't succeed in trapping a segfault. To be generic, align to the cacheline then check you have enough room for your own data. -Chris Hi Chris, The placement of condition is not correct. I don't completely follow your suggestion, could you please elaborate; here we don't know upfront how much more data to be written. I have made below changes to check after writing every command and return error as soon as we reach the end. #define wa_ctx_emit(batch, cmd) { \ if (WARN_ON(index = (PAGE_SIZE / sizeof(uint32_t { \ kunmap_atomic(batch); \ return -ENOSPC;\ } \ batch[index++] = (cmd);\ } is this acceptable? I think this is the only one issue, all other comments are addressed. regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v4 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
On 17/06/2015 21:21, Chris Wilson wrote: On Wed, Jun 17, 2015 at 07:48:16PM +0100, Siluvery, Arun wrote: On 16/06/2015 21:25, Chris Wilson wrote: On Tue, Jun 16, 2015 at 08:25:20PM +0100, Arun Siluvery wrote: +static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring, + uint32_t offset, + uint32_t *num_dwords) +{ + uint32_t index; + struct page *page; + uint32_t *cmd; + + page = i915_gem_object_get_page(ring-wa_ctx.obj, 0); + cmd = kmap_atomic(page); + + index = offset; + + /* FIXME: fill one cacheline with NOOPs. +* Replace these instructions with WA +*/ + while (index (offset + 16)) + cmd[index++] = MI_NOOP; + + /* +* MI_BATCH_BUFFER_END is not required in Indirect ctx BB because +* execution depends on the length specified in terms of cache lines +* in the register CTX_RCS_INDIRECT_CTX +*/ + + kunmap_atomic(cmd); + + if (index (PAGE_SIZE / sizeof(uint32_t))) + return -EINVAL; Check before you GPF! You just overran the buffer and corrupted memory, if you didn't succeed in trapping a segfault. To be generic, align to the cacheline then check you have enough room for your own data. -Chris Hi Chris, The placement of condition is not correct. I don't completely follow your suggestion, could you please elaborate; here we don't know upfront how much more data to be written. Hmm, are we anticipating an unbounded number of workarounds? At some point you have to have a rough upper bound in order to do the bo allocation. If we are really unsure, then we do need to split this into two passes, one to count the number of dwords and the second to allocate and actually fill the cmd[]. Since we have a full page dedicated for this, that should be sufficient for good number of WA; if we need more than one page means we have major issues. The list for Gen8 is small, same for Gen9 also, maybe few more gets added going forward but not close to filling entire page. Some of them will even be restricted to specific steppings/revisions. For these reasons I think a single page setup is sufficient. Do you anticipate any other use cases that require allocating more than one page? Two pass approach can be implemented but it adds unnecessary complexity which may not be required in this case. please let me know your thoughts. I have made below changes to check after writing every command and return error as soon as we reach the end. #define wa_ctx_emit(batch, cmd) { \ if (WARN_ON(index = (PAGE_SIZE / sizeof(uint32_t { \ kunmap_atomic(batch); \ return -ENOSPC;\ } \ batch[index++] = (cmd);\ } is this acceptable? I think this is the only one issue, all other comments are addressed. It's the lesser of evils for sure. Still feel dubious that we don't know upfront how much data we need to allocate. yes, but with single pass approach do you see any way it can be improved? regards Arun -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v4 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
On 16/06/2015 21:33, Chris Wilson wrote: On Tue, Jun 16, 2015 at 08:25:20PM +0100, Arun Siluvery wrote: +static int lrc_setup_wa_ctx_obj(struct intel_engine_cs *ring, u32 size) +{ + int ret; + struct drm_device *dev = ring-dev; You only use it once, keeping it as a local seems counter-intuitive. + WARN_ON(ring-id != RCS); + + size = roundup(size, PAGE_SIZE); Out of curiousity is gcc smart enough to turn this into an ALIGN()? replaced with PAGE_ALIGN(size) + ring-wa_ctx.obj = i915_gem_alloc_object(dev, size); + if (!ring-wa_ctx.obj) { + DRM_DEBUG_DRIVER(alloc LRC WA ctx backing obj failed.\n); + return -ENOMEM; + } + + ret = i915_gem_obj_ggtt_pin(ring-wa_ctx.obj, GEN8_LR_CONTEXT_ALIGN, 0); Strange choice of alignment since we pass around cacheline offsets. this is from the initial version where it was part of context, sorry missed this, replaced with PAGE_SIZE. + if (ret) { + DRM_DEBUG_DRIVER(pin LRC WA ctx backing obj failed: %d\n, +ret); + drm_gem_object_unreference(ring-wa_ctx.obj-base); + return ret; + } + + return 0; +} + +static void lrc_destroy_wa_ctx_obj(struct intel_engine_cs *ring) +{ + WARN_ON(ring-id != RCS); + + i915_gem_object_ggtt_unpin(ring-wa_ctx.obj); + drm_gem_object_unreference(ring-wa_ctx.obj-base); + ring-wa_ctx.obj = NULL; +} + /** * intel_logical_ring_cleanup() - deallocate the Engine Command Streamer * @@ -1474,7 +1612,29 @@ static int logical_render_ring_init(struct drm_device *dev) if (ret) return ret; - return intel_init_pipe_control(ring); + if (INTEL_INFO(ring-dev)-gen = 8) { + ret = lrc_setup_wa_ctx_obj(ring, PAGE_SIZE); + if (ret) { + DRM_DEBUG_DRIVER(Failed to setup context WA page: %d\n, +ret); + return ret; + } + + ret = intel_init_workaround_bb(ring); + if (ret) { + lrc_destroy_wa_ctx_obj(ring); + DRM_ERROR(WA batch buffers are not initialized: %d\n, + ret); + } + } + + ret = intel_init_pipe_control(ring); Did you consider stuffing it into the spare are of the pipe control scatch bo? :) Not exactly but I think it is better to keep them separate. It is not that a single page is not sufficient even if we add more WA in future but for logical reasons. In case if there is any error while initializing these WA we are destroying the page and continuing further which cannot be done with scratch page. regards Arun -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 1/7] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
On 15/06/2015 11:41, Daniel Vetter wrote: On Thu, Jun 04, 2015 at 03:30:56PM +0100, Siluvery, Arun wrote: On 02/06/2015 19:47, Dave Gordon wrote: On 02/06/15 19:36, Siluvery, Arun wrote: On 01/06/2015 11:22, Daniel, Thomas wrote: Indeed, allocating an extra scratch page in the context would simplify vma/mm management. A trick might be to allocate the scratch page at the start, then offset the lrc regs etc - that would then be consistent amongst gen and be easy enough to extend if we need more per-context scratch space in future. -Chris Yes, I think we already have another use for more per-context space at the start. The GuC is planning to do this. Arun, you probably should work with Alex Dai and Dave Gordon to avoid conflicts here. Thomas. Thanks for the heads-up Thomas. I have discussed with Dave and agreed to share this page; GuC probably doesn't need whole page so first half is reserved for it's use and second half is used for WA. I have modified my patches to use context page for applying these WA and don't see any issues. During the discussions Dave proposed another approach. Even though these WA are called per context they are only initialized once and not changed afterwards, same set of WA are applied for each context so instead of adding them in each context, does it make sense to create a separate page and share across all contexts? but of course GuC will anyway add a new page to context so I might as well share that page. Chris/Dave, do you see any problems with sharing page with GuC or you prefer to allocate a separate page for these WA and share across all contexts? Please give your comments. regards Arun I think we have to consider which is more future-proof i.e. which is least likely: (1) the area shared with the GuC grows (definitely still in flux), or (2) workarounds need to be context-specific (possible, but unlikely) So I'd prefer a single area set up just once to contain the pre- and post-context restore workaround batches. If necessary, the one area could contain multiple batches at different offsets, so we could point different contexts at different (shared) batches as required. I think they're unlikely to actually need per-context customisation[*], but there might be a need for different workarounds according to workload type or privilege level or some other criterion ... ? .Dave. [*] unless they need per-context memory addresses coded into them? Considering these WA are initialized only once and not changed afterwards and GuC area probably grows in future which may run into the space used by WA, independent single area setup makes senses. I also checked spec and it is not clear whether any customization is going to be required for different contexts. I have modified patches to setup a single page with WA when default_context is initialized and this is used by all contexts. I will send patches but please let me know if there are any other comments. Yeah if the wa batches aren't ctx specific, then there's really no need to allocate one of them per ctx. One global buffer with all the wa combined should really be all we need. -Daniel Hi Daniel, Agree, this is already taken into account in the next revision v3 (already sent to mailing list). I can see you are still going through the list but when you get there, please let me know if you have any other comments. regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround
On 12/06/2015 18:03, Dave Gordon wrote: On 12/06/15 12:58, Siluvery, Arun wrote: On 09/06/2015 19:43, Dave Gordon wrote: On 05/06/15 14:57, Arun Siluvery wrote: In Per context w/a batch buffer, WaRsRestoreWithPerCtxtBb v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions so as to not break any future users of existing definitions (Michel) Signed-off-by: Rafael Barbalho rafael.barba...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_reg.h | 26 ++ drivers/gpu/drm/i915/intel_lrc.c | 59 2 files changed, 85 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 33b0ff1..6928162 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h [snip] #define MI_LOAD_REGISTER_MEMMI_INSTR(0x29, 0) #define MI_LOAD_REGISTER_REGMI_INSTR(0x2A, 0) +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2) +#define MI_LRM_USE_GLOBAL_GTT (122) +#define MI_LRM_ASYNC_MODE_ENABLE (121) +#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1) Isn't the original definition of MI_LOAD_REGISTER_REG wrong anyway? It's a two-operand instruction, each of which is a one-word MMIO register address, hence always 3 words total. The length bias is 2, so the so-called 'flags' field must be 1. The original definition (where the second argument of the MI_INSTR macro is 0) shouldn't work. So just correct the original definition of MI_LOAD_REGISTER_REG; this isn't something that's actually changed on GEN8. I did notice that the original instructions are odd but thought I might be wrong hence I created new ones to not disturb the original ones. ok I will just correct original one and reuse it. While we're mentioning it, I think the above MI_LOAD_REGISTER_MEM is wrong too. The length should be 1 pre-GEN8, and 2 in GEN8+. ok. #define MI_RS_STORE_DATA_IMMMI_INSTR(0x2B, 0) #define MI_LOAD_URB_MEM MI_INSTR(0x2C, 0) #define MI_STORE_URB_MEMMI_INSTR(0x2D, 0) And these are wrong too! In fact all of these instructions have been added under a comment which says Commands used only by the command parser. Looks like they were added as placeholders without the proper length fields, and then people have started using them as though they were complete definitions :( Time update them all, perhaps ... these are not related to this patch, so it can be taken up as a different patch. As a minimum, you should move these updated #defines out of the section under the comment Commands used only by the command parser and put them in the appropriate place in the regular list of MI_ commnds, preferably in numerical order. Then the ones that are genuinely only used by the command parser could be left for another patch ... [snip] +/* + * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and + * MI_BATCH_BUFFER_END instructions in this sequence need to be + * in the same cacheline. + */ +while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0) +cmd[index++] = MI_NOOP; + +cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 | +MI_LRM_USE_GLOBAL_GTT | +MI_LRM_ASYNC_MODE_ENABLE; +cmd[index++] = INSTPM; +cmd[index++] = scratch_addr; +cmd[index++] = 0; + +/* + * BSpec says there should not be any commands programmed + * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so + * do not add any new commands + */ +cmd[index++] = MI_LOAD_REGISTER_REG_GEN8; +cmd[index++] = GEN8_RS_PREEMPT_STATUS; +cmd[index++] = GEN8_RS_PREEMPT_STATUS; + /* padding */ while (index end) cmd[index++] = MI_NOOP; Where's the MI_BATCH_BUFFER_END referred to in the comment? MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6]. Since the diff context is only few lines it didn't showup in the diff. The second comment above says no commands between LOAD_REG_REG and BB_END, so the point of my comment was that the BB_END is *NOT* immediately after the LOAD_REG_REG -- there are a bunch of no-ops there! true, but they are no-ops so they shouldn't really affect anything. I guess the spec means no valid commands. And therefore also, these instructions do *not* all end up in the same cacheline, thus contradicting the first comment above. I don't understand why. As per the requirement the commands from the first MI_LOAD_REGISTER_MEM_GEN8 (after while) through BB_END will be part of same cacheline (in this case second line). Padding *after* a BB_END would be redundant. yes, I just wanted to keep MI_BATCH_BUFFER_END at the end instead of abruptly terminating the batch which is why I am padding with no-ops, I can change this if that is preferred. .Dave. ___ Intel-gfx mailing list
Re: [Intel-gfx] [PATCH v3 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
On 15/06/2015 16:22, Daniel Vetter wrote: On Fri, Jun 05, 2015 at 12:00:54PM +0100, Chris Wilson wrote: On Fri, Jun 05, 2015 at 11:34:01AM +0100, Arun Siluvery wrote: + /* FIXME: fill unused locations with NOOPs. +* Replace these instructions with WA +*/ +while (index end) + reg_state[index++] = MI_NOOP; I found calling it reg_state was very confusing. Maybe batch, bb, data or cmd? Concurred, reg_state sounds like an mmio dump not a batchbuffer. wa_batch would be my naming bikeshed, but I'd go with either. If this is all that needs changing I can do that while applying patches. -Daniel I have already changed this to cmd. There are some more comments from Dave which I am addressing now, I will send them soon. regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround
On 15/06/2015 18:29, Dave Gordon wrote: On 15/06/15 15:10, Siluvery, Arun wrote: On 12/06/2015 18:03, Dave Gordon wrote: On 12/06/15 12:58, Siluvery, Arun wrote: On 09/06/2015 19:43, Dave Gordon wrote: On 05/06/15 14:57, Arun Siluvery wrote: In Per context w/a batch buffer, WaRsRestoreWithPerCtxtBb v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions so as to not break any future users of existing definitions (Michel) Signed-off-by: Rafael Barbalho rafael.barba...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_reg.h | 26 ++ drivers/gpu/drm/i915/intel_lrc.c | 59 2 files changed, 85 insertions(+) [snip] +/* + * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and + * MI_BATCH_BUFFER_END instructions in this sequence need to be + * in the same cacheline. + */ +while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0) +cmd[index++] = MI_NOOP; + +cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 | +MI_LRM_USE_GLOBAL_GTT | +MI_LRM_ASYNC_MODE_ENABLE; +cmd[index++] = INSTPM; +cmd[index++] = scratch_addr; +cmd[index++] = 0; + +/* + * BSpec says there should not be any commands programmed + * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so + * do not add any new commands + */ +cmd[index++] = MI_LOAD_REGISTER_REG_GEN8; +cmd[index++] = GEN8_RS_PREEMPT_STATUS; +cmd[index++] = GEN8_RS_PREEMPT_STATUS; + /* padding */ while (index end) cmd[index++] = MI_NOOP; Where's the MI_BATCH_BUFFER_END referred to in the comment? MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6]. Since the diff context is only few lines it didn't showup in the diff. The second comment above says no commands between LOAD_REG_REG and BB_END, so the point of my comment was that the BB_END is *NOT* immediately after the LOAD_REG_REG -- there are a bunch of no-ops there! true, but they are no-ops so they shouldn't really affect anything. I guess the spec means no valid commands. I guess the spec means NO COMMANDS. NOOP is a perfectly valid command, and I've even seen cases where a workaround specifically requires a NOOP with the set-no-op-id-register bit set to fix some particular bug. The only special thing about NOOP is that it doesn't get captured in IPEHR. I think the w/a requires this: 0%CLSIZE: ... LRM (reg, addr, 0) LRR (reg, reg) BB_END ... (63%CLSIZE) no gaps, no insertions, all together, all on one cacheline. Those instructions take up 8 DWords (32 bytes) so the sequence doesn't necessarily have to start on a cacheline boundary, as long as it's entirely within the same line. But it's simpler to start on a new line. You seem to have: 0%CLSIZE: LRM (reg, mem, 0) LRR (reg, reg) NOP NOP NOP BB_END so the condition in the comment is not fulfilled. If this works, maybe the comment is wrong. And therefore also, these instructions do *not* all end up in the same cacheline, thus contradicting the first comment above. I don't understand why. As per the requirement the commands from the first MI_LOAD_REGISTER_MEM_GEN8 (after while) through BB_END will be part of same cacheline (in this case second line). OK, they're all in the same line; I didn't look back at the full context enough and thought 'end' would point to the end of the buffer, rather than the end of the cacheline .. because it /does/ point to the end of the buffer, it just happens to be the end of the very same cacheline as well. So I really don't like the way the sizes of the two workaround batches have been defined in terms of cache lines. Also I'm not keen on one bit of code allocating the object and defining the sizes of the sub-areas within it, and then separate functions filling in each of the sequences within these areas, knowing that the areas are /just the right size/. It would be simpler to maintain if the size in cachelines values in lrc_setup_ctx_wa_obj() didn't have to be hand-edited to stay in sync with the number of instructions written by gen8_init_perctx_bb() and gen8_init_indirectctx_bb(). How about having each of these return the number of bytes they've appended to the (u32 *)buffer that they've been given, and let the caller manage mapping/unmapping, alignment, padding, etc, and fill in the size fields accordingly *after* the content has been defined? This is an issue, editing the size if more WA are added is not good, it can be changed as you suggested. regards Arun .Dave. Padding *after* a BB_END would be redundant. yes, I just wanted to keep MI_BATCH_BUFFER_END at the end instead of abruptly terminating the batch which is why I am padding with no-ops, I can change this if that is preferred. .Dave. ___ Intel-gfx
Re: [Intel-gfx] [PATCH v3 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround
On 05/06/2015 15:48, Ville Syrjälä wrote: On Fri, Jun 05, 2015 at 02:56:48PM +0100, Arun Siluvery wrote: In Indirect context w/a batch buffer, +WaFlushCoherentL3CacheLinesAtContextSwitch Signed-off-by: Rafael Barbalho rafael.barba...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/intel_lrc.c | 8 2 files changed, 9 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 84af255..5203c79 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -426,6 +426,7 @@ #define PIPE_CONTROL_INDIRECT_STATE_DISABLE (19) #define PIPE_CONTROL_NOTIFY (18) #define PIPE_CONTROL_FLUSH_ENABLE (17) /* gen7+ */ +#define PIPE_CONTROL_DC_FLUSH_ENABLE (15) #define PIPE_CONTROL_VF_CACHE_INVALIDATE(14) #define PIPE_CONTROL_CONST_CACHE_INVALIDATE (13) #define PIPE_CONTROL_STATE_CACHE_INVALIDATE (12) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index a71eb81..9d8cf65c 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1101,6 +1101,14 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring) /* WaDisableCtxRestoreArbitration:bdw,chv */ cmd[index++] = MI_ARB_ON_OFF | MI_ARB_DISABLE; + /* WaFlushCoherentL3CacheLinesAtContextSwitch:bdw,chv */ + cmd[index++] = GFX_OP_PIPE_CONTROL(6); + cmd[index++] = PIPE_CONTROL_DC_FLUSH_ENABLE; + cmd[index++] = 0; + cmd[index++] = 0; + cmd[index++] = 0; + cmd[index++] = 0; + This looks incomplete. Seems like you should have LRIs around this guy to enable/disable the L3SQCREG4 coherent line flush bit. And chv shouldn't do coherent L3, so this might not be needed there. I checked with HW team and yes I need to add LRIs to enable/disable L3SQCREG4 coherent line flush bit. As you mentioned, it is not required for CHV. Also do we need a CS stall here too? DC Flush Enable 5 Requires stall bit ([20] of DW) set for all GPGPU and Media Workloads. I didn't check the restrictions of this bit, will check again and correc this. regards Arun Supposedly we should add the DC flush to the normal ring flush hooks too. But that's a separate issue. /* padding */ while (index end) cmd[index++] = MI_NOOP; -- 2.3.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround
On 09/06/2015 19:43, Dave Gordon wrote: On 05/06/15 14:57, Arun Siluvery wrote: In Per context w/a batch buffer, WaRsRestoreWithPerCtxtBb v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions so as to not break any future users of existing definitions (Michel) Signed-off-by: Rafael Barbalho rafael.barba...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_reg.h | 26 ++ drivers/gpu/drm/i915/intel_lrc.c | 59 2 files changed, 85 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 33b0ff1..6928162 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h [snip] #define MI_LOAD_REGISTER_MEMMI_INSTR(0x29, 0) #define MI_LOAD_REGISTER_REGMI_INSTR(0x2A, 0) +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2) +#define MI_LRM_USE_GLOBAL_GTT (122) +#define MI_LRM_ASYNC_MODE_ENABLE (121) +#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1) Isn't the original definition of MI_LOAD_REGISTER_REG wrong anyway? It's a two-operand instruction, each of which is a one-word MMIO register address, hence always 3 words total. The length bias is 2, so the so-called 'flags' field must be 1. The original definition (where the second argument of the MI_INSTR macro is 0) shouldn't work. So just correct the original definition of MI_LOAD_REGISTER_REG; this isn't something that's actually changed on GEN8. I did notice that the original instructions are odd but thought I might be wrong hence I created new ones to not disturb the original ones. ok I will just correct original one and reuse it. While we're mentioning it, I think the above MI_LOAD_REGISTER_MEM is wrong too. The length should be 1 pre-GEN8, and 2 in GEN8+. ok. #define MI_RS_STORE_DATA_IMMMI_INSTR(0x2B, 0) #define MI_LOAD_URB_MEM MI_INSTR(0x2C, 0) #define MI_STORE_URB_MEMMI_INSTR(0x2D, 0) And these are wrong too! In fact all of these instructions have been added under a comment which says Commands used only by the command parser. Looks like they were added as placeholders without the proper length fields, and then people have started using them as though they were complete definitions :( Time update them all, perhaps ... these are not related to this patch, so it can be taken up as a different patch. [snip] + /* +* BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and +* MI_BATCH_BUFFER_END instructions in this sequence need to be +* in the same cacheline. +*/ + while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0) + cmd[index++] = MI_NOOP; + + cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 | + MI_LRM_USE_GLOBAL_GTT | + MI_LRM_ASYNC_MODE_ENABLE; + cmd[index++] = INSTPM; + cmd[index++] = scratch_addr; + cmd[index++] = 0; + + /* +* BSpec says there should not be any commands programmed +* between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so +* do not add any new commands +*/ + cmd[index++] = MI_LOAD_REGISTER_REG_GEN8; + cmd[index++] = GEN8_RS_PREEMPT_STATUS; + cmd[index++] = GEN8_RS_PREEMPT_STATUS; + /* padding */ while (index end) cmd[index++] = MI_NOOP; Where's the MI_BATCH_BUFFER_END referred to in the comment? MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6]. Since the diff context is only few lines it didn't showup in the diff. regards Arun .Dave. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode
On 09/06/2015 16:27, Dave Gordon wrote: On 05/06/15 11:34, Arun Siluvery wrote: Some of the WA applied using WA batch buffers perform writes to scratch page. In the current flow WA are initialized before scratch obj is allocated. This patch reorders intel_init_pipe_control() to have a valid scratch obj before we initialize WA. Signed-off-by: Michel Thierry michel.thie...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/intel_lrc.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 0b3422a..20c56e4 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1562,11 +1562,16 @@ static int logical_render_ring_init(struct drm_device *dev) ring-emit_bb_start = gen8_emit_bb_start; ring-dev = dev; + + ret = intel_init_pipe_control(ring); + if (ret) + return ret; + ret = logical_ring_init(dev, ring); if (ret) return ret; - return intel_init_pipe_control(ring); + return 0; } You could squash the last several lines down to just: return logical_ring_init(dev, ring); .Dave. yes but this is updated based on suggestion from Chris to allocate wa_ctx page during ring init itself; in the updated version it is not possible as we need to free that page also if ring_init fails. I sent the updated patches in the same series so as to collate all reviews instead of resending as a separate series. regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
On 05/06/2015 11:56, Chris Wilson wrote: On Fri, Jun 05, 2015 at 11:34:01AM +0100, Arun Siluvery wrote: Some of the WA are to be applied during context save but before restore and some at the end of context save/restore but before executing the instructions in the ring, WA batch buffers are created for this purpose and these WA cannot be applied using normal means. Each context has two registers to load the offsets of these batch buffers. If they are non-zero, HW understands that it need to execute these batches. v1: In this version two separate ring_buffer objects were used to load WA instructions for indirect and per context batch buffers and they were part of every context. v2: Chris suggested to include additional page in context and use it to load these WA instead of creating separate objects. This will simplify lot of things as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC is planning to use a similar setup to share data between GuC and driver and WA batch buffers can probably share that page. However after discussions with Dave who is implementing GuC changes, he suggested to use an independent page for the reasons - GuC area might grow and these WA are initialized only once and are not changed afterwards so we can share them share across all contexts. This version uses this approach. (Thanks to Chris, Dave and Thomas for their inputs) Having moved to a shared wa_ctx for all lrc, I think it makes sense to then do the allocation during ring_init itself, next to the scratch/hws status pages. The advantage is that we don't then need to add more special cases to the default ctx on RCS, and its permanence is far more prominent. It will also be more consistent with calling it ring-wa_ctx. Since you have it already plumbed into ring init/fini, why is it partly done during default ctx init? Maybe all that is required a little bit of code and changelog explanation. -Chris ok, it is possible to do the allocation and setup in logical_ring_init() itself. I wanted to group it with other wa which are setup in init_context(). I will also change s/reg_state/cmd. regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
On 05/06/2015 12:36, Chris Wilson wrote: On Fri, Jun 05, 2015 at 12:24:58PM +0100, Siluvery, Arun wrote: ok, it is possible to do the allocation and setup in logical_ring_init() itself. I wanted to group it with other wa which are setup in init_context(). Phew, I had worried I had missed something. The issue the current split between ring init and default context init raises in my mind is that the content has context dependencies upon it - whereas the w/a so far can be reused globally. So imo the current split is more confusing than just creating the w/a buffer entirely during ring init. -Chris I am moving it to logical_render_ring_init() as these are only applicable for RCS. regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 1/7] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
On 02/06/2015 19:47, Dave Gordon wrote: On 02/06/15 19:36, Siluvery, Arun wrote: On 01/06/2015 11:22, Daniel, Thomas wrote: Indeed, allocating an extra scratch page in the context would simplify vma/mm management. A trick might be to allocate the scratch page at the start, then offset the lrc regs etc - that would then be consistent amongst gen and be easy enough to extend if we need more per-context scratch space in future. -Chris Yes, I think we already have another use for more per-context space at the start. The GuC is planning to do this. Arun, you probably should work with Alex Dai and Dave Gordon to avoid conflicts here. Thomas. Thanks for the heads-up Thomas. I have discussed with Dave and agreed to share this page; GuC probably doesn't need whole page so first half is reserved for it's use and second half is used for WA. I have modified my patches to use context page for applying these WA and don't see any issues. During the discussions Dave proposed another approach. Even though these WA are called per context they are only initialized once and not changed afterwards, same set of WA are applied for each context so instead of adding them in each context, does it make sense to create a separate page and share across all contexts? but of course GuC will anyway add a new page to context so I might as well share that page. Chris/Dave, do you see any problems with sharing page with GuC or you prefer to allocate a separate page for these WA and share across all contexts? Please give your comments. regards Arun I think we have to consider which is more future-proof i.e. which is least likely: (1) the area shared with the GuC grows (definitely still in flux), or (2) workarounds need to be context-specific (possible, but unlikely) So I'd prefer a single area set up just once to contain the pre- and post-context restore workaround batches. If necessary, the one area could contain multiple batches at different offsets, so we could point different contexts at different (shared) batches as required. I think they're unlikely to actually need per-context customisation[*], but there might be a need for different workarounds according to workload type or privilege level or some other criterion ... ? .Dave. [*] unless they need per-context memory addresses coded into them? Considering these WA are initialized only once and not changed afterwards and GuC area probably grows in future which may run into the space used by WA, independent single area setup makes senses. I also checked spec and it is not clear whether any customization is going to be required for different contexts. I have modified patches to setup a single page with WA when default_context is initialized and this is used by all contexts. I will send patches but please let me know if there are any other comments. regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 1/7] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
On 01/06/2015 11:22, Daniel, Thomas wrote: Indeed, allocating an extra scratch page in the context would simplify vma/mm management. A trick might be to allocate the scratch page at the start, then offset the lrc regs etc - that would then be consistent amongst gen and be easy enough to extend if we need more per-context scratch space in future. -Chris Yes, I think we already have another use for more per-context space at the start. The GuC is planning to do this. Arun, you probably should work with Alex Dai and Dave Gordon to avoid conflicts here. Thomas. Thanks for the heads-up Thomas. I have discussed with Dave and agreed to share this page; GuC probably doesn't need whole page so first half is reserved for it's use and second half is used for WA. I have modified my patches to use context page for applying these WA and don't see any issues. During the discussions Dave proposed another approach. Even though these WA are called per context they are only initialized once and not changed afterwards, same set of WA are applied for each context so instead of adding them in each context, does it make sense to create a separate page and share across all contexts? but ofcourse GuC will anyway add a new page to context so I might as well share that page. Chris/Dave, do you see any problems with sharing page with GuC or you prefer to allocate a separate page for these WA and share across all contexts? Please give your comments. regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 1/7] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
On 29/05/2015 19:16, Chris Wilson wrote: On Fri, May 29, 2015 at 07:03:19PM +0100, Arun Siluvery wrote: This patch adds functions to setup WA batch buffers but they are not yet enabled in this patch. Some of the WA are to be applied during context save but before restore and some at the end of context save/restore but before executing the instructions in the ring, WA batch buffers are created for this purpose and these WA cannot be applied using normal means. Signed-off-by: Namrta namrta.salo...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_drv.h | 3 ++ drivers/gpu/drm/i915/intel_lrc.c | 101 +++ 2 files changed, 104 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 731b5ce..dd4b31d 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -814,6 +814,9 @@ struct intel_context { /* Execlists */ bool rcs_initialized; + struct intel_ringbuffer *indirect_ctx_wa_bb; + struct intel_ringbuffer *per_ctx_wa_bb; Eh? They are only command sequences whose starting addresses you encode into the execlists context. Why have you allocated a ringbuf not an object? Why have you allocated 2 pages when you only need one, and could even find space elsewhere in the context ringbuf is only used so that I can use logical_ring_*(), object can also be used. Single page is enough but since we have two batch buffers and need to provide offsets in two different registers, two pages are used for simplifying things, I guess we can manage with single page, I will try this. Your idea of using space in context itself simplifies many things but the context size varies across Gens, is it safe to pick last page or increase the size by one more page and use that to load these instructions? I think using an additional page is safe to avoid the risk of HW overwriting that page or do you have any other recommendation? I will first try and see if it works. And these should be pinned alongside the context *not permanently*. right, I will correct this but this won't be required if we use the space in context. I want a debug mode that limits us to say 16M of GGTT space so that these address space leaks are easier to demonstrate in practice. -Chris regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 1/2] drm/i915/gen8: The WA BB framework is enabled.
On 02/03/2015 11:02, Arun Siluvery wrote: Please ignore this one. I used message id of cover letter instead of v1 of this patch. Latest patches are sent in reply to their initial revisions. regards Arun From: Namrta namrta.salo...@intel.com This can be used to enable WA BB infrastructure for features like RC6, SSEU and in between context save/restore etc. The patch which would need WA BB will have to declare the wa_bb obj utilizing the function here. Update the WA BB with required commands and update the address of the WA BB at appropriate place. v2: Move function to the right place to keeps diffs clearer in the patch that uses this function (Michel) Change-Id: I9cc49ae7426560215e7b6a6d10ba411caeb9321b Signed-off-by: Namrta namrta.salo...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com Reviewed-by: Michel Thierry michel.thie...@intel.com --- drivers/gpu/drm/i915/intel_lrc.c | 32 1 file changed, 32 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 9c851d8..ea37a56 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1107,6 +1107,38 @@ static int intel_logical_ring_workarounds_emit(struct intel_engine_cs *ring, return 0; } +static struct intel_ringbuffer * +create_wa_bb(struct intel_engine_cs *ring, uint32_t bb_size) +{ + struct drm_device *dev = ring-dev; + struct intel_ringbuffer *ringbuf; + int ret; + + ringbuf = kzalloc(sizeof(*ringbuf), GFP_KERNEL); + if (!ringbuf) + return NULL; + + ringbuf-ring = ring; + + ringbuf-size = roundup(bb_size, PAGE_SIZE); + ringbuf-effective_size = ringbuf-size; + ringbuf-head = 0; + ringbuf-tail = 0; + ringbuf-space = ringbuf-size; + ringbuf-last_retired_head = -1; + + ret = intel_alloc_ringbuffer_obj(dev, ringbuf); + if (ret) { + DRM_DEBUG_DRIVER( + Failed to allocate ringbuf obj for wa_bb%s: %d\n, + ring-name, ret); + kfree(ringbuf); + return NULL; + } + + return ringbuf; +} + static int gen8_init_common_ring(struct intel_engine_cs *ring) { struct drm_device *dev = ring-dev; ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/2] drm/i915/gen8: Apply Per-context workarounds using W/A batch buffers
On 02/03/2015 10:10, Michel Thierry wrote: On 25/02/15 17:54, Arun Siluvery wrote: Some of the workarounds are to be applied during context save but before restore and some at the end of context save/restore but before executing the instructions in the ring. Workaround batch buffers are created for this purpose as they cannot be applied using normal means. HW executes them at specific stages during context save/restore. In this method we initialize batch buffer with w/a commands and its address is supplied using context offset pointers when a context is initialized. This patch introduces indirect and per-context batch buffers using which following workarounds are applied. These are required to fix issues observed with preemption related workloads. In Indirect context w/a batch buffer, +WaDisableCtxRestoreArbitration +WaFlushCoherentL3CacheLinesAtContextSwitch +WaClearSlmSpaceAtContextSwitch In Per context w/a batch buffer, +WaDisableCtxRestoreArbitration +WaRsRestoreWithPerCtxtBb v2: Use GTT address type for all privileged instructions, update as per dynamic pinning changes, minor simplifications, rename variables as follows to keep lines under 80 chars and rebase. s/indirect_ctx_wa_ringbuf/indirect_ctx_wa_bb s/per_ctx_wa_ringbuf/per_ctx_wa_bb v3: Modify WA BB initialization to Gen specific. Change-Id: I0cedb536b7f6d9f10ba9e81ba625848e7bab603c Signed-off-by: Rafael Barbalho rafael.barba...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_drv.h | 3 + drivers/gpu/drm/i915/i915_reg.h | 30 +++- drivers/gpu/drm/i915/intel_lrc.c| 302 +++- drivers/gpu/drm/i915/intel_ringbuffer.h | 3 + 4 files changed, 297 insertions(+), 41 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index d42040f..86cdb52 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -774,6 +774,9 @@ struct intel_context { /* Execlists */ bool rcs_initialized; + struct intel_ringbuffer *indirect_ctx_wa_bb; + struct intel_ringbuffer *per_ctx_wa_bb; + struct { struct drm_i915_gem_object *state; struct intel_ringbuffer *ringbuf; diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 55143cb..eb41d7f 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -347,6 +347,26 @@ #define MI_INVALIDATE_BSD (17) #define MI_FLUSH_DW_USE_GTT(12) #define MI_FLUSH_DW_USE_PPGTT (02) +#define MI_ATOMIC(len) MI_INSTR(0x2F, (len-2)) +#define MI_ATOMIC_MEMORY_TYPE_GGTT (122) +#define MI_ATOMIC_INLINE_DATA(118) +#define MI_ATOMIC_CS_STALL (117) +#define MI_ATOMIC_RETURN_DATA_CTL(116) +#define MI_ATOMIC_OP_MASK(op) ((op) 8) +#define MI_ATOMIC_AND MI_ATOMIC_OP_MASK(0x01) +#define MI_ATOMIC_OR MI_ATOMIC_OP_MASK(0x02) +#define MI_ATOMIC_XOR MI_ATOMIC_OP_MASK(0x03) +#define MI_ATOMIC_MOVE MI_ATOMIC_OP_MASK(0x04) +#define MI_ATOMIC_INC MI_ATOMIC_OP_MASK(0x05) +#define MI_ATOMIC_DEC MI_ATOMIC_OP_MASK(0x06) +#define MI_ATOMIC_ADD MI_ATOMIC_OP_MASK(0x07) +#define MI_ATOMIC_SUB MI_ATOMIC_OP_MASK(0x08) +#define MI_ATOMIC_RSUB MI_ATOMIC_OP_MASK(0x09) +#define MI_ATOMIC_IMAX MI_ATOMIC_OP_MASK(0x0A) +#define MI_ATOMIC_IMIN MI_ATOMIC_OP_MASK(0x0B) +#define MI_ATOMIC_UMAX MI_ATOMIC_OP_MASK(0x0C) +#define MI_ATOMIC_UMIN MI_ATOMIC_OP_MASK(0x0D) + #define MI_BATCH_BUFFER MI_INSTR(0x30, 1) #define MI_BATCH_NON_SECURE(1) /* for snb/ivb/vlv this also means batch in ppgtt when ppgtt is enabled. */ @@ -410,6 +430,7 @@ #define DISPLAY_PLANE_A (020) #define DISPLAY_PLANE_B (120) #define GFX_OP_PIPE_CONTROL(len) ((0x329)|(0x327)|(0x224)|(len-2)) +#define PIPE_CONTROL_FLUSH_RO_CACHES (127) I think the consensus is to rename this to PIPE_CONTROL_FLUSH_L3, isn't it? Yes, it will be renamed to PIPE_CONTROL_FLUSH_L3 in v2. #define PIPE_CONTROL_GLOBAL_GTT_IVB(124) /* gen7+ */ #define PIPE_CONTROL_MMIO_WRITE(123) #define PIPE_CONTROL_STORE_DATA_INDEX (121) @@ -426,6 +447,7 @@ #define PIPE_CONTROL_INDIRECT_STATE_DISABLE(19) #define PIPE_CONTROL_NOTIFY(18) #define PIPE_CONTROL_FLUSH_ENABLE (17) /* gen7+ */ +#define PIPE_CONTROL_DC_FLUSH_ENABLE (15) #define PIPE_CONTROL_VF_CACHE_INVALIDATE (14) #define PIPE_CONTROL_CONST_CACHE_INVALIDATE(13) #define PIPE_CONTROL_STATE_CACHE_INVALIDATE(12) @@ -449,8 +471,10 @@ #define MI_CLFLUSH MI_INSTR(0x27, 0) #define MI_REPORT_PERF_COUNTMI_INSTR(0x28, 0) #define
Re: [Intel-gfx] [PATCH v2 2/2] drm/i915/gen8: Apply Per-context workarounds using W/A batch buffers
On 02/03/2015 17:43, Daniel Vetter wrote: On Mon, Mar 02, 2015 at 11:07:20AM +, Arun Siluvery wrote: Some of the workarounds are to be applied during context save but before restore and some at the end of context save/restore but before executing the instructions in the ring. Workaround batch buffers are created for this purpose as they cannot be applied using normal means. HW executes them at specific stages during context save/restore. In this method we initialize batch buffer with w/a commands and its address is supplied using context offset pointers when a context is initialized. This patch introduces indirect and per-context batch buffers using which following workarounds are applied. These are required to fix issues observed with preemption related workloads. In Indirect context w/a batch buffer, +WaDisableCtxRestoreArbitration +WaFlushCoherentL3CacheLinesAtContextSwitch +WaClearSlmSpaceAtContextSwitch In Per context w/a batch buffer, +WaDisableCtxRestoreArbitration +WaRsRestoreWithPerCtxtBb v2: Use GTT address type for all privileged instructions, update as per dynamic pinning changes, minor simplifications, rename variables as follows to keep lines under 80 chars and rebase. s/indirect_ctx_wa_ringbuf/indirect_ctx_wa_bb s/per_ctx_wa_ringbuf/per_ctx_wa_bb v3: Modify WA BB initialization to Gen specific. v4: s/PIPE_CONTROL_FLUSH_RO_CACHES/PIPE_CONTROL_FLUSH_L3 (Ville) This patches modifies definitions of MI_LOAD_REGISTER_MEM and MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions so as to not break any future users of existing definitions (Michel) Change-Id: I0cedb536b7f6d9f10ba9e81ba625848e7bab603c Signed-off-by: Rafael Barbalho rafael.barba...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_drv.h | 3 + drivers/gpu/drm/i915/i915_reg.h | 28 drivers/gpu/drm/i915/intel_lrc.c| 231 +++- drivers/gpu/drm/i915/intel_ringbuffer.h | 3 + 4 files changed, 258 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index d42040f..86cdb52 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -774,6 +774,9 @@ struct intel_context { /* Execlists */ bool rcs_initialized; + struct intel_ringbuffer *indirect_ctx_wa_bb; + struct intel_ringbuffer *per_ctx_wa_bb; Why is this per-ctx and not per-engine? Also your patch splitting doesn't Since we apply them on a context basis I think intel_context is a better place, also they are only applicable for RCS. There is no reason why they cannot be added in engine structure; if you think that is a better place I can make the changes accordingly. make that much sense: Patch 1 only adds a static function without any users (resulting in gcc being unhappy). Imo a better split would be: - wire up wa batch/ring allocation/freing functions - wire up the changes to the lrc initial reg state code - one patch per w/a entry you add I am not the author of first patch and I tried to retain it as is but it has not resulted in a clean split up. I will split them as suggested. regards Arun Cheers, Daniel + struct { struct drm_i915_gem_object *state; struct intel_ringbuffer *ringbuf; diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 55143cb..3048494 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -347,6 +347,26 @@ #define MI_INVALIDATE_BSD (17) #define MI_FLUSH_DW_USE_GTT (12) #define MI_FLUSH_DW_USE_PPGTT (02) +#define MI_ATOMIC(len) MI_INSTR(0x2F, (len-2)) +#define MI_ATOMIC_MEMORY_TYPE_GGTT (122) +#define MI_ATOMIC_INLINE_DATA(118) +#define MI_ATOMIC_CS_STALL (117) +#define MI_ATOMIC_RETURN_DATA_CTL(116) +#define MI_ATOMIC_OP_MASK(op) ((op) 8) +#define MI_ATOMIC_AND MI_ATOMIC_OP_MASK(0x01) +#define MI_ATOMIC_OR MI_ATOMIC_OP_MASK(0x02) +#define MI_ATOMIC_XOR MI_ATOMIC_OP_MASK(0x03) +#define MI_ATOMIC_MOVE MI_ATOMIC_OP_MASK(0x04) +#define MI_ATOMIC_INC MI_ATOMIC_OP_MASK(0x05) +#define MI_ATOMIC_DEC MI_ATOMIC_OP_MASK(0x06) +#define MI_ATOMIC_ADD MI_ATOMIC_OP_MASK(0x07) +#define MI_ATOMIC_SUB MI_ATOMIC_OP_MASK(0x08) +#define MI_ATOMIC_RSUB MI_ATOMIC_OP_MASK(0x09) +#define MI_ATOMIC_IMAX MI_ATOMIC_OP_MASK(0x0A) +#define MI_ATOMIC_IMIN MI_ATOMIC_OP_MASK(0x0B) +#define MI_ATOMIC_UMAX MI_ATOMIC_OP_MASK(0x0C) +#define MI_ATOMIC_UMIN MI_ATOMIC_OP_MASK(0x0D) + #define MI_BATCH_BUFFER MI_INSTR(0x30, 1) #define MI_BATCH_NON_SECURE (1) /* for snb/ivb/vlv this also means batch in ppgtt when ppgtt is enabled. */ @@ -410,6 +430,7 @@ #define DISPLAY_PLANE_A (020) #define DISPLAY_PLANE_B (120) #define GFX_OP_PIPE_CONTROL(len) ((0x329)|(0x327)|(0x224)|(len-2))
Re: [Intel-gfx] [PATCH] drm/i915: Skip Stolen Memory first page.
On 01/08/2014 17:34, Jesse Barnes wrote: On Thu, 31 Jul 2014 12:08:20 -0700 Rodrigo Vivi rodrigo.v...@intel.com wrote: WA to skip the first page of stolen memory due to sporadic HW write on *CS Idle v2: Improve variable names and fix allocated size. Reviewed-by: Ben Widawsky b...@bwidawsk.net Signed-off-by: Rodrigo Vivi rodrigo.v...@intel.com --- drivers/gpu/drm/i915/i915_gem_stolen.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c index 21c025a..82035b0 100644 --- a/drivers/gpu/drm/i915/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c @@ -289,7 +289,8 @@ void i915_gem_cleanup_stolen(struct drm_device *dev) int i915_gem_init_stolen(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev-dev_private; - int bios_reserved = 0; + int start_rsvd = 0; + int end_rsvd = 0; #ifdef CONFIG_INTEL_IOMMU if (intel_iommu_gfx_mapped INTEL_INFO(dev)-gen 8) { @@ -308,15 +309,19 @@ int i915_gem_init_stolen(struct drm_device *dev) DRM_DEBUG_KMS(found %zd bytes of stolen memory at %08lx\n, dev_priv-gtt.stolen_size, dev_priv-mm.stolen_base); + /* WaSkipStolenMemoryFirstPage */ + if (INTEL_INFO(dev)-gen = 8) + start_rsvd = 4096; + if (IS_VALLEYVIEW(dev)) - bios_reserved = 1024*1024; /* top 1M on VLV/BYT */ + end_rsvd = 1024*1024; /* top 1M on VLV/BYT */ - if (WARN_ON(bios_reserved dev_priv-gtt.stolen_size)) + if (WARN_ON((start_rsvd + end_rsvd) dev_priv-gtt.stolen_size)) return 0; /* Basic memrange allocator for stolen space */ - drm_mm_init(dev_priv-mm.stolen, 0, dev_priv-gtt.stolen_size - - bios_reserved); + drm_mm_init(dev_priv-mm.stolen, start_rsvd, + dev_priv-gtt.stolen_size - start_rsvd - end_rsvd); return 0; } Beyond the fastboot stuff Ville has already mentioned, the early allocation of the existing fb from stolen will prevent us from clobbering the currently displayed buffer with the contents of the ringbuffers and whatever else we allocate out of stolen at early boot. We might be able to avoid that by doing stolen allocations top down, or by reserving the displayed fb even if we can't allocate an obj for it, only freeing it after our first mode set. Can you file a bug or JIRA for that to make sure we don't lose track of the fastboot boot corruption issues after this fix lands? Reviving an old thread, Any particular reason why this patch is not merged to nightly? Is it known to cause any other regressions? regards Arun Thanks, ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] Significance of Golden context
Hi, Could someone explain the significance of Null context/Golden state? I understand we are initializing 3D state in this batch and we send this at the beginning to start the HW with a known state but what are implications of not doing this? what kind of issues we can expect if we don't do this? How is this golden state determined? As a test I disabled this for Gen8 and I can boot Android without any issues. regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/4] drm/i915: Implement Wa4x4STCOptimizationDisable:chv
On 21/01/2015 17:37, ville.syrj...@linux.intel.com wrote: From: Ville Syrjälä ville.syrj...@linux.intel.com Wa4x4STCOptimizationDisable got only implemented for BDW, but according to the w/a database CHV needs it too, so add it. Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com --- drivers/gpu/drm/i915/intel_ringbuffer.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index d7aa5c4..2a1a178 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -851,6 +851,10 @@ static int chv_init_workarounds(struct intel_engine_cs *ring) */ WA_CLR_BIT_MASKED(CACHE_MODE_0_GEN7, HIZ_RAW_STALL_OPT_DISABLE); + /* Wa4x4STCOptimizationDisable:chv */ + WA_SET_BIT_MASKED(CACHE_MODE_1, + GEN8_4x4_STC_OPTIMIZATION_DISABLE); + /* Improve HiZ throughput on CHV. */ WA_SET_BIT_MASKED(HIZ_CHICKEN, CHV_HZ_8X8_MODE_IN_1X); Looks good to me. only tested Wa4x4STCOptimizationDisable on Android, no issues observed. For the whole series, Reviewed-by: Arun Siluvery arun.siluv...@linux.intel.com regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 04/10] drm/i915: Pixel Clock changes for DSI dual link
On 05/12/2014 16:33, Singh, Gaurav K wrote: On 12/4/2014 2:57 PM, Jani Nikula wrote: On Thu, 04 Dec 2014, Gaurav K Singh gaurav.k.si...@intel.com wrote: For dual link MIPI Panels, each port needs half of pixel clock. Pixel overlap can be enabled if needed by panel, then in that case, pixel clock will be increased for extra pixels. just a question, why do we need pixel overlap? I couldn't find more details from spec other than that when overlap is set some extra pixels are sent. regards Arun v2 : Address review comments by Jani - Removed the bit mask used for -dual_link - Used DSI instead of MIPI for #define variables Signed-off-by: Gaurav K Singh gaurav.k.si...@intel.com --- drivers/gpu/drm/i915/i915_reg.h|4 drivers/gpu/drm/i915/intel_bios.h |3 ++- drivers/gpu/drm/i915/intel_dsi.c |8 drivers/gpu/drm/i915/intel_dsi.h |6 ++ drivers/gpu/drm/i915/intel_dsi_panel_vbt.c | 21 + 5 files changed, 41 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index c981f5d..87149ba 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -6029,6 +6029,10 @@ enum punit_power_well { #define GEN8_PMINTR_REDIRECT_TO_NON_DISP (131) #define VLV_PWRDWNUPCTL 0xA294 +#define VLV_CHICKEN_3 0x7040C +#define PIXEL_OVERLAP_CNT_MASK(3 30) +#define PIXEL_OVERLAP_CNT_SHIFT 30 I didn't find this register, but does it not need + VLV_DISPLAY_BASE? Given that I can't find the register my review is pretty shallow, but I don't spot anything obviously wrong either. With these caveats, Reviewed-by: Jani Nikula jani.nik...@intel.com This reg is available in BSpec though the bit definitions have not been updated in the BSpec. Also, it was communicated by the BIOS team. + #define GEN6_PMISR 0x44020 #define GEN6_PMIMR 0x44024 /* rps_lock */ #define GEN6_PMIIR 0x44028 diff --git a/drivers/gpu/drm/i915/intel_bios.h b/drivers/gpu/drm/i915/intel_bios.h index de01167..a6a8710 100644 --- a/drivers/gpu/drm/i915/intel_bios.h +++ b/drivers/gpu/drm/i915/intel_bios.h @@ -818,7 +818,8 @@ struct mipi_config { #define DUAL_LINK_PIXEL_ALT 2 u16 dual_link:2; u16 lane_cnt:2; - u16 rsvd3:12; + u16 pixel_overlap:3; + u16 rsvd3:9; u16 rsvd4; diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c index dbe52e9..4e18abd 100644 --- a/drivers/gpu/drm/i915/intel_dsi.c +++ b/drivers/gpu/drm/i915/intel_dsi.c @@ -111,6 +111,14 @@ static void intel_dsi_port_enable(struct intel_encoder *encoder) enum port port; u32 temp; + if (intel_dsi-dual_link == DSI_DUAL_LINK_FRONT_BACK) { + temp = I915_READ(VLV_CHICKEN_3); + temp = ~PIXEL_OVERLAP_CNT_MASK | + intel_dsi-pixel_overlap + PIXEL_OVERLAP_CNT_SHIFT; + I915_WRITE(VLV_CHICKEN_3, temp); + } + for_each_dsi_port(port, intel_dsi-ports) { temp = I915_READ(MIPI_PORT_CTRL(port)); diff --git a/drivers/gpu/drm/i915/intel_dsi.h b/drivers/gpu/drm/i915/intel_dsi.h index f2cc2fc..8fe2064 100644 --- a/drivers/gpu/drm/i915/intel_dsi.h +++ b/drivers/gpu/drm/i915/intel_dsi.h @@ -28,6 +28,11 @@ #include drm/drm_crtc.h #include intel_drv.h +/* Dual Link support */ +#define DSI_DUAL_LINK_NONE 0 +#define DSI_DUAL_LINK_FRONT_BACK 1 +#define DSI_DUAL_LINK_PIXEL_ALT2 + struct intel_dsi_device { unsigned int panel_id; const char *name; @@ -105,6 +110,7 @@ struct intel_dsi { u8 escape_clk_div; u8 dual_link; + u8 pixel_overlap; u32 port_bits; u32 bw_timer; u32 dphy_reg; diff --git a/drivers/gpu/drm/i915/intel_dsi_panel_vbt.c b/drivers/gpu/drm/i915/intel_dsi_panel_vbt.c index f60146f..f8c2269 100644 --- a/drivers/gpu/drm/i915/intel_dsi_panel_vbt.c +++ b/drivers/gpu/drm/i915/intel_dsi_panel_vbt.c @@ -288,6 +288,7 @@ static bool generic_init(struct intel_dsi_device *dsi) intel_dsi-lane_count = mipi_config-lane_cnt + 1; intel_dsi-pixel_format = mipi_config-videomode_color_format 7; intel_dsi-dual_link = mipi_config-dual_link; + intel_dsi-pixel_overlap = mipi_config-pixel_overlap; if (intel_dsi-dual_link) intel_dsi-ports = ((1 PORT_A) | (1 PORT_C)); @@ -310,6 +311,20 @@ static bool generic_init(struct intel_dsi_device *dsi) pclk = mode-clock; + /* In dual link mode each port needs half of pixel clock */ + if (intel_dsi-dual_link) { + pclk = pclk / 2; + + /* we can enable pixel_overlap if
Re: [Intel-gfx] [PATCH 04/10] drm/i915: Pixel Clock changes for DSI dual link
On 05/12/2014 17:36, Jani Nikula wrote: On Fri, 05 Dec 2014, Siluvery, Arun arun.siluv...@linux.intel.com wrote: On 05/12/2014 16:33, Singh, Gaurav K wrote: On 12/4/2014 2:57 PM, Jani Nikula wrote: On Thu, 04 Dec 2014, Gaurav K Singh gaurav.k.si...@intel.com wrote: For dual link MIPI Panels, each port needs half of pixel clock. Pixel overlap can be enabled if needed by panel, then in that case, pixel clock will be increased for extra pixels. just a question, why do we need pixel overlap? I couldn't find more details from spec other than that when overlap is set some extra pixels are sent. From the host perspective a dual link (or dual channel) DSI device is two independent peripheral devices. On the peripheral side the display has to combine the input from the two links (which may be two independent DSI blocks on the peripheral as well) into one contiguous display. I don't know the details, but I'm guessing pixel overlap just makes it easier for the peripheral implementation to get it all together. Thank you for the details. I am just wondering how few extra pixels help on the display side unless they are fixed values which act like some kind of markers to synchronize between two halves. regards Arun BR, Jani. regards Arun v2 : Address review comments by Jani - Removed the bit mask used for -dual_link - Used DSI instead of MIPI for #define variables Signed-off-by: Gaurav K Singh gaurav.k.si...@intel.com --- drivers/gpu/drm/i915/i915_reg.h|4 drivers/gpu/drm/i915/intel_bios.h |3 ++- drivers/gpu/drm/i915/intel_dsi.c |8 drivers/gpu/drm/i915/intel_dsi.h |6 ++ drivers/gpu/drm/i915/intel_dsi_panel_vbt.c | 21 + 5 files changed, 41 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index c981f5d..87149ba 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -6029,6 +6029,10 @@ enum punit_power_well { #define GEN8_PMINTR_REDIRECT_TO_NON_DISP(131) #define VLV_PWRDWNUPCTL 0xA294 +#define VLV_CHICKEN_3 0x7040C +#define PIXEL_OVERLAP_CNT_MASK(3 30) +#define PIXEL_OVERLAP_CNT_SHIFT 30 I didn't find this register, but does it not need + VLV_DISPLAY_BASE? Given that I can't find the register my review is pretty shallow, but I don't spot anything obviously wrong either. With these caveats, Reviewed-by: Jani Nikula jani.nik...@intel.com This reg is available in BSpec though the bit definitions have not been updated in the BSpec. Also, it was communicated by the BIOS team. + #define GEN6_PMISR 0x44020 #define GEN6_PMIMR 0x44024 /* rps_lock */ #define GEN6_PMIIR 0x44028 diff --git a/drivers/gpu/drm/i915/intel_bios.h b/drivers/gpu/drm/i915/intel_bios.h index de01167..a6a8710 100644 --- a/drivers/gpu/drm/i915/intel_bios.h +++ b/drivers/gpu/drm/i915/intel_bios.h @@ -818,7 +818,8 @@ struct mipi_config { #define DUAL_LINK_PIXEL_ALT 2 u16 dual_link:2; u16 lane_cnt:2; - u16 rsvd3:12; + u16 pixel_overlap:3; + u16 rsvd3:9; u16 rsvd4; diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c index dbe52e9..4e18abd 100644 --- a/drivers/gpu/drm/i915/intel_dsi.c +++ b/drivers/gpu/drm/i915/intel_dsi.c @@ -111,6 +111,14 @@ static void intel_dsi_port_enable(struct intel_encoder *encoder) enum port port; u32 temp; + if (intel_dsi-dual_link == DSI_DUAL_LINK_FRONT_BACK) { + temp = I915_READ(VLV_CHICKEN_3); + temp = ~PIXEL_OVERLAP_CNT_MASK | + intel_dsi-pixel_overlap + PIXEL_OVERLAP_CNT_SHIFT; + I915_WRITE(VLV_CHICKEN_3, temp); + } + for_each_dsi_port(port, intel_dsi-ports) { temp = I915_READ(MIPI_PORT_CTRL(port)); diff --git a/drivers/gpu/drm/i915/intel_dsi.h b/drivers/gpu/drm/i915/intel_dsi.h index f2cc2fc..8fe2064 100644 --- a/drivers/gpu/drm/i915/intel_dsi.h +++ b/drivers/gpu/drm/i915/intel_dsi.h @@ -28,6 +28,11 @@ #include drm/drm_crtc.h #include intel_drv.h +/* Dual Link support */ +#define DSI_DUAL_LINK_NONE 0 +#define DSI_DUAL_LINK_FRONT_BACK 1 +#define DSI_DUAL_LINK_PIXEL_ALT2 + struct intel_dsi_device { unsigned int panel_id; const char *name; @@ -105,6 +110,7 @@ struct intel_dsi { u8 escape_clk_div; u8 dual_link; + u8 pixel_overlap; u32 port_bits; u32 bw_timer; u32 dphy_reg; diff --git a/drivers/gpu/drm/i915/intel_dsi_panel_vbt.c b/drivers/gpu/drm/i915/intel_dsi_panel_vbt.c index f60146f..f8c2269 100644 --- a/drivers/gpu/drm/i915
Re: [Intel-gfx] [PATCH] drm/i915: Free resources correctly if we cannot map status page during ctx create
On 17/11/2014 15:54, Daniel, Thomas wrote: -Original Message- From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On Behalf Of Arun Siluvery Sent: Monday, November 17, 2014 3:48 PM To: intel-gfx@lists.freedesktop.org Subject: [Intel-gfx] [PATCH] drm/i915: Free resources correctly if we cannot map status page during ctx create We are not freeing memory allocated for ringbuf and ctx if we fail to map status page so release all resources correctly. Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.commailto:arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/intel_lrc.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index f3efdbd..a84d24b 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1777,8 +1777,10 @@ int intel_lr_context_deferred_create(struct intel_context *ctx, ring-status_page.gfx_addr = i915_gem_obj_ggtt_offset(ctx_obj); ring-status_page.page_addr = kmap(sg_page(ctx_obj-pages-sgl)); - if (ring-status_page.page_addr == NULL) - return -ENOMEM; + if (ring-status_page.page_addr == NULL) { + ret = -ENOMEM; + goto error; + } ring-status_page.obj = ctx_obj; } Hi Arun, I think your tree is out of date. See this patch: http://patchwork.freedesktop.org/patch/35828/ Cheers, Thomas. You are right, I don't have latest changes. This patch can be ignored. regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 1/2] drm/i915: Initialize bdw workarounds in logical ring mode too
On 04/11/2014 19:23, Rodrigo Vivi wrote: These patches got listed to -collector but got a huge conflict. If it is still relevant please rebase it. This patch is currently not relevant, rebased version is already sent to the list for review. https://patchwork.kernel.org/patch/5178771/ regards Arun Also my bikeshed is to findo better names to help on differentiate them at least. On Wed, Sep 24, 2014 at 5:02 AM, Michel Thierry michel.thie...@intel.com wrote: Following the legacy ring submission example, update the ring-init_context() hook to support the execlist submission mode. Workarounds are defined in bdw_emit_workarounds(), but the emit now depends on the ring submission mode. v2: Updated after Cleanup pre prod workarounds For: VIZ-4092 Signed-off-by: Michel Thierry michel.thie...@intel.com --- drivers/gpu/drm/i915/i915_gem_context.c | 2 +- drivers/gpu/drm/i915/intel_lrc.c| 66 + drivers/gpu/drm/i915/intel_ringbuffer.c | 75 +++-- drivers/gpu/drm/i915/intel_ringbuffer.h | 11 - 4 files changed, 120 insertions(+), 34 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 7b73b36..d1ed21a 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -657,7 +657,7 @@ done: if (uninitialized) { if (ring-init_context) { - ret = ring-init_context(ring); + ret = ring-init_context(ring-buffer); if (ret) DRM_ERROR(ring init context: %d\n, ret); } diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index d64d518..a0aa3f0 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1020,6 +1020,62 @@ int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, int num_dwords) return 0; } +static inline void intel_logical_ring_emit_wa(struct intel_ringbuffer *ringbuf, + u32 addr, u32 value) +{ + struct intel_engine_cs *ring = ringbuf-ring; + struct drm_device *dev = ring-dev; + struct drm_i915_private *dev_priv = dev-dev_private; + + if (WARN_ON(dev_priv-num_wa_regs = I915_MAX_WA_REGS)) + return; + + intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(1)); + intel_logical_ring_emit(ringbuf, addr); + intel_logical_ring_emit(ringbuf, value); + + dev_priv-intel_wa_regs[dev_priv-num_wa_regs].addr = addr; + dev_priv-intel_wa_regs[dev_priv-num_wa_regs].mask = value 0x; + /* value is updated with the status of remaining bits of this +* register when it is read from debugfs file +*/ + dev_priv-intel_wa_regs[dev_priv-num_wa_regs].value = value; + dev_priv-num_wa_regs++; +} + +static int bdw_init_logical_workarounds(struct intel_ringbuffer *ringbuf) +{ + int ret; + struct intel_engine_cs *ring = ringbuf-ring; + struct drm_device *dev = ring-dev; + struct drm_i915_private *dev_priv = dev-dev_private; + + /* +* workarounds applied in this fn are part of register state context, +* they need to be re-initialized followed by gpu reset, suspend/resume, +* module reload. +*/ + dev_priv-num_wa_regs = 0; + memset(dev_priv-intel_wa_regs, 0, sizeof(dev_priv-intel_wa_regs)); + + /* +* update the number of dwords required based on the +* actual number of workarounds applied +*/ + ret = intel_logical_ring_begin(ringbuf, BDW_WA_DWORDS_SIZE); + if (ret) + return ret; + + bdw_emit_workarounds(ringbuf); + + intel_logical_ring_advance(ringbuf); + + DRM_DEBUG_DRIVER(Number of Workarounds applied: %d\n, +dev_priv-num_wa_regs); + + return 0; +} + static int gen8_init_common_ring(struct intel_engine_cs *ring) { struct drm_device *dev = ring-dev; @@ -1315,6 +1371,10 @@ static int logical_render_ring_init(struct drm_device *dev) if (HAS_L3_DPF(dev)) ring-irq_keep_mask |= GT_RENDER_L3_PARITY_ERROR_INTERRUPT; + if (IS_BROADWELL(dev)) + ring-init_context = bdw_init_logical_workarounds; + ring-emit_wa = intel_logical_ring_emit_wa; + ring-init = gen8_init_render_ring; ring-cleanup = intel_fini_pipe_control; ring-get_seqno = gen8_get_seqno; @@ -1802,6 +1862,12 @@ int intel_lr_context_deferred_create(struct intel_context *ctx, } if (ring-id == RCS !ctx-rcs_initialized) { + if (ring-init_context) { + ret = ring-init_context(ringbuf); + if (ret) + DRM_ERROR(ring init context: %d\n, ret); + } + ret =
Re: [Intel-gfx] [PATCH 0/3] drm/i915/chv: Add new WA and remove pre-production ones
On 28/10/2014 18:33, Arun Siluvery wrote: The patches in this series adds two new workarounds for CHV and removes pre-production ones. Based on review comments from Ville, add/remove patches are split-up which helps in reverting them if required. The initial patch can be found at, https://patchwork.kernel.org/patch/5178021/ Hi Ville, Patches are split-up as you suggested. Please let me know if further changes are required. regards Arun Arun Siluvery (3): drm/i915/chv: Remove pre-production workarounds drm/i915/chv: Combine GEN8_ROW_CHICKEN w/a drm/i915/chv: Add new workarounds for chv drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/intel_pm.c | 12 drivers/gpu/drm/i915/intel_ringbuffer.c | 22 +++--- 3 files changed, 12 insertions(+), 23 deletions(-) ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/2] drm/i915/chv: Add few more CHV workarounds
On 28/10/2014 12:23, Ville Syrjälä wrote: On Tue, Oct 28, 2014 at 11:57:50AM +, Arun Siluvery wrote: WaDisableInstructionShootdown:chv WaForceEnableNonCoherent:chv WaHdcDisableFetchWhenMasked:chv WaDisableFenceDestinationToSLM:chv (pre-production) s/WaDisableDopClockGating/WaDisableRowChickenDopClockGating, because another CHV WA is defined with the same name in intel_pm.c for a different reg. For: VIZ-4090 Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_reg.h | 2 ++ drivers/gpu/drm/i915/intel_ringbuffer.c | 20 ++-- 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 77fce96..840e5d9 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -5024,6 +5024,7 @@ enum punit_power_well { /* GEN8 chicken */ #define HDC_CHICKEN0 0x7300 #define HDC_FORCE_NON_COHERENT (14) +#define HDC_DONOT_FETCH_MEM_WHEN_MASKED (111) #define HDC_FENCE_DEST_SLM_DISABLE (114) /* WaCatErrorRejectionIssue */ @@ -5941,6 +5942,7 @@ enum punit_power_well { #define GEN9_DG_MIRROR_FIX_ENABLE (15) #define GEN8_ROW_CHICKEN 0xe4f0 +#define INSTRUCTION_SHOOTDOWN_DISABLE (19) #define PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE (18) #define STALL_DOP_GATING_DISABLE(15) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index a8f72e8..2c07a02 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -788,14 +788,30 @@ static int chv_init_workarounds(struct intel_engine_cs *ring) struct drm_i915_private *dev_priv = dev-dev_private; /* WaDisablePartialInstShootdown:chv */ + /* WaDisableInstructionShootdown:chv */ WA_SET_BIT_MASKED(GEN8_ROW_CHICKEN, - PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE); + PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE | + (dev-pdev-revision 0x06 ? + INSTRUCTION_SHOOTDOWN_DISABLE : 0)); I think we should just drop the current early pre-prod workarounds, and not add more of them. ok I will drop this. Is there any guideline on particular revision for bdw, chv below which we should drop that workaround? /* WaDisableThreadStallDopClockGating:chv */ WA_SET_BIT_MASKED(GEN8_ROW_CHICKEN, STALL_DOP_GATING_DISABLE); - /* WaDisableDopClockGating:chv (pre-production hw) */ + /* Use Force Non-Coherent whenever executing a 3D context. This is a +* workaround for a possible hang in the unlikely event a TLB +* invalidation occurs during a PSD flush. +*/ We haven't generally documented the w/as in any great detail. Does it help someone if we start doing that? This was already documented for bdw hence I included it for chv also. regards Arun + /* WaForceEnableNonCoherent:chv */ + /* WaHdcDisableFetchWhenMasked:chv */ + /* WaDisableFenceDestinationToSLM:chv (pre-production) */ + WA_SET_BIT_MASKED(HDC_CHICKEN0, + HDC_FORCE_NON_COHERENT | + HDC_DONOT_FETCH_MEM_WHEN_MASKED | + (dev-pdev-revision 0x06 ? + HDC_FENCE_DEST_SLM_DISABLE : 0)); + + /* WaDisableRowChickenDopClockGating:chv (pre-production hw) */ WA_SET_BIT_MASKED(GEN7_ROW_CHICKEN2, DOP_CLOCK_GATING_DISABLE); -- 2.1.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/chv: Add new WA and remove pre-production ones
On 28/10/2014 17:06, Ville Syrjälä wrote: On Tue, Oct 28, 2014 at 03:48:24PM +, Arun Siluvery wrote: +WaForceEnableNonCoherent:chv +WaHdcDisableFetchWhenMasked:chv -WaDisableDopClockGating:chv -WaDisableSamplerPowerBypass:chv -WaDisableGunitClockGating:chv -WaDisableFfDopClockGating:chv -WaDisableDopClockGating:chv WaDisablePartialInstShootdown:chv and WaDisableThreadStallDopClockGating:chv are related to the same register so combine them. Please split into at least two patches (one to add new w/as and another to remove old ones). Otherwise reverting is a pita in case we find that one of the dropped w/as was actually still needed. I thought of doing that but then combined them as these are early pre-production ones and we may not need them in future but I agree splitting them helps in reverting them if required. v2: Remove pre-production WA instead of restricting them based on revision id (Ville) For: VIZ-4090 Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/intel_pm.c | 12 drivers/gpu/drm/i915/intel_ringbuffer.c | 22 +++--- 3 files changed, 12 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 77fce96..9d39700 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -5024,6 +5024,7 @@ enum punit_power_well { /* GEN8 chicken */ #define HDC_CHICKEN0 0x7300 #define HDC_FORCE_NON_COHERENT (14) +#define HDC_DONOT_FETCH_MEM_WHEN_MASKED (111) #define HDC_FENCE_DEST_SLM_DISABLE (114) /* WaCatErrorRejectionIssue */ diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index 7a69eba..93db25f 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -5944,18 +5944,6 @@ static void cherryview_init_clock_gating(struct drm_device *dev) /* WaDisableSDEUnitClockGating:chv */ I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) | GEN8_SDEUNIT_CLOCK_GATE_DISABLE); - - /* WaDisableGunitClockGating:chv (pre-production hw) */ - I915_WRITE(VLV_GUNIT_CLOCK_GATE, I915_READ(VLV_GUNIT_CLOCK_GATE) | - GINT_DIS); OK - - /* WaDisableFfDopClockGating:chv (pre-production hw) */ - I915_WRITE(GEN6_RC_SLEEP_PSMI_CONTROL, - _MASKED_BIT_ENABLE(GEN8_FF_DOP_CLOCK_GATE_DISABLE)); OK - - /* WaDisableDopClockGating:chv (pre-production hw) */ - I915_WRITE(GEN6_UCGCTL1, I915_READ(GEN6_UCGCTL1) | - GEN6_EU_TCUNIT_CLOCK_GATE_DISABLE); OK, I think. This was the weird w/a where it seemed hard to figure out what it needed. Nothing in BSpec about needing this bit on chv. } static void g4x_init_clock_gating(struct drm_device *dev) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index a8f72e8..368b20a 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -788,20 +788,20 @@ static int chv_init_workarounds(struct intel_engine_cs *ring) struct drm_i915_private *dev_priv = dev-dev_private; /* WaDisablePartialInstShootdown:chv */ - WA_SET_BIT_MASKED(GEN8_ROW_CHICKEN, - PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE); - /* WaDisableThreadStallDopClockGating:chv */ WA_SET_BIT_MASKED(GEN8_ROW_CHICKEN, - STALL_DOP_GATING_DISABLE); - - /* WaDisableDopClockGating:chv (pre-production hw) */ - WA_SET_BIT_MASKED(GEN7_ROW_CHICKEN2, - DOP_CLOCK_GATING_DISABLE); OK, again the weird w/a but Bspec seems to agree at least. + PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE | + STALL_DOP_GATING_DISABLE); Bspec says bit 5 is MBZ now, and yet the w/a database says it's forever. And the hardware accepts 1 there so it's not like many other MBZ bits that you can't set even if you try. Also Bspec has three different definitions for this bit on gen8, all disagree with each other and one definiton even manages to disagree with itself. And reading the hsd stuff I'm not the only that has been confused by this, and yet I see no conclusion there as to how this bit should be configured. Oh well, I guess we can leave it set for now and maybe eventually someone will figure out what we're supposed to do. I am using w/a database as reference and it says forever, spec seems to disagree but probably not yet updated. regards Arun - /* WaDisableSamplerPowerBypass:chv (pre-production hw) */ - WA_SET_BIT_MASKED(HALF_SLICE_CHICKEN3, - GEN8_SAMPLER_POWER_BYPASS_DIS); OK + /* Use Force Non-Coherent whenever executing a 3D context. This is a +* workaround for a possible hang in the unlikely event a TLB +*
Re: [Intel-gfx] [PATCH] drm/i915: Emit even number of dwords when emitting LRIs
On 23/10/2014 14:41, Ville Syrjälä wrote: On Thu, Oct 23, 2014 at 01:50:23PM +0100, Chris Wilson wrote: On Thu, Oct 23, 2014 at 01:42:38PM +0100, Damien Lespiau wrote: On Thu, Oct 23, 2014 at 02:21:02PM +0200, Daniel Vetter wrote: On Wed, Oct 22, 2014 at 06:59:52PM +0100, Arun Siluvery wrote: The number of DWords should be even when doing ring emits as command sequences require QWord alignment. v2: user LRI variant that can write multiple regs in one go (Damien). We can simply insert one NOP at the end instead of one per register write. Cc: Mika Kuoppala mika.kuopp...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 497b836..a8f72e8 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -680,15 +680,16 @@ static int intel_ring_workarounds_emit(struct intel_engine_cs *ring) if (ret) return ret; - ret = intel_ring_begin(ring, w-count * 3); + ret = intel_ring_begin(ring, (w-count * 2 + 2)); if (ret) return ret; + intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(w-count)); Afaik there's a limit to the size of an MI_LRI. Where's the check for that (probably with a WARN_ON for now to avoid unecessary complexity)? I guess there's always the size of the length field, I don't see any other indication. Note that I can find the documentation of the multi-registers version of LRI either. So, well, we probably should double check it does work. It does work. The max is around 60 iirc (the max length of the command). The maximum length seems to be 0xff on gen6+ and 0x3f before that, which would mean at most 128 or 32 registers. Also the context image is full of these multi register LRIs. Based on a quick glance the longest LRI in there is 0x5f on IVB, 0xcf on HSW, and 0xdf on BDW, which translate to 48, 104, and 108 registers per LRI. So we know at least those must work or context restore would not work. Before gen7 the context doesn't seem to resemble a batch, so I can't tell anything about those platforms based on the context image. w-count is already checked against max workarounds which is 16 now so we are well within the limit; I think additional check would be redundant here and it is unlikely to have more than 128 workarounds. regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Add means to apply WA conditionally
On 23/10/2014 16:51, Daniel Vetter wrote: On Thu, Oct 23, 2014 at 04:29:30PM +0100, Arun Siluvery wrote: We would want to apply some of the workarounds based on a condition to a particular platform or Gen but we may not know all possible controlling parameters in advance hence allow to define open conditions; a WA makes it to the list only if the condition is true. With the appropriate conditions we can combine all of the workarounds and apply them from a single place irrespective of platform instead of having them in separate functions. For: VIZ-4090 Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com Imo we should just pull the condition out into proper control flow. Hiding it like that in the macro doesn't seem to buy us anything at all, but obfuscates the code. No we are not hiding the condition, I thought it would be easier to read it this way, e.g., WA_SET_BIT_MASKED_IF(IS_BDW_GT3(dev), WA_REG, WA_MASK); do you prefer adding if(cond) to each WA? regards Arun -Daniel --- drivers/gpu/drm/i915/intel_ringbuffer.c | 35 + 1 file changed, 35 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 497b836..0525a5d 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -736,6 +736,41 @@ static int wa_add(struct drm_i915_private *dev_priv, #define WA_WRITE(addr, val) WA_REG(addr, val, 0x) +#define WA_SET_BIT_MASKED_IF(cond, addr, mask) \ + do {\ + if (cond) { \ + WA_SET_BIT_MASKED(addr, mask); \ + } \ + } while(0) + +#define WA_CLR_BIT_MASKED_IF(cond, addr, mask) \ + do {\ + if (cond) { \ + WA_CLR_BIT_MASKED(addr, mask); \ + } \ + } while(0) + +#define WA_SET_BIT_IF(cond, addr, mask)\ + do {\ + if (cond) { \ + WA_SET_BIT(addr, mask); \ + } \ + } while(0) + +#define WA_CLR_BIT_IF(cond, addr, mask)\ + do {\ + if (cond) { \ + WA_CLR_BIT(addr, mask); \ + } \ + } while(0) + +#define WA_WRITE_IF(cond, addr, val) \ + do {\ + if (cond) { \ + WA_WRITE(addr, val);\ + } \ + } while(0) + static int bdw_init_workarounds(struct intel_engine_cs *ring) { struct drm_device *dev = ring-dev; -- 2.1.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: add missing forcewake put on i915_wa_registers()
On 22/10/2014 08:35, Ville Syrjälä wrote: On Tue, Oct 21, 2014 at 07:40:35PM +0200, Daniel Vetter wrote: On Tue, Oct 21, 2014 at 02:58:08PM -0200, Paulo Zanoni wrote: From: Paulo Zanoni paulo.r.zan...@intel.com Otherwise, a simple cat to the debugfs file can make the machine use much more power than needed, and prevent it from runtime suspending. Related commit: commit 8452e1d173a16d9812422a2272c4ab0f0ba81057 Author: Mika Kuoppala mika.kuopp...@linux.intel.com Date: Tue Oct 7 17:21:26 2014 +0300 drm/i915: Build workaround list in ring initialization Cc: Mika Kuoppala mika.kuopp...@linux.intel.com Cc: Arun Siluvery arun.siluv...@linux.intel.com Testcase: igt/pm_rpm/debugfs-read Signed-off-by: Paulo Zanoni paulo.r.zan...@intel.com tbh I'm not even sure we want to do the manual forcewake get here - I915_READ will do it for us, and this is a debug interface. So no one should care about perf. Mika, is that right? If so I'd like to merge the inverse patch which drops the fw_get. Don't we still need the idle msg disable+poll CSPWRFSM trick here on gen8? That also needs forcewake around it. I had a chat with Mika on this yesterday and he seem to agree that forcewake is probably not required here. I couldn't send the patch yesterday but as per Ville's comments looks like we need forcewake here? regards Arun -Daniel --- drivers/gpu/drm/i915/i915_debugfs.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 9600285..36a4baa 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -2671,6 +2671,7 @@ static int i915_wa_registers(struct seq_file *m, void *unused) addr, value, mask, read, ok ? OK : FAIL); } + gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL); intel_runtime_pm_put(dev_priv); mutex_unlock(dev-struct_mutex); -- 2.1.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 3/4] drm/i915: Build workaround list in ring initialization
On 07/10/2014 15:21, Mika Kuoppala wrote: If we build the workaround list in ring initialization and decouple it from the actual writing of values, we gain the ability to decide where and how we want to apply the values. The advantage of this will become more clear when we need to initialize workarounds on older gens where it is not possible to write all the registers through ring LRIs. v2: rebase on newest bdw workarounds Cc: Arun Siluvery arun.siluv...@linux.intel.com Cc: Damien Lespiau damien.lesp...@intel.com Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com --- drivers/gpu/drm/i915/i915_debugfs.c | 20 ++-- drivers/gpu/drm/i915/i915_drv.h | 28 ++--- drivers/gpu/drm/i915/intel_ringbuffer.c | 185 ++-- 3 files changed, 130 insertions(+), 103 deletions(-) Hi Daniel, Patches 3, 4 in this series are independent of the first two. Could you please pull-in these patches? regards Arun diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index da4036d..87482f8 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -2655,18 +2655,20 @@ static int i915_wa_registers(struct seq_file *m, void *unused) intel_runtime_pm_get(dev_priv); - seq_printf(m, Workarounds applied: %d\n, dev_priv-num_wa_regs); - for (i = 0; i dev_priv-num_wa_regs; ++i) { + gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL); + + seq_printf(m, Workarounds applied: %d\n, dev_priv-workarounds.count); + for (i = 0; i dev_priv-workarounds.count; ++i) { u32 addr, mask; - addr = dev_priv-intel_wa_regs[i].addr; - mask = dev_priv-intel_wa_regs[i].mask; - dev_priv-intel_wa_regs[i].value = I915_READ(addr) | mask; - if (dev_priv-intel_wa_regs[i].addr) + addr = dev_priv-workarounds.reg[i].addr; + mask = dev_priv-workarounds.reg[i].mask; + dev_priv-workarounds.reg[i].value = I915_READ(addr) | mask; + if (dev_priv-workarounds.reg[i].addr) seq_printf(m, 0x%X: 0x%08X, mask: 0x%08X\n, - dev_priv-intel_wa_regs[i].addr, - dev_priv-intel_wa_regs[i].value, - dev_priv-intel_wa_regs[i].mask); + dev_priv-workarounds.reg[i].addr, + dev_priv-workarounds.reg[i].value, + dev_priv-workarounds.reg[i].mask); } intel_runtime_pm_put(dev_priv); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 1e476b5..f7265bf 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1448,6 +1448,20 @@ struct i915_frontbuffer_tracking { unsigned flip_bits; }; +struct i915_wa_reg { + u32 addr; + u32 value; + /* bitmask representing WA bits */ + u32 mask; +}; + +#define I915_MAX_WA_REGS 16 + +struct i915_workarounds { + struct i915_wa_reg reg[I915_MAX_WA_REGS]; + u32 count; +}; + struct drm_i915_private { struct drm_device *dev; struct kmem_cache *slab; @@ -1590,19 +1604,7 @@ struct drm_i915_private { struct intel_shared_dpll shared_dplls[I915_NUM_PLLS]; int dpio_phy_iosf_port[I915_NUM_PHYS_VLV]; - /* -* workarounds are currently applied at different places and -* changes are being done to consolidate them so exact count is -* not clear at this point, use a max value for now. -*/ -#define I915_MAX_WA_REGS 16 - struct { - u32 addr; - u32 value; - /* bitmask representing WA bits */ - u32 mask; - } intel_wa_regs[I915_MAX_WA_REGS]; - u32 num_wa_regs; + struct i915_workarounds workarounds; /* Reclocking support */ bool render_reclock_avail; diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 816a692..12a546f 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -665,80 +665,107 @@ err: return ret; } -static inline void intel_ring_emit_wa(struct intel_engine_cs *ring, - u32 addr, u32 value) +static int intel_ring_workarounds_emit(struct intel_engine_cs *ring) { + int ret, i; struct drm_device *dev = ring-dev; struct drm_i915_private *dev_priv = dev-dev_private; + struct i915_workarounds *w = dev_priv-workarounds; - if (WARN_ON(dev_priv-num_wa_regs = I915_MAX_WA_REGS)) - return; + if (WARN_ON(w-count == 0)) + return 0; - intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1)); - intel_ring_emit(ring, addr); - intel_ring_emit(ring, value); + ring-gpu_caches_dirty =
Re: [Intel-gfx] [PATCH 3/4] drm/i915: Build workaround list in ring initialization
On 07/10/2014 15:21, Mika Kuoppala wrote: If we build the workaround list in ring initialization and decouple it from the actual writing of values, we gain the ability to decide where and how we want to apply the values. The advantage of this will become more clear when we need to initialize workarounds on older gens where it is not possible to write all the registers through ring LRIs. v2: rebase on newest bdw workarounds Cc: Arun Siluvery arun.siluv...@linux.intel.com Cc: Damien Lespiau damien.lesp...@intel.com Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com --- drivers/gpu/drm/i915/i915_debugfs.c | 20 ++-- drivers/gpu/drm/i915/i915_drv.h | 28 ++--- drivers/gpu/drm/i915/intel_ringbuffer.c | 185 ++-- 3 files changed, 130 insertions(+), 103 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index da4036d..87482f8 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -2655,18 +2655,20 @@ static int i915_wa_registers(struct seq_file *m, void *unused) intel_runtime_pm_get(dev_priv); - seq_printf(m, Workarounds applied: %d\n, dev_priv-num_wa_regs); - for (i = 0; i dev_priv-num_wa_regs; ++i) { + gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL); + + seq_printf(m, Workarounds applied: %d\n, dev_priv-workarounds.count); + for (i = 0; i dev_priv-workarounds.count; ++i) { u32 addr, mask; - addr = dev_priv-intel_wa_regs[i].addr; - mask = dev_priv-intel_wa_regs[i].mask; - dev_priv-intel_wa_regs[i].value = I915_READ(addr) | mask; - if (dev_priv-intel_wa_regs[i].addr) + addr = dev_priv-workarounds.reg[i].addr; + mask = dev_priv-workarounds.reg[i].mask; + dev_priv-workarounds.reg[i].value = I915_READ(addr) | mask; + if (dev_priv-workarounds.reg[i].addr) seq_printf(m, 0x%X: 0x%08X, mask: 0x%08X\n, - dev_priv-intel_wa_regs[i].addr, - dev_priv-intel_wa_regs[i].value, - dev_priv-intel_wa_regs[i].mask); + dev_priv-workarounds.reg[i].addr, + dev_priv-workarounds.reg[i].value, + dev_priv-workarounds.reg[i].mask); } intel_runtime_pm_put(dev_priv); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 1e476b5..f7265bf 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1448,6 +1448,20 @@ struct i915_frontbuffer_tracking { unsigned flip_bits; }; +struct i915_wa_reg { + u32 addr; + u32 value; + /* bitmask representing WA bits */ + u32 mask; +}; + +#define I915_MAX_WA_REGS 16 + +struct i915_workarounds { + struct i915_wa_reg reg[I915_MAX_WA_REGS]; + u32 count; +}; + struct drm_i915_private { struct drm_device *dev; struct kmem_cache *slab; @@ -1590,19 +1604,7 @@ struct drm_i915_private { struct intel_shared_dpll shared_dplls[I915_NUM_PLLS]; int dpio_phy_iosf_port[I915_NUM_PHYS_VLV]; - /* -* workarounds are currently applied at different places and -* changes are being done to consolidate them so exact count is -* not clear at this point, use a max value for now. -*/ -#define I915_MAX_WA_REGS 16 - struct { - u32 addr; - u32 value; - /* bitmask representing WA bits */ - u32 mask; - } intel_wa_regs[I915_MAX_WA_REGS]; - u32 num_wa_regs; + struct i915_workarounds workarounds; /* Reclocking support */ bool render_reclock_avail; diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 816a692..12a546f 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -665,80 +665,107 @@ err: return ret; } -static inline void intel_ring_emit_wa(struct intel_engine_cs *ring, - u32 addr, u32 value) +static int intel_ring_workarounds_emit(struct intel_engine_cs *ring) { + int ret, i; struct drm_device *dev = ring-dev; struct drm_i915_private *dev_priv = dev-dev_private; + struct i915_workarounds *w = dev_priv-workarounds; - if (WARN_ON(dev_priv-num_wa_regs = I915_MAX_WA_REGS)) - return; + if (WARN_ON(w-count == 0)) + return 0; - intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1)); - intel_ring_emit(ring, addr); - intel_ring_emit(ring, value); + ring-gpu_caches_dirty = true; + ret = intel_ring_flush_all_caches(ring); + if (ret) + return ret; -
Re: [Intel-gfx] [PATCH 4/4] drm/i915: Check workaround status on dfs read time
On 07/10/2014 15:21, Mika Kuoppala wrote: As the workaround list has the value as initialization time constant, we can do the simple checking on the go without negleting igt. Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com --- drivers/gpu/drm/i915/i915_debugfs.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 87482f8..dbd5dc5 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -2659,16 +2659,16 @@ static int i915_wa_registers(struct seq_file *m, void *unused) seq_printf(m, Workarounds applied: %d\n, dev_priv-workarounds.count); for (i = 0; i dev_priv-workarounds.count; ++i) { - u32 addr, mask; + u32 addr, mask, value, read; + bool ok; addr = dev_priv-workarounds.reg[i].addr; mask = dev_priv-workarounds.reg[i].mask; - dev_priv-workarounds.reg[i].value = I915_READ(addr) | mask; - if (dev_priv-workarounds.reg[i].addr) - seq_printf(m, 0x%X: 0x%08X, mask: 0x%08X\n, - dev_priv-workarounds.reg[i].addr, - dev_priv-workarounds.reg[i].value, - dev_priv-workarounds.reg[i].mask); + value = dev_priv-workarounds.reg[i].value; + read = I915_READ(addr); + ok = (value mask) == (read mask); + seq_printf(m, 0x%X: 0x%08X, mask: 0x%08X, read: 0x%08x, status: %s\n, + addr, value, mask, read, ok ? OK : FAIL); } intel_runtime_pm_put(dev_priv); Reviewed-by: Arun Siluvery arun.siluv...@linux.intel.com regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH igt] gem_workarounds: intel_wa_registers is now prefixed with i915
On 30/08/2014 22:46, Damien Lespiau wrote: Signed-off-by: Damien Lespiau damien.lesp...@intel.com --- tests/gem_workarounds.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/gem_workarounds.c b/tests/gem_workarounds.c index 6826562..32156d2 100644 --- a/tests/gem_workarounds.c +++ b/tests/gem_workarounds.c @@ -184,7 +184,7 @@ igt_main devid = intel_get_drm_devid(drm_fd); batch = intel_batchbuffer_alloc(bufmgr, devid); - fd = igt_debugfs_open(intel_wa_registers, O_RDONLY); + fd = igt_debugfs_open(i915_wa_registers, O_RDONLY); igt_assert(fd = 0); file = fdopen(fd, r); Reviewed-by: Arun Siluvery arun.siluv...@linux.intel.com regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 0/5] A few fixes on top of the wa_regs patches
On 30/08/2014 16:50, Damien Lespiau wrote: Hi Arun, I've compiled a few patches that I think solve some small-ish issues around your wa_regs series. Could you please have a look at them and comment/give your r-b tag if you judge appropriate? On top of those patches, I'd love some comments on the issues I raised in the other mail and possible follow up patches to address them. http://lists.freedesktop.org/archives/intel-gfx/2014-August/051514.html At some point, we'll also need a bit of coherence with what Mika has been doing: http://lists.freedesktop.org/archives/intel-gfx/2014-August/05.html Hi Daniel, Since the new workaround design/implementation takes time could you please pull the patches in this series to fix the issues and also the patch to change filename in igt. for the series, Reviewed-by: Arun Siluvery arun.siluv...@linux.intel.com regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 0/5] A few fixes on top of the wa_regs patches
On 01/09/2014 10:08, Daniel Vetter wrote: On Sun, Aug 31, 2014 at 08:32:55PM +0100, Siluvery, Arun wrote: On 30/08/2014 16:50, Damien Lespiau wrote: Hi Arun, I've compiled a few patches that I think solve some small-ish issues around your wa_regs series. Could you please have a look at them and comment/give your r-b tag if you judge appropriate? On top of those patches, I'd love some comments on the issues I raised in the other mail and possible follow up patches to address them. http://lists.freedesktop.org/archives/intel-gfx/2014-August/051514.html Hi Damien, I really appreciate you taking time to not just give review comments but also sending patches to fix those issues. Chris suggested a way of emitting all LRIs using a simple function and I really wanted to rework everything based on that suggestion. The LRIs are now organized in an array as opposed to sending them individually also debugfs patch can make use of it. I have removed the temporary array included in driver private structure. I think now it looks clean and we can easily add new w/a with minimal changes. Since all of the patches are modified I think it is better to squash them with the merged ones rather than updating them with new patches so I have folded your patches during rework and will send them after testing, please review them and give your comments. Please don't squash fixup patches when I've merged your patch already - usually I only drop patches when they're terminally broken, so if you send me a new version I have to fiddle things to make it all apply. But squashing in a fixup patch is simpler. And imo also easier to review. In this case all of the code is newly added so most of it should apply cleanly but if the preference is to not squash them I can send fix-up patches accordingly. regards Arun And once we deal in fixup patches it's ok to have a bunch of them imo, too. -Daniel ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/5] drm/i915: Rename intel_wa_registers with a i915_ prefix
On 30/08/2014 16:50, Damien Lespiau wrote: Those debugfs files are prefixed by i915, the name of the kernel module, presumably to make the difference with files exposed by core DRM. Also, add a ',' at the end of the last entry. This is to ease the conflict resolution when rebasing internal patches that add a member at the end of the array. Without it, wiggle can't do its job as we need to modify an existing line (appending the ','). Signed-off-by: Damien Lespiau damien.lesp...@intel.com --- drivers/gpu/drm/i915/i915_debugfs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 1467cc1..fc3d582a 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -2628,7 +2628,7 @@ static int i915_shared_dplls_info(struct seq_file *m, void *unused) return 0; } -static int intel_wa_registers(struct seq_file *m, void *unused) +static int i915_wa_registers(struct seq_file *m, void *unused) { int i; int ret; @@ -4198,7 +4198,7 @@ static const struct drm_info_list i915_debugfs_list[] = { {i915_semaphore_status, i915_semaphore_status, 0}, {i915_shared_dplls_info, i915_shared_dplls_info, 0}, {i915_dp_mst_info, i915_dp_mst_info, 0}, - {intel_wa_registers, intel_wa_registers, 0} + {i915_wa_registers, i915_wa_registers, 0}, }; #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list) Reviewed-by: Arun Siluvery arun.siluv...@linux.intel.com This is only for this patch, remaining patches are not required in the rework. regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 5/5] drm/i915: Don't restrict i915_wa_registers to BDW
On 30/08/2014 16:51, Damien Lespiau wrote: We have CHV code that already makes the test obsolete. Besides, when num_wa_regs is 0 (platforms not gathering that W/A data), we expose something sensible already. Signed-off-by: Damien Lespiau damien.lesp...@intel.com --- drivers/gpu/drm/i915/i915_debugfs.c | 5 - 1 file changed, 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index fc3d582a..cd4f045 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -2636,11 +2636,6 @@ static int i915_wa_registers(struct seq_file *m, void *unused) struct drm_device *dev = node-minor-dev; struct drm_i915_private *dev_priv = dev-dev_private; - if (!IS_BROADWELL(dev)) { - DRM_DEBUG_DRIVER(Workaround table not available !!\n); - return -EINVAL; - } - ret = mutex_lock_interruptible(dev-struct_mutex); if (ret) return ret; This can also be taken, so patches 1 and 5 in this series. Reviewed-by: Arun Siluvery arun.siluv...@linux.intel.com regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 4/4] drm/i915: Rework workaround data exporting to debugfs
On 01/09/2014 15:06, Damien Lespiau wrote: On Mon, Sep 01, 2014 at 02:28:53PM +0100, Arun Siluvery wrote: Now w/a are organized in an array so we know exactly how many of them are applied; use the same array while exporting data to debugfs and remove the temporary array we currently have in driver priv structure. Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_debugfs.c | 41 +++-- drivers/gpu/drm/i915/i915_drv.h | 14 --- drivers/gpu/drm/i915/intel_ringbuffer.c | 15 drivers/gpu/drm/i915/intel_ringbuffer.h | 8 +++ 4 files changed, 52 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 2727bda..bab0408 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -2465,6 +2465,14 @@ static int i915_wa_registers(struct seq_file *m, void *unused) struct drm_info_node *node = (struct drm_info_node *) m-private; struct drm_device *dev = node-minor-dev; struct drm_i915_private *dev_priv = dev-dev_private; + struct intel_ring_context_rodata ro_data; + + ret = ring_context_rodata(dev, ro_data); + if (ret) { + seq_printf(m, Workarounds applied: 0\n); seq_puts() + DRM_DEBUG_DRIVER(Workaround table not available !!\n); + return -EINVAL; + } ret = mutex_lock_interruptible(dev-struct_mutex); if (ret) @@ -2472,18 +2480,27 @@ static int i915_wa_registers(struct seq_file *m, void *unused) intel_runtime_pm_get(dev_priv); - seq_printf(m, Workarounds applied: %d\n, dev_priv-num_wa_regs); - for (i = 0; i dev_priv-num_wa_regs; ++i) { - u32 addr, mask; - - addr = dev_priv-intel_wa_regs[i].addr; - mask = dev_priv-intel_wa_regs[i].mask; - dev_priv-intel_wa_regs[i].value = I915_READ(addr) | mask; - if (dev_priv-intel_wa_regs[i].addr) - seq_printf(m, 0x%X: 0x%08X, mask: 0x%08X\n, - dev_priv-intel_wa_regs[i].addr, - dev_priv-intel_wa_regs[i].value, - dev_priv-intel_wa_regs[i].mask); + seq_printf(m, Workarounds applied: %d\n, ro_data.num_items/2); + for (i = 0; i ro_data.num_items; i += 2) { + u32 addr, mask, value; + + addr = ro_data.init_context[i]; + /* +* Most of workarounds are masked registers; +* to set a bit in lower 16-bits we set a mask bit in +* upper 16-bits so we can take either of them as mask but +* it doesn't work if the w/a is about clearing a bit so +* use upper 16-bits to cover both cases. +*/ + mask = ro_data.init_context[i+1] 16; Most doesn't seem good here. Either it's all and we're happy, or we need a generic way to describe the W/A (masked register or not). value + mask is generic enough to code for both cases. It seems some of them could be unmasked registers. We can use 'mask' itself to determine whether it is a masked/unmasked register. mask == 0 if it is an unmasked register. + + /* +* value represents the status of other bits in the +* register besides w/a bits +*/ + value = I915_READ(addr) | mask; + seq_printf(m, 0x%X: 0x%08X, mask: 0x%08X\n, + addr, value, mask); } I still don't get it. 'value' is supposed to be the reference value for the W/A, but you're or'ing the mask here, so you treat the mask as if it were the reference value. This won't work if the W/A is about setting multi-bits fields or about clearing a bit. The comment is still not clear enough. You're saying other bits besides the w/a bits, but or'ing the mask doesn't do that. Why do we care about the other bits in the reference value? they don't matter. Why use something else than (ro_data.init_context[i+1] 0x) for the value here (as long we're talking about masked registers)? I have always considered value as the register value (remaining bits of the register and w/a bits) and now I see your point. Yes lower 16-bits can be used as reference value, depending on whether it is a masked/unmasked we can use/not use the mask in conjunction with value in the test. regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 0/5] A few fixes on top of the wa_regs patches
On 30/08/2014 16:50, Damien Lespiau wrote: Hi Arun, I've compiled a few patches that I think solve some small-ish issues around your wa_regs series. Could you please have a look at them and comment/give your r-b tag if you judge appropriate? On top of those patches, I'd love some comments on the issues I raised in the other mail and possible follow up patches to address them. http://lists.freedesktop.org/archives/intel-gfx/2014-August/051514.html Hi Damien, I really appreciate you taking time to not just give review comments but also sending patches to fix those issues. Chris suggested a way of emitting all LRIs using a simple function and I really wanted to rework everything based on that suggestion. The LRIs are now organized in an array as opposed to sending them individually also debugfs patch can make use of it. I have removed the temporary array included in driver private structure. I think now it looks clean and we can easily add new w/a with minimal changes. Since all of the patches are modified I think it is better to squash them with the merged ones rather than updating them with new patches so I have folded your patches during rework and will send them after testing, please review them and give your comments. regards Arun At some point, we'll also need a bit of coherence with what Mika has been doing: http://lists.freedesktop.org/archives/intel-gfx/2014-August/05.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/2] drm/i915/bdw: Export workaround data to debugfs
On 30/08/2014 16:10, Damien Lespiau wrote: On Tue, Aug 26, 2014 at 02:44:51PM +0100, Arun Siluvery wrote: The workarounds that are applied are exported to a debugfs file; this is used to verify their state after the test case (reset or suspend/resume etc). This patch is only required to support i-g-t. I'm really, really confused. Please bear with me. I have reworked all the patches hopefully things will be more clear this time :) 1. We only deal with masked registers AFAICS. Those registers have the high 16 bits masking the writes. 2. The values given to intel_ring_emit_wa() are the actual values we're going to write in the register, so they include those mask bits. say: intel_ring_emit_wa(ring, GEN7_ROW_CHICKEN2, _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); 3. We then record in intel_wa_regs the reg address and two fields named mask and value. 3. a) mask intel_wa_regs[dev_priv-num_wa_regs].mask = value 0x; You're selecting the low 16bits and put it in mask. But the masked bits are the upper 16bits? This may work when the W/A is about setting bits, but we have a bug if we ever have a W/A that is about clearing a bit. It would seem better to me to grab the upper bits which are, after all, the bitmask we're interested in. it is a valid issue, changed to use upper 16-bits as mask. 3. b) value /* value is updated with the status of remaining bits of this * register when it is read from debugfs file */ dev_priv-intel_wa_regs[dev_priv-num_wa_regs].value = value; I don't understand what the comment explains. The *why* we need to do that is missing and, frankly, having to update the reference values we capture at intel_ring_emit_wa() time sounds like a bug to me. I also take a note that, at this, point, intel_wa_regs.value contains both the value and the mask. Weird. I agree the why part is missing. The idea is value represents the status of other bits in this register besides w/a bit; this is actually redundant here, I guess I added because I wanted to initialize all members. This change is not applicable in the new patches. 4. Time to expose that intel_wa_regs array to user space 4. a) mask mask = dev_priv-intel_wa_regs[i].mask Straigthforward enough, except that these are still the lower 16 bits, so the value really. 4. b) value dev_priv-intel_wa_regs[i].value = I915_READ(addr) | mask; Hum? This really started my journey to dig futher. So we: - override the reference value from intel_ring_emit_wa() with whatever we have in the register at that moment - Or it with a mask that's not really a mask (but the reference value) 5. igt test So you grab those mask and value fields from the debugs file and read the register through mapped MMIO. and then status = (current_wa[i].value current_wa[i].mask) != (wa_regs[i].value wa_regs[i].mask); So that's where I'm starting to put things back together and understand what the intention is. I still think that's not quite right, especially how we get the mask and why we read back the register in the debugfs file. We read the register value after the test case (eg reset) and compare it with a known value that is exported to debugfs file. regards Arun Or am I just missing something? In any case, having to spend that much time trying to understand what's going on is a maintainability problem, we need code that a least looks straightforward. HTH, ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC] libdrm_intel: Rework BO allocs to avoid rounding up to bucket size
On 29/08/2014 11:16, Chris Wilson wrote: On Fri, Aug 29, 2014 at 11:02:01AM +0100, Arun Siluvery wrote: From: Garry Lancaster garry.lancas...@intel.com libdrm includes a scheme where freed buffer objects (BOs) are held in a cache. This allows incoming allocation requests to be serviced by re-using an old BO, instead of requiring a new object to be allocated. This is a performance enhancement. The cache is divided into buckets. Each bucket holds unused BOs of a pre-determined size. When a BO allocation request is seen, the bucket for BOs of this size or larger is selected. Any BO currently in the bucket will be re-used for the allocation. If the bucket was empty, a new BO is created. However, the BO is created with the size determined by the selected bucket (i.e. the size is rounded up to the bucket size), rather than being created with the originally requested size. This is so that when the BO is freed, it can be released into the bucket and re-used by any other allocation which selects the same bucket. Depending upon the size of the allocation, this rounding up can result in a significant wastage of memory when allocating a BO. For example, a BO request just over 132K allocated during GLES context creation was rounded up to the next bucket size of 160K. Such wastage can be critical on devices with low memory. This commit reworks the BO allocation code. On a BO allocation request, if the selected bucket contains any BOs, each of them is checked to see if any is large enough to fulfill the allocation request. If not, a new BO is created, but (due to the new check) it is no longer necessary to round up its size to match the size determined by the selected bucket. So, previously, buckets contained BOs that were all the same size. But now the BOs in a bucket can be different sizes: in the range from the size of the next smaller, nominal, bucket size to the current, nominal, bucket size. On a 1GB system, the following reductions in BO memory usage were seen: BaseMark X 1.0:324.4MB - 306.0MB (-18.4MB; 5.7% saving) BaseMark X 1.1 Medium Quality: 206.9MB - 201.2MB (- 5.7MB; 2.8% saving) GFXBench 3.0 TRex: 216.6MB - 200.0MB (-16.6MB; 8.3% saving) GFXBench 3.0 Manhattan:281.4MB - 246.8MB (-34.6MB; 12.3% saving) No performance change was seen on BaseMarkX. GFXBench 3.0 showed small performance increases (~0.5fps on Manhattan, ~1-2fps on TRex) which may be due to reduced activity of the OOM killer. The principle for rounding up was to increase the cache hit rate and thereby reduce allocations. Might be interesting to know whether the number of bo allocated also changes. If not, the argument is that the working set is pretty stable and has a natural set of sizes which it reuses. A counter example might then be uxa, glamor, compositors which off-the-top-of-my-head would have more variable object sizes. Reducing the impact of thrashing should itself be measurable, and a useful statistic to track. As a corollary to exact allocations, you can then reduce the number of buckets again (the number was increased to allow finer-grained allocations). Again, it is hard to judge whether handing back larger objects will lead to memory wastage. So yet another statistic to track is requested versus allocated memory sizes. Reducing number of buckets would lead to more wastage of memory right? The current bucket sizes are, Bucket[0]: 4K Bucket[1]: 8K Bucket[2]: 12K Bucket[3]: 16K Bucket[4]: 20K Bucket[5]: 24K Bucket[6]: 28K Bucket[7]: 32K Bucket[8]: 40K Bucket[9]: 48K Bucket[10]: 56K Bucket[11]: 64K Bucket[12]: 80K Bucket[13]: 96K Bucket[14]: 112K Bucket[15]: 128K Bucket[16]: 160K Bucket[17]: 192K Bucket[18]: 224K Bucket[19]: 256K ... ... If there are more objects with size 132K we would end up allocating 160K. We can track requested vs allocated but that depends on the application and usage, what would be the best measure to track this? I mean we measure over a given time or any other criteria? Also it is important to state what type of system you are measuring the impact of allocations for -- the behaviour of a cache miss is dramatically different between LLC and non-LLC systems. The current data is from a non-LLC system. regards Arun -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/2] drm/i915/bdw: Export workaround data to debugfs
On 27/08/2014 16:44, Daniel Vetter wrote: On Tue, Aug 26, 2014 at 02:44:51PM +0100, Arun Siluvery wrote: The workarounds that are applied are exported to a debugfs file; this is used to verify their state after the test case (reset or suspend/resume etc). This patch is only required to support i-g-t. Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_debugfs.c | 40 + drivers/gpu/drm/i915/i915_drv.h | 14 drivers/gpu/drm/i915/intel_ringbuffer.c | 23 +++ 3 files changed, 77 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index d42db6b..f0d63f6 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -2451,20 +2451,59 @@ static int i915_shared_dplls_info(struct seq_file *m, void *unused) seq_printf(m, dpll_md: 0x%08x\n, pll-hw_state.dpll_md); seq_printf(m, fp0: 0x%08x\n, pll-hw_state.fp0); seq_printf(m, fp1: 0x%08x\n, pll-hw_state.fp1); seq_printf(m, wrpll: 0x%08x\n, pll-hw_state.wrpll); } drm_modeset_unlock_all(dev); return 0; } +static int intel_wa_registers(struct seq_file *m, void *unused) +{ + int i; + int ret; + struct drm_info_node *node = (struct drm_info_node *) m-private; + struct drm_device *dev = node-minor-dev; + struct drm_i915_private *dev_priv = dev-dev_private; + + if (!IS_BROADWELL(dev)) { + DRM_DEBUG_DRIVER(Workaround table not available !!\n); + return -EINVAL; + } + + ret = mutex_lock_interruptible(dev-struct_mutex); + if (ret) + return ret; + + intel_runtime_pm_get(dev_priv); + + seq_printf(m, Workarounds applied: %d\n, dev_priv-num_wa_regs); + for (i = 0; i dev_priv-num_wa_regs; ++i) { + u32 addr, mask; + + addr = dev_priv-intel_wa_regs[i].addr; + mask = dev_priv-intel_wa_regs[i].mask; + dev_priv-intel_wa_regs[i].value = I915_READ(addr) | mask; + if (dev_priv-intel_wa_regs[i].addr) + seq_printf(m, 0x%X: 0x%08X, mask: 0x%08X\n, + dev_priv-intel_wa_regs[i].addr, + dev_priv-intel_wa_regs[i].value, + dev_priv-intel_wa_regs[i].mask); + } + + intel_runtime_pm_put(dev_priv); + mutex_unlock(dev-struct_mutex); + + return 0; +} + struct pipe_crc_info { const char *name; struct drm_device *dev; enum pipe pipe; }; static int i915_dp_mst_info(struct seq_file *m, void *unused) { struct drm_info_node *node = (struct drm_info_node *) m-private; struct drm_device *dev = node-minor-dev; @@ -3980,20 +4019,21 @@ static const struct drm_info_list i915_debugfs_list[] = { {i915_llc, i915_llc, 0}, {i915_edp_psr_status, i915_edp_psr_status, 0}, {i915_sink_crc_eDP1, i915_sink_crc, 0}, {i915_energy_uJ, i915_energy_uJ, 0}, {i915_pc8_status, i915_pc8_status, 0}, {i915_power_domain_info, i915_power_domain_info, 0}, {i915_display_info, i915_display_info, 0}, {i915_semaphore_status, i915_semaphore_status, 0}, {i915_shared_dplls_info, i915_shared_dplls_info, 0}, {i915_dp_mst_info, i915_dp_mst_info, 0}, + {intel_wa_registers, intel_wa_registers, 0} }; #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list) static const struct i915_debugfs_files { const char *name; const struct file_operations *fops; } i915_debugfs_files[] = { {i915_wedged, i915_wedged_fops}, {i915_max_freq, i915_max_freq_fops}, {i915_min_freq, i915_min_freq_fops}, diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index bcf79f0..49b7be7 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1546,20 +1546,34 @@ struct drm_i915_private { wait_queue_head_t pending_flip_queue; #ifdef CONFIG_DEBUG_FS struct intel_pipe_crc pipe_crc[I915_MAX_PIPES]; #endif int num_shared_dpll; struct intel_shared_dpll shared_dplls[I915_NUM_PLLS]; int dpio_phy_iosf_port[I915_NUM_PHYS_VLV]; + /* +* workarounds are currently applied at different places and +* changes are being done to consolidate them so exact count is +* not clear at this point, use a max value for now. +*/ +#define I915_MAX_WA_REGS 16 + struct { + u32 addr; + u32 value; + /* bitmask representing WA bits */ + u32 mask; + } intel_wa_regs[I915_MAX_WA_REGS]; + u32 num_wa_regs; + /* Reclocking support */ bool render_reclock_avail; bool
Re: [Intel-gfx] [PATCH] igt/gem_workarounds: igt to test workaround registers
On 27/08/2014 16:59, Chris Wilson wrote: On Wed, Aug 27, 2014 at 05:50:16PM +0200, Daniel Vetter wrote: On Tue, Aug 26, 2014 at 02:50:28PM +0100, Arun Siluvery wrote: Some of the workarounds are lost followed by a gpu reset, suspend/resume; this patch adds a test which compares register state before and after the test scenario. This test currently verifies only bdw workarounds. The existing tool didn't need kernel help (other than forcewake). Why was that not used as a starting point? -Chris Do you mean intel_reg_checker()? This new test uses kernel help to get the initial state of workarounds which are exported to debugfs. We could add this known state to the test itself but Daniel is not ok with that. debugfs part is only added to support the test. regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] igt/gem_workarounds: igt to test workaround registers
On 27/08/2014 17:23, Chris Wilson wrote: On Wed, Aug 27, 2014 at 05:17:11PM +0100, Siluvery, Arun wrote: On 27/08/2014 16:59, Chris Wilson wrote: On Wed, Aug 27, 2014 at 05:50:16PM +0200, Daniel Vetter wrote: On Tue, Aug 26, 2014 at 02:50:28PM +0100, Arun Siluvery wrote: Some of the workarounds are lost followed by a gpu reset, suspend/resume; this patch adds a test which compares register state before and after the test scenario. This test currently verifies only bdw workarounds. The existing tool didn't need kernel help (other than forcewake). Why was that not used as a starting point? -Chris Do you mean intel_reg_checker()? This new test uses kernel help to get the initial state of workarounds which are exported to debugfs. We could add this known state to the test itself but Daniel is not ok with that. debugfs part is only added to support the test. I disagree vehemently with Daniel here then. The kernel lies. -Chris Just to clarify, he was not ok because the list we maintain in the test can get out of sync with the workarounds we apply in the driver which can be avoided if it is generated by the kernel itself. It may be ok to maintain the list in the test in this case considering the list is fairly small but it is not my call. regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function
On 25/08/2014 13:18, Ville Syrjälä wrote: On Fri, Aug 22, 2014 at 08:39:11PM +0100, Arun Siluvery wrote: For BDW workarounds are currently initialized in init_clock_gating() but they are lost during reset, suspend/resume etc; this patch moves the WAs that are part of register state context to render ring init fn otherwise default context ends up with incorrect values as they don't get initialized until init_clock_gating fn. v2: Add workarounds to golden render state This method has its own issues, first of all this is different for each gen and it is generated using a tool so adding new workaround and mainitaining them across gens is not a straightforward process. v3: Use LRIs to emit these workarounds (Ville) Instead of modifying the golden render state the same LRIs are emitted from within the driver. For: VIZ-4092 Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_gem_context.c | 6 +++ drivers/gpu/drm/i915/intel_pm.c | 48 -- drivers/gpu/drm/i915/intel_ringbuffer.c | 70 + drivers/gpu/drm/i915/intel_ringbuffer.h | 7 4 files changed, 83 insertions(+), 48 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 9683e62..2debce4 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -631,20 +631,26 @@ static int do_switch(struct intel_engine_cs *ring, } uninitialized = !to-legacy_hw_ctx.initialized from == NULL; to-legacy_hw_ctx.initialized = true; done: i915_gem_context_reference(to); ring-last_context = to; if (uninitialized) { + if (IS_BROADWELL(ring-dev)) { + ret = bdw_init_workarounds(ring); + if (ret) + DRM_ERROR(init workarounds: %d\n, ret); + } + ret = i915_gem_render_state_init(ring); if (ret) DRM_ERROR(init render state: %d\n, ret); } return 0; unpin_out: if (ring-id == RCS) i915_gem_object_ggtt_unpin(to-legacy_hw_ctx.rcs_state); diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index c8f744c..668acd9 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -5507,101 +5507,53 @@ static void gen8_init_clock_gating(struct drm_device *dev) struct drm_i915_private *dev_priv = dev-dev_private; enum pipe pipe; I915_WRITE(WM3_LP_ILK, 0); I915_WRITE(WM2_LP_ILK, 0); I915_WRITE(WM1_LP_ILK, 0); /* FIXME(BDW): Check all the w/a, some might only apply to * pre-production hw. */ - /* WaDisablePartialInstShootdown:bdw */ - I915_WRITE(GEN8_ROW_CHICKEN, - _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE)); - - /* WaDisableThreadStallDopClockGating:bdw */ - /* FIXME: Unclear whether we really need this on production bdw. */ - I915_WRITE(GEN8_ROW_CHICKEN, - _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE)); - /* -* This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for -* pre-production hardware -*/ - I915_WRITE(HALF_SLICE_CHICKEN3, - _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS)); - I915_WRITE(HALF_SLICE_CHICKEN3, - _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS)); I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_BWGTLB_DISABLE)); I915_WRITE(_3D_CHICKEN3, _MASKED_BIT_ENABLE(_3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2))); - I915_WRITE(COMMON_SLICE_CHICKEN2, - _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE)); - - I915_WRITE(GEN7_HALF_SLICE_CHICKEN1, - _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE)); - - /* WaDisableDopClockGating:bdw May not be needed for production */ - I915_WRITE(GEN7_ROW_CHICKEN2, - _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); /* WaSwitchSolVfFArbitrationPriority:bdw */ I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL); /* WaPsrDPAMaskVBlankInSRD:bdw */ I915_WRITE(CHICKEN_PAR1_1, I915_READ(CHICKEN_PAR1_1) | DPA_MASK_VBLANK_SRD); /* WaPsrDPRSUnmaskVBlankInSRD:bdw */ for_each_pipe(pipe) { I915_WRITE(CHICKEN_PIPESL_1(pipe), I915_READ(CHICKEN_PIPESL_1(pipe)) | BDW_DPRS_MASK_VBLANK_SRD); } - /* Use Force Non-Coherent whenever executing a 3D context. This is a -* workaround for for a possible hang in the unlikely event a TLB -* invalidation occurs during a PSD flush. -*/ - I915_WRITE(HDC_CHICKEN0, -
Re: [Intel-gfx] [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function
On 26/08/2014 11:09, Chris Wilson wrote: On Tue, Aug 26, 2014 at 10:33:16AM +0100, Arun Siluvery wrote: For BDW workarounds are currently initialized in init_clock_gating() but they are lost during reset, suspend/resume etc; this patch moves the WAs that are part of register state context to render ring init fn otherwise default context ends up with incorrect values as they don't get initialized until init_clock_gating fn. v2: Add workarounds to golden render state This method has its own issues, first of all this is different for each gen and it is generated using a tool so adding new workaround and mainitaining them across gens is not a straightforward process. v3: Use LRIs to emit these workarounds (Ville) Instead of modifying the golden render state the same LRIs are emitted from within the driver. For: VIZ-4092 Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_gem_context.c | 6 +++ drivers/gpu/drm/i915/intel_pm.c | 48 drivers/gpu/drm/i915/intel_ringbuffer.c | 78 + drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + 4 files changed, 85 insertions(+), 48 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 9683e62..2debce4 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -631,20 +631,26 @@ static int do_switch(struct intel_engine_cs *ring, } uninitialized = !to-legacy_hw_ctx.initialized from == NULL; to-legacy_hw_ctx.initialized = true; done: i915_gem_context_reference(to); ring-last_context = to; if (uninitialized) { + if (IS_BROADWELL(ring-dev)) { + ret = bdw_init_workarounds(ring); + if (ret) + DRM_ERROR(init workarounds: %d\n, ret); A good rule of thumb is that if you are exporting gen specific routines, the layering and abstraction is fishy. -Chris ok, so something like i915_init_workarounds() is ok? with a check for bdw/gen8 done inside that function. regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function
On 26/08/2014 11:34, Chris Wilson wrote: On Tue, Aug 26, 2014 at 11:16:29AM +0100, Siluvery, Arun wrote: On 26/08/2014 11:09, Chris Wilson wrote: On Tue, Aug 26, 2014 at 10:33:16AM +0100, Arun Siluvery wrote: For BDW workarounds are currently initialized in init_clock_gating() but they are lost during reset, suspend/resume etc; this patch moves the WAs that are part of register state context to render ring init fn otherwise default context ends up with incorrect values as they don't get initialized until init_clock_gating fn. v2: Add workarounds to golden render state This method has its own issues, first of all this is different for each gen and it is generated using a tool so adding new workaround and mainitaining them across gens is not a straightforward process. v3: Use LRIs to emit these workarounds (Ville) Instead of modifying the golden render state the same LRIs are emitted from within the driver. For: VIZ-4092 Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_gem_context.c | 6 +++ drivers/gpu/drm/i915/intel_pm.c | 48 drivers/gpu/drm/i915/intel_ringbuffer.c | 78 + drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + 4 files changed, 85 insertions(+), 48 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 9683e62..2debce4 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -631,20 +631,26 @@ static int do_switch(struct intel_engine_cs *ring, } uninitialized = !to-legacy_hw_ctx.initialized from == NULL; to-legacy_hw_ctx.initialized = true; done: i915_gem_context_reference(to); ring-last_context = to; if (uninitialized) { + if (IS_BROADWELL(ring-dev)) { + ret = bdw_init_workarounds(ring); + if (ret) + DRM_ERROR(init workarounds: %d\n, ret); A good rule of thumb is that if you are exporting gen specific routines, the layering and abstraction is fishy. -Chris ok, so something like i915_init_workarounds() is ok? with a check for bdw/gen8 done inside that function. Except for init_workarounds is quite useless as a function name and we already have a structure that is already customised per-engine and per-gen that you could hook into. engine-ring_init_context() ? -Chris Ok thanks, I can create a new fn ring_init_context(). regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/2] drm/i915/bdw: Apply workarounds using the golden render state
On 26/08/2014 13:53, Daniel Vetter wrote: On Fri, Aug 22, 2014 at 01:10:26PM +0100, Siluvery, Arun wrote: On 22/08/2014 12:06, Mika Kuoppala wrote: Ville Syrjälä ville.syrj...@linux.intel.com writes: On Wed, Aug 20, 2014 at 03:19:17PM +0100, Arun Siluvery wrote: Workarounds for bdw are currently applied in init_clock_gating() but they are lost following a gpu reset. Some of the WA registers are part of register state context and they are restored with every context switch so initializing them in golden render state ensures that they are applied even when we start with an uninitialized context or during hw initlialization followed by a reset. v2: Add comments corresponding to WAs in golden render state (Chris). The generation of render state is not a straighforward process, it would be ideal to augment WA values from during the setup state as opposed to using a tool but that would be a follow up patch. I'd still prefer just emitting the LRIs from code rather tha mucking about with null batch. Less hoops to jump through when adding a new w/a. I agree with this. We should aim to keep null state as per gen. Workaround set is different for gtX inside particular gen so we would need then multiple null states per gen. After brief chat with Ville, I think that the correct spot to init the context specific workarounds is after MI_SET_CONTEXT to default and right before null batch is run. If we do these with emitting LRIs to ring, we should be safe as they are then saved with default ctx. The default ctx is then used as a 'parent' for newly created contexts. Ofcource if registers get globbered, then we inherit crap. If we have the per gen null state and the ring is initializing workarounds for the default context, then in future we can save this state as 'read only golden context'. And use it as the initial state for all newly created contexts. Then the full plan how to init would look like this: #1 reset the gpu (on driver load, on resume or on hang recovery) #2 if we have 'read only golden context', copy it to default ctx #3 switch to default context #4 if we had 'read only golden context' we are done with the init. --- #5 if this is driver load thus there is no 'read only golden context' yet. #6 init workarounds through ring LRIs #7 run null/golden state batch #8 save this state as a 'read only golden context' --- #9 for each new context, initialize ctx obj with 'read only golden context' (either by memcpy or restoring from it when switching to new) I understand applying WAs using null batch has its issues but as I mentioned in the commit msg I will fix this as a follow up patch. It is going to take some time though to change the patch as per the new sequence. The patch in its current state helps fix WA issues after reset; so it can only be accepted if it is updated as per the new sequence? We already have a lot of let's fix it later experiments running, so I don't want to overload the ship. So I highly prefer to merge the revised version directly. -Daniel I understand, a revised version with LRIs emitting from the driver is already submitted and is being reviewed. regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/bdw: Render moot context reset and switch with Execlists
On 26/08/2014 06:59, Chris Wilson wrote: On Mon, Aug 25, 2014 at 10:39:39PM +0200, Daniel Vetter wrote: On Wed, Aug 20, 2014 at 04:36:05PM +0100, Chris Wilson wrote: On Wed, Aug 20, 2014 at 04:29:24PM +0100, Thomas Daniel wrote: These two functions make no sense in an Logical Ring Context Execlists world. v2: We got rid of lrc_enabled and centralized everything in the sanitized i915.enable_execlists instead. Signed-off-by: Oscar Mateo oscar.ma...@intel.com v3: Rebased. Corrected a typo in comment for i915_switch_context and added a comment that it should not be called in execlist mode. Added WARN_ON if i915_switch_context is called in execlist mode. Moved check for execlist mode out of i915_switch_context and into callers. Added comment in context_reset explaining why nothing is done in execlist mode. No, this is not the way. The requirement is to reduce the number of special cases not increase them. These should be evaluated to be no-ops when execlists is used. I think it's ok-ish for now. Maybe we need to reconsider when we wire up lrc reclaim - which is the real user of the switch_context in gpu_idle. The problem I have though is that I can't parse the subject of the patch, someone please translate that to simplified English for me. I can do the replacement while applying. No, it is not. execlists is badly designed and this is a further symptom of that. -Chris Thomas is not available and I am replying on his behalf. Is the following subject is good for this patch? Don't execute context reset and switch when using Execlists regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/2] drm/i915/bdw: Apply workarounds in render ring init function
On 26/08/2014 15:37, Ville Syrjälä wrote: On Tue, Aug 26, 2014 at 02:44:50PM +0100, Arun Siluvery wrote: For BDW workarounds are currently initialized in init_clock_gating() but they are lost during reset, suspend/resume etc; this patch moves the WAs that are part of register state context to render ring init fn otherwise default context ends up with incorrect values as they don't get initialized until init_clock_gating fn. v2: Add workarounds to golden render state This method has its own issues, first of all this is different for each gen and it is generated using a tool so adding new workaround and mainitaining them across gens is not a straightforward process. v3: Use LRIs to emit these workarounds (Ville) Instead of modifying the golden render state the same LRIs are emitted from within the driver. v4: Use abstract name when exporting gen specific routines (Chris) For: VIZ-4092 Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com This one looks good as far as I'm concerned. Reviewed-by: Ville Syrjälä ville.syrj...@linux.intel.com Do you plan to give other platforms the same treatment? We need at least CHV converted ASAP. But if you don't have a test machine I can take care of that myself. I don't have hardware for CHV, I can borrow and try to do but since it is required at the earliest could you please modify it for CHV? regards Arun --- drivers/gpu/drm/i915/i915_gem_context.c | 6 +++ drivers/gpu/drm/i915/intel_pm.c | 48 drivers/gpu/drm/i915/intel_ringbuffer.c | 79 + drivers/gpu/drm/i915/intel_ringbuffer.h | 2 + 4 files changed, 87 insertions(+), 48 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 9683e62..0a9bb0e 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -631,20 +631,26 @@ static int do_switch(struct intel_engine_cs *ring, } uninitialized = !to-legacy_hw_ctx.initialized from == NULL; to-legacy_hw_ctx.initialized = true; done: i915_gem_context_reference(to); ring-last_context = to; if (uninitialized) { + if (ring-init_context) { + ret = ring-init_context(ring); + if (ret) + DRM_ERROR(ring init context: %d\n, ret); + } + ret = i915_gem_render_state_init(ring); if (ret) DRM_ERROR(init render state: %d\n, ret); } return 0; unpin_out: if (ring-id == RCS) i915_gem_object_ggtt_unpin(to-legacy_hw_ctx.rcs_state); diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index c8f744c..668acd9 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -5507,101 +5507,53 @@ static void gen8_init_clock_gating(struct drm_device *dev) struct drm_i915_private *dev_priv = dev-dev_private; enum pipe pipe; I915_WRITE(WM3_LP_ILK, 0); I915_WRITE(WM2_LP_ILK, 0); I915_WRITE(WM1_LP_ILK, 0); /* FIXME(BDW): Check all the w/a, some might only apply to * pre-production hw. */ - /* WaDisablePartialInstShootdown:bdw */ - I915_WRITE(GEN8_ROW_CHICKEN, - _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE)); - - /* WaDisableThreadStallDopClockGating:bdw */ - /* FIXME: Unclear whether we really need this on production bdw. */ - I915_WRITE(GEN8_ROW_CHICKEN, - _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE)); - /* -* This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for -* pre-production hardware -*/ - I915_WRITE(HALF_SLICE_CHICKEN3, - _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS)); - I915_WRITE(HALF_SLICE_CHICKEN3, - _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS)); I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_BWGTLB_DISABLE)); I915_WRITE(_3D_CHICKEN3, _MASKED_BIT_ENABLE(_3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2))); - I915_WRITE(COMMON_SLICE_CHICKEN2, - _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE)); - - I915_WRITE(GEN7_HALF_SLICE_CHICKEN1, - _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE)); - - /* WaDisableDopClockGating:bdw May not be needed for production */ - I915_WRITE(GEN7_ROW_CHICKEN2, - _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); /* WaSwitchSolVfFArbitrationPriority:bdw */ I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL); /* WaPsrDPAMaskVBlankInSRD:bdw */ I915_WRITE(CHICKEN_PAR1_1, I915_READ(CHICKEN_PAR1_1) | DPA_MASK_VBLANK_SRD); /* WaPsrDPRSUnmaskVBlankInSRD:bdw */
Re: [Intel-gfx] [PATCH 1/2] drm/i915/bdw: Apply workarounds using the golden render state
On 22/08/2014 12:06, Mika Kuoppala wrote: Ville Syrjälä ville.syrj...@linux.intel.com writes: On Wed, Aug 20, 2014 at 03:19:17PM +0100, Arun Siluvery wrote: Workarounds for bdw are currently applied in init_clock_gating() but they are lost following a gpu reset. Some of the WA registers are part of register state context and they are restored with every context switch so initializing them in golden render state ensures that they are applied even when we start with an uninitialized context or during hw initlialization followed by a reset. v2: Add comments corresponding to WAs in golden render state (Chris). The generation of render state is not a straighforward process, it would be ideal to augment WA values from during the setup state as opposed to using a tool but that would be a follow up patch. I'd still prefer just emitting the LRIs from code rather tha mucking about with null batch. Less hoops to jump through when adding a new w/a. I agree with this. We should aim to keep null state as per gen. Workaround set is different for gtX inside particular gen so we would need then multiple null states per gen. After brief chat with Ville, I think that the correct spot to init the context specific workarounds is after MI_SET_CONTEXT to default and right before null batch is run. If we do these with emitting LRIs to ring, we should be safe as they are then saved with default ctx. The default ctx is then used as a 'parent' for newly created contexts. Ofcource if registers get globbered, then we inherit crap. If we have the per gen null state and the ring is initializing workarounds for the default context, then in future we can save this state as 'read only golden context'. And use it as the initial state for all newly created contexts. Then the full plan how to init would look like this: #1 reset the gpu (on driver load, on resume or on hang recovery) #2 if we have 'read only golden context', copy it to default ctx #3 switch to default context #4 if we had 'read only golden context' we are done with the init. --- #5 if this is driver load thus there is no 'read only golden context' yet. #6 init workarounds through ring LRIs #7 run null/golden state batch #8 save this state as a 'read only golden context' --- #9 for each new context, initialize ctx obj with 'read only golden context' (either by memcpy or restoring from it when switching to new) I understand applying WAs using null batch has its issues but as I mentioned in the commit msg I will fix this as a follow up patch. It is going to take some time though to change the patch as per the new sequence. The patch in its current state helps fix WA issues after reset; so it can only be accepted if it is updated as per the new sequence? regards Arun -Mika ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/2] igt/gem_workarounds: igt to test workaround registers
On 20/08/2014 16:37, Thomas Wood wrote: On 20 August 2014 15:52, Arun Siluvery arun.siluv...@linux.intel.com wrote: Some of the workarounds are lost followed by a gpu reset, suspend/resume; this patch adds a test which compares register state before and after the test scenario. This test currently verifies only bdw workarounds. Just a few points from an igt perspective: could you add the binary to tests/.gitignore and perhaps consider using igt_debug or igt_info instead of printf? There are also some debugfs helpers in igt_debugfs Thank you for the comments, corrected the patch locally, will send the updated version along with any other comments. regards Arun to open/fopen debugfs files. There are also a few other tests that implement GPU hangs, so it would be good to share code to do this between them, but not essential for this patch. Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- tests/Makefile.sources | 1 + tests/gem_workarounds.c | 238 2 files changed, 239 insertions(+) create mode 100644 tests/gem_workarounds.c diff --git a/tests/Makefile.sources b/tests/Makefile.sources index 0eb9369..a17acd1 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -127,20 +127,21 @@ TESTS_progs = \ gem_storedw_loop_vebox \ gem_threaded_access_tiled \ gem_tiled_fence_blits \ gem_tiled_pread \ gem_tiled_pread_pwrite \ gem_tiled_swapping \ gem_tiling_max_stride \ gem_unfence_active_buffers \ gem_unref_active_buffers \ gem_wait_render_timeout \ + gem_workarounds \ gen3_mixed_blits \ gen3_render_linear_blits \ gen3_render_mixed_blits \ gen3_render_tiledx_blits \ gen3_render_tiledy_blits \ gen7_forcewake_mt \ kms_force_connector \ kms_sink_crc_basic \ kms_fence_pin_leak \ pm_psr \ diff --git a/tests/gem_workarounds.c b/tests/gem_workarounds.c new file mode 100644 index 000..56bf4b1 --- /dev/null +++ b/tests/gem_workarounds.c @@ -0,0 +1,238 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + * Arun Siluvery arun.siluv...@linux.intel.com + * + */ + +#define _GNU_SOURCE +#include stdbool.h +#include unistd.h +#include stdlib.h +#include stdio.h +#include string.h +#include fcntl.h +#include inttypes.h +#include errno.h +#include sys/stat.h +#include sys/ioctl.h +#include sys/mman.h +#include time.h +#include signal.h + +#include ioctl_wrappers.h +#include drmtest.h +#include igt_debugfs.h +#include igt_aux.h +#include intel_chipset.h +#include intel_io.h + +enum operation { + GPU_RESET = 0x01, + SUSPEND_RESUME = 0x02, +}; + +struct intel_wa_reg { + uint32_t addr; + uint32_t value; + uint32_t mask; +}; + +int drm_fd; +uint32_t devid; +static drm_intel_bufmgr *bufmgr; +struct intel_batchbuffer *batch; +int num_wa; +struct intel_wa_reg *wa_regs; + + +static void test_hang_gpu(void) +{ + int retry_count = 30; + enum stop_ring_flags flags; + struct drm_i915_gem_execbuffer2 execbuf; + struct drm_i915_gem_exec_object2 gem_exec; + uint32_t b[2] = {MI_BATCH_BUFFER_END}; + + igt_assert(retry_count); + igt_set_stop_rings(STOP_RING_DEFAULTS); + + memset(gem_exec, 0, sizeof(gem_exec)); + gem_exec.handle = gem_create(drm_fd, 4096); + gem_write(drm_fd, gem_exec.handle, 0, b, sizeof(b)); + + memset(execbuf, 0, sizeof(execbuf)); + execbuf.buffers_ptr = (uintptr_t)gem_exec; + execbuf.buffer_count = 1; + execbuf.batch_len = sizeof(b); + + drmIoctl(drm_fd, DRM_IOCTL_I915_GEM_EXECBUFFER2, execbuf); + + while(retry_count--) { + flags = igt_get_stop_rings(); + if (flags == 0) +
Re: [Intel-gfx] [RFC] drm/i915/bdw: Apply workarounds to the golden render state
On 08/08/2014 10:57, Chris Wilson wrote: On Fri, Aug 08, 2014 at 10:52:57AM +0100, arun.siluv...@linux.intel.com wrote: From: Arun Siluvery arun.siluv...@linux.intel.com Workarounds for bdw are currently applied in init_clock_gating() but they are lost following a gpu reset. Some of the registers are part of register state context and they are restored with every context switch so initializing WAs in golden render state ensures that they are applied even when we start with an uninitialized context or during hw initialization followed by a reset. Interesting, but let's try to keep the opaque blobs minimal. The comments for w/a are even more valuable than the code. I agree, I will add comments to each workaround. We are looking at augmenting workarounds to the null batch in render state setup function itself. Do you have any comments with that approach? regards Arun -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC] drm/i915/bdw: Apply workarounds to the golden render state
On 08/08/2014 13:20, Ville Syrjälä wrote: On Fri, Aug 08, 2014 at 10:52:57AM +0100, arun.siluv...@linux.intel.com wrote: From: Arun Siluvery arun.siluv...@linux.intel.com Workarounds for bdw are currently applied in init_clock_gating() but they are lost following a gpu reset. Some of the registers are part of register state context and they are restored with every context switch so initializing WAs in golden render state ensures that they are applied even when we start with an uninitialized context or during hw initialization followed by a reset. This approach might require separate null states for BDW vs. CHV and IVB vs. HSW vs. VLV, which seems a bit unfortunate. Might be better to just issue the w/a register writes via LRIs from the code as part of the null state load. Yes this is a better approach, I am currently changing the code to achieve this, not sure how easy it would be. Although I don't actually undertand how this improves things as opposed to just appllying the w/as via mmio writes. Does it? I observed random behaviour CACHE_MODE_1 which simply used to lose the applied workaround on first context switch even though it is loaded with inhibit==1; register values are not supposed to change but it was changing. I think it is better to add them in null batch to ensure hardware starts with WAs applied. regards Arun Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/intel_pm.c | 50 - drivers/gpu/drm/i915/intel_renderstate_gen8.c | 62 +-- 2 files changed, 39 insertions(+), 73 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index 1ddd4df..ab64b64 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -5402,38 +5402,11 @@ static void gen8_init_clock_gating(struct drm_device *dev) /* FIXME(BDW): Check all the w/a, some might only apply to * pre-production hw. */ - /* WaDisablePartialInstShootdown:bdw */ - I915_WRITE(GEN8_ROW_CHICKEN, - _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE)); - - /* WaDisableThreadStallDopClockGating:bdw */ - /* FIXME: Unclear whether we really need this on production bdw. */ - I915_WRITE(GEN8_ROW_CHICKEN, - _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE)); - - /* -* This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for -* pre-production hardware -*/ - I915_WRITE(HALF_SLICE_CHICKEN3, - _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS)); - I915_WRITE(HALF_SLICE_CHICKEN3, - _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS)); I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_BWGTLB_DISABLE)); I915_WRITE(_3D_CHICKEN3, _MASKED_BIT_ENABLE(_3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2))); - I915_WRITE(COMMON_SLICE_CHICKEN2, - _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE)); - - I915_WRITE(GEN7_HALF_SLICE_CHICKEN1, - _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE)); - - /* WaDisableDopClockGating:bdw May not be needed for production */ - I915_WRITE(GEN7_ROW_CHICKEN2, - _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); - /* WaSwitchSolVfFArbitrationPriority:bdw */ I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL); @@ -5448,41 +5421,18 @@ static void gen8_init_clock_gating(struct drm_device *dev) BDW_DPRS_MASK_VBLANK_SRD); } - /* Use Force Non-Coherent whenever executing a 3D context. This is a -* workaround for for a possible hang in the unlikely event a TLB -* invalidation occurs during a PSD flush. -*/ - I915_WRITE(HDC_CHICKEN0, - I915_READ(HDC_CHICKEN0) | - _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT)); - /* WaVSRefCountFullforceMissDisable:bdw */ /* WaDSRefCountFullforceMissDisable:bdw */ I915_WRITE(GEN7_FF_THREAD_MODE, I915_READ(GEN7_FF_THREAD_MODE) ~(GEN8_FF_DS_REF_CNT_FFME | GEN7_FF_VS_REF_CNT_FFME)); - /* -* BSpec recommends 8x4 when MSAA is used, -* however in practice 16x4 seems fastest. -* -* Note that PS/WM thread counts depend on the WIZ hashing -* disable bit, which we don't touch here, but it's good -* to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). -*/ - I915_WRITE(GEN7_GT_MODE, - GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4); - I915_WRITE(GEN6_RC_SLEEP_PSMI_CONTROL, _MASKED_BIT_ENABLE(GEN8_RC_SEMA_IDLE_MSG_DISABLE)); /* WaDisableSDEUnitClockGating:bdw */ I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) | GEN8_SDEUNIT_CLOCK_GATE_DISABLE); - -
Re: [Intel-gfx] [RFC 2/2] igt/gem_workarounds: igt to test workaround registers
On 08/08/2014 15:12, Daniel Vetter wrote: On Fri, Aug 08, 2014 at 10:54:56AM +0100, arun.siluv...@linux.intel.com wrote: From: Arun Siluvery arun.siluv...@linux.intel.com Some of the workarounds are lost followed by a gpu reset, suspend/resume; this patch adds a test which captures register state before and after the test scenario. This test currently verifies only bdw workarounds. Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com Some comments below. --- lib/intel_reg.h | 8 ++ tests/Makefile.sources | 1 + tests/gem_workarounds.c | 211 3 files changed, 220 insertions(+) create mode 100644 tests/gem_workarounds.c diff --git a/lib/intel_reg.h b/lib/intel_reg.h index 86175bb..d015c36 100644 --- a/lib/intel_reg.h +++ b/lib/intel_reg.h @@ -3628,4 +3628,12 @@ typedef enum { #define GEN6_WIZ_HASHING_16x4 GEN6_WIZ_HASHING(1, 0) #define GEN6_WIZ_HASHING_MASK (GEN6_WIZ_HASHING(1, 1) 16) +#define GAMTARBMODE0x04a08 +#define _3D_CHICKEN3 0x02090 +#define GAM_ECOCHK 0x4090 +#define CHICKEN_PAR1_1 0x42080 +#define GEN7_FF_THREAD_MODE0x20a0 +#define GEN6_RC_SLEEP_PSMI_CONTROL 0x2050 +#define GEN8_UCGCTL6 0x9430 + #endif /* _I810_REG_H */ diff --git a/tests/Makefile.sources b/tests/Makefile.sources index 0eb9369..a17acd1 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -134,6 +134,7 @@ TESTS_progs = \ gem_unfence_active_buffers \ gem_unref_active_buffers \ gem_wait_render_timeout \ + gem_workarounds \ gen3_mixed_blits \ gen3_render_linear_blits \ gen3_render_mixed_blits \ diff --git a/tests/gem_workarounds.c b/tests/gem_workarounds.c new file mode 100644 index 000..35d1aa7 --- /dev/null +++ b/tests/gem_workarounds.c @@ -0,0 +1,211 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + * Arun Siluvery arun.siluv...@linux.intel.com + * + */ + +#define _GNU_SOURCE +#include stdbool.h +#include unistd.h +#include stdlib.h +#include stdio.h +#include string.h +#include fcntl.h +#include inttypes.h +#include errno.h +#include sys/stat.h +#include sys/ioctl.h +#include sys/mman.h +#include time.h +#include signal.h + +#include ioctl_wrappers.h +#include drmtest.h +#include igt_debugfs.h +#include igt_aux.h +#include intel_chipset.h +#include intel_io.h + +int drm_fd; +static drm_intel_bufmgr *bufmgr; +struct intel_batchbuffer *batch; +uint32_t devid; + +enum operation { + GPU_RESET, + SUSPEND_RESUME, The suspend test doesn't seem to be wire up ... Also I think it would be worth to have a module-reload version here too. Suspend/Resume is not working; device is not resuming even after the timer is elapsed. Do we know suspend/resume works correctly on nightly? +}; + +struct workaround { + const char *reg_name; + uint32_t address; +}; + +static struct workaround bdw_workarounds[] = +{ + { GEN8_ROW_CHICKEN, GEN8_ROW_CHICKEN }, + { GEN7_ROW_CHICKEN2, GEN7_ROW_CHICKEN2 }, + { HALF_SLICE_CHICKEN3, HALF_SLICE_CHICKEN3 }, + { GEN7_HALF_SLICE_CHICKEN1, GEN7_HALF_SLICE_CHICKEN1 }, + { COMMON_SLICE_CHICKEN2, COMMON_SLICE_CHICKEN2 }, + { HDC_CHICKEN0, HDC_CHICKEN0 }, + { GEN7_CACHE_MODE_1, GEN7_CACHE_MODE_1 }, + { GEN7_GT_MODE, GEN7_GT_MODE }, + { GAMTARBMODE, GAMTARBMODE }, + { _3D_CHICKEN3, _3D_CHICKEN3 }, + { GAM_ECOCHK, GAM_ECOCHK }, + { CHICKEN_PAR1_1, CHICKEN_PAR1_1 }, + { GEN7_FF_THREAD_MODE, GEN7_FF_THREAD_MODE }, + { GEN6_RC_SLEEP_PSMI_CONTROL, GEN6_RC_SLEEP_PSMI_CONTROL }, + { GEN8_UCGCTL6, GEN8_UCGCTL6 }, + { NULL, 0x }, +}; Crazy idea I've just had to
Re: [Intel-gfx] [RFC] Move BDW workarounds to ring init fn
On 28/07/2014 18:26, Ville Syrjälä wrote: On Mon, Jul 28, 2014 at 05:31:45PM +0100, arun.siluv...@linux.intel.com wrote: From: Arun Siluvery arun.siluv...@linux.intel.com This patch moves BDW workarounds from init_clock_gating() to render ring init fn otherwise they are lost when gpu is reset. In case of execlists, some of the workarounds modify registers that are part of register state context which doesn't get initialized until init_clock_gating(); this results in default context with incorrect values as it is restored and saved before updated by workarounds. I don't think it has to do with execlists. Many of the registers are part of the context image even in ring buffer mode AFAIK. Open issue: For Wa4x4STCOptimizationDisable, we set CACHE_MODE_1[6:6] = 1 At the time when HW contexts are enabled after rings are initialized with default context this workaround is valid but followed by a context switch this is getting reset, please see below log snippet. This is a bit weird. The default context should have restore inhibit==1 so it shouldn't clobber the CACHE_MODE_1 register. There was a specific magic dance you're supposed to do when accessing such registers with mmio, but here we do the write even before the first context switch. Apparently there was some kind of problem with CACHE_MODE_0 on snb too: commit 3a69ddd6f872180b6f61fda87152b37202118fbc Author: Kenneth Graunke kenn...@whitecape.org Date: Fri Apr 27 12:44:41 2012 -0700 drm/i915: Set the Stencil Cache eviction policy to non-LRA mode. but IIRC I wasn't able to reproduce it when I tried. Similar to this register I am also applying this in render ring init fn. Maybe we need to delay these register writes until we've switched to the default context? In its current state (WAs applied in init_clock_gating()) we are writing these registers after switching to default context. When a new hw context is created does all the registers part of context start with default values or they sample the current state? and at what point this sampling takes place? As a test I have updated CACHE_MODE_1 after mi_set_context() then the workaround was valid with every context switch but I think it may not be the right way otherwise we will have to update other WA registers also at this point with every context switch. regards Arun ... [5.978209] [drm:i915_pages_create_for_stolen] offset=0x0, size=8294400 [5.978213] [drm:intel_alloc_plane_obj] plane fb obj 8801472e [5.978215] [drm:i915_gem_setup_global_gtt] reserving preallocated space: 0 + 7e9000 [5.978216] [drm:i915_gem_setup_global_gtt] clearing unused GTT space: [7e9000, f000] [5.979613] [drm:i915_gem_init] CACHE_MODE_1: 0x0180 [5.981372] [drm:gen8_ppgtt_init] Allocated 4 pages for page directories (0 wasted) [5.981373] [drm:gen8_ppgtt_init] Allocated 2048 pages for page tables (0 wasted) [5.981376] [drm:i915_gem_context_init] HW context support initialized [5.981462] [drm:i915_gem_init_hw] CACHE_MODE_1: 0x0180 [5.981467] [drm:i915_gem_init_rings] CACHE_MODE_1: 0x0180 [5.981704] [drm:bdw_init_workarounds] CACHE_MODE_1: 0x01C0 [5.981716] [drm:init_status_page] bsd ring hws offset: 0x0081e000 [5.981792] [drm:init_status_page] blitter ring hws offset: 0x0083f000 [5.981910] [drm:init_status_page] video enhancement ring hws offset: 0x0086 [5.982001] [drm:i915_gem_init_hw] CACHE_MODE_1: 0x01C0 [5.982104] [drm:i915_gem_context_enable] Switch render ring to default_context [5.982106] [drm:i915_gem_render_state_init] render ring: Render state init [5.982120] [drm:do_switch] render ring, CACHE_MODE_1: 0x01C0, uninitialized: 1 [5.982121] [drm:i915_gem_context_enable] Switch bsd ring to default_context [5.982122] [drm:do_switch] bsd ring, CACHE_MODE_1: 0x01C0, uninitialized: 0 [5.982123] [drm:i915_gem_context_enable] Switch blitter ring to default_context [5.982126] [drm:do_switch] blitter ring, CACHE_MODE_1: 0x01C0, uninitialized: 0 [5.982126] [drm:i915_gem_context_enable] Switch video enhancement ring to default_context [5.982128] [drm:do_switch] video enhancement ring, CACHE_MODE_1: 0x01C0, uninitialized: 0 [5.982133] [drm:i915_gem_init] CACHE_MODE_1: 0x01C0 [5.982258] [drm:intel_init_clock_gating] ... [ 10.037019] [drm:do_switch] blitter ring, CACHE_MODE_1: 0x0180, uninitialized: 0 ... [ 10.488145] [drm:do_switch] render ring, CACHE_MODE_1: 0x0180, uninitialized: 0 ... I am currently testing this with an igt which triggers a gpu reset and compares WA register contents before and after reset but the test fails because of this register hence not sending it now. Please let me know how to keep this WA valid after a context switch. Arun Siluvery (1): drm/i915/bdw: Initialize BDW workarounds in render ring init fn drivers/gpu/drm/i915/i915_debugfs.c | 46 ++
Re: [Intel-gfx] [RFC] drm/i915/bdw: Initialize BDW workarounds in render ring init fn
On 28/07/2014 20:22, Daniel Vetter wrote: On Mon, Jul 28, 2014 at 08:00:39PM +0300, Ville Syrjälä wrote: On Mon, Jul 28, 2014 at 05:31:46PM +0100, arun.siluv...@linux.intel.com wrote: From: Arun Siluvery arun.siluv...@linux.intel.com The workarounds at the moment are initialized in init_clock_gating() but they are lost during reset; In case of execlists some workarounds modify registers that are part of register state context, since these are not initialized until init_clock_gating() default context ends up with incorrect values as render context is restored and saved before updated by workarounds hence move them to render ring init fn. This should be ok as these workarounds are not related to display clock gating. Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_debugfs.c | 46 ++ drivers/gpu/drm/i915/intel_pm.c | 59 drivers/gpu/drm/i915/intel_ringbuffer.c | 68 + 3 files changed, 114 insertions(+), 59 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 083683c..cf7da30 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -2397,20 +2397,65 @@ static int i915_shared_dplls_info(struct seq_file *m, void *unused) seq_printf(m, dpll_md: 0x%08x\n, pll-hw_state.dpll_md); seq_printf(m, fp0: 0x%08x\n, pll-hw_state.fp0); seq_printf(m, fp1: 0x%08x\n, pll-hw_state.fp1); seq_printf(m, wrpll: 0x%08x\n, pll-hw_state.wrpll); } drm_modeset_unlock_all(dev); return 0; } +static int i915_workaround_info(struct seq_file *m, void *unused) +{ + struct drm_info_node *node = (struct drm_info_node *) m-private; + struct drm_device *dev = node-minor-dev; + struct drm_i915_private *dev_priv = dev-dev_private; + int ret; + + ret = mutex_lock_interruptible(dev-struct_mutex); + if (ret) + return ret; + + if (IS_BROADWELL(dev)) { + seq_printf(m, GEN8_ROW_CHICKEN:\t0x%08x\n, + I915_READ(GEN8_ROW_CHICKEN)); + seq_printf(m, HALF_SLICE_CHICKEN3:\t0x%08x\n, + I915_READ(HALF_SLICE_CHICKEN3)); + seq_printf(m, GAMTARBMODE:\t0x%08x\n, I915_READ(GAMTARBMODE)); + seq_printf(m, _3D_CHICKEN3:\t0x%08x\n, + I915_READ(_3D_CHICKEN3)); + seq_printf(m, COMMON_SLICE_CHICKEN2:\t0x%08x\n, + I915_READ(COMMON_SLICE_CHICKEN2)); + seq_printf(m, GEN7_HALF_SLICE_CHICKEN1:\t0x%08x\n, + I915_READ(GEN7_HALF_SLICE_CHICKEN1)); + seq_printf(m, GEN7_ROW_CHICKEN2:\t0x%08x\n, + I915_READ(GEN7_ROW_CHICKEN2)); + seq_printf(m, GAM_ECOCHK:\t0x%08x\n, + I915_READ(GAM_ECOCHK)); + seq_printf(m, HDC_CHICKEN0:\t0x%08x\n, + I915_READ(HDC_CHICKEN0)); + seq_printf(m, GEN7_FF_THREAD_MODE:\t0x%08x\n, + I915_READ(GEN7_FF_THREAD_MODE)); + seq_printf(m, GEN8_UCGCTL6:\t0x%08x\n, + I915_READ(GEN8_UCGCTL6)); + seq_printf(m, GEN6_RC_SLEEP_PSMI_CONTROL:\t0x%08x\n, + I915_READ(GEN6_RC_SLEEP_PSMI_CONTROL)); + seq_printf(m, CACHE_MODE_1:\t0x%08x\n, + I915_READ(CACHE_MODE_1)); + } else + DRM_DEBUG_DRIVER(Not available for Gen%d\n, +INTEL_INFO(dev)-gen); + + mutex_unlock(dev-struct_mutex); + return 0; +} + This smells like a separate patch. But I'm not sure we want at all since intel_reg_read will provide the same information. Yeah, debugfs files that just do what intel_reg_read does are just an additional maintaince burden. I know that we have a few that dump lots of registers, but most of them dump a lot of other information, too. -Daniel I've added this mainly for testing workarounds which can be extended further as we move WAs for other chipsets but I agree it can be done with intel_reg_read. regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC] drm/i915/bdw: Initialize BDW workarounds in render ring init fn
On 28/07/2014 18:00, Ville Syrjälä wrote: On Mon, Jul 28, 2014 at 05:31:46PM +0100, arun.siluv...@linux.intel.com wrote: From: Arun Siluvery arun.siluv...@linux.intel.com The workarounds at the moment are initialized in init_clock_gating() but they are lost during reset; In case of execlists some workarounds modify registers that are part of register state context, since these are not initialized until init_clock_gating() default context ends up with incorrect values as render context is restored and saved before updated by workarounds hence move them to render ring init fn. This should be ok as these workarounds are not related to display clock gating. Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_debugfs.c | 46 ++ drivers/gpu/drm/i915/intel_pm.c | 59 drivers/gpu/drm/i915/intel_ringbuffer.c | 68 + 3 files changed, 114 insertions(+), 59 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 083683c..cf7da30 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -2397,20 +2397,65 @@ static int i915_shared_dplls_info(struct seq_file *m, void *unused) seq_printf(m, dpll_md: 0x%08x\n, pll-hw_state.dpll_md); seq_printf(m, fp0: 0x%08x\n, pll-hw_state.fp0); seq_printf(m, fp1: 0x%08x\n, pll-hw_state.fp1); seq_printf(m, wrpll: 0x%08x\n, pll-hw_state.wrpll); } drm_modeset_unlock_all(dev); return 0; } +static int i915_workaround_info(struct seq_file *m, void *unused) +{ + struct drm_info_node *node = (struct drm_info_node *) m-private; + struct drm_device *dev = node-minor-dev; + struct drm_i915_private *dev_priv = dev-dev_private; + int ret; + + ret = mutex_lock_interruptible(dev-struct_mutex); + if (ret) + return ret; + + if (IS_BROADWELL(dev)) { + seq_printf(m, GEN8_ROW_CHICKEN:\t0x%08x\n, + I915_READ(GEN8_ROW_CHICKEN)); + seq_printf(m, HALF_SLICE_CHICKEN3:\t0x%08x\n, + I915_READ(HALF_SLICE_CHICKEN3)); + seq_printf(m, GAMTARBMODE:\t0x%08x\n, I915_READ(GAMTARBMODE)); + seq_printf(m, _3D_CHICKEN3:\t0x%08x\n, + I915_READ(_3D_CHICKEN3)); + seq_printf(m, COMMON_SLICE_CHICKEN2:\t0x%08x\n, + I915_READ(COMMON_SLICE_CHICKEN2)); + seq_printf(m, GEN7_HALF_SLICE_CHICKEN1:\t0x%08x\n, + I915_READ(GEN7_HALF_SLICE_CHICKEN1)); + seq_printf(m, GEN7_ROW_CHICKEN2:\t0x%08x\n, + I915_READ(GEN7_ROW_CHICKEN2)); + seq_printf(m, GAM_ECOCHK:\t0x%08x\n, + I915_READ(GAM_ECOCHK)); + seq_printf(m, HDC_CHICKEN0:\t0x%08x\n, + I915_READ(HDC_CHICKEN0)); + seq_printf(m, GEN7_FF_THREAD_MODE:\t0x%08x\n, + I915_READ(GEN7_FF_THREAD_MODE)); + seq_printf(m, GEN8_UCGCTL6:\t0x%08x\n, + I915_READ(GEN8_UCGCTL6)); + seq_printf(m, GEN6_RC_SLEEP_PSMI_CONTROL:\t0x%08x\n, + I915_READ(GEN6_RC_SLEEP_PSMI_CONTROL)); + seq_printf(m, CACHE_MODE_1:\t0x%08x\n, + I915_READ(CACHE_MODE_1)); + } else + DRM_DEBUG_DRIVER(Not available for Gen%d\n, +INTEL_INFO(dev)-gen); + + mutex_unlock(dev-struct_mutex); + return 0; +} + This smells like a separate patch. But I'm not sure we want at all since intel_reg_read will provide the same information. struct pipe_crc_info { const char *name; struct drm_device *dev; enum pipe pipe; }; static int i915_pipe_crc_open(struct inode *inode, struct file *filep) { struct pipe_crc_info *info = inode-i_private; struct drm_i915_private *dev_priv = info-dev-dev_private; @@ -3904,20 +3949,21 @@ static const struct drm_info_list i915_debugfs_list[] = { {i915_ppgtt_info, i915_ppgtt_info, 0}, {i915_llc, i915_llc, 0}, {i915_edp_psr_status, i915_edp_psr_status, 0}, {i915_sink_crc_eDP1, i915_sink_crc, 0}, {i915_energy_uJ, i915_energy_uJ, 0}, {i915_pc8_status, i915_pc8_status, 0}, {i915_power_domain_info, i915_power_domain_info, 0}, {i915_display_info, i915_display_info, 0}, {i915_semaphore_status, i915_semaphore_status, 0}, {i915_shared_dplls_info, i915_shared_dplls_info, 0}, + {i915_workaround_info, i915_workaround_info, 0}, }; #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list) static const struct i915_debugfs_files { const char *name; const struct
Re: [Intel-gfx] WAs in init_clock_gating?
On 07/07/2014 22:24, Daniel Vetter wrote: On Mon, Jul 7, 2014 at 11:16 PM, Jesse Barnes jbar...@virtuousgeek.org wrote: I don't think it's unreasonable to use a macro that checks a global list for whether to apply a given WA. They'll be scattered all over, but at least it'll be easy to see: 1) whether we implement a given workaround and 2) which platforms steppings it applies to based on the table. Oh, I agree it's not unreasonable. But I'm kinda begging for the simple solution since months (years?) and haven't gotten it, while still getting a steady stream of bug reports and issues. So I've readjusted my expectations ;-) If someone delivers the real deal I'll certainly won't reject it. -Daniel I am moving bdw workarounds from clock_gating fn to render ring init fn and testing this before and after gpu reset. One of the workaround is to disable STC optimization, reg CACHE_MODE_1 bit6 set to 1. I observed that some times after boot this gets reset to 0 (default value) even after applying workarounds; other than workarounds no one else seems to write to this function. Any ideas about this behaviour? regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] WAs in init_clock_gating?
On 24/07/2014 13:33, Daniel Vetter wrote: On Thu, Jul 24, 2014 at 11:43:11AM +0100, Siluvery, Arun wrote: On 07/07/2014 22:24, Daniel Vetter wrote: On Mon, Jul 7, 2014 at 11:16 PM, Jesse Barnes jbar...@virtuousgeek.org wrote: I don't think it's unreasonable to use a macro that checks a global list for whether to apply a given WA. They'll be scattered all over, but at least it'll be easy to see: 1) whether we implement a given workaround and 2) which platforms steppings it applies to based on the table. Oh, I agree it's not unreasonable. But I'm kinda begging for the simple solution since months (years?) and haven't gotten it, while still getting a steady stream of bug reports and issues. So I've readjusted my expectations ;-) If someone delivers the real deal I'll certainly won't reject it. -Daniel I am moving bdw workarounds from clock_gating fn to render ring init fn and testing this before and after gpu reset. Testing = with an igt? Because I'll ask for this ;-) Yes, triggering gpu reset with igt, at the moment the test fails because of this register. One of the workaround is to disable STC optimization, reg CACHE_MODE_1 bit6 set to 1. I observed that some times after boot this gets reset to 0 (default value) even after applying workarounds; other than workarounds no one else seems to write to this function. Any ideas about this behaviour? gpu init tends to do this, since clock_gating is run before that. thanks, I will take a look. regards Arun -Daniel ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC] drm/i915: Add variable gem object size support to i915
On 25/06/2014 12:14, Damien Lespiau wrote: On Wed, Jun 25, 2014 at 11:51:33AM +0100, Damien Lespiau wrote: (This is not necessarily things one would need to take into account for this work, just a few thoughts). One thing I'm wondering is how fitting the size parameter really is when talking about inherently 2D buffers. For instance, let's take a Y-tiled texture with MIPLAYOUT_RIGHT, if we want to allocate mip map levels 0 and 1, and use the ioctl naively to reserve the LOD1 region in one go, we'll end up over allocating the space below LOD1 (if I'm not mistaken that is). This can be mitigated by several calls to this fallocate ioctl, to reserve columns of pages (in the case above, columns for the LOD1 region). So, how about trying to reduce this ioctl overhead by providing a list of (start, length) in the ioctl structure? One more thing to factor in is (let's assume one future hardware will support that): https://www.opengl.org/registry/specs/ARB/sparse_texture.txt So maybe what we really want is to be able to specify region of pages that could be specified in (x, y, width, height, stride) ? (idea popped when talking to Neil Roberts (I now have someone working on Mesa in the office). Hi Damien, Thank you for your comments and the idea to improve this ioctl. At the moment start, end of a region are expected to be page-aligned; ioctl can be modified to accept a multiple ranges and modify them in one go to reduce the overhead of the ioctl. We can define how we want to specify multiple ranges, if userspace can provide the list as (start, end) pairs kernel can directly use them but what would be the preferred way from the user point of view? regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC] manage multiple entries of scratch page with scatterlist
On 12/06/2014 08:26, Daniel Vetter wrote: On Thu, Jun 12, 2014 at 12:49:47AM +0100, Siluvery, Arun wrote: Hi, I am working on a feature to implement support for gem objects to have variable size and realized a problem with the current implementation. Please advice me how to handle this situation efficiently. In this implementation the backing store of the object is replaced with scratch pages according to input range; Initially I store table entries in an array, replace relevant entries with scratch pages and I am using sg_alloc_table_from_pages() to create new sg_table which is assigned to the object. This implementation works as expected but I realized it is wasting memory as scratch page count increases. Consider the worst case scenario where all pages are replaced with scratch pages. The fn sg_alloc_table_from_pages() first computes the number of chunks based on the page frame numbers. PFNs that are consecutive form a chunk and it allocates scatterlists for each chunk which form the sg_table. In case of scratch pages they get the same pfn for each page and sg_alloc_table_from_pages() considers them not part of a chunk and it allocates scatterlist structure for each scratch page which takes lot of memory as the object size increases. I have to tried to modify sg_alloc_table_from_pages() implementation to check for scratch pfn and consider them as single chunk but after the update when iterating through for_each_sg_page() I am seeing different page addresses instead of all pointing to scratch page. Eg. In an object of size 8 pages, scratch_page = ea000112 and pfn: 0x00044800, the result I get is, page[0]: ea000112, pfn: 0x00044800, page[1]: ea0001120040, pfn: 0x00044801, page[2]: ea0001120080, pfn: 0x00044802, page[3]: ea00011200c0, pfn: 0x00044803, page[4]: ea0001120100, pfn: 0x00044804, page[5]: ea0001120140, pfn: 0x00044805, page[6]: ea0001120180, pfn: 0x00044806, page[7]: ea00011201c0, pfn: 0x00044807, How to manage multiple pages that have same pfn with a single scatterlist and still have it's length equal to (PAGE_SIZE*chunk_size)? I would really appreciate any suggestions to improve this implementation. sg tables don't have the idea of repeating a given page, since it doesn't make a lot of sense. Is the memory overhead really a big problem? One other use case where it can be useful is for the creation of blanking buffer. Considering a frame buffer size of 8MB = 2K pages, each scatterlist is 32 bytes which takes 64K for an 8MB object. I think this overhead is acceptable which also simplifies the implementation. Extending the sg implementation with a flag somewhere to repeat a given page instead of incrementing might be possible. But will be a bit of effort to push that through the process since we'll touch code outside of drm. I will explore this option if we see any issues with the overhead. Thank you for your comments. regards Arun -Daniel ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [RFC] manage multiple entries of scratch page with scatterlist
Hi, I am working on a feature to implement support for gem objects to have variable size and realized a problem with the current implementation. Please advice me how to handle this situation efficiently. In this implementation the backing store of the object is replaced with scratch pages according to input range; Initially I store table entries in an array, replace relevant entries with scratch pages and I am using sg_alloc_table_from_pages() to create new sg_table which is assigned to the object. This implementation works as expected but I realized it is wasting memory as scratch page count increases. Consider the worst case scenario where all pages are replaced with scratch pages. The fn sg_alloc_table_from_pages() first computes the number of chunks based on the page frame numbers. PFNs that are consecutive form a chunk and it allocates scatterlists for each chunk which form the sg_table. In case of scratch pages they get the same pfn for each page and sg_alloc_table_from_pages() considers them not part of a chunk and it allocates scatterlist structure for each scratch page which takes lot of memory as the object size increases. I have to tried to modify sg_alloc_table_from_pages() implementation to check for scratch pfn and consider them as single chunk but after the update when iterating through for_each_sg_page() I am seeing different page addresses instead of all pointing to scratch page. Eg. In an object of size 8 pages, scratch_page = ea000112 and pfn: 0x00044800, the result I get is, page[0]: ea000112, pfn: 0x00044800, page[1]: ea0001120040, pfn: 0x00044801, page[2]: ea0001120080, pfn: 0x00044802, page[3]: ea00011200c0, pfn: 0x00044803, page[4]: ea0001120100, pfn: 0x00044804, page[5]: ea0001120140, pfn: 0x00044805, page[6]: ea0001120180, pfn: 0x00044806, page[7]: ea00011201c0, pfn: 0x00044807, How to manage multiple pages that have same pfn with a single scatterlist and still have it's length equal to (PAGE_SIZE*chunk_size)? I would really appreciate any suggestions to improve this implementation. regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC] drm/i915: Add variable gem object size support to i915
On 12/05/2014 18:02, Eric Anholt wrote: arun.siluv...@linux.intel.com writes: From: Siluvery, Arun arun.siluv...@intel.com This patch adds support to have gem objects of variable size. The size of the gem object obj-size is always constant and this fact is tightly coupled in the driver; this implementation allows to vary its effective size using an interface similar to fallocate(). A new ioctl() is introduced to mark a range as scratch/usable. Once marked as scratch, associated backing store is released and the region is filled with scratch pages. The region can also be unmarked at a later point in which case new backing pages are created. The range can be anywhere within the object space, it can have multiple ranges possibly overlapping forming a large contiguous range. There is only one single scratch page and Kernel allows to write to this page; userspace need to keep track of scratch page range otherwise any subsequent writes to these pages will overwrite previous content. This feature is useful where the exact size of the object is not clear at the time of its creation, in such case we usually create an object with more than the required size but end up using it partially. In devices where there are tight memory constraints it would be useful to release that additional space which is currently unused. Using this interface the region can be simply marked as scratch which releases its backing store thus reducing the memory pressure on the kernel. Many thanks to Daniel, ChrisW, Tvrtko, Bob for the idea and feedback on this implementation. v2: fix holes in error handling and use consistent data types (Tvrtko) - If page allocation fails simply return error; do not try to invoke shrinker to free backing store. - Release new pages created by us in case of error during page allocation or sg_table update. - Use 64-bit data types for start and length values to avoid truncation. The idea sounds nice to have for Mesa. We've got this ugly code right now for guessing how many levels a miptree is going to be, and then do copies if we find out we were wrong about how many the app was going to use. This will let us allocate for a maximum-depth miptree, and mark the smaller levels as unused until an image gets put there. The problem I see with this plan is if the page table twiddling ends up being too expensive in our BO reallocation path (right now, if we make the same guess on every allocation, we'll reuse cached BOs with the same size and no mapping cost). It would be nice to see some performance data from real applications, if possible. But then, I don't think I've seen any real applications hit the copy path. The way I am planning to test is to calculate the time it takes to falloc a big object. Could you suggest a best way to test the performance of this change? regards Arun ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx