Re: [Intel-gfx] [PULL] drm-intel-next
On Tue, 06 May 2014, Daniel Vetter dan...@ffwll.ch wrote: On Tue, May 06, 2014 at 10:04:17PM +0200, Knut Petersen wrote: On 06.05.2014 20:59, Daniel Vetter wrote: On Tue, May 06, 2014 at 04:30:45PM +0300, Jani Nikula wrote: On Tue, 06 May 2014, Knut Petersen knut_peter...@t-online.de wrote: On 28.04.2014 15:26, Daniel Vetter wrote: The patch below is a clear candidate for 3.15 as it fixes a regression introduced after 3.14 Egbert Eich (1): drm/i915/SDVO: For sysfs link put directory and target in correct order Daniel, want me to pick that for -fixes? Yeah I geuss so. People never complained that the link isn't there, us adding it for one kernel release to the wrong place doesn't really sound too bothersome. But if this unbreaks a system somewhere then it makes sense to apply to -fixes. Knut, how exactly does your system blow up? Well, it's only the warning generated at boot time that would irritate users. As a patch is known and tested it's probably a good idea to fix it before people complain to lkml and bugzilla. Ah, I didn't realize there's a warning, I've thought only the symlink goes the wrong way. In that case it's good to go for -fixes. Done. -- Jani Nikula, Intel Open Source Technology Center ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/SDVO: For sysfs link put directory and target in correct order
On Tue, 15 Apr 2014, Daniel Vetter dan...@ffwll.ch wrote: On Fri, Apr 11, 2014 at 08:26:25PM +0300, Imre Deak wrote: On Fri, 2014-04-11 at 19:07 +0200, Egbert Eich wrote: When linking the i2c sysfs file into the connector's directory pass directory and link target in the right order. This code was introduced with: commit 931c1c26983b4f84e33b78579fc8d57e4a14c6b4 Author: Imre Deak imre.d...@intel.com Date: Tue Feb 11 17:12:51 2014 +0200 drm/i915: sdvo: add i2c sysfs symlink to the connector's directory This is the same what we do for DP connectors, so make things more consistent. Signed-off-by: Egbert Eich e...@suse.de Oops, good catch. The fix looks ok, fwiw: Reviewed-by: Imre Deak imre.d...@intel.com Queued for -next, thanks for the patch. And picked up for -fixes and queuing for 3.15. BR, Jani. -- Jani Nikula, Intel Open Source Technology Center ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/9] drm/i915: Extract node allocation from bind
On Tue, May 06, 2014 at 10:21:31PM -0700, Ben Widawsky wrote: The DRM node allocation code was already a bit of an ugly bit of code within a complex function. Removing it serves the purpose of cleaning the function up. More importantly, it provides a way to have a preallocated (address space) VMA to easily skip this code. Something we're very likely to need. This is essentially a wrapper for DRM node allocation with an eviction + retry. It is something which could be moved to core DRM eventually, if other drivers had similar eviction semantics. This removes a goto used for something other than error unwinding, a generally reviled (I am neutral) practice, and replaces it with a function call to itself that should have the same effect. Note that it's not a recursive function as all the problem set reduction is done in the eviction code. Some might say this change is worse than before because we are using stack for each subsequent call. Frankly, I'd rather overflow the stack and blow it up than loop forever. In either case, this is addressed in the next patch. I believe, and intend, that other than the stack usage, there is no functional change here. Nope, but decoupling evict_something() further does introduce lots more implied magic. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC] drm/i915: Scratch page optimization for blanking buffer
On Wed, May 07, 2014 at 11:19:57AM +0530, Akash Goel wrote: On Mon, 2014-05-05 at 16:07 +0200, Daniel Vetter wrote: On Mon, May 05, 2014 at 05:13:18PM +0530, akash.g...@intel.com wrote: From: Akash Goel akash.g...@intel.com There is a use case, when user space (display compositor) tries to directly flip a fb (without any prior rendering) on primary plane. So the backing pages of the object are allocated at page flip time only, which takes time. Since, this buffer is supposed to serve as a blanking buffer (black colored), we can setup all the GTT entries of that blanking buffer with scratch page (which is already zeroed out). This saves the time in allocation of real backing physical space for the blanking buffer and flushing of CPU cache. Signed-off-by: Akash Goel akash.g...@intel.com That sounds very much like a special case of the fallocate ioctl where we simply allocat 0 real backing storage pages. -Daniel Hi Daniel, Yes no real backing storage is needed, but still the object should be allowed to get mapped into the GGTT, using the scratch page (already zeroed out). Is there a patch for a new drm/i915 ioctl similar to fallocate, which could be used here? Geez, you guys shouldn't just dump patches onto intel-gfx but read a bit what other people are doing. Especially when they work for the same team ;-) http://comments.gmane.org/gmane.comp.freedesktop.xorg.drivers.intel/36351 Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/bdw: Only use 2g GGTT for 32b platforms
On Tue, May 06, 2014 at 09:58:59PM -0700, Ben Widawsky wrote: From: Ben Widawsky benjamin.widaw...@linux.intel.com Daniel requested in the bug that I use a 3GB fallback size. Since this is not in the spec as a valid size, I decided against it. We could potentially add a patch to bump it to 3GB on top of this one. This probably should be CC: stable - but I'll let the powers that be decide that one. v2: Change ifdef to 32b, instead of ifndef update comment Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76619 Signed-off-by: Ben Widawsky b...@bwidawsk.net --- drivers/gpu/drm/i915/i915_gem_gtt.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 846b6ee..d03a540 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -1759,6 +1759,14 @@ static inline unsigned int gen8_get_total_gtt_size(u16 bdw_gmch_ctl) bdw_gmch_ctl = BDW_GMCH_GGMS_MASK; if (bdw_gmch_ctl) bdw_gmch_ctl = 1 bdw_gmch_ctl; + +#ifdef CONFIG_32BIT + /* Limit 32B platforms to a 2GB GGTT + 4 20 / pte size * PAGE_SIZE */ + if (bdw_gmch_ctl 4) + bdw_gmch_ctl = 4; Comment needs CodingStyle polish and we might as well return the right value directly instead of adjusting bdw_gmch_ctl. With that polish applied this is Reviewed-by: Daniel Vetter daniel.vet...@ffwll.cho And yes most definitely Cc: sta...@vger.kernel.org since it's a regression. Aside: Please add the regression tags when handling bugs, I need those for tracking and stats. -Daniel +#endif + return bdw_gmch_ctl 20; } -- 1.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/9] drm/i915: Use topdown allocation for PPGTT PDEs on gen6/7
On Tue, May 06, 2014 at 10:21:30PM -0700, Ben Widawsky wrote: It was always the intention to do the topdown allocation for context objects (Chris' idea originally). Unfortunately, I never managed to land the patch, but someone else did, so now we can use it. As a reminder, hardware contexts never need to be in the precious GTT aperture space - which is what is what happens with the normal bottom up allocation we do today. Doing a top down allocation increases the odds that the HW contexts can get out of the way, especially with per FD contexts as is done in full PPGTT Signed-off-by: Ben Widawsky b...@bwidawsk.net Queued for -next, thanks for the patch. -Daniel --- drivers/gpu/drm/i915/i915_gem_gtt.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index f6354e0..66fcfc9 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -1035,8 +1035,7 @@ alloc: ppgtt-node, GEN6_PD_SIZE, GEN6_PD_ALIGN, 0, 0, dev_priv-gtt.base.total, - DRM_MM_SEARCH_DEFAULT, - DRM_MM_CREATE_DEFAULT); + DRM_MM_TOPDOWN); if (ret == -ENOSPC !retried) { ret = i915_gem_evict_something(dev, dev_priv-gtt.base, GEN6_PD_SIZE, GEN6_PD_ALIGN, -- 1.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 4/9] drm/i915: Limit the number of node allocation retries
On Tue, May 06, 2014 at 10:21:33PM -0700, Ben Widawsky wrote: AFAICT, it's impossible to actually infinitely retry the allocation in our current code. However, a small oversight on my part, slight bug, or future bug, could easily change that. To avoid a potential breakage, a simple retry variable of 16 bits will help us prevent infinitely running. Retry is limited to 100 as a mostly random number. Some consideration about stack usage was taken into account. As an example, if we allowed 256 retries on a 32b arch (and my memory serves that all arguments are passed on the stack for such architectures), thats 33 bytes * 256 retries + (fudge for return address and such)... it's way more than we want to use already. 64b architectures might be slightly better, since 6? of the first args will get passed through registers, but it's still bad. If anything, we might want to do way less than 100, like 3. Signed-off-by: Ben Widawsky b...@bwidawsk.net So you replace the retry loop with a tailrecursive version in patch 2 and then add duct-tape later on here? Nope. -Daniel --- drivers/gpu/drm/i915/i915_gem.c | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index b6965a2..de98b26 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3216,6 +3216,7 @@ static void i915_gem_verify_gtt(struct drm_device *dev) #endif } +#define MAX_VMA_FIND_RETRY 100 static int i915_gem_find_vm_space(struct i915_address_space *vm, struct drm_mm_node *node, @@ -3224,7 +3225,8 @@ i915_gem_find_vm_space(struct i915_address_space *vm, unsigned long color, unsigned long start, unsigned long end, -uint32_t flags) +uint32_t flags, +uint8_t retry) { int ret; ret = drm_mm_insert_node_in_range_generic(vm-mm, node, @@ -3232,7 +3234,7 @@ i915_gem_find_vm_space(struct i915_address_space *vm, start, end, DRM_MM_SEARCH_DEFAULT, DRM_MM_CREATE_DEFAULT); - if (ret) { + if (ret (retry MAX_VMA_FIND_RETRY)) { if (WARN_ON(ret != -ENOSPC)) return ret; @@ -3241,7 +3243,8 @@ i915_gem_find_vm_space(struct i915_address_space *vm, if (ret == 0) return i915_gem_find_vm_space(vm, node, size, align, color, - start, end, flags); + start, end, flags, + retry++); } return ret; @@ -3306,8 +3309,9 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj, if (IS_ERR(vma)) goto err_unpin; - ret = i915_gem_find_vm_space(vm, vma-node, size, alignment, - obj-cache_level, 0, gtt_max, flags); + ret = i915_gem_find_vm_space(vm, vma-node, + size, alignment, obj-cache_level, + 0, gtt_max, flags, 1); if (ret) goto err_free_vma; -- 1.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 6/9] drm/i915: Wrap VMA binding
On Tue, May 06, 2014 at 10:21:35PM -0700, Ben Widawsky wrote: This will be useful for some upcoming patches which do more platform specific work. Having it in one central place just makes things a bit cleaner and easier. There is a small functional change here. There are more calls to the tracepoints. NOTE: I didn't actually end up using this patch for the intended purpose, but I thought it was a nice patch to keep around. v2: s/i915_gem_bind_vma/i915_gem_vma_bind/ s/i915_gem_unbind_vma/i915_gem_vma_unbind/ (Chris) v3: Missed one spot Signed-off-by: Ben Widawsky b...@bwidawsk.net You change the semantics of the vma bind/unbind tracepoints - previously they've meant we reserve/unreserve a ppgtt address for an object, not they're called every time we touch the ptes (which is what vma_bind|unbind actually do). That doesn't look too terribly useful for tracing any more. What's the use case for these added tracepoints? -Daniel --- drivers/gpu/drm/i915/i915_drv.h| 3 +++ drivers/gpu/drm/i915/i915_gem.c| 13 ++--- drivers/gpu/drm/i915/i915_gem_context.c| 2 +- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +++-- drivers/gpu/drm/i915/i915_gem_gtt.c| 16 ++-- 5 files changed, 27 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index c1d2bea..30cde3d 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2264,6 +2264,9 @@ bool i915_gem_obj_bound(struct drm_i915_gem_object *o, struct i915_address_space *vm); unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o, struct i915_address_space *vm); +void i915_gem_vma_bind(struct i915_vma *vma, enum i915_cache_level, +unsigned flags); +void i915_gem_vma_unbind(struct i915_vma *vma); struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj, struct i915_address_space *vm); struct i915_vma * diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index e9599e9..1567911 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2771,7 +2771,7 @@ int i915_vma_unbind(struct i915_vma *vma) trace_i915_vma_unbind(vma); - vma-unbind_vma(vma); + i915_gem_vma_unbind(vma); i915_gem_gtt_finish_object(obj); @@ -3342,9 +3342,8 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj, WARN_ON(flags PIN_MAPPABLE !obj-map_and_fenceable); - trace_i915_vma_bind(vma, flags); - vma-bind_vma(vma, obj-cache_level, - flags (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 0); + i915_gem_vma_bind(vma, obj-cache_level, + flags (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 0); i915_gem_verify_gtt(dev); return vma; @@ -3548,8 +3547,8 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj, list_for_each_entry(vma, obj-vma_list, vma_link) if (drm_mm_node_allocated(vma-node)) - vma-bind_vma(vma, cache_level, - obj-has_global_gtt_mapping ? GLOBAL_BIND : 0); + i915_gem_vma_bind(vma, cache_level, + obj-has_global_gtt_mapping ? GLOBAL_BIND : 0); } list_for_each_entry(vma, obj-vma_list, vma_link) @@ -3914,7 +3913,7 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj, } if (flags PIN_GLOBAL !obj-has_global_gtt_mapping) - vma-bind_vma(vma, obj-cache_level, GLOBAL_BIND); + i915_gem_vma_bind(vma, obj-cache_level, GLOBAL_BIND); vma-pin_count++; if (flags PIN_MAPPABLE) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index f77b4c1..00c7d88 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -655,7 +655,7 @@ static int do_switch(struct intel_ring_buffer *ring, if (!to-obj-has_global_gtt_mapping) { struct i915_vma *vma = i915_gem_obj_to_vma(to-obj, dev_priv-gtt.base); - vma-bind_vma(vma, to-obj-cache_level, GLOBAL_BIND); + i915_gem_vma_bind(vma, to-obj-cache_level, GLOBAL_BIND); } if (!to-is_initialized || i915_gem_context_is_default(to)) diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 6cc004f..211bbbd 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -369,7 +369,8 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj, struct i915_vma *vma =
Re: [Intel-gfx] [PATCH 7/9] drm/i915: Make aliasing a 2nd class VM
On Tue, May 06, 2014 at 10:21:36PM -0700, Ben Widawsky wrote: There is a good debate to be had about how best to fit the aliasing PPGTT into the code. However, as it stands right now, getting aliasing PPGTT bindings is a hack, and done through implicit arguments. To make this absolutely clear, WARN and return an error if a driver writer tries to do something they shouldn't. I have no issue with an eventual revert of this patch. It makes sense for what we have today. Signed-off-by: Ben Widawsky b...@bwidawsk.net Queued for -next, thanks for the patch. -Daniel --- drivers/gpu/drm/i915/i915_gem.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 1567911..c54668c 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3877,9 +3877,13 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj, uint32_t alignment, unsigned flags) { + struct drm_i915_private *dev_priv = obj-base.dev-dev_private; struct i915_vma *vma; int ret; + if (WARN_ON(vm == dev_priv-mm.aliasing_ppgtt-base)) + return -ENODEV; + if (WARN_ON(flags (PIN_GLOBAL | PIN_MAPPABLE) !i915_is_ggtt(vm))) return -EINVAL; -- 1.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 9/9] drm/i915: Split out aliasing binds
On Tue, May 06, 2014 at 10:21:38PM -0700, Ben Widawsky wrote: This patch finishes off actually separating the aliasing and global finds. Prior to this, all global binds would be aliased. Now if aliasing binds are required, they must be explicitly asked for. So far, we have no users of this outside of execbuf - but Mika has already submitted a patch requiring just this. A nice benefit of this is we should no longer be able to clobber GTT only objects from the aliasing PPGTT. TEST=gem_storedw_batches_loop Hm, how does this testcase test this bug? I'm surprised ... Also the usual format is Testcase: igt/gem_stored_batches_loop i.e. the testcase name as used by piglit. Anyway I want these last two patches from this series since they fix up a regression introduced by full ppgtt. Which is blocking the cmd parser work (among other stuff). But I'm not sold on the earlier parts, so can you please rebase just these two? Also we need to triple-check that we have testcases for all the rebind and set_cache_level corner-case fun in igt. -Daniel Signed-off-by: Ben Widawsky b...@bwidawsk.net --- drivers/gpu/drm/i915/i915_drv.h| 2 +- drivers/gpu/drm/i915/i915_gem.c| 6 -- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 +- drivers/gpu/drm/i915/i915_gem_gtt.c| 3 +++ 4 files changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 413b114..ba897f0 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2316,7 +2316,7 @@ i915_gem_obj_ggtt_pin(struct drm_i915_gem_object *obj, uint32_t alignment, unsigned flags) { - return i915_gem_object_pin(obj, obj_to_ggtt(obj), alignment, flags | PIN_GLOBAL_ALIASED); + return i915_gem_object_pin(obj, obj_to_ggtt(obj), alignment, flags | PIN_GLOBAL); } static inline int diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 86cec5c..b2056f3 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3346,8 +3346,10 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj, WARN_ON(flags PIN_MAPPABLE !obj-map_and_fenceable); - if (flags PIN_GLOBAL_ALIASED) - vma_bind_flags = GLOBAL_BIND | ALIASING_BIND; + if (flags PIN_ALIASING) + vma_bind_flags = ALIASING_BIND; + if (flags PIN_GLOBAL) + vma_bind_flags = GLOBAL_BIND; i915_gem_vma_bind(vma, obj-cache_level, vma_bind_flags); diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index bcb3ae8..60049b4 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -549,7 +549,7 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma, unsigned flags; int ret; - flags = 0; + flags = PIN_ALIASING; need_fence = has_fenced_gpu_access diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 3f2f84e..846b6ee 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -1569,6 +1569,9 @@ static void ggtt_bind_vma(struct i915_vma *vma, } } + if (!(flags ALIASING_BIND)) + return; + if (dev_priv-mm.aliasing_ppgtt (!obj-has_aliasing_ppgtt_mapping || (cache_level != obj-cache_level))) { -- 1.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC] drm/i915 : Reduce the shmem page allocation time by using blitter engines for clearing pages.
On Tue, 2014-05-06 at 13:12 +, Chris Wilson wrote: On Tue, May 06, 2014 at 12:59:37PM +, Gupta, Sourab wrote: On Tue, 2014-05-06 at 11:34 +, Chris Wilson wrote: On Tue, May 06, 2014 at 04:40:58PM +0530, sourab.gu...@intel.com wrote: From: Sourab Gupta sourab.gu...@intel.com This patch is in continuation of and is dependent on earlier patch series to 'reduce the time for which device mutex is kept locked'. (http://lists.freedesktop.org/archives/intel-gfx/2014-May/044596.html) This patch aims to reduce the allocation time of pages from shmem by using blitter engines for clearing the pages freshly alloced. This is utilized in case of fresh pages allocated in shmem_preallocate routines in execbuffer path and page_fault path only. Even though the CPU memset routine is optimized, but still sometimes the time consumed in clearing the pages of a large buffer comes in the order of milliseconds. We intend to make this operation asynchronous by using blitter engine, so irrespective of the size of buffer to be cleared, the execbuffer ioctl processing time will not be affected. Use of blitter engine will make the overall execution time of 'execbuffer' ioctl lesser for large buffers. There may be power implications here on using blitter engines, and we have to evaluate this. As a next step, we can selectively enable this HW based memset only for large buffers, where the overhead of adding commands in a blitter ring(which will otherwise be idle), cross ring synchronization, will be negligible compared to using the CPU to clear out the buffer. You leave a lot of holes by which you leak the uncleared pages to userspace. -Chris Hi Chris, Are you ok with the overall design as such, and the shmem_read_mapping_page_gfp_noclear interface? Is the leak of uncleared pages happening due to implementation issues? If so, we'll try to mitigate them. Actually, along similar lines there is an even more fundamental issue. You should only clear the objects if the pages have not been prepopulated. -Chris Hi Chris, This patch is in continuation of the shmem preallocate patch sent by Akash earlier. (http://lists.freedesktop.org/archives/intel-gfx/2014-May/044597.html) We employ this method only in case of the preallocate routine, which will be called in the first page fault of the object resulting in fresh allocation of pages. This is controlled by means of a flag 'require_clear' which is set in preallocate routine(which will be come into picture only in case of fresh allocation). If pages are already populated for the object, this won't come into picture. Also, we'll try to fix the leak of uncleared pages due to any implementation issues. Regards, Sourab ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC] drm/i915 : Reduce the shmem page allocation time by using blitter engines for clearing pages.
On Tue, 2014-05-06 at 17:56 +, Eric Anholt wrote: sourab.gu...@intel.com writes: From: Sourab Gupta sourab.gu...@intel.com This patch is in continuation of and is dependent on earlier patch series to 'reduce the time for which device mutex is kept locked'. (http://lists.freedesktop.org/archives/intel-gfx/2014-May/044596.html) One of userspace's assumptions is that when you allocate a new BO, you can map it and start writing data into it without needing to wait on the GPU. I expect this patch to mostly hurt performance on apps (and I note that the patch doesn't come with any actual performance data) that get more stalls as a result. Hi Eric, Yes, it may hurt the performance on apps, in case of small buffers and if blitter engine is busy as there is a synchronous wait for rendering in the gem_fault handler. If that is the case, we can drop this from the gem_fault routine and employ it only in the do_execbuffer routine. Its useful there because there is no synchronous wait required in sw, due to cross ring synchronization. We'll gather the numbers to quantify the performance benefit we have while using blitter engines in this way for different buffer sizes. More importantly, though, it breaks existing userspace that relies on buffers being idle on allocation, for the unsychronized maps used in intel_bufferobj_subdata() and intel_bufferobj_map_range(GL_INVALIDATE_BUFFER_BIT | GL_UNSYNCHRONIZED_BIT) Sorry, I miss your point here. It may not break this assumption due to the fact that we employ this method only in case of the preallocate routine, which will be called in the first page fault of the object (gem_fault handler) resulting in fresh allocation of pages. So, in case of unsynchronized maps, there may be a wait involved in the first page fault. Also, that wait time may be lesser than the time required for CPU memset (resulting in no performance hit). There won't be any subsequent waits afterwards for that buffer object. Though, we'll have performance hit in the case when blitter engine is already busy and may not be available to immediately start the memset of freshly allocated mmaped buffers. Am I missing something here? Does the userspace requirement for unsynchronized mapped objects involve complete idleness of object on gpu even when object page faults for the first time? Regards, Sourab ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC] libdrm_intel: Add support for userptr objects
On 05/02/2014 06:15 PM, Ben Widawsky wrote: On Fri, May 02, 2014 at 11:27:45AM +0100, Tvrtko Ursulin wrote: Some people expressed interest in tiling so I thought to preserve it in the API. Kernel even allows it, and it is just because Chris found bspec references saying it will break badly on some platforms, that I (or maybe it was both of us) decided to disable it on this level for time being. So I think it is just a matter of coming up with a blacklist and it would be possible to use it then. Well again, maybe this is specific to my usecase, but I can never imagine a use for such fields at this stage in the buffer's lifecycle. Could you get some people to describe how they want to use it? Actually thinking about it more, when I collated requirements from various groups they were not all that interested. But Chris was actually in favour of keeping it in the kernel rather than disabling it altogether. So I decided to keep it in the userspace API and only reject the attempts to use it for time being. I think a param for USERPTR is warranted. Or did we decide not to do this already? I asked for it, but people with authority said no. It is all very weak, fragile and dangerous anyway - param or feature detect like the above. We would really need a proper feature detect mechanism, like text based in sysfs or something. I don't see why a param is fragile. Feature detect OTOH is almost always implemented in a fragile manner. Not if we had a text file in sysfs with names rather than numbers (getparam) without a central allocation authority. HAS_PARAM(FEATURE1) actually just tricks you into thinking it's fine, while actually you are just asking Do I have feature 48. What is feature 48? Who knows... some features never make it to upstream, some do and then get their HAS_PARAM number reallocated so it is really weak. Something like grep userptr /sys/kernel/debug/dri/0/i915_features, or stat /sys/kernel/debug/dri/0/i915/features/userptr would be much better. This way or the other, it seems to be there is not consensus with upstream gate keepers whether to have it or not. It was Chris actually who ripped it out. Personally I can see both arguments which is why I think we should come up with something better. Probably don't need the special function pointer yet since I don't think we can yet envision use cases which will require any kind of special handling. A simple has_userptr in bufmgr_gem will probably suffice. But I don't care too much either way. What do you mean? Don't add bo_alloc_userptr to the bufmgr. Just add the prototype to intel_bufmgr.h. Not sure, wouldn't like to make inconsistent. I'm pretty close to running this with most of the changes I had asked you for. I need to see how much of your igt test I can reuse now. Cool, so there is nothing for me to do. :) Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC] drm/i915: Scratch page optimization for blanking buffer
On Wed, May 7, 2014 at 9:39 AM, Daniel Vetter dan...@ffwll.ch wrote: Yes no real backing storage is needed, but still the object should be allowed to get mapped into the GGTT, using the scratch page (already zeroed out). Is there a patch for a new drm/i915 ioctl similar to fallocate, which could be used here? Geez, you guys shouldn't just dump patches onto intel-gfx but read a bit what other people are doing. Especially when they work for the same team ;-) http://comments.gmane.org/gmane.comp.freedesktop.xorg.drivers.intel/36351 Oops, I've mixed up names and you're not actually on the same team. Sorry about that. Still when you do upstream development please try to follow a bit what's happening in general so that we can maximally exploit opportunities for collaboration. That's why we have such a big emphasis on open discussion fora like mailing lists or irc channel. So I very much want people to jump into discussions of patches from other groups (or even outside from Intel) if they see some overlap (or other possible benefits like sharing learned lessons) with their own work. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/2] rendercopy/bdw: Enable hw-generated binding tables
I quickly cobbled together a hsw version of this and gave it a whirl on one machine. Seems to work just fine here, and no lockups when switching between hw and sw binding tables. Did you get the lockups on hsw even with rendercopy? Here's my hsw version: From 17eeb8021815e2c18d6ba9b2185a37904296c2d9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= ville.syrj...@linux.intel.com Date: Wed, 7 May 2014 12:33:01 +0300 Subject: [PATCH] rendercopy: use resource streamer on hsw MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com --- lib/gen7_render.h | 16 +++- lib/rendercopy_gen7.c | 103 +- 2 files changed, 115 insertions(+), 4 deletions(-) diff --git a/lib/gen7_render.h b/lib/gen7_render.h index 1661d4c..58a88ef 100644 --- a/lib/gen7_render.h +++ b/lib/gen7_render.h @@ -155,8 +155,11 @@ #define GEN7_PIPE_CONTROL_IS_FLUSH (1 11) #define GEN7_PIPE_CONTROL_TC_FLUSH (1 10) #define GEN7_PIPE_CONTROL_NOTIFY_ENABLE (1 8) -#define GEN7_PIPE_CONTROL_GLOBAL_GTT(1 2) -#define GEN7_PIPE_CONTROL_LOCAL_PGTT(0 2) +#define GEN7_PIPE_CONTROL_FLUSH(1 7) +#define GEN7_PIPE_CONTROL_DC_FLUSH (1 5) +#define GEN7_PIPE_CONTROL_VF_INVALIDATE (1 4) +#define GEN7_PIPE_CONTROL_CC_INVALIDATE (1 2) +#define GEN7_PIPE_CONTROL_SC_INVALIDATE (1 2) #define GEN7_PIPE_CONTROL_STALL_AT_SCOREBOARD (1 1) #define GEN7_PIPE_CONTROL_DEPTH_CACHE_FLUSH(1 0) @@ -1361,4 +1364,13 @@ typedef enum { EXTEND_COUNT } sampler_extend_t; +/* HSW+ resource streamer */ +#define HSW_3DSTATE_BINDING_TABLE_POOL_ALLOC GEN7_3D(3, 1, 0x19) +# define BINDING_TABLE_POOL_ENABLE (1 11) +#define HSW_3DSTATE_BINDING_TABLE_EDIT_VS GEN7_3D(3, 0, 0x43) +#define HSW_3DSTATE_BINDING_TABLE_EDIT_GS GEN7_3D(3, 0, 0x44) +#define HSW_3DSTATE_BINDING_TABLE_EDIT_HS GEN7_3D(3, 0, 0x45) +#define HSW_3DSTATE_BINDING_TABLE_EDIT_DS GEN7_3D(3, 0, 0x46) +#define HSW_3DSTATE_BINDING_TABLE_EDIT_PS GEN7_3D(3, 0, 0x47) + #endif diff --git a/lib/rendercopy_gen7.c b/lib/rendercopy_gen7.c index 5131d8f..4efccb9 100644 --- a/lib/rendercopy_gen7.c +++ b/lib/rendercopy_gen7.c @@ -21,6 +21,9 @@ #include gen7_render.h #include intel_reg.h +#ifndef I915_EXEC_RESOURCE_STREAMER +#define I915_EXEC_RESOURCE_STREAMER (113) +#endif static const uint32_t ps_kernel[][4] = { { 0x0080005a, 0x2e2077bd, 0x00c0, 0x008d0040 }, @@ -73,11 +76,14 @@ gen7_render_flush(struct intel_batchbuffer *batch, drm_intel_context *context, uint32_t batch_end) { int ret; + uint32_t flags = I915_EXEC_RENDER; ret = drm_intel_bo_subdata(batch-bo, 0, 4096, batch-buffer); + if (batch-use_resource_streamer) + flags |= I915_EXEC_RESOURCE_STREAMER; if (ret == 0) ret = drm_intel_gem_bo_context_exec(batch-bo, context, - batch_end, 0); + batch_end, flags); assert(ret == 0); } @@ -219,6 +225,75 @@ static void gen7_emit_vertex_buffer(struct intel_batchbuffer *batch, OUT_BATCH(0); } +static void +gen7_hw_binding_table(struct intel_batchbuffer *batch, bool enable) +{ + if (!enable) { + OUT_BATCH(MI_RS_CONTROL | 0x0); + + OUT_BATCH(HSW_3DSTATE_BINDING_TABLE_POOL_ALLOC | (3 - 2)); + /* binding table pool base address */ + OUT_BATCH(3 5); + /* Upper bound */ + OUT_BATCH(0); + + OUT_BATCH(GEN7_PIPE_CONTROL | (4 - 2)); + OUT_BATCH(GEN7_PIPE_CONTROL_CS_STALL | GEN7_PIPE_CONTROL_STALL_AT_SCOREBOARD); + OUT_BATCH(0); + OUT_BATCH(0); + + OUT_BATCH(GEN7_PIPE_CONTROL | (4 - 2)); + OUT_BATCH(GEN7_PIPE_CONTROL_SC_INVALIDATE); + OUT_BATCH(0); + OUT_BATCH(0); + + return; +} + OUT_BATCH(HSW_3DSTATE_BINDING_TABLE_POOL_ALLOC | (3 - 2)); + + /* binding table pool base address */ + OUT_RELOC(batch-hw_bt_pool_bo, I915_GEM_DOMAIN_SAMPLER, 0, + BINDING_TABLE_POOL_ENABLE | (3 5)); + + /* Upper bound */ + OUT_RELOC(batch-hw_bt_pool_bo, I915_GEM_DOMAIN_SAMPLER, 0, + batch-hw_bt_pool_bo-size); + + OUT_BATCH(GEN7_PIPE_CONTROL | (4 - 2)); + OUT_BATCH(GEN7_PIPE_CONTROL_CS_STALL | GEN7_PIPE_CONTROL_STALL_AT_SCOREBOARD); + OUT_BATCH(0); + OUT_BATCH(0); + + OUT_BATCH(GEN7_PIPE_CONTROL | (4 - 2)); + OUT_BATCH(GEN7_PIPE_CONTROL_SC_INVALIDATE); + OUT_BATCH(0); + OUT_BATCH(0); +} + +static uint32_t +gen7_rs_bind_surfaces(struct intel_batchbuffer *batch, + struct igt_buf *src, + struct igt_buf *dst, +
Re: [Intel-gfx] [PATCH 1/2] rendercopy/bdw: Enable hw-generated binding tables
On Wed, May 07, 2014 at 02:49:31PM +0300, Ville Syrjälä wrote: I quickly cobbled together a hsw version of this and gave it a whirl on one machine. Seems to work just fine here, and no lockups when switching between hw and sw binding tables. Did you get the lockups on hsw even with rendercopy? This intrigues me that we should do a context continuation rendercpy (with the state setup split across multiple batches) and lots of context switching. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 00/66] runtime pm for DPMS
On Fri, 2014-04-25 at 10:45 +0200, Daniel Vetter wrote: Ok, review assignements. Please complain if you don't have the bandwidth so that I can find someone else. On Thu, Apr 24, 2014 at 11:54 PM, Daniel Vetter daniel.vet...@ffwll.ch wrote: Daniel Vetter (66): drm/i915: Make encoder-mode_set callbacks optional drm/i915/dvo: Remove -mode_set callback drm/i915/tv: extract set_tv_mode_timings drm/i915/tv: extract set_color_conversion drm/i915/tv: De-magic device check drm/i915/tv: Rip out pipe-disabling nonsense from -mode_set drm/i915/tv: Remove -mode_set callback drm/i915/crt: Remove -mode_set callback drm/i915/sdvo: Remove -mode_set callback Removal of encoder-mode_set callbacks, part 1 Reviewer: Imre drm/i915/hdmi: Enable hdmi mode on g4x, too drm/i915: Track hdmi mode in the pipe config drm/i915/sdvo: Use pipe_config-limited_color_range consistently drm/i915: state readout and cross checking for limited_color_range drm/i915/sdvo: use config-has_hdmi_sink drm/i915: Simplify audio handling on DDI ports drm/i915: Track has_audio in the pipe config drm/i915/dp: Move port A pll setup to g4x_pre_enable_dp drm/i915/dp: Remove -mode_set callback drm/i915/hdmi: Remove redundant IS_VLV checks drm/i915/hdmi: Remove -mode_set callback Removal of the encoder-mode_set callbacks for hdmi/sdvo/dp with small interludes to move a bit of the hdmi/audio state into the pipe config. Reviewer: Naresh Kumar drm/i915/lvds: Remove -mode_set callback drm/i915/ddi: Remove -mode_set callback drm/i915/dsi: Remove -mode_set callback drm/i915: Stop calling encoder-mode_set Final removals of encoder-mode_set callbacks Reviewer: Imre On patches 21-24: Reviewed-by: Imre Deak imre.d...@intel.com drm/i915: Make -update_primary_plane infallible drm/i915: More cargo-culted locking for intel_update_fbc drm/i915: Sprinkle intel_edp_psr_update over crtc_enable/disable drm/i915: Inline set_base into crtc_mode_set drm/i915: Move fb pinning into __intel_set_mode Some shuffling to get the primary-fb handling out of crtc mode_set callbacks Reviewer: Akash Goel drm/i915: Shovel hw setup code out of i9xx_crtc_mode_set drm/i915: Move lowfreq_avail around a bit in ilk/hsw_crtc_mode_set drm/i915: Shovel hw setup code out of ilk_crtc_mode_set drm/i915: Shovel hw setup code out of hsw_crtc_mode_set drm/i915: Extract i9xx_set_pll_dividers drm/i915: Extract vlv_prepare_pll gmch pll moved out of crtc mode_set callbacks into -enable hooks Reviewer: Shobhit Kumar drm/i915: Only update shared dpll state when needed drm/i915: Extract intel_prepare_shared_dpll drm/i915: s/ironlake_/intel_ for the enable_share_dpll function Prep polish on the existing shared_dpll code Reviewer: Damien (same comment as below) drm/i915: Check hw state in assert_can_disable_lcpll drm/i915: Remove spll_refcount for hsw drm/i915: Clean up WRPLL/SPLL #defines drm/i915: Make intel_wait_for_pipe_off static drm/i915: Disable pipe before ports on ilk drm/i915: Pass port explicitly to intel_ddi_get_hw_state drm/i915: Unexport intel_ddi_connector_get_hw_state drm/i915: Move hsw_fdi_link_train into intel_crt.c drm/i915: Move pch fifo underrun report enabling to hsw_crt_pre_enable drm/i915: Move the SPLL enabling into hsw_crt_pre_enable drm/i915: Move lpt_pch_enable int hsw_crt_enable drm/i915: Move the pch fifo underrun handling into hsw_crt_disable drm/i915: Move lpt_disable_pch_transcoder into the hsw crt encoder drm/i915: Move pch fifo underrun report re-enabling into hsw_crt_post_disable drm/i915: Move the hsw fdi disabling into hsw_crt_post_disable drm/i915: Move SPLL disabling into hsw_crt_post_disable Create a new hsw-specific crt encoder which subsumes the entire fdi/pch handling on haswell. This has the nice upshot to make SPLL logically a port-private clock and so removes it from further considerations. Reviewer: Paulo drm/i915: Add a debugfs file for the shared dpll state drm/i915: Move ddi_pll_sel into the pipe config drm/i915: State readout and cross-checking for ddi_pll_sel drm/i915: Precompute static ddi_pll_sel values in encoders drm/i915: Basic shared dpll support for WRPLLs drm/i915: Document that the pll-mode_set hook is optional drm/i915: State readout support for WRPLLs drm/i915: -disable hook for WRPLLs drm/i915: -enable hook for WRPLLs drm/i915: Switch to common shared dpll framework for WRPLLs drm/i915: Only touch WRPLL hw state in enable/disable hooks Convert wrpll handling to the common shared_dpll framework. We need this since runtime pm for dpms requires us to separately track pll refernces from crtcs and active usage by crtcs Reviewer: Damien (maybe find someone from the vpg guys who do the pll
[Intel-gfx] [PATCH] tests/gem_flink_race, prime_self_import: fix object counts
We need to add one drm_open_any() before getting the object counts as first call to drm_open_any() allocates file descriptors for exit handlers and thus is not symmetrical Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77867 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77875 Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com --- tests/gem_flink_race.c|8 tests/prime_self_import.c |8 2 files changed, 16 insertions(+) diff --git a/tests/gem_flink_race.c b/tests/gem_flink_race.c index 3353b83..f129cb3 100644 --- a/tests/gem_flink_race.c +++ b/tests/gem_flink_race.c @@ -163,6 +163,11 @@ static void test_flink_close(void) int r, i, num_threads; int obj_count; void *status; + int fake; + + /* Allocate exit handler fds in here so that we dont screw +* up the counts */ + fake = drm_open_any(); obj_count = get_object_count(); @@ -193,6 +198,9 @@ static void test_flink_close(void) obj_count = get_object_count() - obj_count; printf(leaked %i objects\n, obj_count); + + close(fake); + igt_assert(obj_count == 0); } diff --git a/tests/prime_self_import.c b/tests/prime_self_import.c index efdd24f..41c203f 100644 --- a/tests/prime_self_import.c +++ b/tests/prime_self_import.c @@ -258,6 +258,11 @@ static void test_reimport_close_race(void) int obj_count; void *status; uint32_t handle; + int fake; + + /* Allocate exit handler fds in here so that we dont screw +* up the counts */ + fake = drm_open_any(); obj_count = get_object_count(); @@ -294,6 +299,9 @@ static void test_reimport_close_race(void) obj_count = get_object_count() - obj_count; printf(leaked %i objects\n, obj_count); + + close(fake); + igt_assert(obj_count == 0); } -- 1.7.9.5 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 01/10] drm/i915/bdw: Implement a basic PM interrupt handler
On Mon, May 05, 2014 at 06:17:30PM +0530, deepa...@linux.intel.com wrote: From: Ben Widawsky benjamin.widaw...@intel.com Almost all of it is reusable from the existing code. The primary difference is we need to do even less in the interrupt handler, since interrupts are not shared in the same way. The patch is mostly a copy-paste of the existing snb+ code, with updates to the relevant parts requiring changes to the interrupt handling. As such it /should/ be relatively trivial. It's highly likely that I missed some places where I need a gen8 version of the PM interrupts, but it has become invisible to me by now. This patch could probably be split into adding the new functions, followed by actually handling the interrupts. Since the code is currently disabled (and broken) I think the patch stands better by itself. v2: Move the commit about not touching the ringbuffer interrupt to the snb_* function where it belongs (Rodrigo) v3: Rebased on Paulo's runtime PM changes v4: Not well validated, but rebase on commit 730488b2eddded4497f63f70867b1256cd9e117c Author: Paulo Zanoni paulo.r.zan...@intel.com Date: Fri Mar 7 20:12:32 2014 -0300 drm/i915: kill dev_priv-pm.regsave v5: Rebased on latest code base. (Deepak) v6: Remove conflict markers, Unnecessary empty line and use right IIR interrupt (Ville) Signed-off-by: Ben Widawsky b...@bwidawsk.net Signed-off-by: Deepak S deepa...@linux.intel.com --- drivers/gpu/drm/i915/i915_irq.c | 75 ++-- drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/intel_drv.h | 2 ++ drivers/gpu/drm/i915/intel_pm.c | 38 ++-- 4 files changed, 112 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 2d76183..6af51ad 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -248,6 +248,49 @@ static bool ivb_can_enable_err_int(struct drm_device *dev) return true; } +/** + * bdw_update_pm_irq - update GT interrupt 2 + * @dev_priv: driver private + * @interrupt_mask: mask of interrupt bits to update + * @enabled_irq_mask: mask of interrupt bits to enable + * + * Copied from the snb function, updated with relevant register offsets + */ +static void bdw_update_pm_irq(struct drm_i915_private *dev_priv, + uint32_t interrupt_mask, + uint32_t enabled_irq_mask) +{ + uint32_t new_val; + + assert_spin_locked(dev_priv-irq_lock); + + if (dev_priv-pm.irqs_disabled) { + WARN(1, IRQs disabled\n); + return; + } Could be just 'if (WARN...' just like in the snb function. + + new_val = dev_priv-pm_irq_mask; + new_val = ~interrupt_mask; + new_val |= (~enabled_irq_mask interrupt_mask); + + if (new_val != dev_priv-pm_irq_mask) { + dev_priv-pm_irq_mask = new_val; + I915_WRITE(GEN8_GT_IMR(2), I915_READ(GEN8_GT_IMR(2)) | +dev_priv-pm_irq_mask); I just realized that this doesn't actually clear the old bits. I think we should just kill the RMW here and mandate that all users of GT_IMR(2) need to use this function. Also you're forgetting to initialize pm_irq_mask. I guess it should be initialized in gen8_gt_irq_postinstall() to match the gen6 code. + POSTING_READ(GEN8_GT_IMR(2)); + } +} + +void bdw_enable_pm_irq(struct drm_i915_private *dev_priv, uint32_t mask) +{ + bdw_update_pm_irq(dev_priv, mask, mask); +} + +void bdw_disable_pm_irq(struct drm_i915_private *dev_priv, uint32_t mask) +{ + bdw_update_pm_irq(dev_priv, mask, 0); +} + static bool cpt_can_enable_serr_int(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev-dev_private; @@ -1098,8 +1141,12 @@ static void gen6_pm_rps_work(struct work_struct *work) spin_lock_irq(dev_priv-irq_lock); pm_iir = dev_priv-rps.pm_iir; dev_priv-rps.pm_iir = 0; - /* Make sure not to corrupt PMIMR state used by ringbuffer code */ - snb_enable_pm_irq(dev_priv, dev_priv-pm_rps_events); + if (IS_BROADWELL(dev_priv-dev)) + bdw_enable_pm_irq(dev_priv, dev_priv-pm_rps_events); + else { + /* Make sure not to corrupt PMIMR state used by ringbuffer */ + snb_enable_pm_irq(dev_priv, dev_priv-pm_rps_events); + } spin_unlock_irq(dev_priv-irq_lock); /* Make sure we didn't queue anything we're not going to process. */ @@ -1296,6 +1343,19 @@ static void snb_gt_irq_handler(struct drm_device *dev, ivybridge_parity_error_irq_handler(dev, gt_iir); } +static void gen8_rps_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir) +{ + if ((pm_iir dev_priv-pm_rps_events) == 0) + return; + + spin_lock(dev_priv-irq_lock); + dev_priv-rps.pm_iir |=
Re: [Intel-gfx] [PATCH 02/10] drm/i915: Enable PM Interrupts target via Display Interface.
On Mon, May 05, 2014 at 06:17:31PM +0530, deepa...@linux.intel.com wrote: From: Deepak S deepa...@linux.intel.com In BDW, Apart from unmasking up/down threshold interrupts. we need to umask bit 32 of PM_INTRMASK to route interrupts to target via Display Interface. v2: Add (131) mask (Ville) Signed-off-by: Deepak S deepa...@linux.intel.com --- drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/intel_pm.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index ca4f8b9..c850254 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -5112,6 +5112,7 @@ enum punit_power_well { #define GEN6_RC6p_THRESHOLD 0xA0BC #define GEN6_RC6pp_THRESHOLD 0xA0C0 #define GEN6_PMINTRMSK 0xA168 +#define GEN8_PMINTR_REDIRECT_TO_NON_DISP (131) #define GEN6_PMISR 0x44020 #define GEN6_PMIMR 0x44024 /* rps_lock */ diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index 0e69c97..ebb5c88 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -3114,6 +3114,8 @@ static u32 gen6_rps_pm_mask(struct drm_i915_private *dev_priv, u8 val) if (INTEL_INFO(dev_priv-dev)-gen = 7 !IS_HASWELL(dev_priv-dev)) mask |= GEN6_PM_RP_UP_EI_EXPIRED; + mask |= GEN8_PMINTR_REDIRECT_TO_NON_DISP; + Didn't Ben want a gen check here? return ~mask; } -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Ville Syrjälä Intel OTC ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] tests/gem_flink_race, prime_self_import: fix object counts
On Wed, May 07, 2014 at 04:55:28PM +0300, Mika Kuoppala wrote: We need to add one drm_open_any() before getting the object counts as first call to drm_open_any() allocates file descriptors for exit handlers and thus is not symmetrical Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77867 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77875 Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com Ah, makes sense. Reviewed-by: Daniel Vetter daniel.vet...@ffwll.ch When you push this please add a citation of the commit which introduced this regression like we do it for kernel patches. Thanks, Daniel --- tests/gem_flink_race.c|8 tests/prime_self_import.c |8 2 files changed, 16 insertions(+) diff --git a/tests/gem_flink_race.c b/tests/gem_flink_race.c index 3353b83..f129cb3 100644 --- a/tests/gem_flink_race.c +++ b/tests/gem_flink_race.c @@ -163,6 +163,11 @@ static void test_flink_close(void) int r, i, num_threads; int obj_count; void *status; + int fake; + + /* Allocate exit handler fds in here so that we dont screw + * up the counts */ + fake = drm_open_any(); obj_count = get_object_count(); @@ -193,6 +198,9 @@ static void test_flink_close(void) obj_count = get_object_count() - obj_count; printf(leaked %i objects\n, obj_count); + + close(fake); + igt_assert(obj_count == 0); } diff --git a/tests/prime_self_import.c b/tests/prime_self_import.c index efdd24f..41c203f 100644 --- a/tests/prime_self_import.c +++ b/tests/prime_self_import.c @@ -258,6 +258,11 @@ static void test_reimport_close_race(void) int obj_count; void *status; uint32_t handle; + int fake; + + /* Allocate exit handler fds in here so that we dont screw + * up the counts */ + fake = drm_open_any(); obj_count = get_object_count(); @@ -294,6 +299,9 @@ static void test_reimport_close_race(void) obj_count = get_object_count() - obj_count; printf(leaked %i objects\n, obj_count); + + close(fake); + igt_assert(obj_count == 0); } -- 1.7.9.5 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] Design review request: DRM color manager
Hello all, As per previous discussions, I am sending the drm-color-manager design for review, as inline text. Please have a look and let me know your feedback. (I tried hard to align the inline text diagram in mail, hope you can see the proper picture too) = DRM Color Manager = - Color manager provides a common interface for all color correction / enhancement properties supported by various hardwares. - Color manager will be one Umbrella DRM property (blob type) - Driver can register the color correction properties of its HW during the init time, and all the corrections can be applied using the same interface. How does a driver register color properties with DRM color manager: --- - A DRM driver will call drm_register_color_manager() function with following information: .prop_set_handler .prop_get_hanlder .color_prop_identifier structure {porp_name, prop_id} {porp_name, prop_id} {porp_name, prop_id} For example: I915 driver can register as: .prop_set_handler = intel_clrmgr_set() .prop_get_hanlder = intel_clrmgr_get() .color_prop_identifier structure {gamma_correction, 0} {csc_correction, 1} {hue_saturation, 2} How will color_manager_set/get() functions work: - Color EDID: == || property || Enable/ || Pipe/Plane/ || No of|| || ID || Disable || Identifier || Data bytes ||data.. || 1 Byte || 1 Byte || 1 Byte || 1 Byte || == - Color EDID is a protocol to extract the color correction characterictics from a blob, coming from DRM layer as property_set_blob() or loading a blob for property_get_blob() - Userspace will load this blob with corresponding values and use the drm_set_blob(color-manager) interface. - DRM layer of get/set_color_manager() will validate the entry, extract the following information from blob - property - enable/disable - identifier - ptr to start of raw data - With this information, drm_get/set_color_manager() layer will call driver's .get/set_handler() which will do the correction using those values. - Based on the driver's requirement, user space can send any type of values in byte stream, like direct reg values, correction coefficients or any other form of data. Benefits of this common interface: -- - Single interface for all color properties. No need to create multiple properties. - HW agonist. Its in hands of driver to register the properties. - Any no of properties can be registered. - Can be clubbed with modeset implementations. Regards Shashank On 5/7/2014 7:41 PM, Sharma, Shashank wrote: FYI -Original Message- From: Sharma, Shashank Sent: Tuesday, April 22, 2014 8:32 PM To: Daniel Vetter Cc: David Herrmann; intel-gfx@lists.freedesktop.org; Cn, Ramakrishnan; Jindal, Sonika; Shankar, Uma Subject: RE: [Intel-gfx] Design review request: DRM color manager David, My apologies for starting a pre-mature design discussion. Daniel, Thanks for pointing out first two things, It was not known to me, I will take care of this in future. First time I presented color-manager design, in internal display design forum, where most of the reviewers were not there. We took the feedback from people who were present, and implemented the design. When we shared color manager implementation, that design was rejected and one of the feedbacks was that it would be better to discuss it on dri-devel where people outside Intel can give their opinion, and that’s the only reason why I added dri-devel for the new design (Please see the attached mail, I replied to all who were in last communication). Please let me know how do we want to proceed now. Regards Shashank -Original Message- From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel Vetter Sent: Tuesday, April 22, 2014 7:18 PM To: Sharma, Shashank Cc: David Herrmann; intel-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org; Thierry Reding; Cn, Ramakrishnan; Alex Deucher; Jindal, Sonika; Shankar, Uma Subject: Re: [Intel-gfx] Design review request: DRM color manager On Tue, Apr 22, 2014 at 12:07:41PM +, Sharma, Shashank wrote: Thanks again David, Comments inline. Three things: - Please don't send out .pptx files to upstream/public mailing lists, that's just not how the upstream community works. - Please either fix up ms outlook to do proper in-line quoting or switch to a proper mail client for discussions on dri-devel. I'm ok with this on intel-gfx to some extend since that's our own turf, but on dri-devel the
Re: [Intel-gfx] [PATCH 10/10] drm/i915/chv: Freq(opcode) request for CHV.
On Mon, May 05, 2014 at 06:17:39PM +0530, deepa...@linux.intel.com wrote: From: Deepak S deepa...@linux.intel.com On CHV, All the freq request should be even. So, we need to make sure we request the opcode accordingly. Signed-off-by: Deepak S deepa...@linux.intel.com Reviewed-by: Ben Widawsky b...@bwidawsk.net --- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_irq.c | 9 +++-- drivers/gpu/drm/i915/i915_reg.h | 3 +++ 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index e70a9f0..3966ff2 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1318,6 +1318,7 @@ struct drm_i915_private { u32 gt_irq_mask; u32 pm_irq_mask; u32 pm_rps_events; + u32 pm_rps_freq_req; u32 pipestat_irq_mask[I915_MAX_PIPES]; struct work_struct hotplug_work; diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 6af51ad..3e8bcca 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -1162,7 +1162,7 @@ static void gen6_pm_rps_work(struct work_struct *work) if (adj 0) adj *= 2; else - adj = 1; + adj = dev_priv-pm_rps_freq_req; new_delay = dev_priv-rps.cur_freq + adj; /* @@ -1181,7 +1181,7 @@ static void gen6_pm_rps_work(struct work_struct *work) if (adj 0) adj *= 2; else - adj = -1; + adj = -1 * dev_priv-pm_rps_freq_req; I'm not sure pm_rps_freq_req is very descriptive. Maybe it's better to just open code the thing and add a few comments. Eg.: /* CHV requires even values */ adj = IS_CHERRYVIEW(dev) ? 2 : 1; ... /* CHV requires even values */ adj = IS_CHERRYVIEW(dev) ? -2 : -1; new_delay = dev_priv-rps.cur_freq + adj; In any case this doesn't seem to enough to guarantee an even value. You'd also need to make sure efficient_freq, min_freq_softlimit and max_freq_softlimit are even. The alternative is to just make the final value even in valleyview_set_rps(), but then you might still trip these WARN_ON(val dev_priv-rps.max_freq_softlimit); WARN_ON(val dev_priv-rps.min_freq_softlimit); in there. So I guess it's back to option 1 and making sure all the limits are also even. Also leaving it up to valleyview_set_rps() could introduce a slight delay before the frequency starts to change since it could effectively ignore the first interrupt (depending on which way it's going and which we we would round the final value). } else { /* unknown event */ new_delay = dev_priv-rps.cur_freq; @@ -4088,6 +4088,11 @@ void intel_irq_init(struct drm_device *dev) /* Let's track the enabled rps events */ dev_priv-pm_rps_events = GEN6_PM_RPS_EVENTS; + if (IS_CHERRYVIEW(dev)) + dev_priv-pm_rps_freq_req = CHV_GPU_FREQ_REQ; + else + dev_priv-pm_rps_freq_req = GEN6_GPU_FREQ_REQ; + setup_timer(dev_priv-gpu_error.hangcheck_timer, i915_hangcheck_elapsed, (unsigned long) dev); diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 3ff34c4..4998d6b 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -529,6 +529,9 @@ enum punit_power_well { #define CHV_FB_RPE_FREQ_SHIFT8 #define CHV_FB_RPE_FREQ_MASK 0xff +#define CHV_GPU_FREQ_REQ 2 +#define GEN6_GPU_FREQ_REQ1 I don't think these defines buy us anything. Instead I think just adding a comment or two to where the even vs. odd magic happens should be a better option. + #define IOSF_NC_FB_GFX_FREQ_FUSE 0x1c #define FB_GFX_MAX_FREQ_FUSE_SHIFT 3 #define FB_GFX_MAX_FREQ_FUSE_MASK 0x07f8 -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Ville Syrjälä Intel OTC ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 4/9] drm/i915: Limit the number of node allocation retries
On Wed, May 07, 2014 at 09:49:57AM +0200, Daniel Vetter wrote: On Tue, May 06, 2014 at 10:21:33PM -0700, Ben Widawsky wrote: AFAICT, it's impossible to actually infinitely retry the allocation in our current code. However, a small oversight on my part, slight bug, or future bug, could easily change that. To avoid a potential breakage, a simple retry variable of 16 bits will help us prevent infinitely running. Retry is limited to 100 as a mostly random number. Some consideration about stack usage was taken into account. As an example, if we allowed 256 retries on a 32b arch (and my memory serves that all arguments are passed on the stack for such architectures), thats 33 bytes * 256 retries + (fudge for return address and such)... it's way more than we want to use already. 64b architectures might be slightly better, since 6? of the first args will get passed through registers, but it's still bad. If anything, we might want to do way less than 100, like 3. Signed-off-by: Ben Widawsky b...@bwidawsk.net So you replace the retry loop with a tailrecursive version in patch 2 and then add duct-tape later on here? Nope. -Daniel I feel like you're implying the retry loop will end, ever. The retry timeout should probably come first though, I Agree with that much. --- drivers/gpu/drm/i915/i915_gem.c | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index b6965a2..de98b26 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3216,6 +3216,7 @@ static void i915_gem_verify_gtt(struct drm_device *dev) #endif } +#define MAX_VMA_FIND_RETRY 100 static int i915_gem_find_vm_space(struct i915_address_space *vm, struct drm_mm_node *node, @@ -3224,7 +3225,8 @@ i915_gem_find_vm_space(struct i915_address_space *vm, unsigned long color, unsigned long start, unsigned long end, - uint32_t flags) + uint32_t flags, + uint8_t retry) { int ret; ret = drm_mm_insert_node_in_range_generic(vm-mm, node, @@ -3232,7 +3234,7 @@ i915_gem_find_vm_space(struct i915_address_space *vm, start, end, DRM_MM_SEARCH_DEFAULT, DRM_MM_CREATE_DEFAULT); - if (ret) { + if (ret (retry MAX_VMA_FIND_RETRY)) { if (WARN_ON(ret != -ENOSPC)) return ret; @@ -3241,7 +3243,8 @@ i915_gem_find_vm_space(struct i915_address_space *vm, if (ret == 0) return i915_gem_find_vm_space(vm, node, size, align, color, - start, end, flags); + start, end, flags, + retry++); } return ret; @@ -3306,8 +3309,9 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj, if (IS_ERR(vma)) goto err_unpin; - ret = i915_gem_find_vm_space(vm, vma-node, size, alignment, -obj-cache_level, 0, gtt_max, flags); + ret = i915_gem_find_vm_space(vm, vma-node, +size, alignment, obj-cache_level, +0, gtt_max, flags, 1); if (ret) goto err_free_vma; -- 1.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- Ben Widawsky, Intel Open Source Technology Center ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] UXA: Wait until a pageflip actually completes to report it.
UXA was reporting page-flip completion as soon as the flip was scheduled with the kernel, instead of waiting for the kernel to indicate that the flip had actually completed. Moving the DRI2SwapComplete call to the right place fixes all of our Piglit tests for OML_sync_control when run on xf86-video-intel/UXA, aside from a bit of difficult-to-reproduce flakiness when using a divisor 1. This also eliminates a compile-time and run-time warning when built against an xserver with Warn on DRI2SwapComplete with constant UST/MSC applied. Signed-off-by: Jamey Sharp ja...@minilop.net Cc: Theo Hill theo0...@gmail.com Cc: Eric Anholt e...@anholt.net --- src/uxa/intel_dri.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/src/uxa/intel_dri.c b/src/uxa/intel_dri.c index ca58052..3745767 100644 --- a/src/uxa/intel_dri.c +++ b/src/uxa/intel_dri.c @@ -932,10 +932,6 @@ I830DRI2ScheduleFlip(struct intel_screen_private *intel, /* Then flip DRI2 pointers and update the screen pixmap */ I830DRI2ExchangeBuffers(intel, info-front, info-back); - DRI2SwapComplete(info-client, draw, 0, 0, 0, -DRI2_EXCHANGE_COMPLETE, -info-event_complete, -info-event_data); return TRUE; } @@ -1090,6 +1086,12 @@ void I830DRI2FlipEventHandler(unsigned int frame, unsigned int tv_sec, assert(intel-pending_flip[flip_info-pipe] == flip_info); intel-pending_flip[flip_info-pipe] = NULL; + DRI2SwapComplete(flip_info-client, drawable, +frame, tv_sec, tv_usec, +DRI2_EXCHANGE_COMPLETE, +flip_info-event_complete, +flip_info-event_data); + chain = flip_info-chain; if (chain) { DrawablePtr chain_drawable = NULL; -- 1.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/9] drm/i915: Extract node allocation from bind
On Wed, May 07, 2014 at 08:02:38AM +0100, Chris Wilson wrote: On Tue, May 06, 2014 at 10:21:31PM -0700, Ben Widawsky wrote: The DRM node allocation code was already a bit of an ugly bit of code within a complex function. Removing it serves the purpose of cleaning the function up. More importantly, it provides a way to have a preallocated (address space) VMA to easily skip this code. Something we're very likely to need. This is essentially a wrapper for DRM node allocation with an eviction + retry. It is something which could be moved to core DRM eventually, if other drivers had similar eviction semantics. This removes a goto used for something other than error unwinding, a generally reviled (I am neutral) practice, and replaces it with a function call to itself that should have the same effect. Note that it's not a recursive function as all the problem set reduction is done in the eviction code. Some might say this change is worse than before because we are using stack for each subsequent call. Frankly, I'd rather overflow the stack and blow it up than loop forever. In either case, this is addressed in the next patch. I believe, and intend, that other than the stack usage, there is no functional change here. Nope, but decoupling evict_something() further does introduce lots more implied magic. -Chris Hmm. I really don't see what's actually upsetting. Can you be a bit more explicit about what's so bothersome to you for my edification? -- Ben Widawsky, Intel Open Source Technology Center ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/9] drm/i915: Extract node allocation from bind
On Wed, May 07, 2014 at 08:45:38AM -0700, Ben Widawsky wrote: On Wed, May 07, 2014 at 08:02:38AM +0100, Chris Wilson wrote: On Tue, May 06, 2014 at 10:21:31PM -0700, Ben Widawsky wrote: The DRM node allocation code was already a bit of an ugly bit of code within a complex function. Removing it serves the purpose of cleaning the function up. More importantly, it provides a way to have a preallocated (address space) VMA to easily skip this code. Something we're very likely to need. This is essentially a wrapper for DRM node allocation with an eviction + retry. It is something which could be moved to core DRM eventually, if other drivers had similar eviction semantics. This removes a goto used for something other than error unwinding, a generally reviled (I am neutral) practice, and replaces it with a function call to itself that should have the same effect. Note that it's not a recursive function as all the problem set reduction is done in the eviction code. Some might say this change is worse than before because we are using stack for each subsequent call. Frankly, I'd rather overflow the stack and blow it up than loop forever. In either case, this is addressed in the next patch. I believe, and intend, that other than the stack usage, there is no functional change here. Nope, but decoupling evict_something() further does introduce lots more implied magic. -Chris Hmm. I really don't see what's actually upsetting. Can you be a bit more explicit about what's so bothersome to you for my edification? evict_something() make assumptions about the ranges of the vm which it searches. At the moment, they mirror its parent's function. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 6/9] drm/i915: Wrap VMA binding
On Wed, May 07, 2014 at 09:55:49AM +0200, Daniel Vetter wrote: On Tue, May 06, 2014 at 10:21:35PM -0700, Ben Widawsky wrote: This will be useful for some upcoming patches which do more platform specific work. Having it in one central place just makes things a bit cleaner and easier. There is a small functional change here. There are more calls to the tracepoints. NOTE: I didn't actually end up using this patch for the intended purpose, but I thought it was a nice patch to keep around. v2: s/i915_gem_bind_vma/i915_gem_vma_bind/ s/i915_gem_unbind_vma/i915_gem_vma_unbind/ (Chris) v3: Missed one spot Signed-off-by: Ben Widawsky b...@bwidawsk.net You change the semantics of the vma bind/unbind tracepoints - previously they've meant we reserve/unreserve a ppgtt address for an object, not they're called every time we touch the ptes (which is what vma_bind|unbind actually do). That doesn't look too terribly useful for tracing any more. What's the use case for these added tracepoints? -Daniel You're right. The tracepoint is incredibly useful for debug purposes, but not as an actual tracepoint. The current tracepoint is badly named, IMO, but that's fine. I am not sure it's worth fixing though, since nobody has really spoken up for the patch. --- drivers/gpu/drm/i915/i915_drv.h| 3 +++ drivers/gpu/drm/i915/i915_gem.c| 13 ++--- drivers/gpu/drm/i915/i915_gem_context.c| 2 +- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +++-- drivers/gpu/drm/i915/i915_gem_gtt.c| 16 ++-- 5 files changed, 27 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index c1d2bea..30cde3d 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2264,6 +2264,9 @@ bool i915_gem_obj_bound(struct drm_i915_gem_object *o, struct i915_address_space *vm); unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o, struct i915_address_space *vm); +void i915_gem_vma_bind(struct i915_vma *vma, enum i915_cache_level, + unsigned flags); +void i915_gem_vma_unbind(struct i915_vma *vma); struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj, struct i915_address_space *vm); struct i915_vma * diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index e9599e9..1567911 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2771,7 +2771,7 @@ int i915_vma_unbind(struct i915_vma *vma) trace_i915_vma_unbind(vma); - vma-unbind_vma(vma); + i915_gem_vma_unbind(vma); i915_gem_gtt_finish_object(obj); @@ -3342,9 +3342,8 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj, WARN_ON(flags PIN_MAPPABLE !obj-map_and_fenceable); - trace_i915_vma_bind(vma, flags); - vma-bind_vma(vma, obj-cache_level, - flags (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 0); + i915_gem_vma_bind(vma, obj-cache_level, + flags (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 0); i915_gem_verify_gtt(dev); return vma; @@ -3548,8 +3547,8 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj, list_for_each_entry(vma, obj-vma_list, vma_link) if (drm_mm_node_allocated(vma-node)) - vma-bind_vma(vma, cache_level, - obj-has_global_gtt_mapping ? GLOBAL_BIND : 0); + i915_gem_vma_bind(vma, cache_level, + obj-has_global_gtt_mapping ? GLOBAL_BIND : 0); } list_for_each_entry(vma, obj-vma_list, vma_link) @@ -3914,7 +3913,7 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj, } if (flags PIN_GLOBAL !obj-has_global_gtt_mapping) - vma-bind_vma(vma, obj-cache_level, GLOBAL_BIND); + i915_gem_vma_bind(vma, obj-cache_level, GLOBAL_BIND); vma-pin_count++; if (flags PIN_MAPPABLE) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index f77b4c1..00c7d88 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -655,7 +655,7 @@ static int do_switch(struct intel_ring_buffer *ring, if (!to-obj-has_global_gtt_mapping) { struct i915_vma *vma = i915_gem_obj_to_vma(to-obj, dev_priv-gtt.base); - vma-bind_vma(vma, to-obj-cache_level, GLOBAL_BIND); + i915_gem_vma_bind(vma, to-obj-cache_level, GLOBAL_BIND); } if (!to-is_initialized || i915_gem_context_is_default(to))
Re: [Intel-gfx] [PATCH] UXA: Wait until a pageflip actually completes to report it.
On Mon, May 05, 2014 at 11:05:07PM -0700, Jamey Sharp wrote: UXA was reporting page-flip completion as soon as the flip was scheduled with the kernel, instead of waiting for the kernel to indicate that the flip had actually completed. Moving the DRI2SwapComplete call to the right place fixes all of our Piglit tests for OML_sync_control when run on xf86-video-intel/UXA, aside from a bit of difficult-to-reproduce flakiness when using a divisor 1. The violation is intentional, as it gives us triple buffering by default. It can be disabled. Also you could do an improved version for recent Xorg, still using DRI2. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 6/9] drm/i915: Wrap VMA binding
On Wed, May 07, 2014 at 08:54:56AM -0700, Ben Widawsky wrote: On Wed, May 07, 2014 at 09:55:49AM +0200, Daniel Vetter wrote: On Tue, May 06, 2014 at 10:21:35PM -0700, Ben Widawsky wrote: This will be useful for some upcoming patches which do more platform specific work. Having it in one central place just makes things a bit cleaner and easier. There is a small functional change here. There are more calls to the tracepoints. NOTE: I didn't actually end up using this patch for the intended purpose, but I thought it was a nice patch to keep around. v2: s/i915_gem_bind_vma/i915_gem_vma_bind/ s/i915_gem_unbind_vma/i915_gem_vma_unbind/ (Chris) v3: Missed one spot Signed-off-by: Ben Widawsky b...@bwidawsk.net You change the semantics of the vma bind/unbind tracepoints - previously they've meant we reserve/unreserve a ppgtt address for an object, not they're called every time we touch the ptes (which is what vma_bind|unbind actually do). That doesn't look too terribly useful for tracing any more. What's the use case for these added tracepoints? -Daniel You're right. The tracepoint is incredibly useful for debug purposes, but not as an actual tracepoint. The current tracepoint is badly named, IMO, but that's fine. I am not sure it's worth fixing though, since nobody has really spoken up for the patch. Yeah I think a new tracepoint we use every time we frob ptes could be useful indeed. But conflating that operation with the vm operation isn't good, especially since the vm operation lacks crucial information like the actual caching bits we're using with the ptes. -Daniel --- drivers/gpu/drm/i915/i915_drv.h| 3 +++ drivers/gpu/drm/i915/i915_gem.c| 13 ++--- drivers/gpu/drm/i915/i915_gem_context.c| 2 +- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +++-- drivers/gpu/drm/i915/i915_gem_gtt.c| 16 ++-- 5 files changed, 27 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index c1d2bea..30cde3d 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2264,6 +2264,9 @@ bool i915_gem_obj_bound(struct drm_i915_gem_object *o, struct i915_address_space *vm); unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o, struct i915_address_space *vm); +void i915_gem_vma_bind(struct i915_vma *vma, enum i915_cache_level, +unsigned flags); +void i915_gem_vma_unbind(struct i915_vma *vma); struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj, struct i915_address_space *vm); struct i915_vma * diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index e9599e9..1567911 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2771,7 +2771,7 @@ int i915_vma_unbind(struct i915_vma *vma) trace_i915_vma_unbind(vma); - vma-unbind_vma(vma); + i915_gem_vma_unbind(vma); i915_gem_gtt_finish_object(obj); @@ -3342,9 +3342,8 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj, WARN_ON(flags PIN_MAPPABLE !obj-map_and_fenceable); - trace_i915_vma_bind(vma, flags); - vma-bind_vma(vma, obj-cache_level, - flags (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 0); + i915_gem_vma_bind(vma, obj-cache_level, + flags (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 0); i915_gem_verify_gtt(dev); return vma; @@ -3548,8 +3547,8 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj, list_for_each_entry(vma, obj-vma_list, vma_link) if (drm_mm_node_allocated(vma-node)) - vma-bind_vma(vma, cache_level, - obj-has_global_gtt_mapping ? GLOBAL_BIND : 0); + i915_gem_vma_bind(vma, cache_level, + obj-has_global_gtt_mapping ? GLOBAL_BIND : 0); } list_for_each_entry(vma, obj-vma_list, vma_link) @@ -3914,7 +3913,7 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj, } if (flags PIN_GLOBAL !obj-has_global_gtt_mapping) - vma-bind_vma(vma, obj-cache_level, GLOBAL_BIND); + i915_gem_vma_bind(vma, obj-cache_level, GLOBAL_BIND); vma-pin_count++; if (flags PIN_MAPPABLE) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index f77b4c1..00c7d88 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -655,7 +655,7 @@ static int do_switch(struct intel_ring_buffer *ring,
Re: [Intel-gfx] [PATCH 2/9] drm/i915: Extract node allocation from bind
On Wed, May 07, 2014 at 04:53:08PM +0100, Chris Wilson wrote: On Wed, May 07, 2014 at 08:45:38AM -0700, Ben Widawsky wrote: On Wed, May 07, 2014 at 08:02:38AM +0100, Chris Wilson wrote: On Tue, May 06, 2014 at 10:21:31PM -0700, Ben Widawsky wrote: The DRM node allocation code was already a bit of an ugly bit of code within a complex function. Removing it serves the purpose of cleaning the function up. More importantly, it provides a way to have a preallocated (address space) VMA to easily skip this code. Something we're very likely to need. This is essentially a wrapper for DRM node allocation with an eviction + retry. It is something which could be moved to core DRM eventually, if other drivers had similar eviction semantics. This removes a goto used for something other than error unwinding, a generally reviled (I am neutral) practice, and replaces it with a function call to itself that should have the same effect. Note that it's not a recursive function as all the problem set reduction is done in the eviction code. Some might say this change is worse than before because we are using stack for each subsequent call. Frankly, I'd rather overflow the stack and blow it up than loop forever. In either case, this is addressed in the next patch. I believe, and intend, that other than the stack usage, there is no functional change here. Nope, but decoupling evict_something() further does introduce lots more implied magic. -Chris Hmm. I really don't see what's actually upsetting. Can you be a bit more explicit about what's so bothersome to you for my edification? evict_something() make assumptions about the ranges of the vm which it searches. At the moment, they mirror its parent's function. -Chris Ah, thanks. So is plumbing starting eviction offset into evict something an appealing solution here? -- Chris Wilson, Intel Open Source Technology Centre -- Ben Widawsky, Intel Open Source Technology Center ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Use hash tables for the command parser
Could someone help to review this patch please? It provides a nice improvement to the command parser's performance, so I'd like to get this one in. Thanks, Brad On Mon, Apr 28, 2014 at 08:22:08AM -0700, Volkin, Bradley D wrote: From: Brad Volkin bradley.d.vol...@intel.com For clients that submit large batch buffers the command parser has a substantial impact on performance. On my HSW ULT system performance drops as much as ~20% on some tests. Most of the time is spent in the command lookup code. Converting that from the current naive search to a hash table lookup reduces the performance impact by as much as ~10%. The choice of value for I915_CMD_HASH_ORDER allows all commands currently used in the parser tables to hash to their own bucket (except for one collision on the render ring). The tradeoff is that it wastes memory. Because the opcodes for the commands in the tables are not particularly well distributed, reducing the order still leaves many buckets empty. The increased collisions don't seem to have a huge impact on the performance gain, but for now anyhow, the parser trades memory for performance. Signed-off-by: Brad Volkin bradley.d.vol...@intel.com --- drivers/gpu/drm/i915/i915_cmd_parser.c | 136 drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/intel_ringbuffer.c | 2 + drivers/gpu/drm/i915/intel_ringbuffer.h | 11 ++- 4 files changed, 116 insertions(+), 34 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index 9bac097..9dca899 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -498,16 +498,18 @@ static u32 gen7_blt_get_cmd_length_mask(u32 cmd_header) return 0; } -static bool validate_cmds_sorted(struct intel_ring_buffer *ring) +static bool validate_cmds_sorted(struct intel_ring_buffer *ring, + const struct drm_i915_cmd_table *cmd_tables, + int cmd_table_count) { int i; bool ret = true; - if (!ring-cmd_tables || ring-cmd_table_count == 0) + if (!cmd_tables || cmd_table_count == 0) return true; - for (i = 0; i ring-cmd_table_count; i++) { - const struct drm_i915_cmd_table *table = ring-cmd_tables[i]; + for (i = 0; i cmd_table_count; i++) { + const struct drm_i915_cmd_table *table = cmd_tables[i]; u32 previous = 0; int j; @@ -557,6 +559,60 @@ static bool validate_regs_sorted(struct intel_ring_buffer *ring) ring-master_reg_count); } +struct cmd_node { + const struct drm_i915_cmd_descriptor *desc; + struct hlist_node node; +}; + +/* + * Different command ranges have different numbers of bits for the opcode. + * In order to use the opcode bits, and only the opcode bits, for the hash key + * we should use the MI_* command opcode mask (since those commands use the + * fewest bits for the opcode.) + */ +#define CMD_HASH_MASK STD_MI_OPCODE_MASK + +static int init_hash_table(struct intel_ring_buffer *ring, +const struct drm_i915_cmd_table *cmd_tables, +int cmd_table_count) +{ + int i, j; + + hash_init(ring-cmd_hash); + + for (i = 0; i cmd_table_count; i++) { + const struct drm_i915_cmd_table *table = cmd_tables[i]; + + for (j = 0; j table-count; j++) { + const struct drm_i915_cmd_descriptor *desc = + table-table[j]; + struct cmd_node *desc_node = + kmalloc(sizeof(*desc_node), GFP_KERNEL); + + if (!desc_node) + return -ENOMEM; + + desc_node-desc = desc; + hash_add(ring-cmd_hash, desc_node-node, + desc-cmd.value CMD_HASH_MASK); + } + } + + return 0; +} + +static void fini_hash_table(struct intel_ring_buffer *ring) +{ + struct hlist_node *tmp; + struct cmd_node *desc_node; + int i; + + hash_for_each_safe(ring-cmd_hash, i, tmp, desc_node, node) { + hash_del(desc_node-node); + kfree(desc_node); + } +} + /** * i915_cmd_parser_init_ring() - set cmd parser related fields for a ringbuffer * @ring: the ringbuffer to initialize @@ -567,18 +623,21 @@ static bool validate_regs_sorted(struct intel_ring_buffer *ring) */ void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring) { + const struct drm_i915_cmd_table *cmd_tables; + int cmd_table_count; + if (!IS_GEN7(ring-dev)) return; switch (ring-id) { case RCS: if (IS_HASWELL(ring-dev)) { - ring-cmd_tables =
Re: [Intel-gfx] [PATCH 2/9] drm/i915: Extract node allocation from bind
On Wed, May 07, 2014 at 09:00:16AM -0700, Ben Widawsky wrote: On Wed, May 07, 2014 at 04:53:08PM +0100, Chris Wilson wrote: On Wed, May 07, 2014 at 08:45:38AM -0700, Ben Widawsky wrote: Hmm. I really don't see what's actually upsetting. Can you be a bit more explicit about what's so bothersome to you for my edification? evict_something() make assumptions about the ranges of the vm which it searches. At the moment, they mirror its parent's function. Ah, thanks. So is plumbing starting eviction offset into evict something an appealing solution here? Yes. I have to admit to not being overly sold on the code migration yet. I guess you have an ulterior motive... Evictable vm? -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v2] drm/i915: remove user GTT mappings early during runtime suspend
Currently user space can access GEM buffers mapped to GTT through existing mappings concurrently while the platform specific suspend handlers are running. Since these handlers may change the HW state in a way that would break such accesses, remove the mappings before calling the handlers. Spotted by Ville. Also Chris pointed out that the lists that i915_gem_release_all_mmaps() walks through need dev-struct_mutex, so take this lock. There is a potential deadlock against a concurrent RPM resume, resolve this by aborting and rescheduling the suspend (Daniel). v2: - take struct_mutex around i915_gem_release_all_mmaps() (Chris, Daniel) Signed-off-by: Imre Deak imre.d...@intel.com --- drivers/gpu/drm/i915/i915_drv.c | 27 +-- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 4024e16..0c9858c 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -36,6 +36,7 @@ #include linux/console.h #include linux/module.h +#include linux/pm_runtime.h #include drm/drm_crtc_helper.h static struct drm_driver driver; @@ -1315,6 +1316,30 @@ static int intel_runtime_suspend(struct device *device) DRM_DEBUG_KMS(Suspending device\n); /* +* We could deadlock here in case another thread holding struct_mutex +* calls RPM suspend concurrently, since the RPM suspend will wait +* first for this RPM suspend to finish. In this case the concurrent +* RPM resume will be followed by its RPM suspend counterpart. Still +* for consistency return -EAGAIN, which will reschedule this suspend. +*/ + if (!mutex_trylock(dev-struct_mutex)) { + DRM_DEBUG_KMS(device lock contention, deffering suspend\n); + /* +* Bump the expiration timestamp, otherwise the suspend won't +* be rescheduled. +*/ + pm_runtime_mark_last_busy(device); + + return -EAGAIN; + } + /* +* We are safe here against re-faults, since the fault handler takes +* an RPM reference. +*/ + i915_gem_release_all_mmaps(dev_priv); + mutex_unlock(dev-struct_mutex); + + /* * rps.work can't be rearmed here, since we get here only after making * sure the GPU is idle and the RPS freq is set to the minimum. See * intel_mark_idle(). @@ -1340,8 +1365,6 @@ static int intel_runtime_suspend(struct device *device) return ret; } - i915_gem_release_all_mmaps(dev_priv); - del_timer_sync(dev_priv-gpu_error.hangcheck_timer); dev_priv-pm.suspended = true; -- 1.8.4 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 08/13] drm/i915: Implement MI decode for gen8
On Wed, Apr 30, 2014 at 02:21:15PM +0300, Ville Syrjälä wrote: On Tue, Apr 29, 2014 at 02:52:35PM -0700, Ben Widawsky wrote: From: Ben Widawsky benjamin.widaw...@linux.intel.com This is needed to implement ipehr_is_semaphore_wait Signed-off-by: Ben Widawsky b...@bwidawsk.net --- drivers/gpu/drm/i915/i915_irq.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 2d76183..bfd21c7 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2561,12 +2561,9 @@ static bool ipehr_is_semaphore_wait(struct drm_device *dev, u32 ipehr) { if (INTEL_INFO(dev)-gen = 8) { - /* -* FIXME: gen8 semaphore support - currently we don't emit -* semaphores on bdw anyway, but this needs to be addressed when -* we merge that code. -*/ - return false; + /* Broadwell's semaphore wait is 3 dwords. We hope IPEHR is the +* first dword. */ + return (ipehr 23) == 0x1c; } else { ipehr = ~MI_SEMAPHORE_SYNC_MASK; return ipehr == (MI_SEMAPHORE_MBOX | MI_SEMAPHORE_COMPARE | @@ -2586,6 +2583,8 @@ semaphore_wait_to_signaller_ring(struct intel_ring_buffer *ring, u32 ipehr) * FIXME: gen8 semaphore support - currently we don't emit * semaphores on bdw anyway, but this needs to be addressed when * we merge that code. +* +* XXX: Gen8 needs more than just IPEHR. */ I believe something like this should take care of the remaining gap. Thanks for the help. At this point I just want to get the damn thing merged, and want to avoid any extra risk. Would you mind resending this as a distinct patch (authored by you), after we get the rest merged? diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 2446e61..cd1069e 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2590,19 +2590,21 @@ ipehr_is_semaphore_wait(struct drm_device *dev, u32 ipehr) } static struct intel_ring_buffer * -semaphore_wait_to_signaller_ring(struct intel_ring_buffer *ring, u32 ipehr) +semaphore_wait_to_signaller_ring(struct intel_ring_buffer *ring, + u32 ipehr, u64 offset) { struct drm_i915_private *dev_priv = ring-dev-dev_private; struct intel_ring_buffer *signaller; int i; if (INTEL_INFO(dev_priv-dev)-gen = 8) { - /* - * FIXME: gen8 semaphore support - currently we don't emit - * semaphores on bdw anyway, but this needs to be addressed when - * we merge that code. - */ - return NULL; + for_each_ring(signaller, dev_priv, i) { + if (ring == signaller) + continue; + + if (offset == signaller-semaphore.signal_gtt[ring-id]) + return signaller; + } } else { u32 sync_bits = ipehr MI_SEMAPHORE_SYNC_MASK; @@ -2627,6 +2629,7 @@ semaphore_waits_for(struct intel_ring_buffer *ring, u32 *seqno) { struct drm_i915_private *dev_priv = ring-dev-dev_private; u32 cmd, ipehr, head; + u64 offset = 0; int i; ipehr = I915_READ(RING_IPEHR(ring-mmio_base)); @@ -2662,7 +2665,12 @@ semaphore_waits_for(struct intel_ring_buffer *ring, u32 *seqno) return NULL; *seqno = ioread32(ring-virtual_start + head + 4) + 1; - return semaphore_wait_to_signaller_ring(ring, ipehr); + if (INTEL_INFO(dev_priv-dev)-gen = 8) { + offset = ioread32(ring-virtual_start + head + 12); + offset = 32; + offset |= ioread32(ring-virtual_start + head + 8); + } + return semaphore_wait_to_signaller_ring(ring, ipehr, offset); } static int semaphore_passed(struct intel_ring_buffer *ring) -- 1.8.3.2 return NULL; } else { -- 1.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Ville Syrjälä Intel OTC ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Ben Widawsky, Intel Open Source Technology Center ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 08/13] drm/i915: Implement MI decode for gen8
On Wed, May 07, 2014 at 09:59:14AM -0700, Ben Widawsky wrote: On Wed, Apr 30, 2014 at 02:21:15PM +0300, Ville Syrjälä wrote: On Tue, Apr 29, 2014 at 02:52:35PM -0700, Ben Widawsky wrote: From: Ben Widawsky benjamin.widaw...@linux.intel.com This is needed to implement ipehr_is_semaphore_wait Signed-off-by: Ben Widawsky b...@bwidawsk.net --- drivers/gpu/drm/i915/i915_irq.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 2d76183..bfd21c7 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2561,12 +2561,9 @@ static bool ipehr_is_semaphore_wait(struct drm_device *dev, u32 ipehr) { if (INTEL_INFO(dev)-gen = 8) { - /* - * FIXME: gen8 semaphore support - currently we don't emit - * semaphores on bdw anyway, but this needs to be addressed when - * we merge that code. - */ - return false; + /* Broadwell's semaphore wait is 3 dwords. We hope IPEHR is the + * first dword. */ + return (ipehr 23) == 0x1c; } else { ipehr = ~MI_SEMAPHORE_SYNC_MASK; return ipehr == (MI_SEMAPHORE_MBOX | MI_SEMAPHORE_COMPARE | @@ -2586,6 +2583,8 @@ semaphore_wait_to_signaller_ring(struct intel_ring_buffer *ring, u32 ipehr) * FIXME: gen8 semaphore support - currently we don't emit * semaphores on bdw anyway, but this needs to be addressed when * we merge that code. + * + * XXX: Gen8 needs more than just IPEHR. */ I believe something like this should take care of the remaining gap. Thanks for the help. At this point I just want to get the damn thing merged, and want to avoid any extra risk. Would you mind resending this as a distinct patch (authored by you), after we get the rest merged? Can do. diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 2446e61..cd1069e 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2590,19 +2590,21 @@ ipehr_is_semaphore_wait(struct drm_device *dev, u32 ipehr) } static struct intel_ring_buffer * -semaphore_wait_to_signaller_ring(struct intel_ring_buffer *ring, u32 ipehr) +semaphore_wait_to_signaller_ring(struct intel_ring_buffer *ring, +u32 ipehr, u64 offset) { struct drm_i915_private *dev_priv = ring-dev-dev_private; struct intel_ring_buffer *signaller; int i; if (INTEL_INFO(dev_priv-dev)-gen = 8) { - /* -* FIXME: gen8 semaphore support - currently we don't emit -* semaphores on bdw anyway, but this needs to be addressed when -* we merge that code. -*/ - return NULL; + for_each_ring(signaller, dev_priv, i) { + if (ring == signaller) + continue; + + if (offset == signaller-semaphore.signal_gtt[ring-id]) + return signaller; + } } else { u32 sync_bits = ipehr MI_SEMAPHORE_SYNC_MASK; @@ -2627,6 +2629,7 @@ semaphore_waits_for(struct intel_ring_buffer *ring, u32 *seqno) { struct drm_i915_private *dev_priv = ring-dev-dev_private; u32 cmd, ipehr, head; + u64 offset = 0; int i; ipehr = I915_READ(RING_IPEHR(ring-mmio_base)); @@ -2662,7 +2665,12 @@ semaphore_waits_for(struct intel_ring_buffer *ring, u32 *seqno) return NULL; *seqno = ioread32(ring-virtual_start + head + 4) + 1; - return semaphore_wait_to_signaller_ring(ring, ipehr); + if (INTEL_INFO(dev_priv-dev)-gen = 8) { + offset = ioread32(ring-virtual_start + head + 12); + offset = 32; + offset |= ioread32(ring-virtual_start + head + 8); + } + return semaphore_wait_to_signaller_ring(ring, ipehr, offset); } static int semaphore_passed(struct intel_ring_buffer *ring) -- 1.8.3.2 return NULL; } else { -- 1.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Ville Syrjälä Intel OTC ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Ben Widawsky, Intel Open Source Technology Center -- Ville Syrjälä Intel OTC ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2] drm/i915: remove user GTT mappings early during runtime suspend
On Wed, 2014-05-07 at 19:57 +0300, Imre Deak wrote: Currently user space can access GEM buffers mapped to GTT through existing mappings concurrently while the platform specific suspend handlers are running. Since these handlers may change the HW state in a way that would break such accesses, remove the mappings before calling the handlers. Spotted by Ville. Also Chris pointed out that the lists that i915_gem_release_all_mmaps() walks through need dev-struct_mutex, so take this lock. There is a potential deadlock against a concurrent RPM resume, resolve this by aborting and rescheduling the suspend (Daniel). v2: - take struct_mutex around i915_gem_release_all_mmaps() (Chris, Daniel) Signed-off-by: Imre Deak imre.d...@intel.com --- drivers/gpu/drm/i915/i915_drv.c | 27 +-- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 4024e16..0c9858c 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -36,6 +36,7 @@ #include linux/console.h #include linux/module.h +#include linux/pm_runtime.h #include drm/drm_crtc_helper.h static struct drm_driver driver; @@ -1315,6 +1316,30 @@ static int intel_runtime_suspend(struct device *device) DRM_DEBUG_KMS(Suspending device\n); /* + * We could deadlock here in case another thread holding struct_mutex + * calls RPM suspend concurrently, since the RPM suspend will wait resume^ resume^ + * first for this RPM suspend to finish. In this case the concurrent + * RPM resume will be followed by its RPM suspend counterpart. Still + * for consistency return -EAGAIN, which will reschedule this suspend. + */ + if (!mutex_trylock(dev-struct_mutex)) { + DRM_DEBUG_KMS(device lock contention, deffering suspend\n); + /* + * Bump the expiration timestamp, otherwise the suspend won't + * be rescheduled. + */ + pm_runtime_mark_last_busy(device); + + return -EAGAIN; + } + /* + * We are safe here against re-faults, since the fault handler takes + * an RPM reference. + */ + i915_gem_release_all_mmaps(dev_priv); + mutex_unlock(dev-struct_mutex); + + /* * rps.work can't be rearmed here, since we get here only after making * sure the GPU is idle and the RPS freq is set to the minimum. See * intel_mark_idle(). @@ -1340,8 +1365,6 @@ static int intel_runtime_suspend(struct device *device) return ret; } - i915_gem_release_all_mmaps(dev_priv); - del_timer_sync(dev_priv-gpu_error.hangcheck_timer); dev_priv-pm.suspended = true; signature.asc Description: This is a digitally signed message part ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 05/10] drm/i915: Implement MI decode for gen8
From: Ben Widawsky benjamin.widaw...@linux.intel.com This is needed to implement ipehr_is_semaphore_wait Signed-off-by: Ben Widawsky b...@bwidawsk.net --- drivers/gpu/drm/i915/i915_irq.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 2d76183..bfd21c7 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2561,12 +2561,9 @@ static bool ipehr_is_semaphore_wait(struct drm_device *dev, u32 ipehr) { if (INTEL_INFO(dev)-gen = 8) { - /* -* FIXME: gen8 semaphore support - currently we don't emit -* semaphores on bdw anyway, but this needs to be addressed when -* we merge that code. -*/ - return false; + /* Broadwell's semaphore wait is 3 dwords. We hope IPEHR is the +* first dword. */ + return (ipehr 23) == 0x1c; } else { ipehr = ~MI_SEMAPHORE_SYNC_MASK; return ipehr == (MI_SEMAPHORE_MBOX | MI_SEMAPHORE_COMPARE | @@ -2586,6 +2583,8 @@ semaphore_wait_to_signaller_ring(struct intel_ring_buffer *ring, u32 ipehr) * FIXME: gen8 semaphore support - currently we don't emit * semaphores on bdw anyway, but this needs to be addressed when * we merge that code. +* +* XXX: Gen8 needs more than just IPEHR. */ return NULL; } else { -- 1.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 03/10] drm/i915/bdw: implement semaphore signal
Semaphore signalling works similarly to previous GENs with the exception that the per ring mailboxes no longer exist. Instead you must define your own space, somewhere in the GTT. The comments in the code define the layout I've opted for, which should be fairly future proof. Ie. I tried to define offsets in abstract terms (NUM_RINGS, seqno size, etc). NOTE: If one wanted to move this to the HWSP they could. I've decided one 4k object would be easier to deal with, and provide potential wins with cache locality, but that's all speculative. v2: Update the macro to not need the other ring's ring-id (Chris) Update the comment to use the correct formula (Chris) v3: Move the macros the ringbuffer.h to prevent churn in next patch (Ville) v4: Fixed compilation rebase conflict commit 1ec9e26ddab06459e89a890431b2de064c5d1056 Author: Daniel Vetter daniel.vet...@ffwll.ch Date: Fri Feb 14 14:01:11 2014 +0100 drm/i915: Consolidate binding parameters into flags v5: VCS2 rebase Replace hweight_long with hweight32 Signed-off-by: Ben Widawsky b...@bwidawsk.net --- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_reg.h | 5 +- drivers/gpu/drm/i915/intel_ringbuffer.c | 166 ++-- drivers/gpu/drm/i915/intel_ringbuffer.h | 78 +-- 4 files changed, 190 insertions(+), 60 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 26253c0..1ede818 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1355,6 +1355,7 @@ struct drm_i915_private { struct pci_dev *bridge_dev; struct intel_ring_buffer ring[I915_NUM_RINGS]; + struct drm_i915_gem_object *semaphore_obj; uint32_t last_seqno, next_seqno; drm_dma_handle_t *status_page_dmah; diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 03ffc57..dc25084 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -232,7 +232,7 @@ #define MI_DISPLAY_FLIP_IVB_SPRITE_B (3 19) #define MI_DISPLAY_FLIP_IVB_PLANE_C (4 19) #define MI_DISPLAY_FLIP_IVB_SPRITE_C (5 19) -#define MI_SEMAPHORE_MBOX MI_INSTR(0x16, 1) /* gen6+ */ +#define MI_SEMAPHORE_MBOX MI_INSTR(0x16, 1) /* gen6, gen7 */ #define MI_SEMAPHORE_GLOBAL_GTT(122) #define MI_SEMAPHORE_UPDATE (121) #define MI_SEMAPHORE_COMPARE (120) @@ -258,6 +258,8 @@ #define MI_RESTORE_EXT_STATE_EN (12) #define MI_FORCE_RESTORE (11) #define MI_RESTORE_INHIBIT (10) +#define MI_SEMAPHORE_SIGNALMI_INSTR(0x1b, 0) /* GEN8+ */ +#define MI_SEMAPHORE_TARGET(engine) ((engine)15) #define MI_STORE_DWORD_IMM MI_INSTR(0x20, 1) #define MI_MEM_VIRTUAL (1 22) /* 965+ only */ #define MI_STORE_DWORD_INDEX MI_INSTR(0x21, 1) @@ -352,6 +354,7 @@ #define PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE(110) /* GM45+ only */ #define PIPE_CONTROL_INDIRECT_STATE_DISABLE (19) #define PIPE_CONTROL_NOTIFY (18) +#define PIPE_CONTROL_FLUSH_ENABLE(17) /* gen7+ */ #define PIPE_CONTROL_VF_CACHE_INVALIDATE (14) #define PIPE_CONTROL_CONST_CACHE_INVALIDATE (13) #define PIPE_CONTROL_STATE_CACHE_INVALIDATE (12) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 1d52c8a..b69eceb 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -650,6 +650,13 @@ static int init_render_ring(struct intel_ring_buffer *ring) static void render_ring_cleanup(struct intel_ring_buffer *ring) { struct drm_device *dev = ring-dev; + struct drm_i915_private *dev_priv = dev-dev_private; + + if (dev_priv-semaphore_obj) { + i915_gem_object_ggtt_unpin(dev_priv-semaphore_obj); + drm_gem_object_unreference(dev_priv-semaphore_obj-base); + dev_priv-semaphore_obj = NULL; + } if (ring-scratch.obj == NULL) return; @@ -663,6 +670,85 @@ static void render_ring_cleanup(struct intel_ring_buffer *ring) ring-scratch.obj = NULL; } +static int gen8_rcs_signal(struct intel_ring_buffer *signaller, + unsigned int num_dwords) +{ +#define MBOX_UPDATE_DWORDS 8 + struct drm_device *dev = signaller-dev; + struct drm_i915_private *dev_priv = dev-dev_private; + struct intel_ring_buffer *waiter; + int i, ret, num_rings; + + num_rings = hweight32(INTEL_INFO(dev)-ring_mask); + num_dwords += (num_rings-1) * MBOX_UPDATE_DWORDS; +#undef MBOX_UPDATE_DWORDS + + ret = intel_ring_begin(signaller, num_dwords); + if (ret) + return ret; + + for_each_ring(waiter, dev_priv, i) { + u64 gtt_offset = signaller-semaphore.signal_ggtt[i]; + if (gtt_offset ==
[Intel-gfx] [PATCH 06/10] drm/i915: Extract semaphore error collection
Signed-off-by: Ben Widawsky b...@bwidawsk.net --- drivers/gpu/drm/i915/i915_gpu_error.c | 30 ++ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 2d81985..a7eaab2 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -744,6 +744,23 @@ static void i915_gem_record_fences(struct drm_device *dev, } } + +static void gen6_record_semaphore_state(struct drm_i915_private *dev_priv, + struct intel_ring_buffer *ring, + struct drm_i915_error_ring *ering) +{ + ering-semaphore_mboxes[0] = I915_READ(RING_SYNC_0(ring-mmio_base)); + ering-semaphore_mboxes[1] = I915_READ(RING_SYNC_1(ring-mmio_base)); + ering-semaphore_seqno[0] = ring-semaphore.sync_seqno[0]; + ering-semaphore_seqno[1] = ring-semaphore.sync_seqno[1]; + + if (HAS_VEBOX(dev_priv-dev)) { + ering-semaphore_mboxes[2] = + I915_READ(RING_SYNC_2(ring-mmio_base)); + ering-semaphore_seqno[2] = ring-semaphore.sync_seqno[2]; + } +} + static void i915_record_ring_state(struct drm_device *dev, struct intel_ring_buffer *ring, struct drm_i915_error_ring *ering) @@ -753,18 +770,7 @@ static void i915_record_ring_state(struct drm_device *dev, if (INTEL_INFO(dev)-gen = 6) { ering-rc_psmi = I915_READ(ring-mmio_base + 0x50); ering-fault_reg = I915_READ(RING_FAULT_REG(ring)); - ering-semaphore_mboxes[0] - = I915_READ(RING_SYNC_0(ring-mmio_base)); - ering-semaphore_mboxes[1] - = I915_READ(RING_SYNC_1(ring-mmio_base)); - ering-semaphore_seqno[0] = ring-semaphore.sync_seqno[0]; - ering-semaphore_seqno[1] = ring-semaphore.sync_seqno[1]; - } - - if (HAS_VEBOX(dev)) { - ering-semaphore_mboxes[2] = - I915_READ(RING_SYNC_2(ring-mmio_base)); - ering-semaphore_seqno[2] = ring-semaphore.sync_seqno[2]; + gen6_record_semaphore_state(dev_priv, ring, ering); } if (INTEL_INFO(dev)-gen = 4) { -- 1.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 09/10] drm/i915/bdw: poll semaphores
As Ville points out, it's possible/probable we don't actually need this. Potentially, this validates the letter of the spec, and not the spirit. Ville: I discussed this on irc w/ Ben, and I was suggesting we don't need to poll. Polling apparently can be used as a workaround for certain hardware issues, but it looks like those issues shouldn't affect us, for the momemnt at least. So my suggestion was to try w/o polling first (since there could be some power cost to polling) and add the poll bit if problems arise. Signed-off-by: Ben Widawsky b...@bwidawsk.net --- drivers/gpu/drm/i915/intel_ringbuffer.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 20fc2b1..66b1c6e 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -842,6 +842,7 @@ gen8_ring_sync(struct intel_ring_buffer *waiter, intel_ring_emit(waiter, MI_SEMAPHORE_WAIT | MI_SEMAPHORE_GLOBAL_GTT | + MI_SEMAPHORE_POLL | MI_SEMAPHORE_SAD_GTE_SDD); intel_ring_emit(waiter, seqno); intel_ring_emit(waiter, -- 1.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 08/10] drm/i915: semaphore debugfs
Simple debugfs file to display the current state of semaphores. This is useful if you want to see the state without hanging the GPU. NOTE: This patch is optional to the series. NOTE2: Like the GPU error state collection, the reads are currently incoherent. Signed-off-by: Ben Widawsky b...@bwidawsk.net --- drivers/gpu/drm/i915/i915_debugfs.c | 70 + 1 file changed, 70 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 18b3565..d9c1414 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -2335,6 +2335,75 @@ static int i915_display_info(struct seq_file *m, void *unused) return 0; } +static int i915_semaphore_status(struct seq_file *m, void *unused) +{ + struct drm_info_node *node = (struct drm_info_node *) m-private; + struct drm_device *dev = node-minor-dev; + struct drm_i915_private *dev_priv = dev-dev_private; + struct intel_ring_buffer *ring; + int i, j, ret; + + if (!i915_semaphore_is_enabled(dev)) { + seq_puts(m, Semaphores are disabled\n); + return 0; + } + + ret = mutex_lock_interruptible(dev-struct_mutex); + if (ret) + return ret; + + if (IS_BROADWELL(dev)) { + struct page *page; + uint64_t *seqno; + + page = i915_gem_object_get_page(dev_priv-semaphore_obj, 0); + + seqno = (uint64_t *)kmap_atomic(page); + for_each_ring(ring, dev_priv, i) { + uint64_t offset; + + seq_printf(m, %s\n, ring-name); + + seq_puts(m, Last signal:); + for (j = 0; j I915_NUM_RINGS; j++) { + offset = i * I915_NUM_RINGS + j; + seq_printf(m, 0x%08llx (0x%02llx) , + seqno[offset], offset * 8); + } + seq_putc(m, '\n'); + + seq_puts(m, Last wait: ); + for (j = 0; j I915_NUM_RINGS; j++) { + offset = i + (j * I915_NUM_RINGS); + seq_printf(m, 0x%08llx (0x%02llx) , + seqno[offset], offset * 8); + } + seq_putc(m, '\n'); + + } + kunmap_atomic(seqno); + } else { + seq_puts(m, Last signal:); + for_each_ring(ring, dev_priv, i) + for (j = 0; j I915_NUM_RINGS; j++) + seq_printf(m, 0x%08x\n, + I915_READ(ring-semaphore.mbox.signal[j])); + seq_putc(m, '\n'); + } + + seq_puts(m, \nSync seqno:\n); + for_each_ring(ring, dev_priv, i) { + for (j = 0; j I915_NUM_RINGS - 1; j++) { + seq_printf(m, 0x%08x , ring-semaphore.sync_seqno[j]); + } + seq_putc(m, '\n'); + } + seq_putc(m, '\n'); + + mutex_unlock(dev-struct_mutex); + return 0; +} + struct pipe_crc_info { const char *name; struct drm_device *dev; @@ -3781,6 +3850,7 @@ static const struct drm_info_list i915_debugfs_list[] = { {i915_pc8_status, i915_pc8_status, 0}, {i915_power_domain_info, i915_power_domain_info, 0}, {i915_display_info, i915_display_info, 0}, + {i915_semaphore_status, i915_semaphore_status, 0}, }; #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list) -- 1.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 00/10] Semaphores again. Needs shepherd.
The series is mostly done, but it needs a few last tweaks to make it acceptable for merge. Chris noticed that the series doesn't properly work with the new VCS2 code that Daniel merged a couple of weeks ago. I too noticed this, but was remaining silent in the hopes that we could actually try to make progress. I raised concerns about this during the VCS2 merge on IRC. I only took a quick look, and it seems it's not too bad to implement, but it requires a bit of brain power to make sure it's correct. The series is broken in another way because the new semaphore decoding which Daniel added. Ville posted a fix that looked sane to me, but I haven't a machine to test with. I'd recommend someone try to beat that into shape and test it. The series should be /okay/ without that though. So here are the latest patches that I have. I just don't have enough time before my vacation to get these things done. They've gone through 5 rounds of rebase since the first series was posted on 12/13/13. They get the first thumbs up from Ville on Mar 5 (a couple of minor tweaks were required and could have been easily done by the maintainer). I've fixed everything I am aware. If someone could take over the patches, and get them merged, I think it'd really help in getting this useful feature added to our driver for gen8. * The last two patches aren't needed, but are here for posterity. If nobody has taken this over when I return I will see if I can complete it. Ben Widawsky (10): drm/i915: Make semaphore updates more precise drm/i915: gen specific ring init drm/i915/bdw: implement semaphore signal drm/i915/bdw: implement semaphore wait drm/i915: Implement MI decode for gen8 drm/i915: Extract semaphore error collection drm/i915/bdw: collect semaphore error state drm/i915: semaphore debugfs drm/i915/bdw: poll semaphores DONT_MERGE drm/i915: FORCE_RESTORE for gen8 semaphores drivers/gpu/drm/i915/i915_debugfs.c | 70 +++ drivers/gpu/drm/i915/i915_drv.h | 2 + drivers/gpu/drm/i915/i915_gem_context.c | 7 + drivers/gpu/drm/i915/i915_gpu_error.c | 79 ++-- drivers/gpu/drm/i915/i915_irq.c | 11 +- drivers/gpu/drm/i915/i915_reg.h | 8 +- drivers/gpu/drm/i915/intel_ringbuffer.c | 342 +++- drivers/gpu/drm/i915/intel_ringbuffer.h | 78 +++- 8 files changed, 474 insertions(+), 123 deletions(-) -- 1.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/9] drm/i915: Extract node allocation from bind
On Wed, May 07, 2014 at 05:55:00PM +0100, Chris Wilson wrote: On Wed, May 07, 2014 at 09:00:16AM -0700, Ben Widawsky wrote: On Wed, May 07, 2014 at 04:53:08PM +0100, Chris Wilson wrote: On Wed, May 07, 2014 at 08:45:38AM -0700, Ben Widawsky wrote: Hmm. I really don't see what's actually upsetting. Can you be a bit more explicit about what's so bothersome to you for my edification? evict_something() make assumptions about the ranges of the vm which it searches. At the moment, they mirror its parent's function. Ah, thanks. So is plumbing starting eviction offset into evict something an appealing solution here? Yes. I have to admit to not being overly sold on the code migration yet. I guess you have an ulterior motive... Evictable vm? -Chris The only immediate goal is to be able to get this bit logic out of bind_to_vm, so I can use preallocated nodes a bit more cleanly. At least for the present, I have no plans with it other than that. I did like the resuse of the PDE allocation for gen7. If I think of a better reason, I'll let you know. -- Ben Widawsky, Intel Open Source Technology Center ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 01/10] drm/i915: Make semaphore updates more precise
With the ring mask we now have an easy way to know the number of rings in the system, and therefore can accurately predict the number of dwords to emit for semaphore signalling. This was not possible (easily) previously. There should be no functional impact, simply fewer instructions emitted. While we're here, simply do the round up to 2 instead of the fancier rounding we did before, which rounding up per mbox, ie 4. This also allows us to drop the unnecessary MI_NOOP, so not really 4, 3. v2: Use 3 dwords instead of 4 (Ville) Do the proper calculation to get the number of dwords to emit (Ville) Conditionally set .sync_to when semaphores are enabled (Ville) v3: Rebased on VCS2 Replace hweight_long with hweight32 (Ville) v4: Pull out the accidentally squashed hunk from the next patch after rebase (Daniel). Reviewed-by: Ville Syrjälä ville.syrj...@linux.intel.com (v1) Signed-off-by: Ben Widawsky b...@bwidawsk.net --- drivers/gpu/drm/i915/intel_ringbuffer.c | 25 + 1 file changed, 9 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 40a7aa4..d211fbc 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -666,24 +666,19 @@ static void render_ring_cleanup(struct intel_ring_buffer *ring) static int gen6_signal(struct intel_ring_buffer *signaller, unsigned int num_dwords) { +#define MBOX_UPDATE_DWORDS 3 struct drm_device *dev = signaller-dev; struct drm_i915_private *dev_priv = dev-dev_private; struct intel_ring_buffer *useless; - int i, ret; + int i, ret, num_rings; - /* NB: In order to be able to do semaphore MBOX updates for varying -* number of rings, it's easiest if we round up each individual update -* to a multiple of 2 (since ring updates must always be a multiple of -* 2) even though the actual update only requires 3 dwords. -*/ -#define MBOX_UPDATE_DWORDS 4 - if (i915_semaphore_is_enabled(dev)) - num_dwords += ((I915_NUM_RINGS-1) * MBOX_UPDATE_DWORDS); + num_rings = hweight32(INTEL_INFO(dev)-ring_mask); + num_dwords += round_up((num_rings-1) * MBOX_UPDATE_DWORDS, 2); +#undef MBOX_UPDATE_DWORDS ret = intel_ring_begin(signaller, num_dwords); if (ret) return ret; -#undef MBOX_UPDATE_DWORDS for_each_ring(useless, dev_priv, i) { u32 mbox_reg = signaller-semaphore.mbox.signal[i]; @@ -691,15 +686,13 @@ static int gen6_signal(struct intel_ring_buffer *signaller, intel_ring_emit(signaller, MI_LOAD_REGISTER_IMM(1)); intel_ring_emit(signaller, mbox_reg); intel_ring_emit(signaller, signaller-outstanding_lazy_seqno); - intel_ring_emit(signaller, MI_NOOP); - } else { - intel_ring_emit(signaller, MI_NOOP); - intel_ring_emit(signaller, MI_NOOP); - intel_ring_emit(signaller, MI_NOOP); - intel_ring_emit(signaller, MI_NOOP); } } + /* If num_dwords was rounded, make sure the tail pointer is correct */ + if (num_rings % 2 == 0) + intel_ring_emit(signaller, MI_NOOP); + return 0; } -- 1.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC] drm/i915 : Reduce the shmem page allocation time by using blitter engines for clearing pages.
Gupta, Sourab sourab.gu...@intel.com writes: On Tue, 2014-05-06 at 17:56 +, Eric Anholt wrote: sourab.gu...@intel.com writes: From: Sourab Gupta sourab.gu...@intel.com This patch is in continuation of and is dependent on earlier patch series to 'reduce the time for which device mutex is kept locked'. (http://lists.freedesktop.org/archives/intel-gfx/2014-May/044596.html) One of userspace's assumptions is that when you allocate a new BO, you can map it and start writing data into it without needing to wait on the GPU. I expect this patch to mostly hurt performance on apps (and I note that the patch doesn't come with any actual performance data) that get more stalls as a result. Hi Eric, Yes, it may hurt the performance on apps, in case of small buffers and if blitter engine is busy as there is a synchronous wait for rendering in the gem_fault handler. If that is the case, we can drop this from the gem_fault routine and employ it only in the do_execbuffer routine. Its useful there because there is no synchronous wait required in sw, due to cross ring synchronization. We'll gather the numbers to quantify the performance benefit we have while using blitter engines in this way for different buffer sizes. More importantly, though, it breaks existing userspace that relies on buffers being idle on allocation, for the unsychronized maps used in intel_bufferobj_subdata() and intel_bufferobj_map_range(GL_INVALIDATE_BUFFER_BIT | GL_UNSYNCHRONIZED_BIT) Sorry, I miss your point here. It may not break this assumption due to the fact that we employ this method only in case of the preallocate routine, which will be called in the first page fault of the object (gem_fault handler) resulting in fresh allocation of pages. So, in case of unsynchronized maps, there may be a wait involved in the first page fault. Also, that wait time may be lesser than the time required for CPU memset (resulting in no performance hit). There won't be any subsequent waits afterwards for that buffer object. Though, we'll have performance hit in the case when blitter engine is already busy and may not be available to immediately start the memset of freshly allocated mmaped buffers. Am I missing something here? Does the userspace requirement for unsynchronized mapped objects involve complete idleness of object on gpu even when object page faults for the first time? Oh, I mised how this works. So at pagefault time, you're firing off the blit, then immediately stalling on it? This sounds even less like a possible performance win than I was initially thinking. pgp1fWzwjC48O.pgp Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/2] drm/i915: Improve fallback ring waiting
On Mon, May 05, 2014 at 01:07:32AM -0700, Chris Wilson wrote: A few improvements to the fallback method for waiting upon ring space: 1. Fix the start/end wait tracepoints to always be paired. 2. Increase responsiveness of checking 3. Mark the process as waiting upon io 4. Check for signal interruptions Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk These seem like reasonable improvements. Reviewed-by: Brad Volkin bradley.d.vol...@intel.com --- drivers/gpu/drm/i915/intel_ringbuffer.c | 24 +--- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index e0c7bf27eafd..d6b7b884adf9 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1546,7 +1546,6 @@ static int ring_wait_for_space(struct intel_ring_buffer *ring, int n) /* force the tail write in case we have been skipping them */ __intel_ring_advance(ring); - trace_i915_ring_wait_begin(ring); /* With GEM the hangcheck timer should kick us out of the loop, * leaving it early runs the risk of corrupting GEM state (due * to running on almost untested codepaths). But on resume @@ -1554,12 +1553,13 @@ static int ring_wait_for_space(struct intel_ring_buffer *ring, int n) * case by choosing an insanely large timeout. */ end = jiffies + 60 * HZ; + trace_i915_ring_wait_begin(ring); do { ring-head = I915_READ_HEAD(ring); ring-space = ring_space(ring); if (ring-space = n) { - trace_i915_ring_wait_end(ring); - return 0; + ret = 0; + break; } if (!drm_core_check_feature(dev, DRIVER_MODESET) @@ -1569,15 +1569,25 @@ static int ring_wait_for_space(struct intel_ring_buffer *ring, int n) master_priv-sarea_priv-perf_boxes |= I915_BOX_WAIT; } - msleep(1); + io_schedule_timeout(1); + + if (dev_priv-mm.interruptible signal_pending(current)) { + ret = -ERESTARTSYS; + break; + } ret = i915_gem_check_wedge(dev_priv-gpu_error, dev_priv-mm.interruptible); if (ret) - return ret; - } while (!time_after(jiffies, end)); + break; + + if (time_after(jiffies, end)) { + ret = -EBUSY; + break; + } + } while (1); trace_i915_ring_wait_end(ring); - return -EBUSY; + return ret; } static int intel_wrap_ring_buffer(struct intel_ring_buffer *ring) -- 2.0.0.rc0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/2] rendercopy/bdw: Enable hw-generated binding tables
On Wednesday, May 07, 2014 02:49:31 PM Ville Syrjälä wrote: I quickly cobbled together a hsw version of this and gave it a whirl on one machine. Seems to work just fine here, and no lockups when switching between hw and sw binding tables. Did you get the lockups on hsw even with rendercopy? Here's my hsw version: +static void +gen7_hw_binding_table(struct intel_batchbuffer *batch, bool enable) +{ + if (!enable) { + OUT_BATCH(MI_RS_CONTROL | 0x0); + + OUT_BATCH(HSW_3DSTATE_BINDING_TABLE_POOL_ALLOC | (3 - 2)); + /* binding table pool base address */ This bit I missed and the source of my troubles for the past few months. + OUT_BATCH(3 5); Yep, I confirm toggling on HSW does work quite well now. I'll now update the patches to include HSW path on the kernel. I also take back my previous statement that RS is broken on HSW! :) Thanks a lot Ville! + /* Upper bound */ + OUT_BATCH(0); + + OUT_BATCH(GEN7_PIPE_CONTROL | (4 - 2)); + OUT_BATCH(GEN7_PIPE_CONTROL_CS_STALL | GEN7_PIPE_CONTROL_STALL_AT_SCOREBOARD); + OUT_BATCH(0); + OUT_BATCH(0); + + OUT_BATCH(GEN7_PIPE_CONTROL | (4 - 2)); + OUT_BATCH(GEN7_PIPE_CONTROL_SC_INVALIDATE); + OUT_BATCH(0); + OUT_BATCH(0); + + return; +} ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/2] drm/i915: Flush request queue when waiting for ring space
On Mon, May 05, 2014 at 01:07:33AM -0700, Chris Wilson wrote: During the review of commit 1f70999f9052f5a1b0ce1a55aff3808f2ec9fe42 Author: Chris Wilson ch...@chris-wilson.co.uk Date: Mon Jan 27 22:43:07 2014 + drm/i915: Prevent recursion by retiring requests when the ring is full Ville raised the point that our interaction with request-tail was likely to foul up other uses elsewhere (such as hang check comparing ACTHD against requests). However, we also need to restore the implicit retire requests that certain test cases depend upon (e.g. igt/gem_exec_lut_handle), this raises the spectre that the ppgtt will randomly call i915_gpu_idle() and recurse back into intel_ring_begin(). Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78023 There's one minor cleanup suggested below. Overall I think the code is better after the patch. I don't really like reintroducing the potential recursion, but I suppose that's a separate issue. With the cleanup... Reviewed-by: Brad Volkin bradley.d.vol...@intel.com --- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_gem.c | 3 +-- drivers/gpu/drm/i915/intel_ringbuffer.c | 36 - 3 files changed, 15 insertions(+), 25 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 5f4f631aecef..69a4e161ff37 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2155,6 +2155,7 @@ struct drm_i915_gem_request * i915_gem_find_active_request(struct intel_ring_buffer *ring); bool i915_gem_retire_requests(struct drm_device *dev); +void i915_gem_retire_requests_ring(struct intel_ring_buffer *ring); int __must_check i915_gem_check_wedge(struct i915_gpu_error *error, bool interruptible); static inline bool i915_reset_in_progress(struct i915_gpu_error *error) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index dae51c3701f3..2f2a668c01ac 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -64,7 +64,6 @@ static unsigned long i915_gem_inactive_scan(struct shrinker *shrinker, static unsigned long i915_gem_purge(struct drm_i915_private *dev_priv, long target); static unsigned long i915_gem_shrink_all(struct drm_i915_private *dev_priv); static void i915_gem_object_truncate(struct drm_i915_gem_object *obj); -static void i915_gem_retire_requests_ring(struct intel_ring_buffer *ring); static bool cpu_cache_is_coherent(struct drm_device *dev, enum i915_cache_level level) @@ -2448,7 +2447,7 @@ void i915_gem_reset(struct drm_device *dev) /** * This function clears the request list as sequence numbers are passed. */ -static void +void i915_gem_retire_requests_ring(struct intel_ring_buffer *ring) { uint32_t seqno; diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index d6b7b884adf9..9dce15cbce73 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -40,14 +40,19 @@ */ #define CACHELINE_BYTES 64 -static inline int ring_space(struct intel_ring_buffer *ring) +static inline int __ring_space(int head, int tail, int size) { - int space = (ring-head HEAD_ADDR) - (ring-tail + I915_RING_FREE_SPACE); + int space = head - (tail + I915_RING_FREE_SPACE); if (space 0) - space += ring-size; + space += size; return space; } +static inline int ring_space(struct intel_ring_buffer *ring) +{ + return __ring_space(ring-head HEAD_ADDR, ring-tail, ring-size); +} + static bool intel_ring_stopped(struct intel_ring_buffer *ring) { struct drm_i915_private *dev_priv = ring-dev-dev_private; @@ -1495,26 +1500,11 @@ static int intel_ring_wait_request(struct intel_ring_buffer *ring, int n) } list_for_each_entry(request, ring-request_list, list) { - int space; - - if (request-tail == -1) - continue; - - space = request-tail - (ring-tail + I915_RING_FREE_SPACE); - if (space 0) - space += ring-size; - if (space = n) { + if (__ring_space(request-tail, ring-tail, ring-size) = n) { seqno = request-seqno; tail = request-tail; We don't actually use the local 'tail' anymore. break; } - - /* Consume this request in case we need more space than - * is available and so need to prevent a race between - * updating last_retired_head and direct reads of - * I915_RING_HEAD. It also provides a nice sanity check. - */ - request-tail
Re: [Intel-gfx] [PATCH 1/2] drm/i915: Improve fallback ring waiting
On Mon, May 05, 2014 at 09:07:32AM +0100, Chris Wilson wrote: A few improvements to the fallback method for waiting upon ring space: 1. Fix the start/end wait tracepoints to always be paired. 2. Increase responsiveness of checking 3. Mark the process as waiting upon io 4. Check for signal interruptions Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- drivers/gpu/drm/i915/intel_ringbuffer.c | 24 +--- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index e0c7bf27eafd..d6b7b884adf9 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1546,7 +1546,6 @@ static int ring_wait_for_space(struct intel_ring_buffer *ring, int n) /* force the tail write in case we have been skipping them */ __intel_ring_advance(ring); - trace_i915_ring_wait_begin(ring); /* With GEM the hangcheck timer should kick us out of the loop, * leaving it early runs the risk of corrupting GEM state (due * to running on almost untested codepaths). But on resume @@ -1554,12 +1553,13 @@ static int ring_wait_for_space(struct intel_ring_buffer *ring, int n) * case by choosing an insanely large timeout. */ end = jiffies + 60 * HZ; + trace_i915_ring_wait_begin(ring); do { ring-head = I915_READ_HEAD(ring); ring-space = ring_space(ring); if (ring-space = n) { - trace_i915_ring_wait_end(ring); - return 0; + ret = 0; + break; } if (!drm_core_check_feature(dev, DRIVER_MODESET) @@ -1569,15 +1569,25 @@ static int ring_wait_for_space(struct intel_ring_buffer *ring, int n) master_priv-sarea_priv-perf_boxes |= I915_BOX_WAIT; } - msleep(1); + io_schedule_timeout(1); This isn't exported so doesn't compile too well. I've undone this change for now. Otherwise both patches merged. -Daniel + + if (dev_priv-mm.interruptible signal_pending(current)) { + ret = -ERESTARTSYS; + break; + } ret = i915_gem_check_wedge(dev_priv-gpu_error, dev_priv-mm.interruptible); if (ret) - return ret; - } while (!time_after(jiffies, end)); + break; + + if (time_after(jiffies, end)) { + ret = -EBUSY; + break; + } + } while (1); trace_i915_ring_wait_end(ring); - return -EBUSY; + return ret; } static int intel_wrap_ring_buffer(struct intel_ring_buffer *ring) -- 2.0.0.rc0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/bdw: Only use 2g GGTT for 32b platforms
On Wed, May 07, 2014 at 09:42:57AM +0200, Daniel Vetter wrote: On Tue, May 06, 2014 at 09:58:59PM -0700, Ben Widawsky wrote: From: Ben Widawsky benjamin.widaw...@linux.intel.com Daniel requested in the bug that I use a 3GB fallback size. Since this is not in the spec as a valid size, I decided against it. We could potentially add a patch to bump it to 3GB on top of this one. This probably should be CC: stable - but I'll let the powers that be decide that one. v2: Change ifdef to 32b, instead of ifndef update comment Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76619 Signed-off-by: Ben Widawsky b...@bwidawsk.net --- drivers/gpu/drm/i915/i915_gem_gtt.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 846b6ee..d03a540 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -1759,6 +1759,14 @@ static inline unsigned int gen8_get_total_gtt_size(u16 bdw_gmch_ctl) bdw_gmch_ctl = BDW_GMCH_GGMS_MASK; if (bdw_gmch_ctl) bdw_gmch_ctl = 1 bdw_gmch_ctl; + +#ifdef CONFIG_32BIT + /* Limit 32B platforms to a 2GB GGTT + 4 20 / pte size * PAGE_SIZE */ + if (bdw_gmch_ctl 4) + bdw_gmch_ctl = 4; Comment needs CodingStyle polish and we might as well return the right value directly instead of adjusting bdw_gmch_ctl. With that polish applied this is Reviewed-by: Daniel Vetter daniel.vet...@ffwll.cho And yes most definitely Cc: sta...@vger.kernel.org since it's a regression. Aside: Please add the regression tags when handling bugs, I need those for tracking and stats. -Daniel I don't know what a regression tag is. +#endif + return bdw_gmch_ctl 20; } -- 1.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Ben Widawsky, Intel Open Source Technology Center ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] [v3] drm/i915/bdw: Only use 2g GGTT for 32b platforms
Daniel requested in the bug that I use a 3GB fallback size. Since this is not in the spec as a valid size, I decided against it. We could potentially add a patch to bump it to 3GB on top of this one. This probably should be CC: stable - but I'll let the powers that be decide that one. Regression from a revert of the revert: commit 7907f45bf9f67a1c5e5d4ae05bab428d7c2f43b2 Author: Ben Widawsky benjamin.widaw...@intel.com Date: Wed Feb 19 22:05:46 2014 -0800 Revert drm/i915/bdw: Limit GTT to 2GB v2: Change ifdef to 32b, instead of ifndef update comment v3. Update comment to not wrap (Daniel). Update commit message Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76619 Cc: sta...@vger.kernel.org Reviewed-by: Daniel Vetter daniel.vet...@ffwll.ch Signed-off-by: Ben Widawsky b...@bwidawsk.net --- drivers/gpu/drm/i915/i915_gem_gtt.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 4d87bf2..8d6d531 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -2640,6 +2640,13 @@ static inline unsigned int gen8_get_total_gtt_size(u16 bdw_gmch_ctl) bdw_gmch_ctl = BDW_GMCH_GGMS_MASK; if (bdw_gmch_ctl) bdw_gmch_ctl = 1 bdw_gmch_ctl; + +#ifdef CONFIG_32BIT + /* Limit 32B platforms to a 2GB GGTT: 4 20 / pte size * PAGE_SIZE */ + if (bdw_gmch_ctl 4) + bdw_gmch_ctl = 4; +#endif + return bdw_gmch_ctl 20; } -- 1.9.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] UXA: Wait until a pageflip actually completes to report it.
On Wed, May 07, 2014 at 04:55:18PM +0100, Chris Wilson wrote: On Mon, May 05, 2014 at 11:05:07PM -0700, Jamey Sharp wrote: UXA was reporting page-flip completion as soon as the flip was scheduled with the kernel, instead of waiting for the kernel to indicate that the flip had actually completed. Moving the DRI2SwapComplete call to the right place fixes all of our Piglit tests for OML_sync_control when run on xf86-video-intel/UXA, aside from a bit of difficult-to-reproduce flakiness when using a divisor 1. The violation is intentional, as it gives us triple buffering by default. It can be disabled. As far as I can tell, this patch has no effect on triple-buffering. I verified that by logging new_front-handle in intel_do_pageflip: It rotates through three different BO's on successive flips. Jamey signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx