date:20140507

Re: [Intel-gfx] [PULL] drm-intel-next

2014-05-07 Thread Jani Nikula

On Tue, 06 May 2014, Daniel Vetter dan...@ffwll.ch wrote:
 On Tue, May 06, 2014 at 10:04:17PM +0200, Knut Petersen wrote:
 On 06.05.2014 20:59, Daniel Vetter wrote:
 On Tue, May 06, 2014 at 04:30:45PM +0300, Jani Nikula wrote:
 On Tue, 06 May 2014, Knut Petersen knut_peter...@t-online.de wrote:
 On 28.04.2014 15:26, Daniel Vetter wrote:
 The patch below is a clear candidate for 3.15 as it fixes a regression 
 introduced after 3.14
 Egbert Eich (1):
 drm/i915/SDVO: For sysfs link put directory and target in 
  correct order
 Daniel, want me to pick that for -fixes?
 Yeah I geuss so. People never complained that the link isn't there, us
 adding it for one kernel release to the wrong place doesn't really sound
 too bothersome.
 
 But if this unbreaks a system somewhere then it makes sense to apply to
 -fixes.
 
 Knut, how exactly does your system blow up?
 Well, it's only the warning generated at boot time that would irritate users.
 As a patch is known and tested it's probably a good idea to fix it before
 people complain to lkml and bugzilla.

 Ah, I didn't realize there's a warning, I've thought only the symlink goes
 the wrong way. In that case it's good to go for -fixes.

Done.

-- 
Jani Nikula, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915/SDVO: For sysfs link put directory and target in correct order

2014-05-07 Thread Jani Nikula

On Tue, 15 Apr 2014, Daniel Vetter dan...@ffwll.ch wrote:
 On Fri, Apr 11, 2014 at 08:26:25PM +0300, Imre Deak wrote:
 On Fri, 2014-04-11 at 19:07 +0200, Egbert Eich wrote:
  When linking the i2c sysfs file into the connector's directory
  pass directory and link target in the right order.
  This code was introduced with:
  
commit 931c1c26983b4f84e33b78579fc8d57e4a14c6b4
Author: Imre Deak imre.d...@intel.com
Date:   Tue Feb 11 17:12:51 2014 +0200
  
  drm/i915: sdvo: add i2c sysfs symlink to the connector's directory
  
  This is the same what we do for DP connectors, so make things more
  consistent.
  
  Signed-off-by: Egbert Eich e...@suse.de
 
 Oops, good catch. The fix looks ok, fwiw:
 Reviewed-by: Imre Deak imre.d...@intel.com

 Queued for -next, thanks for the patch.

And picked up for -fixes and queuing for 3.15.

BR,
Jani.


-- 
Jani Nikula, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 2/9] drm/i915: Extract node allocation from bind

2014-05-07 Thread Chris Wilson

On Tue, May 06, 2014 at 10:21:31PM -0700, Ben Widawsky wrote:
 The DRM node allocation code was already a bit of an ugly bit of code
 within a complex function. Removing it serves the purpose of cleaning
 the function up. More importantly, it provides a way to have a
 preallocated (address space) VMA to easily skip this code. Something
 we're very likely to need.
 
 This is essentially a wrapper for DRM node allocation with an eviction +
 retry. It is something which could be moved to core DRM eventually, if
 other drivers had similar eviction semantics.
 
 This removes a goto used for something other than error unwinding, a
 generally reviled (I am neutral) practice, and replaces it with a
 function call to itself that should have the same effect. Note that it's
 not a recursive function as all the problem set reduction is done in the
 eviction code.
 
 Some might say this change is worse than before because we are using
 stack for each subsequent call. Frankly, I'd rather overflow the stack
 and blow it up than loop forever. In either case, this is addressed in
 the next patch.
 
 I believe, and intend, that other than the stack usage, there is no
 functional change here.

Nope, but decoupling evict_something() further does introduce lots more
implied magic.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [RFC] drm/i915: Scratch page optimization for blanking buffer

2014-05-07 Thread Daniel Vetter

On Wed, May 07, 2014 at 11:19:57AM +0530, Akash Goel wrote:
 On Mon, 2014-05-05 at 16:07 +0200, Daniel Vetter wrote:
  On Mon, May 05, 2014 at 05:13:18PM +0530, akash.g...@intel.com wrote:
   From: Akash Goel akash.g...@intel.com
   
   There is a use case, when user space (display compositor) tries
   to directly flip a fb (without any prior rendering) on primary
   plane. So the backing pages of the object are allocated at page
   flip time only, which takes time. Since, this buffer is supposed to
   serve as a blanking buffer (black colored), we can setup all the GTT 
   entries
   of that blanking buffer with scratch page (which is already zeroed out).
   This saves the time in allocation of real backing physical space for the
   blanking buffer and flushing of CPU cache.
   
   Signed-off-by: Akash Goel akash.g...@intel.com
  
  That sounds very much like a special case of the fallocate ioctl where we
  simply allocat 0 real backing storage pages.
  -Daniel
  
 
 Hi Daniel,
 
 Yes no real backing storage is needed, but still the object should be
 allowed to get mapped into the GGTT, using the scratch page (already
 zeroed out).
 
 Is there a patch for a new drm/i915 ioctl similar to fallocate, which
 could be used here?

Geez, you guys shouldn't just dump patches onto intel-gfx but read a bit
what other people are doing. Especially when they work for the same team
;-)

http://comments.gmane.org/gmane.comp.freedesktop.xorg.drivers.intel/36351

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915/bdw: Only use 2g GGTT for 32b platforms

2014-05-07 Thread Daniel Vetter

On Tue, May 06, 2014 at 09:58:59PM -0700, Ben Widawsky wrote:
 From: Ben Widawsky benjamin.widaw...@linux.intel.com
 
 Daniel requested in the bug that I use a 3GB fallback size. Since this
 is not in the spec as a valid size, I decided against it. We could
 potentially add a patch to bump it to 3GB on top of this one.
 
 This probably should be CC: stable - but I'll let the powers that be
 decide that one.
 
 v2: Change ifdef to 32b, instead of ifndef
 update comment
 
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76619
 Signed-off-by: Ben Widawsky b...@bwidawsk.net
 ---
  drivers/gpu/drm/i915/i915_gem_gtt.c | 8 
  1 file changed, 8 insertions(+)
 
 diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
 b/drivers/gpu/drm/i915/i915_gem_gtt.c
 index 846b6ee..d03a540 100644
 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
 +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
 @@ -1759,6 +1759,14 @@ static inline unsigned int gen8_get_total_gtt_size(u16 
 bdw_gmch_ctl)
   bdw_gmch_ctl = BDW_GMCH_GGMS_MASK;
   if (bdw_gmch_ctl)
   bdw_gmch_ctl = 1  bdw_gmch_ctl;
 +
 +#ifdef CONFIG_32BIT
 + /* Limit 32B platforms to a 2GB GGTT
 + 4  20 / pte size * PAGE_SIZE */
 + if (bdw_gmch_ctl  4)
 + bdw_gmch_ctl = 4;

Comment needs CodingStyle polish and we might as well return the right
value directly instead of adjusting bdw_gmch_ctl. With that polish applied
this is Reviewed-by: Daniel Vetter daniel.vet...@ffwll.cho

And yes most definitely Cc: sta...@vger.kernel.org since it's a
regression.

Aside: Please add the regression tags when handling bugs, I need those for
tracking and stats.
-Daniel

 +#endif
 +
   return bdw_gmch_ctl  20;
  }
  
 -- 
 1.9.2
 
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/9] drm/i915: Use topdown allocation for PPGTT PDEs on gen6/7

2014-05-07 Thread Daniel Vetter

On Tue, May 06, 2014 at 10:21:30PM -0700, Ben Widawsky wrote:
 It was always the intention to do the topdown allocation for context
 objects (Chris' idea originally). Unfortunately, I never managed to land
 the patch, but someone else did, so now we can use it.
 
 As a reminder, hardware contexts never need to be in the precious GTT
 aperture space - which is what is what happens with the normal bottom up
 allocation we do today. Doing a top down allocation increases the odds
 that the HW contexts can get out of the way, especially with per FD
 contexts as is done in full PPGTT
 
 Signed-off-by: Ben Widawsky b...@bwidawsk.net

Queued for -next, thanks for the patch.
-Daniel

 ---
  drivers/gpu/drm/i915/i915_gem_gtt.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
 b/drivers/gpu/drm/i915/i915_gem_gtt.c
 index f6354e0..66fcfc9 100644
 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
 +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
 @@ -1035,8 +1035,7 @@ alloc:
 ppgtt-node, GEN6_PD_SIZE,
 GEN6_PD_ALIGN, 0,
 0, dev_priv-gtt.base.total,
 -   DRM_MM_SEARCH_DEFAULT,
 -   DRM_MM_CREATE_DEFAULT);
 +   DRM_MM_TOPDOWN);
   if (ret == -ENOSPC  !retried) {
   ret = i915_gem_evict_something(dev, dev_priv-gtt.base,
  GEN6_PD_SIZE, GEN6_PD_ALIGN,
 -- 
 1.9.2
 
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 4/9] drm/i915: Limit the number of node allocation retries

2014-05-07 Thread Daniel Vetter

On Tue, May 06, 2014 at 10:21:33PM -0700, Ben Widawsky wrote:
 AFAICT, it's impossible to actually infinitely retry the allocation in
 our current code. However, a small oversight on my part, slight bug, or
 future bug, could easily change that.
 
 To avoid a potential breakage, a simple retry variable of 16 bits will
 help us prevent infinitely running.
 
 Retry is limited to 100 as a mostly random number. Some consideration
 about stack usage was taken into account. As an example, if we allowed
 256 retries on a 32b arch (and my memory serves that all arguments are
 passed on the stack for such architectures), thats 33 bytes * 256
 retries + (fudge for return address and such)... it's way more than we
 want to use already. 64b architectures might be slightly better, since
 6? of the first args will get passed through registers, but it's still
 bad.
 
 If anything, we might want to do way less than 100, like 3.
 
 Signed-off-by: Ben Widawsky b...@bwidawsk.net

So you replace the retry loop with a tailrecursive version in patch 2 and
then add duct-tape later on here? Nope.
-Daniel

 ---
  drivers/gpu/drm/i915/i915_gem.c | 14 +-
  1 file changed, 9 insertions(+), 5 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
 index b6965a2..de98b26 100644
 --- a/drivers/gpu/drm/i915/i915_gem.c
 +++ b/drivers/gpu/drm/i915/i915_gem.c
 @@ -3216,6 +3216,7 @@ static void i915_gem_verify_gtt(struct drm_device *dev)
  #endif
  }
  
 +#define MAX_VMA_FIND_RETRY 100
  static int
  i915_gem_find_vm_space(struct i915_address_space *vm,
  struct drm_mm_node *node,
 @@ -3224,7 +3225,8 @@ i915_gem_find_vm_space(struct i915_address_space *vm,
  unsigned long color,
  unsigned long start,
  unsigned long end,
 -uint32_t flags)
 +uint32_t flags,
 +uint8_t retry)
  {
   int ret;
   ret = drm_mm_insert_node_in_range_generic(vm-mm, node,
 @@ -3232,7 +3234,7 @@ i915_gem_find_vm_space(struct i915_address_space *vm,
 start, end,
 DRM_MM_SEARCH_DEFAULT,
 DRM_MM_CREATE_DEFAULT);
 - if (ret) {
 + if (ret  (retry  MAX_VMA_FIND_RETRY)) {
   if (WARN_ON(ret != -ENOSPC))
   return ret;
  
 @@ -3241,7 +3243,8 @@ i915_gem_find_vm_space(struct i915_address_space *vm,
   if (ret == 0)
   return i915_gem_find_vm_space(vm, node,
 size, align, color,
 -   start, end, flags);
 +   start, end, flags,
 +   retry++);
   }
  
   return ret;
 @@ -3306,8 +3309,9 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object 
 *obj,
   if (IS_ERR(vma))
   goto err_unpin;
  
 - ret = i915_gem_find_vm_space(vm, vma-node, size, alignment,
 -  obj-cache_level, 0, gtt_max, flags);
 + ret = i915_gem_find_vm_space(vm, vma-node,
 +  size, alignment, obj-cache_level,
 +  0, gtt_max, flags, 1);
   if (ret)
   goto err_free_vma;
  
 -- 
 1.9.2
 
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 6/9] drm/i915: Wrap VMA binding

2014-05-07 Thread Daniel Vetter

On Tue, May 06, 2014 at 10:21:35PM -0700, Ben Widawsky wrote:
 This will be useful for some upcoming patches which do more platform
 specific work. Having it in one central place just makes things a bit
 cleaner and easier.
 
 There is a small functional change here. There are more calls to the
 tracepoints.
 
 NOTE: I didn't actually end up using this patch for the intended purpose, but 
 I
 thought it was a nice patch to keep around.
 
 v2: s/i915_gem_bind_vma/i915_gem_vma_bind/
 s/i915_gem_unbind_vma/i915_gem_vma_unbind/
 (Chris)
 
 v3: Missed one spot
 
 Signed-off-by: Ben Widawsky b...@bwidawsk.net

You change the semantics of the vma bind/unbind tracepoints - previously
they've meant we reserve/unreserve a ppgtt address for an object, not
they're called every time we touch the ptes (which is what
vma_bind|unbind actually do). That doesn't look too terribly useful for
tracing any more.

What's the use case for these added tracepoints?
-Daniel

 ---
  drivers/gpu/drm/i915/i915_drv.h|  3 +++
  drivers/gpu/drm/i915/i915_gem.c| 13 ++---
  drivers/gpu/drm/i915/i915_gem_context.c|  2 +-
  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 +++--
  drivers/gpu/drm/i915/i915_gem_gtt.c| 16 ++--
  5 files changed, 27 insertions(+), 12 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
 index c1d2bea..30cde3d 100644
 --- a/drivers/gpu/drm/i915/i915_drv.h
 +++ b/drivers/gpu/drm/i915/i915_drv.h
 @@ -2264,6 +2264,9 @@ bool i915_gem_obj_bound(struct drm_i915_gem_object *o,
   struct i915_address_space *vm);
  unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
   struct i915_address_space *vm);
 +void i915_gem_vma_bind(struct i915_vma *vma, enum i915_cache_level,
 +unsigned flags);
 +void i915_gem_vma_unbind(struct i915_vma *vma);
  struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
struct i915_address_space *vm);
  struct i915_vma *
 diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
 index e9599e9..1567911 100644
 --- a/drivers/gpu/drm/i915/i915_gem.c
 +++ b/drivers/gpu/drm/i915/i915_gem.c
 @@ -2771,7 +2771,7 @@ int i915_vma_unbind(struct i915_vma *vma)
  
   trace_i915_vma_unbind(vma);
  
 - vma-unbind_vma(vma);
 + i915_gem_vma_unbind(vma);
  
   i915_gem_gtt_finish_object(obj);
  
 @@ -3342,9 +3342,8 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object 
 *obj,
  
   WARN_ON(flags  PIN_MAPPABLE  !obj-map_and_fenceable);
  
 - trace_i915_vma_bind(vma, flags);
 - vma-bind_vma(vma, obj-cache_level,
 -   flags  (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 0);
 + i915_gem_vma_bind(vma, obj-cache_level,
 +   flags  (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 
 0);
  
   i915_gem_verify_gtt(dev);
   return vma;
 @@ -3548,8 +3547,8 @@ int i915_gem_object_set_cache_level(struct 
 drm_i915_gem_object *obj,
  
   list_for_each_entry(vma, obj-vma_list, vma_link)
   if (drm_mm_node_allocated(vma-node))
 - vma-bind_vma(vma, cache_level,
 -   obj-has_global_gtt_mapping ? 
 GLOBAL_BIND : 0);
 + i915_gem_vma_bind(vma, cache_level,
 +   obj-has_global_gtt_mapping ? 
 GLOBAL_BIND : 0);
   }
  
   list_for_each_entry(vma, obj-vma_list, vma_link)
 @@ -3914,7 +3913,7 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
   }
  
   if (flags  PIN_GLOBAL  !obj-has_global_gtt_mapping)
 - vma-bind_vma(vma, obj-cache_level, GLOBAL_BIND);
 + i915_gem_vma_bind(vma, obj-cache_level, GLOBAL_BIND);
  
   vma-pin_count++;
   if (flags  PIN_MAPPABLE)
 diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
 b/drivers/gpu/drm/i915/i915_gem_context.c
 index f77b4c1..00c7d88 100644
 --- a/drivers/gpu/drm/i915/i915_gem_context.c
 +++ b/drivers/gpu/drm/i915/i915_gem_context.c
 @@ -655,7 +655,7 @@ static int do_switch(struct intel_ring_buffer *ring,
   if (!to-obj-has_global_gtt_mapping) {
   struct i915_vma *vma = i915_gem_obj_to_vma(to-obj,
  dev_priv-gtt.base);
 - vma-bind_vma(vma, to-obj-cache_level, GLOBAL_BIND);
 + i915_gem_vma_bind(vma, to-obj-cache_level, GLOBAL_BIND);
   }
  
   if (!to-is_initialized || i915_gem_context_is_default(to))
 diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
 b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
 index 6cc004f..211bbbd 100644
 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
 +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
 @@ -369,7 +369,8 @@ i915_gem_execbuffer_relocate_entry(struct 
 drm_i915_gem_object *obj,
   struct i915_vma *vma =

Re: [Intel-gfx] [PATCH 7/9] drm/i915: Make aliasing a 2nd class VM

2014-05-07 Thread Daniel Vetter

On Tue, May 06, 2014 at 10:21:36PM -0700, Ben Widawsky wrote:
 There is a good debate to be had about how best to fit the aliasing
 PPGTT into the code. However, as it stands right now, getting aliasing
 PPGTT bindings is a hack, and done through implicit arguments. To make
 this absolutely clear, WARN and return an error if a driver writer tries
 to do something they shouldn't.
 
 I have no issue with an eventual revert of this patch. It makes sense
 for what we have today.
 
 Signed-off-by: Ben Widawsky b...@bwidawsk.net

Queued for -next, thanks for the patch.
-Daniel

 ---
  drivers/gpu/drm/i915/i915_gem.c | 4 
  1 file changed, 4 insertions(+)
 
 diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
 index 1567911..c54668c 100644
 --- a/drivers/gpu/drm/i915/i915_gem.c
 +++ b/drivers/gpu/drm/i915/i915_gem.c
 @@ -3877,9 +3877,13 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
   uint32_t alignment,
   unsigned flags)
  {
 + struct drm_i915_private *dev_priv = obj-base.dev-dev_private;
   struct i915_vma *vma;
   int ret;
  
 + if (WARN_ON(vm == dev_priv-mm.aliasing_ppgtt-base))
 + return -ENODEV;
 +
   if (WARN_ON(flags  (PIN_GLOBAL | PIN_MAPPABLE)  !i915_is_ggtt(vm)))
   return -EINVAL;
  
 -- 
 1.9.2
 
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 9/9] drm/i915: Split out aliasing binds

2014-05-07 Thread Daniel Vetter

On Tue, May 06, 2014 at 10:21:38PM -0700, Ben Widawsky wrote:
 This patch finishes off  actually separating the aliasing and global
 finds. Prior to this, all global binds would be aliased. Now if aliasing
 binds are required, they must be explicitly asked for. So far, we have
 no users of this outside of execbuf - but Mika has already submitted a
 patch requiring just this.
 
 A nice benefit of this is we should no longer be able to clobber GTT
 only objects from the aliasing PPGTT.
 
 TEST=gem_storedw_batches_loop

Hm, how does this testcase test this bug? I'm surprised ...

Also the usual format is

Testcase: igt/gem_stored_batches_loop

i.e. the testcase name as used by piglit.

Anyway I want these last two patches from this series since they fix up a
regression introduced by full ppgtt. Which is blocking the cmd parser work
(among other stuff). But I'm not sold on the earlier parts, so can you
please rebase just these two?

Also we need to triple-check that we have testcases for all the rebind and
set_cache_level corner-case fun in igt.
-Daniel

 Signed-off-by: Ben Widawsky b...@bwidawsk.net
 ---
  drivers/gpu/drm/i915/i915_drv.h| 2 +-
  drivers/gpu/drm/i915/i915_gem.c| 6 --
  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 +-
  drivers/gpu/drm/i915/i915_gem_gtt.c| 3 +++
  4 files changed, 9 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
 index 413b114..ba897f0 100644
 --- a/drivers/gpu/drm/i915/i915_drv.h
 +++ b/drivers/gpu/drm/i915/i915_drv.h
 @@ -2316,7 +2316,7 @@ i915_gem_obj_ggtt_pin(struct drm_i915_gem_object *obj,
 uint32_t alignment,
 unsigned flags)
  {
 - return i915_gem_object_pin(obj, obj_to_ggtt(obj), alignment, flags | 
 PIN_GLOBAL_ALIASED);
 + return i915_gem_object_pin(obj, obj_to_ggtt(obj), alignment, flags | 
 PIN_GLOBAL);
  }
  
  static inline int
 diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
 index 86cec5c..b2056f3 100644
 --- a/drivers/gpu/drm/i915/i915_gem.c
 +++ b/drivers/gpu/drm/i915/i915_gem.c
 @@ -3346,8 +3346,10 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object 
 *obj,
  
   WARN_ON(flags  PIN_MAPPABLE  !obj-map_and_fenceable);
  
 - if (flags  PIN_GLOBAL_ALIASED)
 - vma_bind_flags = GLOBAL_BIND | ALIASING_BIND;
 + if (flags  PIN_ALIASING)
 + vma_bind_flags = ALIASING_BIND;
 + if (flags  PIN_GLOBAL)
 + vma_bind_flags = GLOBAL_BIND;
  
   i915_gem_vma_bind(vma, obj-cache_level, vma_bind_flags);
  
 diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
 b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
 index bcb3ae8..60049b4 100644
 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
 +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
 @@ -549,7 +549,7 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
   unsigned flags;
   int ret;
  
 - flags = 0;
 + flags = PIN_ALIASING;
  
   need_fence =
   has_fenced_gpu_access 
 diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
 b/drivers/gpu/drm/i915/i915_gem_gtt.c
 index 3f2f84e..846b6ee 100644
 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
 +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
 @@ -1569,6 +1569,9 @@ static void ggtt_bind_vma(struct i915_vma *vma,
   }
   }
  
 + if (!(flags  ALIASING_BIND))
 + return;
 +
   if (dev_priv-mm.aliasing_ppgtt 
   (!obj-has_aliasing_ppgtt_mapping ||
(cache_level != obj-cache_level))) {
 -- 
 1.9.2
 
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [RFC] drm/i915 : Reduce the shmem page allocation time by using blitter engines for clearing pages.

2014-05-07 Thread Gupta, Sourab

On Tue, 2014-05-06 at 13:12 +, Chris Wilson wrote:
 On Tue, May 06, 2014 at 12:59:37PM +, Gupta, Sourab wrote:
  On Tue, 2014-05-06 at 11:34 +, Chris Wilson wrote:
   On Tue, May 06, 2014 at 04:40:58PM +0530, sourab.gu...@intel.com wrote:
From: Sourab Gupta sourab.gu...@intel.com

This patch is in continuation of and is dependent on earlier patch
series to 'reduce the time for which device mutex is kept locked'.
(http://lists.freedesktop.org/archives/intel-gfx/2014-May/044596.html)

This patch aims to reduce the allocation time of pages from shmem
by using blitter engines for clearing the pages freshly alloced.
This is utilized in case of fresh pages allocated in shmem_preallocate
routines in execbuffer path and page_fault path only.

Even though the CPU memset routine is optimized, but still sometimes
the time consumed in clearing the pages of a large buffer comes in
the order of milliseconds.
We intend to make this operation asynchronous by using blitter engine,
so irrespective of the size of buffer to be cleared, the execbuffer
ioctl processing time will not be affected. Use of blitter engine will
make the overall execution time of 'execbuffer' ioctl lesser for
large buffers.

There may be power implications here on using blitter engines, and
we have to evaluate this. As a next step, we can selectively enable
this HW based memset only for large buffers, where the overhead of
adding commands in a blitter ring(which will otherwise be idle),
cross ring synchronization, will be negligible compared to using the
CPU to clear out the buffer.
   
   You leave a lot of holes by which you leak the uncleared pages to
   userspace.
   -Chris
   
  Hi Chris,
  
  Are you ok with the overall design as such, and the
  shmem_read_mapping_page_gfp_noclear interface?
  Is the leak of uncleared pages happening due to implementation issues?
  If so, we'll try to mitigate them.
 
 Actually, along similar lines there is an even more fundamental issue.
 You should only clear the objects if the pages have not been
 prepopulated.
 -Chris
 
Hi Chris,
This patch is in continuation of the shmem preallocate patch sent by
Akash earlier.
(http://lists.freedesktop.org/archives/intel-gfx/2014-May/044597.html)
We employ this method only in case of the preallocate routine, which
will be called in the first page fault of the object resulting in fresh
allocation of pages.
This is controlled by means of a flag 'require_clear' which is set in
preallocate routine(which will be come into picture only in case of
fresh allocation). If pages are already populated for the object, this
won't come into picture.
Also, we'll try to fix the leak of uncleared pages due to any
implementation issues.

Regards,
Sourab

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [RFC] drm/i915 : Reduce the shmem page allocation time by using blitter engines for clearing pages.

2014-05-07 Thread Gupta, Sourab

On Tue, 2014-05-06 at 17:56 +, Eric Anholt wrote:
 sourab.gu...@intel.com writes:
 
  From: Sourab Gupta sourab.gu...@intel.com
 
  This patch is in continuation of and is dependent on earlier patch
  series to 'reduce the time for which device mutex is kept locked'.
  (http://lists.freedesktop.org/archives/intel-gfx/2014-May/044596.html)
 
 One of userspace's assumptions is that when you allocate a new BO, you
 can map it and start writing data into it without needing to wait on the
 GPU.  I expect this patch to mostly hurt performance on apps (and I note
 that the patch doesn't come with any actual performance data) that get
 more stalls as a result.
 
Hi Eric,
Yes, it may hurt the performance on apps, in case of small buffers and 
if blitter engine is busy as there is a synchronous wait for rendering 
in the gem_fault handler. If that is the case, we can drop this from the 
gem_fault routine and employ it only in the do_execbuffer routine. Its 
useful there because there is no synchronous wait required in sw, due 
to cross ring synchronization.
We'll gather the numbers to quantify the performance benefit we have
while using blitter engines in this way for different buffer sizes.

 More importantly, though, it breaks existing userspace that relies on
 buffers being idle on allocation, for the unsychronized maps used in
 intel_bufferobj_subdata() and
 intel_bufferobj_map_range(GL_INVALIDATE_BUFFER_BIT |
 GL_UNSYNCHRONIZED_BIT)

Sorry, I miss your point here. It may not break this assumption due to
the fact that we employ this method only in case of the preallocate
routine, which will be called in the first page fault of the object
(gem_fault handler) resulting in fresh allocation of pages. 


So, in case of unsynchronized maps, there may be a wait involved in the
first page fault. Also, that wait time may be lesser than the time
required for CPU memset (resulting in no performance hit).
There won't be any subsequent waits afterwards for that buffer object.

Though, we'll have performance hit in the case when blitter engine is
already busy and may not be available to immediately start the memset of
freshly allocated mmaped buffers.

Am I missing something here? Does the userspace requirement for
unsynchronized mapped objects involve complete idleness of object on gpu
even when object page faults for the first time?

Regards,
Sourab

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [RFC] libdrm_intel: Add support for userptr objects

2014-05-07 Thread Tvrtko Ursulin



On 05/02/2014 06:15 PM, Ben Widawsky wrote:

On Fri, May 02, 2014 at 11:27:45AM +0100, Tvrtko Ursulin wrote:

Some people expressed interest in tiling so I thought to preserve it in the
API.

Kernel even allows it, and it is just because Chris found bspec references
saying it will break badly on some platforms, that I (or maybe it was both
of us) decided to disable it on this level for time being.

So I think it is just a matter of coming up with a blacklist and it would be
possible to use it then.



Well again, maybe this is specific to my usecase, but I can never
imagine a use for such fields at this stage in the buffer's lifecycle.
Could you get some people to describe how they want to use it?


Actually thinking about it more, when I collated requirements from 
various groups they were not all that interested. But Chris was actually 
in favour of keeping it in the kernel rather than disabling it 
altogether. So I decided to keep it in the userspace API and only reject 
the attempts to use it for time being.



I think a param for USERPTR is warranted. Or did we decide not to do
this already?


I asked for it, but people with authority said no. It is all very weak,
fragile and dangerous anyway - param or feature detect like the above.

We would really need a proper feature detect mechanism, like text based in
sysfs or something.



I don't see why a param is fragile. Feature detect OTOH is almost always
implemented in a fragile manner.


Not if we had a text file in sysfs with names rather than numbers 
(getparam) without a central allocation authority.


HAS_PARAM(FEATURE1) actually just tricks you into thinking it's fine, 
while actually you are just asking Do I have feature 48. What is 
feature 48? Who knows... some features never make it to upstream, some 
do and then get their HAS_PARAM number reallocated so it is really weak.


Something like grep userptr /sys/kernel/debug/dri/0/i915_features, or 
stat /sys/kernel/debug/dri/0/i915/features/userptr would be much better.


This way or the other, it seems to be there is not consensus with 
upstream gate keepers whether to have it or not. It was Chris actually 
who ripped it out. Personally I can see both arguments which is why I 
think we should come up with something better.



Probably don't need the special function pointer yet since I don't think
we can yet envision use cases which will require any kind of special
handling. A simple has_userptr in bufmgr_gem will probably suffice. But
I don't care too much either way.


What do you mean?


Don't add bo_alloc_userptr to the bufmgr. Just add the prototype to
intel_bufmgr.h.


Not sure, wouldn't like to make inconsistent.


I'm pretty close to running this with most of the changes I had asked
you for. I need to see how much of your igt test I can reuse now.


Cool, so there is nothing for me to do. :)

Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [RFC] drm/i915: Scratch page optimization for blanking buffer

2014-05-07 Thread Daniel Vetter

On Wed, May 7, 2014 at 9:39 AM, Daniel Vetter dan...@ffwll.ch wrote:
 Yes no real backing storage is needed, but still the object should be
 allowed to get mapped into the GGTT, using the scratch page (already
 zeroed out).

 Is there a patch for a new drm/i915 ioctl similar to fallocate, which
 could be used here?

 Geez, you guys shouldn't just dump patches onto intel-gfx but read a bit
 what other people are doing. Especially when they work for the same team
 ;-)

 http://comments.gmane.org/gmane.comp.freedesktop.xorg.drivers.intel/36351


Oops, I've mixed up names and you're not actually on the same team.
Sorry about that.

Still when you do upstream development please try to follow a bit
what's happening in general so that we can maximally exploit
opportunities for collaboration. That's why we have such a big
emphasis on open discussion fora like mailing lists or irc channel. So
I very much want people to jump into discussions of patches from other
groups (or even outside from Intel) if they see some overlap (or other
possible benefits like sharing learned lessons) with their own work.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/2] rendercopy/bdw: Enable hw-generated binding tables

2014-05-07 Thread Ville Syrjälä

I quickly cobbled together a hsw version of this and gave it a whirl on
one machine. Seems to work just fine here, and no lockups when switching
between hw and sw binding tables. Did you get the lockups on hsw even
with rendercopy?

Here's my hsw version:

From 17eeb8021815e2c18d6ba9b2185a37904296c2d9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= ville.syrj...@linux.intel.com
Date: Wed, 7 May 2014 12:33:01 +0300
Subject: [PATCH] rendercopy: use resource streamer on hsw
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com
---
 lib/gen7_render.h |  16 +++-
 lib/rendercopy_gen7.c | 103 +-
 2 files changed, 115 insertions(+), 4 deletions(-)

diff --git a/lib/gen7_render.h b/lib/gen7_render.h
index 1661d4c..58a88ef 100644
--- a/lib/gen7_render.h
+++ b/lib/gen7_render.h
@@ -155,8 +155,11 @@
 #define GEN7_PIPE_CONTROL_IS_FLUSH  (1  11)
 #define GEN7_PIPE_CONTROL_TC_FLUSH  (1  10)
 #define GEN7_PIPE_CONTROL_NOTIFY_ENABLE (1  8)
-#define GEN7_PIPE_CONTROL_GLOBAL_GTT(1  2)
-#define GEN7_PIPE_CONTROL_LOCAL_PGTT(0  2)
+#define GEN7_PIPE_CONTROL_FLUSH(1  7)
+#define GEN7_PIPE_CONTROL_DC_FLUSH  (1  5)
+#define GEN7_PIPE_CONTROL_VF_INVALIDATE (1  4)
+#define GEN7_PIPE_CONTROL_CC_INVALIDATE (1  2)
+#define GEN7_PIPE_CONTROL_SC_INVALIDATE (1  2)
 #define GEN7_PIPE_CONTROL_STALL_AT_SCOREBOARD   (1  1)
 #define GEN7_PIPE_CONTROL_DEPTH_CACHE_FLUSH(1  0)
 
@@ -1361,4 +1364,13 @@ typedef enum {
EXTEND_COUNT
 } sampler_extend_t;
 
+/* HSW+ resource streamer */
+#define HSW_3DSTATE_BINDING_TABLE_POOL_ALLOC   GEN7_3D(3, 1, 0x19)
+# define BINDING_TABLE_POOL_ENABLE (1  11)
+#define HSW_3DSTATE_BINDING_TABLE_EDIT_VS  GEN7_3D(3, 0, 0x43)
+#define HSW_3DSTATE_BINDING_TABLE_EDIT_GS  GEN7_3D(3, 0, 0x44)
+#define HSW_3DSTATE_BINDING_TABLE_EDIT_HS  GEN7_3D(3, 0, 0x45)
+#define HSW_3DSTATE_BINDING_TABLE_EDIT_DS  GEN7_3D(3, 0, 0x46)
+#define HSW_3DSTATE_BINDING_TABLE_EDIT_PS  GEN7_3D(3, 0, 0x47)
+
 #endif
diff --git a/lib/rendercopy_gen7.c b/lib/rendercopy_gen7.c
index 5131d8f..4efccb9 100644
--- a/lib/rendercopy_gen7.c
+++ b/lib/rendercopy_gen7.c
@@ -21,6 +21,9 @@
 #include gen7_render.h
 #include intel_reg.h
 
+#ifndef I915_EXEC_RESOURCE_STREAMER
+#define I915_EXEC_RESOURCE_STREAMER (113)
+#endif
 
 static const uint32_t ps_kernel[][4] = {
{ 0x0080005a, 0x2e2077bd, 0x00c0, 0x008d0040 },
@@ -73,11 +76,14 @@ gen7_render_flush(struct intel_batchbuffer *batch,
  drm_intel_context *context, uint32_t batch_end)
 {
int ret;
+   uint32_t flags = I915_EXEC_RENDER;
 
ret = drm_intel_bo_subdata(batch-bo, 0, 4096, batch-buffer);
+   if (batch-use_resource_streamer)
+   flags |= I915_EXEC_RESOURCE_STREAMER;
if (ret == 0)
ret = drm_intel_gem_bo_context_exec(batch-bo, context,
-   batch_end, 0);
+   batch_end, flags);
assert(ret == 0);
 }
 
@@ -219,6 +225,75 @@ static void gen7_emit_vertex_buffer(struct 
intel_batchbuffer *batch,
OUT_BATCH(0);
 }
 
+static void
+gen7_hw_binding_table(struct intel_batchbuffer *batch, bool enable)
+{
+   if (!enable) {
+   OUT_BATCH(MI_RS_CONTROL | 0x0);
+
+   OUT_BATCH(HSW_3DSTATE_BINDING_TABLE_POOL_ALLOC | (3 - 2));
+   /* binding table pool base address */
+   OUT_BATCH(3  5);
+   /* Upper bound */
+   OUT_BATCH(0);
+
+   OUT_BATCH(GEN7_PIPE_CONTROL | (4 - 2));
+   OUT_BATCH(GEN7_PIPE_CONTROL_CS_STALL | 
GEN7_PIPE_CONTROL_STALL_AT_SCOREBOARD);
+   OUT_BATCH(0);
+   OUT_BATCH(0);
+
+   OUT_BATCH(GEN7_PIPE_CONTROL | (4 - 2));
+   OUT_BATCH(GEN7_PIPE_CONTROL_SC_INVALIDATE);
+   OUT_BATCH(0);
+   OUT_BATCH(0);
+
+   return;
+}
+   OUT_BATCH(HSW_3DSTATE_BINDING_TABLE_POOL_ALLOC | (3 - 2));
+
+   /* binding table pool base address */
+   OUT_RELOC(batch-hw_bt_pool_bo, I915_GEM_DOMAIN_SAMPLER, 0,
+  BINDING_TABLE_POOL_ENABLE | (3  5));
+
+   /* Upper bound */
+   OUT_RELOC(batch-hw_bt_pool_bo, I915_GEM_DOMAIN_SAMPLER, 0,
+  batch-hw_bt_pool_bo-size);
+
+   OUT_BATCH(GEN7_PIPE_CONTROL | (4 - 2));
+   OUT_BATCH(GEN7_PIPE_CONTROL_CS_STALL | 
GEN7_PIPE_CONTROL_STALL_AT_SCOREBOARD);
+   OUT_BATCH(0);
+   OUT_BATCH(0);
+
+   OUT_BATCH(GEN7_PIPE_CONTROL | (4 - 2));
+   OUT_BATCH(GEN7_PIPE_CONTROL_SC_INVALIDATE);
+   OUT_BATCH(0);
+   OUT_BATCH(0);
+}
+
+static uint32_t
+gen7_rs_bind_surfaces(struct intel_batchbuffer *batch,
+ struct igt_buf *src,
+ struct igt_buf *dst,
+

Re: [Intel-gfx] [PATCH 1/2] rendercopy/bdw: Enable hw-generated binding tables

2014-05-07 Thread Chris Wilson

On Wed, May 07, 2014 at 02:49:31PM +0300, Ville Syrjälä wrote:
 I quickly cobbled together a hsw version of this and gave it a whirl on
 one machine. Seems to work just fine here, and no lockups when switching
 between hw and sw binding tables. Did you get the lockups on hsw even
 with rendercopy?

This intrigues me that we should do a context continuation rendercpy
(with the state setup split across multiple batches) and lots of context
switching.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 00/66] runtime pm for DPMS

2014-05-07 Thread Imre Deak

On Fri, 2014-04-25 at 10:45 +0200, Daniel Vetter wrote:
 Ok, review assignements. Please complain if you don't have the
 bandwidth so that I can find someone else.
 
 On Thu, Apr 24, 2014 at 11:54 PM, Daniel Vetter daniel.vet...@ffwll.ch 
 wrote:
 
  Daniel Vetter (66):
drm/i915: Make encoder-mode_set callbacks optional
drm/i915/dvo: Remove -mode_set callback
drm/i915/tv: extract set_tv_mode_timings
drm/i915/tv: extract set_color_conversion
drm/i915/tv: De-magic device check
drm/i915/tv: Rip out pipe-disabling nonsense from -mode_set
drm/i915/tv: Remove -mode_set callback
drm/i915/crt: Remove -mode_set callback
drm/i915/sdvo: Remove -mode_set callback
 
  Removal of encoder-mode_set callbacks, part 1
 
 Reviewer: Imre
 
drm/i915/hdmi: Enable hdmi mode on g4x, too
drm/i915: Track hdmi mode in the pipe config
drm/i915/sdvo: Use pipe_config-limited_color_range consistently
drm/i915: state readout and cross checking for limited_color_range
drm/i915/sdvo: use config-has_hdmi_sink
drm/i915: Simplify audio handling on DDI ports
drm/i915: Track has_audio in the pipe config
drm/i915/dp: Move port A pll setup to g4x_pre_enable_dp
drm/i915/dp: Remove -mode_set callback
drm/i915/hdmi: Remove redundant IS_VLV checks
drm/i915/hdmi: Remove -mode_set callback
 
  Removal of the encoder-mode_set callbacks for hdmi/sdvo/dp with small
  interludes to move a bit of the hdmi/audio state into the pipe config.
 
 Reviewer: Naresh Kumar
 
 
drm/i915/lvds: Remove -mode_set callback
drm/i915/ddi: Remove -mode_set callback
drm/i915/dsi: Remove -mode_set callback
drm/i915: Stop calling encoder-mode_set
 
  Final removals of encoder-mode_set callbacks
 
 Reviewer: Imre

On patches 21-24:
Reviewed-by: Imre Deak imre.d...@intel.com

 
drm/i915: Make -update_primary_plane infallible
drm/i915: More cargo-culted locking for intel_update_fbc
drm/i915: Sprinkle intel_edp_psr_update over crtc_enable/disable
drm/i915: Inline set_base into crtc_mode_set
drm/i915: Move fb pinning into __intel_set_mode
 
  Some shuffling to get the primary-fb handling out of crtc mode_set 
  callbacks
 
 Reviewer: Akash Goel
 
drm/i915: Shovel hw setup code out of i9xx_crtc_mode_set
drm/i915: Move lowfreq_avail around a bit in ilk/hsw_crtc_mode_set
drm/i915: Shovel hw setup code out of ilk_crtc_mode_set
drm/i915: Shovel hw setup code out of hsw_crtc_mode_set
drm/i915: Extract i9xx_set_pll_dividers
drm/i915: Extract vlv_prepare_pll
 
  gmch pll moved out of crtc mode_set callbacks into -enable hooks
 
 Reviewer: Shobhit Kumar
 
drm/i915: Only update shared dpll state when needed
drm/i915: Extract intel_prepare_shared_dpll
drm/i915: s/ironlake_/intel_ for the enable_share_dpll function
 
  Prep polish on the existing shared_dpll code
 
 Reviewer: Damien (same comment as below)
 
drm/i915: Check hw state in assert_can_disable_lcpll
drm/i915: Remove spll_refcount for hsw
drm/i915: Clean up WRPLL/SPLL #defines
drm/i915: Make intel_wait_for_pipe_off static
drm/i915: Disable pipe before ports on ilk
drm/i915: Pass port explicitly to intel_ddi_get_hw_state
drm/i915: Unexport intel_ddi_connector_get_hw_state
drm/i915: Move hsw_fdi_link_train into intel_crt.c
drm/i915: Move pch fifo underrun report enabling to hsw_crt_pre_enable
drm/i915: Move the SPLL enabling into hsw_crt_pre_enable
drm/i915: Move lpt_pch_enable int hsw_crt_enable
drm/i915: Move the pch fifo underrun handling into hsw_crt_disable
drm/i915: Move lpt_disable_pch_transcoder into the hsw crt encoder
drm/i915: Move pch fifo underrun report re-enabling into
  hsw_crt_post_disable
drm/i915: Move the hsw fdi disabling into hsw_crt_post_disable
drm/i915: Move SPLL disabling into hsw_crt_post_disable
 
  Create a new hsw-specific crt encoder which subsumes the entire fdi/pch 
  handling
  on haswell. This has the nice upshot to make SPLL logically a port-private 
  clock
  and so removes it from further considerations.
 
 Reviewer: Paulo
 
drm/i915: Add a debugfs file for the shared dpll state
drm/i915: Move ddi_pll_sel into the pipe config
drm/i915: State readout and cross-checking for ddi_pll_sel
drm/i915: Precompute static ddi_pll_sel values in encoders
drm/i915: Basic shared dpll support for WRPLLs
drm/i915: Document that the pll-mode_set hook is optional
drm/i915: State readout support for WRPLLs
drm/i915: -disable hook for WRPLLs
drm/i915: -enable hook for WRPLLs
drm/i915: Switch to common shared dpll framework for WRPLLs
drm/i915: Only touch WRPLL hw state in enable/disable hooks
 
  Convert wrpll handling to the common shared_dpll framework. We need this 
  since
  runtime pm for dpms requires us to separately track pll refernces from 
  crtcs and
  active usage by crtcs
 
 Reviewer: Damien (maybe find someone from the vpg guys who do the pll

[Intel-gfx] [PATCH] tests/gem_flink_race, prime_self_import: fix object counts

2014-05-07 Thread Mika Kuoppala

We need to add one drm_open_any() before getting the object counts
as first call to drm_open_any() allocates file descriptors for
exit handlers and thus is not symmetrical

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77867
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77875
Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com
---
 tests/gem_flink_race.c|8 
 tests/prime_self_import.c |8 
 2 files changed, 16 insertions(+)

diff --git a/tests/gem_flink_race.c b/tests/gem_flink_race.c
index 3353b83..f129cb3 100644
--- a/tests/gem_flink_race.c
+++ b/tests/gem_flink_race.c
@@ -163,6 +163,11 @@ static void test_flink_close(void)
int r, i, num_threads;
int obj_count;
void *status;
+   int fake;
+
+   /* Allocate exit handler fds in here so that we dont screw
+* up the counts */
+   fake = drm_open_any();
 
obj_count = get_object_count();
 
@@ -193,6 +198,9 @@ static void test_flink_close(void)
obj_count = get_object_count() - obj_count;
 
printf(leaked %i objects\n, obj_count);
+
+   close(fake);
+
igt_assert(obj_count == 0);
 }
 
diff --git a/tests/prime_self_import.c b/tests/prime_self_import.c
index efdd24f..41c203f 100644
--- a/tests/prime_self_import.c
+++ b/tests/prime_self_import.c
@@ -258,6 +258,11 @@ static void test_reimport_close_race(void)
int obj_count;
void *status;
uint32_t handle;
+   int fake;
+
+   /* Allocate exit handler fds in here so that we dont screw
+* up the counts */
+   fake = drm_open_any();
 
obj_count = get_object_count();
 
@@ -294,6 +299,9 @@ static void test_reimport_close_race(void)
obj_count = get_object_count() - obj_count;
 
printf(leaked %i objects\n, obj_count);
+
+   close(fake);
+
igt_assert(obj_count == 0);
 }
 
-- 
1.7.9.5

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 01/10] drm/i915/bdw: Implement a basic PM interrupt handler

2014-05-07 Thread Ville Syrjälä

On Mon, May 05, 2014 at 06:17:30PM +0530, deepa...@linux.intel.com wrote:
 From: Ben Widawsky benjamin.widaw...@intel.com
 
 Almost all of it is reusable from the existing code. The primary
 difference is we need to do even less in the interrupt handler, since
 interrupts are not shared in the same way.
 
 The patch is mostly a copy-paste of the existing snb+ code, with updates
 to the relevant parts requiring changes to the interrupt handling. As
 such it /should/ be relatively trivial. It's highly likely that I missed
 some places where I need a gen8 version of the PM interrupts, but it has
 become invisible to me by now.
 
 This patch could probably be split into adding the new functions,
 followed by actually handling the interrupts. Since the code is
 currently disabled (and broken) I think the patch stands better by
 itself.
 
 v2: Move the commit about not touching the ringbuffer interrupt to the
 snb_* function where it belongs (Rodrigo)
 
 v3: Rebased on Paulo's runtime PM changes
 
 v4: Not well validated, but rebase on
 commit 730488b2eddded4497f63f70867b1256cd9e117c
 Author: Paulo Zanoni paulo.r.zan...@intel.com
 Date:   Fri Mar 7 20:12:32 2014 -0300
 
 drm/i915: kill dev_priv-pm.regsave
 
 v5: Rebased on latest code base. (Deepak)
 
 v6: Remove conflict markers, Unnecessary empty line and use right
 IIR interrupt (Ville)
 
 Signed-off-by: Ben Widawsky b...@bwidawsk.net
 Signed-off-by: Deepak S deepa...@linux.intel.com
 ---
  drivers/gpu/drm/i915/i915_irq.c  | 75 
 ++--
  drivers/gpu/drm/i915/i915_reg.h  |  1 +
  drivers/gpu/drm/i915/intel_drv.h |  2 ++
  drivers/gpu/drm/i915/intel_pm.c  | 38 ++--
  4 files changed, 112 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
 index 2d76183..6af51ad 100644
 --- a/drivers/gpu/drm/i915/i915_irq.c
 +++ b/drivers/gpu/drm/i915/i915_irq.c
 @@ -248,6 +248,49 @@ static bool ivb_can_enable_err_int(struct drm_device 
 *dev)
   return true;
  }
  
 +/**
 +  * bdw_update_pm_irq - update GT interrupt 2
 +  * @dev_priv: driver private
 +  * @interrupt_mask: mask of interrupt bits to update
 +  * @enabled_irq_mask: mask of interrupt bits to enable
 +  *
 +  * Copied from the snb function, updated with relevant register offsets
 +  */
 +static void bdw_update_pm_irq(struct drm_i915_private *dev_priv,
 +   uint32_t interrupt_mask,
 +   uint32_t enabled_irq_mask)
 +{
 + uint32_t new_val;
 +
 + assert_spin_locked(dev_priv-irq_lock);
 +
 + if (dev_priv-pm.irqs_disabled) {
 + WARN(1, IRQs disabled\n);
 + return;
 + }

Could be just 'if (WARN...' just like in the snb function.

 +
 + new_val = dev_priv-pm_irq_mask;
 + new_val = ~interrupt_mask;
 + new_val |= (~enabled_irq_mask  interrupt_mask);
 +
 + if (new_val != dev_priv-pm_irq_mask) {
 + dev_priv-pm_irq_mask = new_val;
 + I915_WRITE(GEN8_GT_IMR(2), I915_READ(GEN8_GT_IMR(2)) |
 +dev_priv-pm_irq_mask);

I just realized that this doesn't actually clear the old bits. I think
we should just kill the RMW here and mandate that all users of GT_IMR(2)
need to use this function.

Also you're forgetting to initialize pm_irq_mask. I guess it should be
initialized in gen8_gt_irq_postinstall() to match the gen6 code.

 + POSTING_READ(GEN8_GT_IMR(2));
 + }
 +}
 +
 +void bdw_enable_pm_irq(struct drm_i915_private *dev_priv, uint32_t mask)
 +{
 + bdw_update_pm_irq(dev_priv, mask, mask);
 +}
 +
 +void bdw_disable_pm_irq(struct drm_i915_private *dev_priv, uint32_t mask)
 +{
 + bdw_update_pm_irq(dev_priv, mask, 0);
 +}
 +
  static bool cpt_can_enable_serr_int(struct drm_device *dev)
  {
   struct drm_i915_private *dev_priv = dev-dev_private;
 @@ -1098,8 +1141,12 @@ static void gen6_pm_rps_work(struct work_struct *work)
   spin_lock_irq(dev_priv-irq_lock);
   pm_iir = dev_priv-rps.pm_iir;
   dev_priv-rps.pm_iir = 0;
 - /* Make sure not to corrupt PMIMR state used by ringbuffer code */
 - snb_enable_pm_irq(dev_priv, dev_priv-pm_rps_events);
 + if (IS_BROADWELL(dev_priv-dev))
 + bdw_enable_pm_irq(dev_priv, dev_priv-pm_rps_events);
 + else {
 + /* Make sure not to corrupt PMIMR state used by ringbuffer */
 + snb_enable_pm_irq(dev_priv, dev_priv-pm_rps_events);
 + }
   spin_unlock_irq(dev_priv-irq_lock);
  
   /* Make sure we didn't queue anything we're not going to process. */
 @@ -1296,6 +1343,19 @@ static void snb_gt_irq_handler(struct drm_device *dev,
   ivybridge_parity_error_irq_handler(dev, gt_iir);
  }
  
 +static void gen8_rps_irq_handler(struct drm_i915_private *dev_priv, u32 
 pm_iir)
 +{
 + if ((pm_iir  dev_priv-pm_rps_events) == 0)
 + return;
 +
 + spin_lock(dev_priv-irq_lock);
 + dev_priv-rps.pm_iir |=

Re: [Intel-gfx] [PATCH 02/10] drm/i915: Enable PM Interrupts target via Display Interface.

2014-05-07 Thread Ville Syrjälä

On Mon, May 05, 2014 at 06:17:31PM +0530, deepa...@linux.intel.com wrote:
 From: Deepak S deepa...@linux.intel.com
 
 In BDW, Apart from unmasking up/down threshold interrupts. we need
 to umask bit 32 of PM_INTRMASK to route interrupts to target via Display
 Interface.
 
 v2: Add (131) mask (Ville)
 
 Signed-off-by: Deepak S deepa...@linux.intel.com
 ---
  drivers/gpu/drm/i915/i915_reg.h | 1 +
  drivers/gpu/drm/i915/intel_pm.c | 2 ++
  2 files changed, 3 insertions(+)
 
 diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
 index ca4f8b9..c850254 100644
 --- a/drivers/gpu/drm/i915/i915_reg.h
 +++ b/drivers/gpu/drm/i915/i915_reg.h
 @@ -5112,6 +5112,7 @@ enum punit_power_well {
  #define GEN6_RC6p_THRESHOLD  0xA0BC
  #define GEN6_RC6pp_THRESHOLD 0xA0C0
  #define GEN6_PMINTRMSK   0xA168
 +#define GEN8_PMINTR_REDIRECT_TO_NON_DISP (131)
  
  #define GEN6_PMISR   0x44020
  #define GEN6_PMIMR   0x44024 /* rps_lock */
 diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
 index 0e69c97..ebb5c88 100644
 --- a/drivers/gpu/drm/i915/intel_pm.c
 +++ b/drivers/gpu/drm/i915/intel_pm.c
 @@ -3114,6 +3114,8 @@ static u32 gen6_rps_pm_mask(struct drm_i915_private 
 *dev_priv, u8 val)
   if (INTEL_INFO(dev_priv-dev)-gen = 7  !IS_HASWELL(dev_priv-dev))
   mask |= GEN6_PM_RP_UP_EI_EXPIRED;
  
 + mask |= GEN8_PMINTR_REDIRECT_TO_NON_DISP;
 +

Didn't Ben want a gen check here?

   return ~mask;
  }
  
 -- 
 1.9.1
 
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] tests/gem_flink_race, prime_self_import: fix object counts

2014-05-07 Thread Daniel Vetter

On Wed, May 07, 2014 at 04:55:28PM +0300, Mika Kuoppala wrote:
 We need to add one drm_open_any() before getting the object counts
 as first call to drm_open_any() allocates file descriptors for
 exit handlers and thus is not symmetrical
 
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77867
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77875
 Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com

Ah, makes sense. Reviewed-by: Daniel Vetter daniel.vet...@ffwll.ch

When you push this please add a citation of the commit which introduced
this regression like we do it for kernel patches.

Thanks, Daniel

 ---
  tests/gem_flink_race.c|8 
  tests/prime_self_import.c |8 
  2 files changed, 16 insertions(+)
 
 diff --git a/tests/gem_flink_race.c b/tests/gem_flink_race.c
 index 3353b83..f129cb3 100644
 --- a/tests/gem_flink_race.c
 +++ b/tests/gem_flink_race.c
 @@ -163,6 +163,11 @@ static void test_flink_close(void)
   int r, i, num_threads;
   int obj_count;
   void *status;
 + int fake;
 +
 + /* Allocate exit handler fds in here so that we dont screw
 +  * up the counts */
 + fake = drm_open_any();
  
   obj_count = get_object_count();
  
 @@ -193,6 +198,9 @@ static void test_flink_close(void)
   obj_count = get_object_count() - obj_count;
  
   printf(leaked %i objects\n, obj_count);
 +
 + close(fake);
 +
   igt_assert(obj_count == 0);
  }
  
 diff --git a/tests/prime_self_import.c b/tests/prime_self_import.c
 index efdd24f..41c203f 100644
 --- a/tests/prime_self_import.c
 +++ b/tests/prime_self_import.c
 @@ -258,6 +258,11 @@ static void test_reimport_close_race(void)
   int obj_count;
   void *status;
   uint32_t handle;
 + int fake;
 +
 + /* Allocate exit handler fds in here so that we dont screw
 +  * up the counts */
 + fake = drm_open_any();
  
   obj_count = get_object_count();
  
 @@ -294,6 +299,9 @@ static void test_reimport_close_race(void)
   obj_count = get_object_count() - obj_count;
  
   printf(leaked %i objects\n, obj_count);
 +
 + close(fake);
 +
   igt_assert(obj_count == 0);
  }
  
 -- 
 1.7.9.5
 
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] Design review request: DRM color manager

2014-05-07 Thread Sharma, Shashank


Hello all,

As per previous discussions, I am sending the drm-color-manager design 
for review, as inline text. Please have a look and let me know your 
feedback.


(I tried hard to align the inline text diagram in mail, hope you can see 
the proper picture too)


=
DRM Color Manager
=

- Color manager provides a common interface for all color correction /
  enhancement properties supported by various hardwares.
- Color manager will be one Umbrella DRM property (blob type)
- Driver can register the color correction properties of its HW during
  the init time, and all the corrections can be applied using the same
  interface.

How does a driver register color properties with DRM color manager:
---

- A DRM driver will call drm_register_color_manager() function with 
following information:

.prop_set_handler
.prop_get_hanlder
.color_prop_identifier structure
{porp_name, prop_id}
{porp_name, prop_id}
{porp_name, prop_id}

For example: I915 driver can register as:
.prop_set_handler = intel_clrmgr_set()
.prop_get_hanlder = intel_clrmgr_get()
.color_prop_identifier structure
{gamma_correction, 0}
{csc_correction, 1}
{hue_saturation, 2}



How will color_manager_set/get() functions work:
-

Color EDID:
  ==
|| property || Enable/  || Pipe/Plane/  || No of||  
||  ID  || Disable  || Identifier   || Data bytes   ||data..
|| 1 Byte	|| 1 Byte	|| 1 Byte	|| 1 Byte	|| 
==


- Color EDID is a protocol to extract the color correction
  characterictics from a blob, coming from DRM layer
  as property_set_blob() or loading a blob for property_get_blob()

- Userspace will load this blob with corresponding values and use the
  drm_set_blob(color-manager) interface.
- DRM layer of get/set_color_manager() will validate the entry, extract
  the following information from blob
- property
- enable/disable
- identifier
- ptr to start of raw data
- With this information, drm_get/set_color_manager() layer will call
  driver's .get/set_handler()
  which will do the correction using those values.
- Based on the driver's requirement, user space can send any type of
  values in byte stream, like
  direct reg values, correction coefficients or any other form of data.

Benefits of this common interface:
--
- Single interface for all color properties. No need to create multiple
  properties.
- HW agonist. Its in hands of driver to register the properties.
- Any no of properties can be registered.
- Can be clubbed with modeset implementations.




Regards
Shashank

On 5/7/2014 7:41 PM, Sharma, Shashank wrote:

FYI

-Original Message-
From: Sharma, Shashank
Sent: Tuesday, April 22, 2014 8:32 PM
To: Daniel Vetter
Cc: David Herrmann; intel-gfx@lists.freedesktop.org; Cn, Ramakrishnan; Jindal, 
Sonika; Shankar, Uma
Subject: RE: [Intel-gfx] Design review request: DRM color manager

David,
My apologies for starting a pre-mature design discussion.

Daniel,
Thanks for pointing out first two things, It was not known to me, I will take 
care of this in future.

First time I presented color-manager design, in internal display design forum, 
where most of the reviewers were not there.
We took the feedback from people who were present, and implemented the design.

When we shared color manager implementation, that design was rejected and one 
of the feedbacks was that it would be better to discuss it on dri-devel where 
people outside Intel can give their opinion, and that’s the only reason why I 
added dri-devel for the new design (Please see the attached mail, I replied to 
all who were in last communication).
Please let me know how do we want to proceed now.


Regards
Shashank
-Original Message-
From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel Vetter
Sent: Tuesday, April 22, 2014 7:18 PM
To: Sharma, Shashank
Cc: David Herrmann; intel-gfx@lists.freedesktop.org; 
dri-de...@lists.freedesktop.org; Thierry Reding; Cn, Ramakrishnan; Alex 
Deucher; Jindal, Sonika; Shankar, Uma
Subject: Re: [Intel-gfx] Design review request: DRM color manager

On Tue, Apr 22, 2014 at 12:07:41PM +, Sharma, Shashank wrote:

Thanks again David,
Comments inline.


Three things:
- Please don't send out .pptx files to upstream/public mailing lists,
   that's just not how the upstream community works.

- Please either fix up ms outlook to do proper in-line quoting or switch
   to a proper mail client for discussions on dri-devel. I'm ok with this
   on intel-gfx to some extend since that's our own turf, but on dri-devel
   the

Re: [Intel-gfx] [PATCH 10/10] drm/i915/chv: Freq(opcode) request for CHV.

2014-05-07 Thread Ville Syrjälä

On Mon, May 05, 2014 at 06:17:39PM +0530, deepa...@linux.intel.com wrote:
 From: Deepak S deepa...@linux.intel.com
 
 On CHV, All the freq request should be even. So, we need to make sure we
 request the opcode accordingly.
 
 Signed-off-by: Deepak S deepa...@linux.intel.com
 Reviewed-by: Ben Widawsky b...@bwidawsk.net
 ---
  drivers/gpu/drm/i915/i915_drv.h | 1 +
  drivers/gpu/drm/i915/i915_irq.c | 9 +++--
  drivers/gpu/drm/i915/i915_reg.h | 3 +++
  3 files changed, 11 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
 index e70a9f0..3966ff2 100644
 --- a/drivers/gpu/drm/i915/i915_drv.h
 +++ b/drivers/gpu/drm/i915/i915_drv.h
 @@ -1318,6 +1318,7 @@ struct drm_i915_private {
   u32 gt_irq_mask;
   u32 pm_irq_mask;
   u32 pm_rps_events;
 + u32 pm_rps_freq_req;
   u32 pipestat_irq_mask[I915_MAX_PIPES];
  
   struct work_struct hotplug_work;
 diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
 index 6af51ad..3e8bcca 100644
 --- a/drivers/gpu/drm/i915/i915_irq.c
 +++ b/drivers/gpu/drm/i915/i915_irq.c
 @@ -1162,7 +1162,7 @@ static void gen6_pm_rps_work(struct work_struct *work)
   if (adj  0)
   adj *= 2;
   else
 - adj = 1;
 + adj = dev_priv-pm_rps_freq_req;
   new_delay = dev_priv-rps.cur_freq + adj;
  
   /*
 @@ -1181,7 +1181,7 @@ static void gen6_pm_rps_work(struct work_struct *work)
   if (adj  0)
   adj *= 2;
   else
 - adj = -1;
 + adj = -1 * dev_priv-pm_rps_freq_req;

I'm not sure pm_rps_freq_req is very descriptive. Maybe it's better to
just open code the thing and add a few comments. Eg.:
 /* CHV requires even values */
 adj = IS_CHERRYVIEW(dev) ? 2 : 1;
 ...
 /* CHV requires even values */
 adj = IS_CHERRYVIEW(dev) ? -2 : -1;

   new_delay = dev_priv-rps.cur_freq + adj;

In any case this doesn't seem to enough to guarantee an even value.
You'd also need to make sure efficient_freq, min_freq_softlimit and
max_freq_softlimit are even.

The alternative is to just make the final value even in 
valleyview_set_rps(), but then you might still trip these
 WARN_ON(val  dev_priv-rps.max_freq_softlimit);
 WARN_ON(val  dev_priv-rps.min_freq_softlimit);
in there. So I guess it's back to option 1 and making sure all the
limits are also even. Also leaving it up to valleyview_set_rps() could
introduce a slight delay before the frequency starts to change since it
could effectively ignore the first interrupt (depending on which way
it's going and which we we would round the final value).

   } else { /* unknown event */
   new_delay = dev_priv-rps.cur_freq;
 @@ -4088,6 +4088,11 @@ void intel_irq_init(struct drm_device *dev)
   /* Let's track the enabled rps events */
   dev_priv-pm_rps_events = GEN6_PM_RPS_EVENTS;
  
 + if (IS_CHERRYVIEW(dev))
 + dev_priv-pm_rps_freq_req = CHV_GPU_FREQ_REQ;
 + else
 + dev_priv-pm_rps_freq_req = GEN6_GPU_FREQ_REQ;
 +
   setup_timer(dev_priv-gpu_error.hangcheck_timer,
   i915_hangcheck_elapsed,
   (unsigned long) dev);
 diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
 index 3ff34c4..4998d6b 100644
 --- a/drivers/gpu/drm/i915/i915_reg.h
 +++ b/drivers/gpu/drm/i915/i915_reg.h
 @@ -529,6 +529,9 @@ enum punit_power_well {
  #define CHV_FB_RPE_FREQ_SHIFT8
  #define CHV_FB_RPE_FREQ_MASK 0xff
  
 +#define CHV_GPU_FREQ_REQ 2
 +#define GEN6_GPU_FREQ_REQ1

I don't think these defines buy us anything. Instead I think just
adding a comment or two to where the even vs. odd magic happens
should be a better option.

 +
  #define IOSF_NC_FB_GFX_FREQ_FUSE 0x1c
  #define   FB_GFX_MAX_FREQ_FUSE_SHIFT 3
  #define   FB_GFX_MAX_FREQ_FUSE_MASK  0x07f8
 -- 
 1.9.1
 
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 4/9] drm/i915: Limit the number of node allocation retries

2014-05-07 Thread Ben Widawsky

On Wed, May 07, 2014 at 09:49:57AM +0200, Daniel Vetter wrote:
 On Tue, May 06, 2014 at 10:21:33PM -0700, Ben Widawsky wrote:
  AFAICT, it's impossible to actually infinitely retry the allocation in
  our current code. However, a small oversight on my part, slight bug, or
  future bug, could easily change that.
  
  To avoid a potential breakage, a simple retry variable of 16 bits will
  help us prevent infinitely running.
  
  Retry is limited to 100 as a mostly random number. Some consideration
  about stack usage was taken into account. As an example, if we allowed
  256 retries on a 32b arch (and my memory serves that all arguments are
  passed on the stack for such architectures), thats 33 bytes * 256
  retries + (fudge for return address and such)... it's way more than we
  want to use already. 64b architectures might be slightly better, since
  6? of the first args will get passed through registers, but it's still
  bad.
  
  If anything, we might want to do way less than 100, like 3.
  
  Signed-off-by: Ben Widawsky b...@bwidawsk.net
 
 So you replace the retry loop with a tailrecursive version in patch 2 and
 then add duct-tape later on here? Nope.
 -Daniel
 

I feel like you're implying the retry loop will end, ever. The retry
timeout should probably come first though, I Agree with that much.

  ---
   drivers/gpu/drm/i915/i915_gem.c | 14 +-
   1 file changed, 9 insertions(+), 5 deletions(-)
  
  diff --git a/drivers/gpu/drm/i915/i915_gem.c 
  b/drivers/gpu/drm/i915/i915_gem.c
  index b6965a2..de98b26 100644
  --- a/drivers/gpu/drm/i915/i915_gem.c
  +++ b/drivers/gpu/drm/i915/i915_gem.c
  @@ -3216,6 +3216,7 @@ static void i915_gem_verify_gtt(struct drm_device 
  *dev)
   #endif
   }
   
  +#define MAX_VMA_FIND_RETRY 100
   static int
   i915_gem_find_vm_space(struct i915_address_space *vm,
 struct drm_mm_node *node,
  @@ -3224,7 +3225,8 @@ i915_gem_find_vm_space(struct i915_address_space *vm,
 unsigned long color,
 unsigned long start,
 unsigned long end,
  -  uint32_t flags)
  +  uint32_t flags,
  +  uint8_t retry)
   {
  int ret;
  ret = drm_mm_insert_node_in_range_generic(vm-mm, node,
  @@ -3232,7 +3234,7 @@ i915_gem_find_vm_space(struct i915_address_space *vm,
start, end,
DRM_MM_SEARCH_DEFAULT,
DRM_MM_CREATE_DEFAULT);
  -   if (ret) {
  +   if (ret  (retry  MAX_VMA_FIND_RETRY)) {
  if (WARN_ON(ret != -ENOSPC))
  return ret;
   
  @@ -3241,7 +3243,8 @@ i915_gem_find_vm_space(struct i915_address_space *vm,
  if (ret == 0)
  return i915_gem_find_vm_space(vm, node,
size, align, color,
  - start, end, flags);
  + start, end, flags,
  + retry++);
  }
   
  return ret;
  @@ -3306,8 +3309,9 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object 
  *obj,
  if (IS_ERR(vma))
  goto err_unpin;
   
  -   ret = i915_gem_find_vm_space(vm, vma-node, size, alignment,
  -obj-cache_level, 0, gtt_max, flags);
  +   ret = i915_gem_find_vm_space(vm, vma-node,
  +size, alignment, obj-cache_level,
  +0, gtt_max, flags, 1);
  if (ret)
  goto err_free_vma;
   
  -- 
  1.9.2
  
  ___
  Intel-gfx mailing list
  Intel-gfx@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/intel-gfx
 
 -- 
 Daniel Vetter
 Software Engineer, Intel Corporation
 +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- 
Ben Widawsky, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH] UXA: Wait until a pageflip actually completes to report it.

2014-05-07 Thread Jamey Sharp

UXA was reporting page-flip completion as soon as the flip was scheduled
with the kernel, instead of waiting for the kernel to indicate that the
flip had actually completed.

Moving the DRI2SwapComplete call to the right place fixes all of our
Piglit tests for OML_sync_control when run on xf86-video-intel/UXA,
aside from a bit of difficult-to-reproduce flakiness when using a
divisor  1.

This also eliminates a compile-time and run-time warning when built
against an xserver with Warn on DRI2SwapComplete with constant UST/MSC
applied.

Signed-off-by: Jamey Sharp ja...@minilop.net
Cc: Theo Hill theo0...@gmail.com
Cc: Eric Anholt e...@anholt.net
---
 src/uxa/intel_dri.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/src/uxa/intel_dri.c b/src/uxa/intel_dri.c
index ca58052..3745767 100644
--- a/src/uxa/intel_dri.c
+++ b/src/uxa/intel_dri.c
@@ -932,10 +932,6 @@ I830DRI2ScheduleFlip(struct intel_screen_private *intel,
 
/* Then flip DRI2 pointers and update the screen pixmap */
I830DRI2ExchangeBuffers(intel, info-front, info-back);
-   DRI2SwapComplete(info-client, draw, 0, 0, 0,
-DRI2_EXCHANGE_COMPLETE,
-info-event_complete,
-info-event_data);
return TRUE;
 }
 
@@ -1090,6 +1086,12 @@ void I830DRI2FlipEventHandler(unsigned int frame, 
unsigned int tv_sec,
assert(intel-pending_flip[flip_info-pipe] == flip_info);
intel-pending_flip[flip_info-pipe] = NULL;
 
+   DRI2SwapComplete(flip_info-client, drawable,
+frame, tv_sec, tv_usec,
+DRI2_EXCHANGE_COMPLETE,
+flip_info-event_complete,
+flip_info-event_data);
+
chain = flip_info-chain;
if (chain) {
DrawablePtr chain_drawable = NULL;
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 2/9] drm/i915: Extract node allocation from bind

2014-05-07 Thread Ben Widawsky

On Wed, May 07, 2014 at 08:02:38AM +0100, Chris Wilson wrote:
 On Tue, May 06, 2014 at 10:21:31PM -0700, Ben Widawsky wrote:
  The DRM node allocation code was already a bit of an ugly bit of code
  within a complex function. Removing it serves the purpose of cleaning
  the function up. More importantly, it provides a way to have a
  preallocated (address space) VMA to easily skip this code. Something
  we're very likely to need.
  
  This is essentially a wrapper for DRM node allocation with an eviction +
  retry. It is something which could be moved to core DRM eventually, if
  other drivers had similar eviction semantics.
  
  This removes a goto used for something other than error unwinding, a
  generally reviled (I am neutral) practice, and replaces it with a
  function call to itself that should have the same effect. Note that it's
  not a recursive function as all the problem set reduction is done in the
  eviction code.
  
  Some might say this change is worse than before because we are using
  stack for each subsequent call. Frankly, I'd rather overflow the stack
  and blow it up than loop forever. In either case, this is addressed in
  the next patch.
  
  I believe, and intend, that other than the stack usage, there is no
  functional change here.
 
 Nope, but decoupling evict_something() further does introduce lots more
 implied magic.
 -Chris
 

Hmm. I really don't see what's actually upsetting. Can you be a bit more
explicit about what's so bothersome to you for my edification?

-- 
Ben Widawsky, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 2/9] drm/i915: Extract node allocation from bind

2014-05-07 Thread Chris Wilson

On Wed, May 07, 2014 at 08:45:38AM -0700, Ben Widawsky wrote:
 On Wed, May 07, 2014 at 08:02:38AM +0100, Chris Wilson wrote:
  On Tue, May 06, 2014 at 10:21:31PM -0700, Ben Widawsky wrote:
   The DRM node allocation code was already a bit of an ugly bit of code
   within a complex function. Removing it serves the purpose of cleaning
   the function up. More importantly, it provides a way to have a
   preallocated (address space) VMA to easily skip this code. Something
   we're very likely to need.
   
   This is essentially a wrapper for DRM node allocation with an eviction +
   retry. It is something which could be moved to core DRM eventually, if
   other drivers had similar eviction semantics.
   
   This removes a goto used for something other than error unwinding, a
   generally reviled (I am neutral) practice, and replaces it with a
   function call to itself that should have the same effect. Note that it's
   not a recursive function as all the problem set reduction is done in the
   eviction code.
   
   Some might say this change is worse than before because we are using
   stack for each subsequent call. Frankly, I'd rather overflow the stack
   and blow it up than loop forever. In either case, this is addressed in
   the next patch.
   
   I believe, and intend, that other than the stack usage, there is no
   functional change here.
  
  Nope, but decoupling evict_something() further does introduce lots more
  implied magic.
  -Chris
  
 
 Hmm. I really don't see what's actually upsetting. Can you be a bit more
 explicit about what's so bothersome to you for my edification?

evict_something() make assumptions about the ranges of the vm which it
searches. At the moment, they mirror its parent's function.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 6/9] drm/i915: Wrap VMA binding

2014-05-07 Thread Ben Widawsky

On Wed, May 07, 2014 at 09:55:49AM +0200, Daniel Vetter wrote:
 On Tue, May 06, 2014 at 10:21:35PM -0700, Ben Widawsky wrote:
  This will be useful for some upcoming patches which do more platform
  specific work. Having it in one central place just makes things a bit
  cleaner and easier.
  
  There is a small functional change here. There are more calls to the
  tracepoints.
  
  NOTE: I didn't actually end up using this patch for the intended purpose, 
  but I
  thought it was a nice patch to keep around.
  
  v2: s/i915_gem_bind_vma/i915_gem_vma_bind/
  s/i915_gem_unbind_vma/i915_gem_vma_unbind/
  (Chris)
  
  v3: Missed one spot
  
  Signed-off-by: Ben Widawsky b...@bwidawsk.net
 
 You change the semantics of the vma bind/unbind tracepoints - previously
 they've meant we reserve/unreserve a ppgtt address for an object, not
 they're called every time we touch the ptes (which is what
 vma_bind|unbind actually do). That doesn't look too terribly useful for
 tracing any more.
 
 What's the use case for these added tracepoints?
 -Daniel
 

You're right. The tracepoint is incredibly useful for debug purposes,
but not as an actual tracepoint. The current tracepoint is badly named,
IMO, but that's fine. I am not sure it's worth fixing though, since
nobody has really spoken up for the patch.

  ---
   drivers/gpu/drm/i915/i915_drv.h|  3 +++
   drivers/gpu/drm/i915/i915_gem.c| 13 ++---
   drivers/gpu/drm/i915/i915_gem_context.c|  2 +-
   drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 +++--
   drivers/gpu/drm/i915/i915_gem_gtt.c| 16 ++--
   5 files changed, 27 insertions(+), 12 deletions(-)
  
  diff --git a/drivers/gpu/drm/i915/i915_drv.h 
  b/drivers/gpu/drm/i915/i915_drv.h
  index c1d2bea..30cde3d 100644
  --- a/drivers/gpu/drm/i915/i915_drv.h
  +++ b/drivers/gpu/drm/i915/i915_drv.h
  @@ -2264,6 +2264,9 @@ bool i915_gem_obj_bound(struct drm_i915_gem_object *o,
  struct i915_address_space *vm);
   unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
  struct i915_address_space *vm);
  +void i915_gem_vma_bind(struct i915_vma *vma, enum i915_cache_level,
  +  unsigned flags);
  +void i915_gem_vma_unbind(struct i915_vma *vma);
   struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
   struct i915_address_space *vm);
   struct i915_vma *
  diff --git a/drivers/gpu/drm/i915/i915_gem.c 
  b/drivers/gpu/drm/i915/i915_gem.c
  index e9599e9..1567911 100644
  --- a/drivers/gpu/drm/i915/i915_gem.c
  +++ b/drivers/gpu/drm/i915/i915_gem.c
  @@ -2771,7 +2771,7 @@ int i915_vma_unbind(struct i915_vma *vma)
   
  trace_i915_vma_unbind(vma);
   
  -   vma-unbind_vma(vma);
  +   i915_gem_vma_unbind(vma);
   
  i915_gem_gtt_finish_object(obj);
   
  @@ -3342,9 +3342,8 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object 
  *obj,
   
  WARN_ON(flags  PIN_MAPPABLE  !obj-map_and_fenceable);
   
  -   trace_i915_vma_bind(vma, flags);
  -   vma-bind_vma(vma, obj-cache_level,
  - flags  (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 0);
  +   i915_gem_vma_bind(vma, obj-cache_level,
  + flags  (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 
  0);
   
  i915_gem_verify_gtt(dev);
  return vma;
  @@ -3548,8 +3547,8 @@ int i915_gem_object_set_cache_level(struct 
  drm_i915_gem_object *obj,
   
  list_for_each_entry(vma, obj-vma_list, vma_link)
  if (drm_mm_node_allocated(vma-node))
  -   vma-bind_vma(vma, cache_level,
  - obj-has_global_gtt_mapping ? 
  GLOBAL_BIND : 0);
  +   i915_gem_vma_bind(vma, cache_level,
  + obj-has_global_gtt_mapping ? 
  GLOBAL_BIND : 0);
  }
   
  list_for_each_entry(vma, obj-vma_list, vma_link)
  @@ -3914,7 +3913,7 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
  }
   
  if (flags  PIN_GLOBAL  !obj-has_global_gtt_mapping)
  -   vma-bind_vma(vma, obj-cache_level, GLOBAL_BIND);
  +   i915_gem_vma_bind(vma, obj-cache_level, GLOBAL_BIND);
   
  vma-pin_count++;
  if (flags  PIN_MAPPABLE)
  diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
  b/drivers/gpu/drm/i915/i915_gem_context.c
  index f77b4c1..00c7d88 100644
  --- a/drivers/gpu/drm/i915/i915_gem_context.c
  +++ b/drivers/gpu/drm/i915/i915_gem_context.c
  @@ -655,7 +655,7 @@ static int do_switch(struct intel_ring_buffer *ring,
  if (!to-obj-has_global_gtt_mapping) {
  struct i915_vma *vma = i915_gem_obj_to_vma(to-obj,
 dev_priv-gtt.base);
  -   vma-bind_vma(vma, to-obj-cache_level, GLOBAL_BIND);
  +   i915_gem_vma_bind(vma, to-obj-cache_level, GLOBAL_BIND);
  }
   
  if (!to-is_initialized || i915_gem_context_is_default(to))

Re: [Intel-gfx] [PATCH] UXA: Wait until a pageflip actually completes to report it.

2014-05-07 Thread Chris Wilson

On Mon, May 05, 2014 at 11:05:07PM -0700, Jamey Sharp wrote:
 UXA was reporting page-flip completion as soon as the flip was scheduled
 with the kernel, instead of waiting for the kernel to indicate that the
 flip had actually completed.
 
 Moving the DRI2SwapComplete call to the right place fixes all of our
 Piglit tests for OML_sync_control when run on xf86-video-intel/UXA,
 aside from a bit of difficult-to-reproduce flakiness when using a
 divisor  1.

The violation is intentional, as it gives us triple buffering by
default. It can be disabled. Also you could do an improved version for
recent Xorg, still using DRI2.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 6/9] drm/i915: Wrap VMA binding

2014-05-07 Thread Daniel Vetter

On Wed, May 07, 2014 at 08:54:56AM -0700, Ben Widawsky wrote:
 On Wed, May 07, 2014 at 09:55:49AM +0200, Daniel Vetter wrote:
  On Tue, May 06, 2014 at 10:21:35PM -0700, Ben Widawsky wrote:
   This will be useful for some upcoming patches which do more platform
   specific work. Having it in one central place just makes things a bit
   cleaner and easier.
   
   There is a small functional change here. There are more calls to the
   tracepoints.
   
   NOTE: I didn't actually end up using this patch for the intended purpose, 
   but I
   thought it was a nice patch to keep around.
   
   v2: s/i915_gem_bind_vma/i915_gem_vma_bind/
   s/i915_gem_unbind_vma/i915_gem_vma_unbind/
   (Chris)
   
   v3: Missed one spot
   
   Signed-off-by: Ben Widawsky b...@bwidawsk.net
  
  You change the semantics of the vma bind/unbind tracepoints - previously
  they've meant we reserve/unreserve a ppgtt address for an object, not
  they're called every time we touch the ptes (which is what
  vma_bind|unbind actually do). That doesn't look too terribly useful for
  tracing any more.
  
  What's the use case for these added tracepoints?
  -Daniel
  
 
 You're right. The tracepoint is incredibly useful for debug purposes,
 but not as an actual tracepoint. The current tracepoint is badly named,
 IMO, but that's fine. I am not sure it's worth fixing though, since
 nobody has really spoken up for the patch.

Yeah I think a new tracepoint we use every time we frob ptes could be
useful indeed. But conflating that operation with the vm operation isn't
good, especially since the vm operation lacks crucial information like the
actual caching bits we're using with the ptes.
-Daniel
 
   ---
drivers/gpu/drm/i915/i915_drv.h|  3 +++
drivers/gpu/drm/i915/i915_gem.c| 13 ++---
drivers/gpu/drm/i915/i915_gem_context.c|  2 +-
drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 +++--
drivers/gpu/drm/i915/i915_gem_gtt.c| 16 ++--
5 files changed, 27 insertions(+), 12 deletions(-)
   
   diff --git a/drivers/gpu/drm/i915/i915_drv.h 
   b/drivers/gpu/drm/i915/i915_drv.h
   index c1d2bea..30cde3d 100644
   --- a/drivers/gpu/drm/i915/i915_drv.h
   +++ b/drivers/gpu/drm/i915/i915_drv.h
   @@ -2264,6 +2264,9 @@ bool i915_gem_obj_bound(struct drm_i915_gem_object 
   *o,
 struct i915_address_space *vm);
unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
 struct i915_address_space *vm);
   +void i915_gem_vma_bind(struct i915_vma *vma, enum i915_cache_level,
   +unsigned flags);
   +void i915_gem_vma_unbind(struct i915_vma *vma);
struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
  struct i915_address_space *vm);
struct i915_vma *
   diff --git a/drivers/gpu/drm/i915/i915_gem.c 
   b/drivers/gpu/drm/i915/i915_gem.c
   index e9599e9..1567911 100644
   --- a/drivers/gpu/drm/i915/i915_gem.c
   +++ b/drivers/gpu/drm/i915/i915_gem.c
   @@ -2771,7 +2771,7 @@ int i915_vma_unbind(struct i915_vma *vma)

 trace_i915_vma_unbind(vma);

   - vma-unbind_vma(vma);
   + i915_gem_vma_unbind(vma);

 i915_gem_gtt_finish_object(obj);

   @@ -3342,9 +3342,8 @@ i915_gem_object_bind_to_vm(struct 
   drm_i915_gem_object *obj,

 WARN_ON(flags  PIN_MAPPABLE  !obj-map_and_fenceable);

   - trace_i915_vma_bind(vma, flags);
   - vma-bind_vma(vma, obj-cache_level,
   -   flags  (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 0);
   + i915_gem_vma_bind(vma, obj-cache_level,
   +   flags  (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 
   0);

 i915_gem_verify_gtt(dev);
 return vma;
   @@ -3548,8 +3547,8 @@ int i915_gem_object_set_cache_level(struct 
   drm_i915_gem_object *obj,

 list_for_each_entry(vma, obj-vma_list, vma_link)
 if (drm_mm_node_allocated(vma-node))
   - vma-bind_vma(vma, cache_level,
   -   obj-has_global_gtt_mapping ? 
   GLOBAL_BIND : 0);
   + i915_gem_vma_bind(vma, cache_level,
   +   obj-has_global_gtt_mapping ? 
   GLOBAL_BIND : 0);
 }

 list_for_each_entry(vma, obj-vma_list, vma_link)
   @@ -3914,7 +3913,7 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
 }

 if (flags  PIN_GLOBAL  !obj-has_global_gtt_mapping)
   - vma-bind_vma(vma, obj-cache_level, GLOBAL_BIND);
   + i915_gem_vma_bind(vma, obj-cache_level, GLOBAL_BIND);

 vma-pin_count++;
 if (flags  PIN_MAPPABLE)
   diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
   b/drivers/gpu/drm/i915/i915_gem_context.c
   index f77b4c1..00c7d88 100644
   --- a/drivers/gpu/drm/i915/i915_gem_context.c
   +++ b/drivers/gpu/drm/i915/i915_gem_context.c
   @@ -655,7 +655,7 @@ static int do_switch(struct intel_ring_buffer *ring,

Re: [Intel-gfx] [PATCH 2/9] drm/i915: Extract node allocation from bind

2014-05-07 Thread Ben Widawsky

On Wed, May 07, 2014 at 04:53:08PM +0100, Chris Wilson wrote:
 On Wed, May 07, 2014 at 08:45:38AM -0700, Ben Widawsky wrote:
  On Wed, May 07, 2014 at 08:02:38AM +0100, Chris Wilson wrote:
   On Tue, May 06, 2014 at 10:21:31PM -0700, Ben Widawsky wrote:
The DRM node allocation code was already a bit of an ugly bit of code
within a complex function. Removing it serves the purpose of cleaning
the function up. More importantly, it provides a way to have a
preallocated (address space) VMA to easily skip this code. Something
we're very likely to need.

This is essentially a wrapper for DRM node allocation with an eviction +
retry. It is something which could be moved to core DRM eventually, if
other drivers had similar eviction semantics.

This removes a goto used for something other than error unwinding, a
generally reviled (I am neutral) practice, and replaces it with a
function call to itself that should have the same effect. Note that it's
not a recursive function as all the problem set reduction is done in the
eviction code.

Some might say this change is worse than before because we are using
stack for each subsequent call. Frankly, I'd rather overflow the stack
and blow it up than loop forever. In either case, this is addressed in
the next patch.

I believe, and intend, that other than the stack usage, there is no
functional change here.
   
   Nope, but decoupling evict_something() further does introduce lots more
   implied magic.
   -Chris
   
  
  Hmm. I really don't see what's actually upsetting. Can you be a bit more
  explicit about what's so bothersome to you for my edification?
 
 evict_something() make assumptions about the ranges of the vm which it
 searches. At the moment, they mirror its parent's function.
 -Chris
 

Ah, thanks. So is plumbing starting eviction offset into evict something an
appealing solution here?

 -- 
 Chris Wilson, Intel Open Source Technology Centre

-- 
Ben Widawsky, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Use hash tables for the command parser

2014-05-07 Thread Volkin, Bradley D

Could someone help to review this patch please? It provides a nice
improvement to the command parser's performance, so I'd like to get
this one in.

Thanks,
Brad

On Mon, Apr 28, 2014 at 08:22:08AM -0700, Volkin, Bradley D wrote:
 From: Brad Volkin bradley.d.vol...@intel.com
 
 For clients that submit large batch buffers the command parser has
 a substantial impact on performance. On my HSW ULT system performance
 drops as much as ~20% on some tests. Most of the time is spent in the
 command lookup code. Converting that from the current naive search to
 a hash table lookup reduces the performance impact by as much as ~10%.
 
 The choice of value for I915_CMD_HASH_ORDER allows all commands
 currently used in the parser tables to hash to their own bucket (except
 for one collision on the render ring). The tradeoff is that it wastes
 memory. Because the opcodes for the commands in the tables are not
 particularly well distributed, reducing the order still leaves many
 buckets empty. The increased collisions don't seem to have a huge
 impact on the performance gain, but for now anyhow, the parser trades
 memory for performance.
 
 Signed-off-by: Brad Volkin bradley.d.vol...@intel.com
 ---
  drivers/gpu/drm/i915/i915_cmd_parser.c  | 136 
 
  drivers/gpu/drm/i915/i915_drv.h |   1 +
  drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
  drivers/gpu/drm/i915/intel_ringbuffer.h |  11 ++-
  4 files changed, 116 insertions(+), 34 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
 b/drivers/gpu/drm/i915/i915_cmd_parser.c
 index 9bac097..9dca899 100644
 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
 +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
 @@ -498,16 +498,18 @@ static u32 gen7_blt_get_cmd_length_mask(u32 cmd_header)
   return 0;
  }
  
 -static bool validate_cmds_sorted(struct intel_ring_buffer *ring)
 +static bool validate_cmds_sorted(struct intel_ring_buffer *ring,
 +  const struct drm_i915_cmd_table *cmd_tables,
 +  int cmd_table_count)
  {
   int i;
   bool ret = true;
  
 - if (!ring-cmd_tables || ring-cmd_table_count == 0)
 + if (!cmd_tables || cmd_table_count == 0)
   return true;
  
 - for (i = 0; i  ring-cmd_table_count; i++) {
 - const struct drm_i915_cmd_table *table = ring-cmd_tables[i];
 + for (i = 0; i  cmd_table_count; i++) {
 + const struct drm_i915_cmd_table *table = cmd_tables[i];
   u32 previous = 0;
   int j;
  
 @@ -557,6 +559,60 @@ static bool validate_regs_sorted(struct 
 intel_ring_buffer *ring)
ring-master_reg_count);
  }
  
 +struct cmd_node {
 + const struct drm_i915_cmd_descriptor *desc;
 + struct hlist_node node;
 +};
 +
 +/*
 + * Different command ranges have different numbers of bits for the opcode.
 + * In order to use the opcode bits, and only the opcode bits, for the hash 
 key
 + * we should use the MI_* command opcode mask (since those commands use the
 + * fewest bits for the opcode.)
 + */
 +#define CMD_HASH_MASK STD_MI_OPCODE_MASK
 +
 +static int init_hash_table(struct intel_ring_buffer *ring,
 +const struct drm_i915_cmd_table *cmd_tables,
 +int cmd_table_count)
 +{
 + int i, j;
 +
 + hash_init(ring-cmd_hash);
 +
 + for (i = 0; i  cmd_table_count; i++) {
 + const struct drm_i915_cmd_table *table = cmd_tables[i];
 +
 + for (j = 0; j  table-count; j++) {
 + const struct drm_i915_cmd_descriptor *desc =
 + table-table[j];
 + struct cmd_node *desc_node =
 + kmalloc(sizeof(*desc_node), GFP_KERNEL);
 +
 + if (!desc_node)
 + return -ENOMEM;
 +
 + desc_node-desc = desc;
 + hash_add(ring-cmd_hash, desc_node-node,
 +  desc-cmd.value  CMD_HASH_MASK);
 + }
 + }
 +
 + return 0;
 +}
 +
 +static void fini_hash_table(struct intel_ring_buffer *ring)
 +{
 + struct hlist_node *tmp;
 + struct cmd_node *desc_node;
 + int i;
 +
 + hash_for_each_safe(ring-cmd_hash, i, tmp, desc_node, node) {
 + hash_del(desc_node-node);
 + kfree(desc_node);
 + }
 +}
 +
  /**
   * i915_cmd_parser_init_ring() - set cmd parser related fields for a 
 ringbuffer
   * @ring: the ringbuffer to initialize
 @@ -567,18 +623,21 @@ static bool validate_regs_sorted(struct 
 intel_ring_buffer *ring)
   */
  void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
  {
 + const struct drm_i915_cmd_table *cmd_tables;
 + int cmd_table_count;
 +
   if (!IS_GEN7(ring-dev))
   return;
  
   switch (ring-id) {
   case RCS:
   if (IS_HASWELL(ring-dev)) {
 - ring-cmd_tables =

Re: [Intel-gfx] [PATCH 2/9] drm/i915: Extract node allocation from bind

2014-05-07 Thread Chris Wilson

On Wed, May 07, 2014 at 09:00:16AM -0700, Ben Widawsky wrote:
 On Wed, May 07, 2014 at 04:53:08PM +0100, Chris Wilson wrote:
  On Wed, May 07, 2014 at 08:45:38AM -0700, Ben Widawsky wrote:
   Hmm. I really don't see what's actually upsetting. Can you be a bit more
   explicit about what's so bothersome to you for my edification?
  
  evict_something() make assumptions about the ranges of the vm which it
  searches. At the moment, they mirror its parent's function.
 
 Ah, thanks. So is plumbing starting eviction offset into evict something an
 appealing solution here?

Yes. I have to admit to not being overly sold on the code migration yet.
I guess you have an ulterior motive... Evictable vm?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH v2] drm/i915: remove user GTT mappings early during runtime suspend

2014-05-07 Thread Imre Deak

Currently user space can access GEM buffers mapped to GTT through
existing mappings concurrently while the platform specific suspend
handlers are running. Since these handlers may change the HW state in a
way that would break such accesses, remove the mappings before calling
the handlers. Spotted by Ville.

Also Chris pointed out that the lists that i915_gem_release_all_mmaps()
walks through need dev-struct_mutex, so take this lock. There is a
potential deadlock against a concurrent RPM resume, resolve this by
aborting and rescheduling the suspend (Daniel).

v2:
- take struct_mutex around i915_gem_release_all_mmaps() (Chris, Daniel)

Signed-off-by: Imre Deak imre.d...@intel.com
---
 drivers/gpu/drm/i915/i915_drv.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 4024e16..0c9858c 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -36,6 +36,7 @@
 
 #include linux/console.h
 #include linux/module.h
+#include linux/pm_runtime.h
 #include drm/drm_crtc_helper.h
 
 static struct drm_driver driver;
@@ -1315,6 +1316,30 @@ static int intel_runtime_suspend(struct device *device)
DRM_DEBUG_KMS(Suspending device\n);
 
/*
+* We could deadlock here in case another thread holding struct_mutex
+* calls RPM suspend concurrently, since the RPM suspend will wait
+* first for this RPM suspend to finish. In this case the concurrent
+* RPM resume will be followed by its RPM suspend counterpart. Still
+* for consistency return -EAGAIN, which will reschedule this suspend.
+*/
+   if (!mutex_trylock(dev-struct_mutex)) {
+   DRM_DEBUG_KMS(device lock contention, deffering suspend\n);
+   /*
+* Bump the expiration timestamp, otherwise the suspend won't
+* be rescheduled.
+*/
+   pm_runtime_mark_last_busy(device);
+
+   return -EAGAIN;
+   }
+   /*
+* We are safe here against re-faults, since the fault handler takes
+* an RPM reference.
+*/
+   i915_gem_release_all_mmaps(dev_priv);
+   mutex_unlock(dev-struct_mutex);
+
+   /*
 * rps.work can't be rearmed here, since we get here only after making
 * sure the GPU is idle and the RPS freq is set to the minimum. See
 * intel_mark_idle().
@@ -1340,8 +1365,6 @@ static int intel_runtime_suspend(struct device *device)
return ret;
}
 
-   i915_gem_release_all_mmaps(dev_priv);
-
del_timer_sync(dev_priv-gpu_error.hangcheck_timer);
dev_priv-pm.suspended = true;
 
-- 
1.8.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 08/13] drm/i915: Implement MI decode for gen8

2014-05-07 Thread Ben Widawsky

On Wed, Apr 30, 2014 at 02:21:15PM +0300, Ville Syrjälä wrote:
 On Tue, Apr 29, 2014 at 02:52:35PM -0700, Ben Widawsky wrote:
  From: Ben Widawsky benjamin.widaw...@linux.intel.com
  
  This is needed to implement ipehr_is_semaphore_wait
  
  Signed-off-by: Ben Widawsky b...@bwidawsk.net
  ---
   drivers/gpu/drm/i915/i915_irq.c | 11 +--
   1 file changed, 5 insertions(+), 6 deletions(-)
  
  diff --git a/drivers/gpu/drm/i915/i915_irq.c 
  b/drivers/gpu/drm/i915/i915_irq.c
  index 2d76183..bfd21c7 100644
  --- a/drivers/gpu/drm/i915/i915_irq.c
  +++ b/drivers/gpu/drm/i915/i915_irq.c
  @@ -2561,12 +2561,9 @@ static bool
   ipehr_is_semaphore_wait(struct drm_device *dev, u32 ipehr)
   {
  if (INTEL_INFO(dev)-gen = 8) {
  -   /*
  -* FIXME: gen8 semaphore support - currently we don't emit
  -* semaphores on bdw anyway, but this needs to be addressed when
  -* we merge that code.
  -*/
  -   return false;
  +   /* Broadwell's semaphore wait is 3 dwords. We hope IPEHR is the
  +* first dword. */
  +   return (ipehr  23) == 0x1c;
  } else {
  ipehr = ~MI_SEMAPHORE_SYNC_MASK;
  return ipehr == (MI_SEMAPHORE_MBOX | MI_SEMAPHORE_COMPARE |
  @@ -2586,6 +2583,8 @@ semaphore_wait_to_signaller_ring(struct 
  intel_ring_buffer *ring, u32 ipehr)
   * FIXME: gen8 semaphore support - currently we don't emit
   * semaphores on bdw anyway, but this needs to be addressed when
   * we merge that code.
  +*
  +* XXX: Gen8 needs more than just IPEHR.
   */
 
 I believe something like this should take care of the remaining gap.
 

Thanks for the help. At this point I just want to get the damn thing
merged, and want to avoid any extra risk. Would you mind resending this
as a distinct patch (authored by you), after we get the rest merged?

 diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
 index 2446e61..cd1069e 100644
 --- a/drivers/gpu/drm/i915/i915_irq.c
 +++ b/drivers/gpu/drm/i915/i915_irq.c
 @@ -2590,19 +2590,21 @@ ipehr_is_semaphore_wait(struct drm_device *dev, u32 
 ipehr)
  }
  
  static struct intel_ring_buffer *
 -semaphore_wait_to_signaller_ring(struct intel_ring_buffer *ring, u32 ipehr)
 +semaphore_wait_to_signaller_ring(struct intel_ring_buffer *ring,
 +  u32 ipehr, u64 offset)
  {
   struct drm_i915_private *dev_priv = ring-dev-dev_private;
   struct intel_ring_buffer *signaller;
   int i;
  
   if (INTEL_INFO(dev_priv-dev)-gen = 8) {
 - /*
 -  * FIXME: gen8 semaphore support - currently we don't emit
 -  * semaphores on bdw anyway, but this needs to be addressed when
 -  * we merge that code.
 -  */
 - return NULL;
 + for_each_ring(signaller, dev_priv, i) {
 + if (ring == signaller)
 + continue;
 +
 + if (offset == signaller-semaphore.signal_gtt[ring-id])
 + return signaller;
 + }
   } else {
   u32 sync_bits = ipehr  MI_SEMAPHORE_SYNC_MASK;
  
 @@ -2627,6 +2629,7 @@ semaphore_waits_for(struct intel_ring_buffer *ring, u32 
 *seqno)
  {
   struct drm_i915_private *dev_priv = ring-dev-dev_private;
   u32 cmd, ipehr, head;
 + u64 offset = 0;
   int i;
  
   ipehr = I915_READ(RING_IPEHR(ring-mmio_base));
 @@ -2662,7 +2665,12 @@ semaphore_waits_for(struct intel_ring_buffer *ring, 
 u32 *seqno)
   return NULL;
  
   *seqno = ioread32(ring-virtual_start + head + 4) + 1;
 - return semaphore_wait_to_signaller_ring(ring, ipehr);
 + if (INTEL_INFO(dev_priv-dev)-gen = 8) {
 + offset = ioread32(ring-virtual_start + head + 12);
 + offset = 32;
 + offset |= ioread32(ring-virtual_start + head + 8);
 + }
 + return semaphore_wait_to_signaller_ring(ring, ipehr, offset);
  }
  
  static int semaphore_passed(struct intel_ring_buffer *ring)
 -- 
 1.8.3.2
 
 
  return NULL;
  } else {
  -- 
  1.9.2
  
  ___
  Intel-gfx mailing list
  Intel-gfx@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/intel-gfx
 
 -- 
 Ville Syrjälä
 Intel OTC
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ben Widawsky, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 08/13] drm/i915: Implement MI decode for gen8

2014-05-07 Thread Ville Syrjälä

On Wed, May 07, 2014 at 09:59:14AM -0700, Ben Widawsky wrote:
 On Wed, Apr 30, 2014 at 02:21:15PM +0300, Ville Syrjälä wrote:
  On Tue, Apr 29, 2014 at 02:52:35PM -0700, Ben Widawsky wrote:
   From: Ben Widawsky benjamin.widaw...@linux.intel.com
   
   This is needed to implement ipehr_is_semaphore_wait
   
   Signed-off-by: Ben Widawsky b...@bwidawsk.net
   ---
drivers/gpu/drm/i915/i915_irq.c | 11 +--
1 file changed, 5 insertions(+), 6 deletions(-)
   
   diff --git a/drivers/gpu/drm/i915/i915_irq.c 
   b/drivers/gpu/drm/i915/i915_irq.c
   index 2d76183..bfd21c7 100644
   --- a/drivers/gpu/drm/i915/i915_irq.c
   +++ b/drivers/gpu/drm/i915/i915_irq.c
   @@ -2561,12 +2561,9 @@ static bool
ipehr_is_semaphore_wait(struct drm_device *dev, u32 ipehr)
{
 if (INTEL_INFO(dev)-gen = 8) {
   - /*
   -  * FIXME: gen8 semaphore support - currently we don't emit
   -  * semaphores on bdw anyway, but this needs to be addressed when
   -  * we merge that code.
   -  */
   - return false;
   + /* Broadwell's semaphore wait is 3 dwords. We hope IPEHR is the
   +  * first dword. */
   + return (ipehr  23) == 0x1c;
 } else {
 ipehr = ~MI_SEMAPHORE_SYNC_MASK;
 return ipehr == (MI_SEMAPHORE_MBOX | MI_SEMAPHORE_COMPARE |
   @@ -2586,6 +2583,8 @@ semaphore_wait_to_signaller_ring(struct 
   intel_ring_buffer *ring, u32 ipehr)
  * FIXME: gen8 semaphore support - currently we don't emit
  * semaphores on bdw anyway, but this needs to be addressed when
  * we merge that code.
   +  *
   +  * XXX: Gen8 needs more than just IPEHR.
  */
  
  I believe something like this should take care of the remaining gap.
  
 
 Thanks for the help. At this point I just want to get the damn thing
 merged, and want to avoid any extra risk. Would you mind resending this
 as a distinct patch (authored by you), after we get the rest merged?

Can do.

 
  diff --git a/drivers/gpu/drm/i915/i915_irq.c 
  b/drivers/gpu/drm/i915/i915_irq.c
  index 2446e61..cd1069e 100644
  --- a/drivers/gpu/drm/i915/i915_irq.c
  +++ b/drivers/gpu/drm/i915/i915_irq.c
  @@ -2590,19 +2590,21 @@ ipehr_is_semaphore_wait(struct drm_device *dev, u32 
  ipehr)
   }
   
   static struct intel_ring_buffer *
  -semaphore_wait_to_signaller_ring(struct intel_ring_buffer *ring, u32 ipehr)
  +semaphore_wait_to_signaller_ring(struct intel_ring_buffer *ring,
  +u32 ipehr, u64 offset)
   {
  struct drm_i915_private *dev_priv = ring-dev-dev_private;
  struct intel_ring_buffer *signaller;
  int i;
   
  if (INTEL_INFO(dev_priv-dev)-gen = 8) {
  -   /*
  -* FIXME: gen8 semaphore support - currently we don't emit
  -* semaphores on bdw anyway, but this needs to be addressed when
  -* we merge that code.
  -*/
  -   return NULL;
  +   for_each_ring(signaller, dev_priv, i) {
  +   if (ring == signaller)
  +   continue;
  +
  +   if (offset == signaller-semaphore.signal_gtt[ring-id])
  +   return signaller;
  +   }
  } else {
  u32 sync_bits = ipehr  MI_SEMAPHORE_SYNC_MASK;
   
  @@ -2627,6 +2629,7 @@ semaphore_waits_for(struct intel_ring_buffer *ring, 
  u32 *seqno)
   {
  struct drm_i915_private *dev_priv = ring-dev-dev_private;
  u32 cmd, ipehr, head;
  +   u64 offset = 0;
  int i;
   
  ipehr = I915_READ(RING_IPEHR(ring-mmio_base));
  @@ -2662,7 +2665,12 @@ semaphore_waits_for(struct intel_ring_buffer *ring, 
  u32 *seqno)
  return NULL;
   
  *seqno = ioread32(ring-virtual_start + head + 4) + 1;
  -   return semaphore_wait_to_signaller_ring(ring, ipehr);
  +   if (INTEL_INFO(dev_priv-dev)-gen = 8) {
  +   offset = ioread32(ring-virtual_start + head + 12);
  +   offset = 32;
  +   offset |= ioread32(ring-virtual_start + head + 8);
  +   }
  +   return semaphore_wait_to_signaller_ring(ring, ipehr, offset);
   }
   
   static int semaphore_passed(struct intel_ring_buffer *ring)
  -- 
  1.8.3.2
  
  
 return NULL;
 } else {
   -- 
   1.9.2
   
   ___
   Intel-gfx mailing list
   Intel-gfx@lists.freedesktop.org
   http://lists.freedesktop.org/mailman/listinfo/intel-gfx
  
  -- 
  Ville Syrjälä
  Intel OTC
  ___
  Intel-gfx mailing list
  Intel-gfx@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/intel-gfx
 
 -- 
 Ben Widawsky, Intel Open Source Technology Center

-- 
Ville Syrjälä
Intel OTC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v2] drm/i915: remove user GTT mappings early during runtime suspend

2014-05-07 Thread Imre Deak

On Wed, 2014-05-07 at 19:57 +0300, Imre Deak wrote:
 Currently user space can access GEM buffers mapped to GTT through
 existing mappings concurrently while the platform specific suspend
 handlers are running. Since these handlers may change the HW state in a
 way that would break such accesses, remove the mappings before calling
 the handlers. Spotted by Ville.
 
 Also Chris pointed out that the lists that i915_gem_release_all_mmaps()
 walks through need dev-struct_mutex, so take this lock. There is a
 potential deadlock against a concurrent RPM resume, resolve this by
 aborting and rescheduling the suspend (Daniel).
 
 v2:
 - take struct_mutex around i915_gem_release_all_mmaps() (Chris, Daniel)
 
 Signed-off-by: Imre Deak imre.d...@intel.com
 ---
  drivers/gpu/drm/i915/i915_drv.c | 27 +--
  1 file changed, 25 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
 index 4024e16..0c9858c 100644
 --- a/drivers/gpu/drm/i915/i915_drv.c
 +++ b/drivers/gpu/drm/i915/i915_drv.c
 @@ -36,6 +36,7 @@
  
  #include linux/console.h
  #include linux/module.h
 +#include linux/pm_runtime.h
  #include drm/drm_crtc_helper.h
  
  static struct drm_driver driver;
 @@ -1315,6 +1316,30 @@ static int intel_runtime_suspend(struct device *device)
   DRM_DEBUG_KMS(Suspending device\n);
  
   /*
 +  * We could deadlock here in case another thread holding struct_mutex
 +  * calls RPM suspend concurrently, since the RPM suspend will wait
   resume^ resume^
 +  * first for this RPM suspend to finish. In this case the concurrent
 +  * RPM resume will be followed by its RPM suspend counterpart. Still
 +  * for consistency return -EAGAIN, which will reschedule this suspend.
 +  */
 + if (!mutex_trylock(dev-struct_mutex)) {
 + DRM_DEBUG_KMS(device lock contention, deffering suspend\n);
 + /*
 +  * Bump the expiration timestamp, otherwise the suspend won't
 +  * be rescheduled.
 +  */
 + pm_runtime_mark_last_busy(device);
 +
 + return -EAGAIN;
 + }
 + /*
 +  * We are safe here against re-faults, since the fault handler takes
 +  * an RPM reference.
 +  */
 + i915_gem_release_all_mmaps(dev_priv);
 + mutex_unlock(dev-struct_mutex);
 +
 + /*
* rps.work can't be rearmed here, since we get here only after making
* sure the GPU is idle and the RPS freq is set to the minimum. See
* intel_mark_idle().
 @@ -1340,8 +1365,6 @@ static int intel_runtime_suspend(struct device *device)
   return ret;
   }
  
 - i915_gem_release_all_mmaps(dev_priv);
 -
   del_timer_sync(dev_priv-gpu_error.hangcheck_timer);
   dev_priv-pm.suspended = true;
  



signature.asc
Description: This is a digitally signed message part
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 05/10] drm/i915: Implement MI decode for gen8

2014-05-07 Thread Ben Widawsky

From: Ben Widawsky benjamin.widaw...@linux.intel.com

This is needed to implement ipehr_is_semaphore_wait

Signed-off-by: Ben Widawsky b...@bwidawsk.net
---
 drivers/gpu/drm/i915/i915_irq.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 2d76183..bfd21c7 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2561,12 +2561,9 @@ static bool
 ipehr_is_semaphore_wait(struct drm_device *dev, u32 ipehr)
 {
if (INTEL_INFO(dev)-gen = 8) {
-   /*
-* FIXME: gen8 semaphore support - currently we don't emit
-* semaphores on bdw anyway, but this needs to be addressed when
-* we merge that code.
-*/
-   return false;
+   /* Broadwell's semaphore wait is 3 dwords. We hope IPEHR is the
+* first dword. */
+   return (ipehr  23) == 0x1c;
} else {
ipehr = ~MI_SEMAPHORE_SYNC_MASK;
return ipehr == (MI_SEMAPHORE_MBOX | MI_SEMAPHORE_COMPARE |
@@ -2586,6 +2583,8 @@ semaphore_wait_to_signaller_ring(struct intel_ring_buffer 
*ring, u32 ipehr)
 * FIXME: gen8 semaphore support - currently we don't emit
 * semaphores on bdw anyway, but this needs to be addressed when
 * we merge that code.
+*
+* XXX: Gen8 needs more than just IPEHR.
 */
return NULL;
} else {
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 03/10] drm/i915/bdw: implement semaphore signal

2014-05-07 Thread Ben Widawsky

Semaphore signalling works similarly to previous GENs with the exception
that the per ring mailboxes no longer exist. Instead you must define
your own space, somewhere in the GTT.

The comments in the code define the layout I've opted for, which should
be fairly future proof. Ie. I tried to define offsets in abstract terms
(NUM_RINGS, seqno size, etc).

NOTE: If one wanted to move this to the HWSP they could. I've decided
one 4k object would be easier to deal with, and provide potential wins
with cache locality, but that's all speculative.

v2: Update the macro to not need the other ring's ring-id (Chris)
Update the comment to use the correct formula (Chris)

v3: Move the macros the ringbuffer.h to prevent churn in next patch
(Ville)

v4: Fixed compilation rebase conflict
commit 1ec9e26ddab06459e89a890431b2de064c5d1056
Author: Daniel Vetter daniel.vet...@ffwll.ch
Date:   Fri Feb 14 14:01:11 2014 +0100

drm/i915: Consolidate binding parameters into flags

v5: VCS2 rebase
Replace hweight_long with hweight32

Signed-off-by: Ben Widawsky b...@bwidawsk.net
---
 drivers/gpu/drm/i915/i915_drv.h |   1 +
 drivers/gpu/drm/i915/i915_reg.h |   5 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c | 166 ++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |  78 +--
 4 files changed, 190 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 26253c0..1ede818 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1355,6 +1355,7 @@ struct drm_i915_private {
 
struct pci_dev *bridge_dev;
struct intel_ring_buffer ring[I915_NUM_RINGS];
+   struct drm_i915_gem_object *semaphore_obj;
uint32_t last_seqno, next_seqno;
 
drm_dma_handle_t *status_page_dmah;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 03ffc57..dc25084 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -232,7 +232,7 @@
 #define   MI_DISPLAY_FLIP_IVB_SPRITE_B (3  19)
 #define   MI_DISPLAY_FLIP_IVB_PLANE_C  (4  19)
 #define   MI_DISPLAY_FLIP_IVB_SPRITE_C (5  19)
-#define MI_SEMAPHORE_MBOX  MI_INSTR(0x16, 1) /* gen6+ */
+#define MI_SEMAPHORE_MBOX  MI_INSTR(0x16, 1) /* gen6, gen7 */
 #define   MI_SEMAPHORE_GLOBAL_GTT(122)
 #define   MI_SEMAPHORE_UPDATE  (121)
 #define   MI_SEMAPHORE_COMPARE (120)
@@ -258,6 +258,8 @@
 #define   MI_RESTORE_EXT_STATE_EN  (12)
 #define   MI_FORCE_RESTORE (11)
 #define   MI_RESTORE_INHIBIT   (10)
+#define MI_SEMAPHORE_SIGNALMI_INSTR(0x1b, 0) /* GEN8+ */
+#define   MI_SEMAPHORE_TARGET(engine)  ((engine)15)
 #define MI_STORE_DWORD_IMM MI_INSTR(0x20, 1)
 #define   MI_MEM_VIRTUAL   (1  22) /* 965+ only */
 #define MI_STORE_DWORD_INDEX   MI_INSTR(0x21, 1)
@@ -352,6 +354,7 @@
 #define   PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE(110) /* 
GM45+ only */
 #define   PIPE_CONTROL_INDIRECT_STATE_DISABLE  (19)
 #define   PIPE_CONTROL_NOTIFY  (18)
+#define   PIPE_CONTROL_FLUSH_ENABLE(17) /* gen7+ */
 #define   PIPE_CONTROL_VF_CACHE_INVALIDATE (14)
 #define   PIPE_CONTROL_CONST_CACHE_INVALIDATE  (13)
 #define   PIPE_CONTROL_STATE_CACHE_INVALIDATE  (12)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 1d52c8a..b69eceb 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -650,6 +650,13 @@ static int init_render_ring(struct intel_ring_buffer *ring)
 static void render_ring_cleanup(struct intel_ring_buffer *ring)
 {
struct drm_device *dev = ring-dev;
+   struct drm_i915_private *dev_priv = dev-dev_private;
+
+   if (dev_priv-semaphore_obj) {
+   i915_gem_object_ggtt_unpin(dev_priv-semaphore_obj);
+   drm_gem_object_unreference(dev_priv-semaphore_obj-base);
+   dev_priv-semaphore_obj = NULL;
+   }
 
if (ring-scratch.obj == NULL)
return;
@@ -663,6 +670,85 @@ static void render_ring_cleanup(struct intel_ring_buffer 
*ring)
ring-scratch.obj = NULL;
 }
 
+static int gen8_rcs_signal(struct intel_ring_buffer *signaller,
+  unsigned int num_dwords)
+{
+#define MBOX_UPDATE_DWORDS 8
+   struct drm_device *dev = signaller-dev;
+   struct drm_i915_private *dev_priv = dev-dev_private;
+   struct intel_ring_buffer *waiter;
+   int i, ret, num_rings;
+
+   num_rings = hweight32(INTEL_INFO(dev)-ring_mask);
+   num_dwords += (num_rings-1) * MBOX_UPDATE_DWORDS;
+#undef MBOX_UPDATE_DWORDS
+
+   ret = intel_ring_begin(signaller, num_dwords);
+   if (ret)
+   return ret;
+
+   for_each_ring(waiter, dev_priv, i) {
+   u64 gtt_offset = signaller-semaphore.signal_ggtt[i];
+   if (gtt_offset ==

[Intel-gfx] [PATCH 06/10] drm/i915: Extract semaphore error collection

2014-05-07 Thread Ben Widawsky

Signed-off-by: Ben Widawsky b...@bwidawsk.net
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 30 ++
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
b/drivers/gpu/drm/i915/i915_gpu_error.c
index 2d81985..a7eaab2 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -744,6 +744,23 @@ static void i915_gem_record_fences(struct drm_device *dev,
}
 }
 
+
+static void gen6_record_semaphore_state(struct drm_i915_private *dev_priv,
+   struct intel_ring_buffer *ring,
+   struct drm_i915_error_ring *ering)
+{
+   ering-semaphore_mboxes[0] = I915_READ(RING_SYNC_0(ring-mmio_base));
+   ering-semaphore_mboxes[1] = I915_READ(RING_SYNC_1(ring-mmio_base));
+   ering-semaphore_seqno[0] = ring-semaphore.sync_seqno[0];
+   ering-semaphore_seqno[1] = ring-semaphore.sync_seqno[1];
+
+   if (HAS_VEBOX(dev_priv-dev)) {
+   ering-semaphore_mboxes[2] =
+   I915_READ(RING_SYNC_2(ring-mmio_base));
+   ering-semaphore_seqno[2] = ring-semaphore.sync_seqno[2];
+   }
+}
+
 static void i915_record_ring_state(struct drm_device *dev,
   struct intel_ring_buffer *ring,
   struct drm_i915_error_ring *ering)
@@ -753,18 +770,7 @@ static void i915_record_ring_state(struct drm_device *dev,
if (INTEL_INFO(dev)-gen = 6) {
ering-rc_psmi = I915_READ(ring-mmio_base + 0x50);
ering-fault_reg = I915_READ(RING_FAULT_REG(ring));
-   ering-semaphore_mboxes[0]
-   = I915_READ(RING_SYNC_0(ring-mmio_base));
-   ering-semaphore_mboxes[1]
-   = I915_READ(RING_SYNC_1(ring-mmio_base));
-   ering-semaphore_seqno[0] = ring-semaphore.sync_seqno[0];
-   ering-semaphore_seqno[1] = ring-semaphore.sync_seqno[1];
-   }
-
-   if (HAS_VEBOX(dev)) {
-   ering-semaphore_mboxes[2] =
-   I915_READ(RING_SYNC_2(ring-mmio_base));
-   ering-semaphore_seqno[2] = ring-semaphore.sync_seqno[2];
+   gen6_record_semaphore_state(dev_priv, ring, ering);
}
 
if (INTEL_INFO(dev)-gen = 4) {
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 09/10] drm/i915/bdw: poll semaphores

2014-05-07 Thread Ben Widawsky

As Ville points out, it's possible/probable we don't actually need this.
Potentially, this validates the letter of the spec, and not the spirit.

Ville:
 I discussed this on irc w/ Ben, and I was suggesting we don't need to
 poll. Polling apparently can be used as a workaround for certain
 hardware issues, but it looks like those issues shouldn't affect us,
 for the momemnt at least. So my suggestion was to try w/o polling
 first (since there could be some power cost to polling) and add the
 poll bit if problems arise.

Signed-off-by: Ben Widawsky b...@bwidawsk.net
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 20fc2b1..66b1c6e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -842,6 +842,7 @@ gen8_ring_sync(struct intel_ring_buffer *waiter,
 
intel_ring_emit(waiter, MI_SEMAPHORE_WAIT |
MI_SEMAPHORE_GLOBAL_GTT |
+   MI_SEMAPHORE_POLL |
MI_SEMAPHORE_SAD_GTE_SDD);
intel_ring_emit(waiter, seqno);
intel_ring_emit(waiter,
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 08/10] drm/i915: semaphore debugfs

2014-05-07 Thread Ben Widawsky

Simple debugfs file to display the current state of semaphores. This is
useful if you want to see the state without hanging the GPU.

NOTE: This patch is optional to the series.

NOTE2: Like the GPU error state collection, the reads are currently
incoherent.

Signed-off-by: Ben Widawsky b...@bwidawsk.net
---
 drivers/gpu/drm/i915/i915_debugfs.c | 70 +
 1 file changed, 70 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 18b3565..d9c1414 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2335,6 +2335,75 @@ static int i915_display_info(struct seq_file *m, void 
*unused)
return 0;
 }
 
+static int i915_semaphore_status(struct seq_file *m, void *unused)
+{
+   struct drm_info_node *node = (struct drm_info_node *) m-private;
+   struct drm_device *dev = node-minor-dev;
+   struct drm_i915_private *dev_priv = dev-dev_private;
+   struct intel_ring_buffer *ring;
+   int i, j, ret;
+
+   if (!i915_semaphore_is_enabled(dev)) {
+   seq_puts(m, Semaphores are disabled\n);
+   return 0;
+   }
+
+   ret = mutex_lock_interruptible(dev-struct_mutex);
+   if (ret)
+   return ret;
+
+   if (IS_BROADWELL(dev)) {
+   struct page *page;
+   uint64_t *seqno;
+
+   page = i915_gem_object_get_page(dev_priv-semaphore_obj, 0);
+
+   seqno = (uint64_t *)kmap_atomic(page);
+   for_each_ring(ring, dev_priv, i) {
+   uint64_t offset;
+
+   seq_printf(m, %s\n, ring-name);
+
+   seq_puts(m,   Last signal:);
+   for (j = 0; j  I915_NUM_RINGS; j++) {
+   offset = i * I915_NUM_RINGS + j;
+   seq_printf(m, 0x%08llx (0x%02llx) ,
+  seqno[offset], offset * 8);
+   }
+   seq_putc(m, '\n');
+
+   seq_puts(m,   Last wait:  );
+   for (j = 0; j  I915_NUM_RINGS; j++) {
+   offset = i + (j * I915_NUM_RINGS);
+   seq_printf(m, 0x%08llx (0x%02llx) ,
+  seqno[offset], offset * 8);
+   }
+   seq_putc(m, '\n');
+
+   }
+   kunmap_atomic(seqno);
+   } else {
+   seq_puts(m,   Last signal:);
+   for_each_ring(ring, dev_priv, i)
+   for (j = 0; j  I915_NUM_RINGS; j++)
+   seq_printf(m, 0x%08x\n,
+  
I915_READ(ring-semaphore.mbox.signal[j]));
+   seq_putc(m, '\n');
+   }
+
+   seq_puts(m, \nSync seqno:\n);
+   for_each_ring(ring, dev_priv, i) {
+   for (j = 0; j  I915_NUM_RINGS - 1; j++) {
+   seq_printf(m,   0x%08x , 
ring-semaphore.sync_seqno[j]);
+   }
+   seq_putc(m, '\n');
+   }
+   seq_putc(m, '\n');
+
+   mutex_unlock(dev-struct_mutex);
+   return 0;
+}
+
 struct pipe_crc_info {
const char *name;
struct drm_device *dev;
@@ -3781,6 +3850,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
{i915_pc8_status, i915_pc8_status, 0},
{i915_power_domain_info, i915_power_domain_info, 0},
{i915_display_info, i915_display_info, 0},
+   {i915_semaphore_status, i915_semaphore_status, 0},
 };
 #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list)
 
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 00/10] Semaphores again. Needs shepherd.

2014-05-07 Thread Ben Widawsky

The series is mostly done, but it needs a few last tweaks to make it acceptable
for merge.

Chris noticed that the series doesn't properly work with the new VCS2 code that
Daniel merged a couple of weeks ago. I too noticed this, but was remaining
silent in the hopes that we could actually try to make progress. I raised
concerns about this during the VCS2 merge on IRC. I only took a quick look, and
it seems it's not too bad to implement, but it requires a bit of brain power to
make sure it's correct.

The series is broken in another way because the new semaphore decoding which
Daniel added. Ville posted a fix that looked sane to me, but I haven't a
machine to test with. I'd recommend someone try to beat that into shape and
test it. The series should be /okay/ without that though.

So here are the latest patches that I have. I just don't have enough time
before my vacation to get these things done. They've gone through 5 rounds of
rebase since the first series was posted on 12/13/13. They get the first thumbs
up from Ville on Mar 5 (a couple of minor tweaks were required and could have
been easily done by the maintainer). I've fixed everything I am aware. If
someone could take over the patches, and get them merged, I think it'd really
help in getting this useful feature added to our driver for gen8.

* The last two patches aren't needed, but are here for posterity.

If nobody has taken this over when I return I will see if I can complete it.

Ben Widawsky (10):
  drm/i915: Make semaphore updates more precise
  drm/i915: gen specific ring init
  drm/i915/bdw: implement semaphore signal
  drm/i915/bdw: implement semaphore wait
  drm/i915: Implement MI decode for gen8
  drm/i915: Extract semaphore error collection
  drm/i915/bdw: collect semaphore error state
  drm/i915: semaphore debugfs
  drm/i915/bdw: poll semaphores
  DONT_MERGE drm/i915: FORCE_RESTORE for gen8 semaphores

 drivers/gpu/drm/i915/i915_debugfs.c |  70 +++
 drivers/gpu/drm/i915/i915_drv.h |   2 +
 drivers/gpu/drm/i915/i915_gem_context.c |   7 +
 drivers/gpu/drm/i915/i915_gpu_error.c   |  79 ++--
 drivers/gpu/drm/i915/i915_irq.c |  11 +-
 drivers/gpu/drm/i915/i915_reg.h |   8 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c | 342 +++-
 drivers/gpu/drm/i915/intel_ringbuffer.h |  78 +++-
 8 files changed, 474 insertions(+), 123 deletions(-)

-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 2/9] drm/i915: Extract node allocation from bind

2014-05-07 Thread Ben Widawsky

On Wed, May 07, 2014 at 05:55:00PM +0100, Chris Wilson wrote:
 On Wed, May 07, 2014 at 09:00:16AM -0700, Ben Widawsky wrote:
  On Wed, May 07, 2014 at 04:53:08PM +0100, Chris Wilson wrote:
   On Wed, May 07, 2014 at 08:45:38AM -0700, Ben Widawsky wrote:
Hmm. I really don't see what's actually upsetting. Can you be a bit more
explicit about what's so bothersome to you for my edification?
   
   evict_something() make assumptions about the ranges of the vm which it
   searches. At the moment, they mirror its parent's function.
  
  Ah, thanks. So is plumbing starting eviction offset into evict something an
  appealing solution here?
 
 Yes. I have to admit to not being overly sold on the code migration yet.
 I guess you have an ulterior motive... Evictable vm?
 -Chris
 

The only immediate goal is to be able to get this bit logic out of
bind_to_vm, so I can use preallocated nodes a bit more cleanly. At
least for the present, I have no plans with it other than that. I did
like the resuse of the PDE allocation for gen7.

If I think of a better reason, I'll let you know.

-- 
Ben Widawsky, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 01/10] drm/i915: Make semaphore updates more precise

2014-05-07 Thread Ben Widawsky

With the ring mask we now have an easy way to know the number of rings
in the system, and therefore can accurately predict the number of dwords
to emit for semaphore signalling. This was not possible (easily)
previously.

There should be no functional impact, simply fewer instructions emitted.

While we're here, simply do the round up to 2 instead of the fancier
rounding we did before, which rounding up per mbox, ie 4. This also
allows us to drop the unnecessary MI_NOOP, so not really 4, 3.

v2: Use 3 dwords instead of 4 (Ville)
Do the proper calculation to get the number of dwords to emit (Ville)
Conditionally set .sync_to when semaphores are enabled (Ville)

v3: Rebased on VCS2
Replace hweight_long with hweight32 (Ville)

v4: Pull out the accidentally squashed hunk from the next patch after
rebase (Daniel).

Reviewed-by: Ville Syrjälä ville.syrj...@linux.intel.com (v1)
Signed-off-by: Ben Widawsky b...@bwidawsk.net
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 25 +
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 40a7aa4..d211fbc 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -666,24 +666,19 @@ static void render_ring_cleanup(struct intel_ring_buffer 
*ring)
 static int gen6_signal(struct intel_ring_buffer *signaller,
   unsigned int num_dwords)
 {
+#define MBOX_UPDATE_DWORDS 3
struct drm_device *dev = signaller-dev;
struct drm_i915_private *dev_priv = dev-dev_private;
struct intel_ring_buffer *useless;
-   int i, ret;
+   int i, ret, num_rings;
 
-   /* NB: In order to be able to do semaphore MBOX updates for varying
-* number of rings, it's easiest if we round up each individual update
-* to a multiple of 2 (since ring updates must always be a multiple of
-* 2) even though the actual update only requires 3 dwords.
-*/
-#define MBOX_UPDATE_DWORDS 4
-   if (i915_semaphore_is_enabled(dev))
-   num_dwords += ((I915_NUM_RINGS-1) * MBOX_UPDATE_DWORDS);
+   num_rings = hweight32(INTEL_INFO(dev)-ring_mask);
+   num_dwords += round_up((num_rings-1) * MBOX_UPDATE_DWORDS, 2);
+#undef MBOX_UPDATE_DWORDS
 
ret = intel_ring_begin(signaller, num_dwords);
if (ret)
return ret;
-#undef MBOX_UPDATE_DWORDS
 
for_each_ring(useless, dev_priv, i) {
u32 mbox_reg = signaller-semaphore.mbox.signal[i];
@@ -691,15 +686,13 @@ static int gen6_signal(struct intel_ring_buffer 
*signaller,
intel_ring_emit(signaller, MI_LOAD_REGISTER_IMM(1));
intel_ring_emit(signaller, mbox_reg);
intel_ring_emit(signaller, 
signaller-outstanding_lazy_seqno);
-   intel_ring_emit(signaller, MI_NOOP);
-   } else {
-   intel_ring_emit(signaller, MI_NOOP);
-   intel_ring_emit(signaller, MI_NOOP);
-   intel_ring_emit(signaller, MI_NOOP);
-   intel_ring_emit(signaller, MI_NOOP);
}
}
 
+   /* If num_dwords was rounded, make sure the tail pointer is correct */
+   if (num_rings % 2 == 0)
+   intel_ring_emit(signaller, MI_NOOP);
+
return 0;
 }
 
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [RFC] drm/i915 : Reduce the shmem page allocation time by using blitter engines for clearing pages.

2014-05-07 Thread Eric Anholt

Gupta, Sourab sourab.gu...@intel.com writes:

 On Tue, 2014-05-06 at 17:56 +, Eric Anholt wrote:
 sourab.gu...@intel.com writes:
 
  From: Sourab Gupta sourab.gu...@intel.com
 
  This patch is in continuation of and is dependent on earlier patch
  series to 'reduce the time for which device mutex is kept locked'.
  (http://lists.freedesktop.org/archives/intel-gfx/2014-May/044596.html)
 
 One of userspace's assumptions is that when you allocate a new BO, you
 can map it and start writing data into it without needing to wait on the
 GPU.  I expect this patch to mostly hurt performance on apps (and I note
 that the patch doesn't come with any actual performance data) that get
 more stalls as a result.
 
 Hi Eric,
 Yes, it may hurt the performance on apps, in case of small buffers and 
 if blitter engine is busy as there is a synchronous wait for rendering 
 in the gem_fault handler. If that is the case, we can drop this from the 
 gem_fault routine and employ it only in the do_execbuffer routine. Its 
 useful there because there is no synchronous wait required in sw, due 
 to cross ring synchronization.
 We'll gather the numbers to quantify the performance benefit we have
 while using blitter engines in this way for different buffer sizes.

 More importantly, though, it breaks existing userspace that relies on
 buffers being idle on allocation, for the unsychronized maps used in
 intel_bufferobj_subdata() and
 intel_bufferobj_map_range(GL_INVALIDATE_BUFFER_BIT |
 GL_UNSYNCHRONIZED_BIT)

 Sorry, I miss your point here. It may not break this assumption due to
 the fact that we employ this method only in case of the preallocate
 routine, which will be called in the first page fault of the object
 (gem_fault handler) resulting in fresh allocation of pages. 


 So, in case of unsynchronized maps, there may be a wait involved in the
 first page fault. Also, that wait time may be lesser than the time
 required for CPU memset (resulting in no performance hit).
 There won't be any subsequent waits afterwards for that buffer object.

 Though, we'll have performance hit in the case when blitter engine is
 already busy and may not be available to immediately start the memset of
 freshly allocated mmaped buffers.

 Am I missing something here? Does the userspace requirement for
 unsynchronized mapped objects involve complete idleness of object on gpu
 even when object page faults for the first time?

Oh, I mised how this works.  So at pagefault time, you're firing off the
blit, then immediately stalling on it?  This sounds even less like a
possible performance win than I was initially thinking.



pgp1fWzwjC48O.pgp
Description: PGP signature
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/2] drm/i915: Improve fallback ring waiting

2014-05-07 Thread Volkin, Bradley D

On Mon, May 05, 2014 at 01:07:32AM -0700, Chris Wilson wrote:
 A few improvements to the fallback method for waiting upon ring space:
 
 1. Fix the start/end wait tracepoints to always be paired.
 2. Increase responsiveness of checking
 3. Mark the process as waiting upon io
 4. Check for signal interruptions
 
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk

These seem like reasonable improvements.
Reviewed-by: Brad Volkin bradley.d.vol...@intel.com

 ---
  drivers/gpu/drm/i915/intel_ringbuffer.c | 24 +---
  1 file changed, 17 insertions(+), 7 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
 b/drivers/gpu/drm/i915/intel_ringbuffer.c
 index e0c7bf27eafd..d6b7b884adf9 100644
 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
 +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
 @@ -1546,7 +1546,6 @@ static int ring_wait_for_space(struct intel_ring_buffer 
 *ring, int n)
   /* force the tail write in case we have been skipping them */
   __intel_ring_advance(ring);
  
 - trace_i915_ring_wait_begin(ring);
   /* With GEM the hangcheck timer should kick us out of the loop,
* leaving it early runs the risk of corrupting GEM state (due
* to running on almost untested codepaths). But on resume
 @@ -1554,12 +1553,13 @@ static int ring_wait_for_space(struct 
 intel_ring_buffer *ring, int n)
* case by choosing an insanely large timeout. */
   end = jiffies + 60 * HZ;
  
 + trace_i915_ring_wait_begin(ring);
   do {
   ring-head = I915_READ_HEAD(ring);
   ring-space = ring_space(ring);
   if (ring-space = n) {
 - trace_i915_ring_wait_end(ring);
 - return 0;
 + ret = 0;
 + break;
   }
  
   if (!drm_core_check_feature(dev, DRIVER_MODESET) 
 @@ -1569,15 +1569,25 @@ static int ring_wait_for_space(struct 
 intel_ring_buffer *ring, int n)
   master_priv-sarea_priv-perf_boxes |= 
 I915_BOX_WAIT;
   }
  
 - msleep(1);
 + io_schedule_timeout(1);
 +
 + if (dev_priv-mm.interruptible  signal_pending(current)) {
 + ret = -ERESTARTSYS;
 + break;
 + }
  
   ret = i915_gem_check_wedge(dev_priv-gpu_error,
  dev_priv-mm.interruptible);
   if (ret)
 - return ret;
 - } while (!time_after(jiffies, end));
 + break;
 +
 + if (time_after(jiffies, end)) {
 + ret = -EBUSY;
 + break;
 + }
 + } while (1);
   trace_i915_ring_wait_end(ring);
 - return -EBUSY;
 + return ret;
  }
  
  static int intel_wrap_ring_buffer(struct intel_ring_buffer *ring)
 -- 
 2.0.0.rc0
 
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/2] rendercopy/bdw: Enable hw-generated binding tables

2014-05-07 Thread Abdiel Janulgue

On Wednesday, May 07, 2014 02:49:31 PM Ville Syrjälä wrote:
 I quickly cobbled together a hsw version of this and gave it a whirl on
 one machine. Seems to work just fine here, and no lockups when switching
 between hw and sw binding tables. Did you get the lockups on hsw even
 with rendercopy?
 
 Here's my hsw version:
 
 
 +static void
 +gen7_hw_binding_table(struct intel_batchbuffer *batch, bool enable)
 +{
 + if (!enable) {
 + OUT_BATCH(MI_RS_CONTROL | 0x0);
 +
 + OUT_BATCH(HSW_3DSTATE_BINDING_TABLE_POOL_ALLOC | (3 - 2));
 + /* binding table pool base address */

This bit I missed and the source of my troubles for the past few months.

 + OUT_BATCH(3  5);

Yep, I confirm toggling on HSW does work quite well now. I'll now update the 
patches to include HSW path on the kernel. I also take back my previous 
statement that RS is broken on HSW! :)

Thanks a lot Ville!


 + /* Upper bound */
 + OUT_BATCH(0);
 +
 + OUT_BATCH(GEN7_PIPE_CONTROL | (4 - 2));
 + OUT_BATCH(GEN7_PIPE_CONTROL_CS_STALL |
 GEN7_PIPE_CONTROL_STALL_AT_SCOREBOARD); + OUT_BATCH(0);
 + OUT_BATCH(0);
 +
 + OUT_BATCH(GEN7_PIPE_CONTROL | (4 - 2));
 + OUT_BATCH(GEN7_PIPE_CONTROL_SC_INVALIDATE);
 + OUT_BATCH(0);
 + OUT_BATCH(0);
 +
 + return;
 +}

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 2/2] drm/i915: Flush request queue when waiting for ring space

2014-05-07 Thread Volkin, Bradley D

On Mon, May 05, 2014 at 01:07:33AM -0700, Chris Wilson wrote:
 During the review of
 
 commit 1f70999f9052f5a1b0ce1a55aff3808f2ec9fe42
 Author: Chris Wilson ch...@chris-wilson.co.uk
 Date:   Mon Jan 27 22:43:07 2014 +
 
 drm/i915: Prevent recursion by retiring requests when the ring is full
 
 Ville raised the point that our interaction with request-tail was
 likely to foul up other uses elsewhere (such as hang check comparing
 ACTHD against requests).
 
 However, we also need to restore the implicit retire requests that certain
 test cases depend upon (e.g. igt/gem_exec_lut_handle), this raises the
 spectre that the ppgtt will randomly call i915_gpu_idle() and recurse
 back into intel_ring_begin().
 
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78023

There's one minor cleanup suggested below.

Overall I think the code is better after the patch. I don't really like 
reintroducing the potential recursion, but I suppose that's a separate
issue. With the cleanup...

Reviewed-by: Brad Volkin bradley.d.vol...@intel.com

 ---
  drivers/gpu/drm/i915/i915_drv.h |  1 +
  drivers/gpu/drm/i915/i915_gem.c |  3 +--
  drivers/gpu/drm/i915/intel_ringbuffer.c | 36 
 -
  3 files changed, 15 insertions(+), 25 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
 index 5f4f631aecef..69a4e161ff37 100644
 --- a/drivers/gpu/drm/i915/i915_drv.h
 +++ b/drivers/gpu/drm/i915/i915_drv.h
 @@ -2155,6 +2155,7 @@ struct drm_i915_gem_request *
  i915_gem_find_active_request(struct intel_ring_buffer *ring);
  
  bool i915_gem_retire_requests(struct drm_device *dev);
 +void i915_gem_retire_requests_ring(struct intel_ring_buffer *ring);
  int __must_check i915_gem_check_wedge(struct i915_gpu_error *error,
 bool interruptible);
  static inline bool i915_reset_in_progress(struct i915_gpu_error *error)
 diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
 index dae51c3701f3..2f2a668c01ac 100644
 --- a/drivers/gpu/drm/i915/i915_gem.c
 +++ b/drivers/gpu/drm/i915/i915_gem.c
 @@ -64,7 +64,6 @@ static unsigned long i915_gem_inactive_scan(struct shrinker 
 *shrinker,
  static unsigned long i915_gem_purge(struct drm_i915_private *dev_priv, long 
 target);
  static unsigned long i915_gem_shrink_all(struct drm_i915_private *dev_priv);
  static void i915_gem_object_truncate(struct drm_i915_gem_object *obj);
 -static void i915_gem_retire_requests_ring(struct intel_ring_buffer *ring);
  
  static bool cpu_cache_is_coherent(struct drm_device *dev,
 enum i915_cache_level level)
 @@ -2448,7 +2447,7 @@ void i915_gem_reset(struct drm_device *dev)
  /**
   * This function clears the request list as sequence numbers are passed.
   */
 -static void
 +void
  i915_gem_retire_requests_ring(struct intel_ring_buffer *ring)
  {
   uint32_t seqno;
 diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
 b/drivers/gpu/drm/i915/intel_ringbuffer.c
 index d6b7b884adf9..9dce15cbce73 100644
 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
 +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
 @@ -40,14 +40,19 @@
   */
  #define CACHELINE_BYTES 64
  
 -static inline int ring_space(struct intel_ring_buffer *ring)
 +static inline int __ring_space(int head, int tail, int size)
  {
 - int space = (ring-head  HEAD_ADDR) - (ring-tail + 
 I915_RING_FREE_SPACE);
 + int space = head - (tail + I915_RING_FREE_SPACE);
   if (space  0)
 - space += ring-size;
 + space += size;
   return space;
  }
  
 +static inline int ring_space(struct intel_ring_buffer *ring)
 +{
 + return __ring_space(ring-head  HEAD_ADDR, ring-tail, ring-size);
 +}
 +
  static bool intel_ring_stopped(struct intel_ring_buffer *ring)
  {
   struct drm_i915_private *dev_priv = ring-dev-dev_private;
 @@ -1495,26 +1500,11 @@ static int intel_ring_wait_request(struct 
 intel_ring_buffer *ring, int n)
   }
  
   list_for_each_entry(request, ring-request_list, list) {
 - int space;
 -
 - if (request-tail == -1)
 - continue;
 -
 - space = request-tail - (ring-tail + I915_RING_FREE_SPACE);
 - if (space  0)
 - space += ring-size;
 - if (space = n) {
 + if (__ring_space(request-tail, ring-tail, ring-size) = n) {
   seqno = request-seqno;
   tail = request-tail;

We don't actually use the local 'tail' anymore.

   break;
   }
 -
 - /* Consume this request in case we need more space than
 -  * is available and so need to prevent a race between
 -  * updating last_retired_head and direct reads of
 -  * I915_RING_HEAD. It also provides a nice sanity check.
 -  */
 - request-tail

Re: [Intel-gfx] [PATCH 1/2] drm/i915: Improve fallback ring waiting

2014-05-07 Thread Daniel Vetter

On Mon, May 05, 2014 at 09:07:32AM +0100, Chris Wilson wrote:
 A few improvements to the fallback method for waiting upon ring space:
 
 1. Fix the start/end wait tracepoints to always be paired.
 2. Increase responsiveness of checking
 3. Mark the process as waiting upon io
 4. Check for signal interruptions
 
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 ---
  drivers/gpu/drm/i915/intel_ringbuffer.c | 24 +---
  1 file changed, 17 insertions(+), 7 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
 b/drivers/gpu/drm/i915/intel_ringbuffer.c
 index e0c7bf27eafd..d6b7b884adf9 100644
 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
 +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
 @@ -1546,7 +1546,6 @@ static int ring_wait_for_space(struct intel_ring_buffer 
 *ring, int n)
   /* force the tail write in case we have been skipping them */
   __intel_ring_advance(ring);
  
 - trace_i915_ring_wait_begin(ring);
   /* With GEM the hangcheck timer should kick us out of the loop,
* leaving it early runs the risk of corrupting GEM state (due
* to running on almost untested codepaths). But on resume
 @@ -1554,12 +1553,13 @@ static int ring_wait_for_space(struct 
 intel_ring_buffer *ring, int n)
* case by choosing an insanely large timeout. */
   end = jiffies + 60 * HZ;
  
 + trace_i915_ring_wait_begin(ring);
   do {
   ring-head = I915_READ_HEAD(ring);
   ring-space = ring_space(ring);
   if (ring-space = n) {
 - trace_i915_ring_wait_end(ring);
 - return 0;
 + ret = 0;
 + break;
   }
  
   if (!drm_core_check_feature(dev, DRIVER_MODESET) 
 @@ -1569,15 +1569,25 @@ static int ring_wait_for_space(struct 
 intel_ring_buffer *ring, int n)
   master_priv-sarea_priv-perf_boxes |= 
 I915_BOX_WAIT;
   }
  
 - msleep(1);
 + io_schedule_timeout(1);

This isn't exported so doesn't compile too well. I've undone this change
for now. Otherwise both patches merged.
-Daniel

 +
 + if (dev_priv-mm.interruptible  signal_pending(current)) {
 + ret = -ERESTARTSYS;
 + break;
 + }
  
   ret = i915_gem_check_wedge(dev_priv-gpu_error,
  dev_priv-mm.interruptible);
   if (ret)
 - return ret;
 - } while (!time_after(jiffies, end));
 + break;
 +
 + if (time_after(jiffies, end)) {
 + ret = -EBUSY;
 + break;
 + }
 + } while (1);
   trace_i915_ring_wait_end(ring);
 - return -EBUSY;
 + return ret;
  }
  
  static int intel_wrap_ring_buffer(struct intel_ring_buffer *ring)
 -- 
 2.0.0.rc0
 
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915/bdw: Only use 2g GGTT for 32b platforms

2014-05-07 Thread Ben Widawsky

On Wed, May 07, 2014 at 09:42:57AM +0200, Daniel Vetter wrote:
 On Tue, May 06, 2014 at 09:58:59PM -0700, Ben Widawsky wrote:
  From: Ben Widawsky benjamin.widaw...@linux.intel.com
  
  Daniel requested in the bug that I use a 3GB fallback size. Since this
  is not in the spec as a valid size, I decided against it. We could
  potentially add a patch to bump it to 3GB on top of this one.
  
  This probably should be CC: stable - but I'll let the powers that be
  decide that one.
  
  v2: Change ifdef to 32b, instead of ifndef
  update comment
  
  Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76619
  Signed-off-by: Ben Widawsky b...@bwidawsk.net
  ---
   drivers/gpu/drm/i915/i915_gem_gtt.c | 8 
   1 file changed, 8 insertions(+)
  
  diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
  b/drivers/gpu/drm/i915/i915_gem_gtt.c
  index 846b6ee..d03a540 100644
  --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
  +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
  @@ -1759,6 +1759,14 @@ static inline unsigned int 
  gen8_get_total_gtt_size(u16 bdw_gmch_ctl)
  bdw_gmch_ctl = BDW_GMCH_GGMS_MASK;
  if (bdw_gmch_ctl)
  bdw_gmch_ctl = 1  bdw_gmch_ctl;
  +
  +#ifdef CONFIG_32BIT
  +   /* Limit 32B platforms to a 2GB GGTT
  +   4  20 / pte size * PAGE_SIZE */
  +   if (bdw_gmch_ctl  4)
  +   bdw_gmch_ctl = 4;
 
 Comment needs CodingStyle polish and we might as well return the right
 value directly instead of adjusting bdw_gmch_ctl. With that polish applied
 this is Reviewed-by: Daniel Vetter daniel.vet...@ffwll.cho
 
 And yes most definitely Cc: sta...@vger.kernel.org since it's a
 regression.
 
 Aside: Please add the regression tags when handling bugs, I need those for
 tracking and stats.
 -Daniel
 

I don't know what a regression tag is.

  +#endif
  +
  return bdw_gmch_ctl  20;
   }
   
  -- 
  1.9.2
  
  ___
  Intel-gfx mailing list
  Intel-gfx@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/intel-gfx
 
 -- 
 Daniel Vetter
 Software Engineer, Intel Corporation
 +41 (0) 79 365 57 48 - http://blog.ffwll.ch
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ben Widawsky, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH] [v3] drm/i915/bdw: Only use 2g GGTT for 32b platforms

2014-05-07 Thread Ben Widawsky

Daniel requested in the bug that I use a 3GB fallback size. Since this
is not in the spec as a valid size, I decided against it. We could
potentially add a patch to bump it to 3GB on top of this one.

This probably should be CC: stable - but I'll let the powers that be
decide that one.

Regression from a revert of the revert:
commit 7907f45bf9f67a1c5e5d4ae05bab428d7c2f43b2
Author: Ben Widawsky benjamin.widaw...@intel.com
Date:   Wed Feb 19 22:05:46 2014 -0800

Revert drm/i915/bdw: Limit GTT to 2GB

v2: Change ifdef to 32b, instead of ifndef
update comment

v3. Update comment to not wrap (Daniel).
Update commit message

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76619
Cc: sta...@vger.kernel.org
Reviewed-by: Daniel Vetter daniel.vet...@ffwll.ch
Signed-off-by: Ben Widawsky b...@bwidawsk.net
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 4d87bf2..8d6d531 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2640,6 +2640,13 @@ static inline unsigned int gen8_get_total_gtt_size(u16 
bdw_gmch_ctl)
bdw_gmch_ctl = BDW_GMCH_GGMS_MASK;
if (bdw_gmch_ctl)
bdw_gmch_ctl = 1  bdw_gmch_ctl;
+
+#ifdef CONFIG_32BIT
+   /* Limit 32B platforms to a 2GB GGTT: 4  20 / pte size * PAGE_SIZE */
+   if (bdw_gmch_ctl  4)
+   bdw_gmch_ctl = 4;
+#endif
+
return bdw_gmch_ctl  20;
 }
 
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] UXA: Wait until a pageflip actually completes to report it.

2014-05-07 Thread Jamey Sharp

On Wed, May 07, 2014 at 04:55:18PM +0100, Chris Wilson wrote:
 On Mon, May 05, 2014 at 11:05:07PM -0700, Jamey Sharp wrote:
  UXA was reporting page-flip completion as soon as the flip was scheduled
  with the kernel, instead of waiting for the kernel to indicate that the
  flip had actually completed.
  
  Moving the DRI2SwapComplete call to the right place fixes all of our
  Piglit tests for OML_sync_control when run on xf86-video-intel/UXA,
  aside from a bit of difficult-to-reproduce flakiness when using a
  divisor  1.
 
 The violation is intentional, as it gives us triple buffering by
 default. It can be disabled.

As far as I can tell, this patch has no effect on triple-buffering. I
verified that by logging new_front-handle in intel_do_pageflip: It
rotates through three different BO's on successive flips.

Jamey


signature.asc
Description: Digital signature
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

53 matches

Mail list logo