Re: [Intel-gfx] [PATCH] drm/i915: Return -ENOENT for unknown contexts

2014-04-17 Thread Chris Wilson
On Wed, Apr 16, 2014 at 11:58:09AM +0300, Mika Kuoppala wrote:
 From: Mika Kuoppala mika.kuopp...@linux.intel.com
 
 If hw_contexts are disabled, we always return the per file
 descriptor default context stats. Make sure that the context
 is correctly given and fail accordingly if not.
 
 Testcase: igt/gem_reset_stats/params
 Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com

This code is obsolete, and the bug should be already fixed by

commit 691e6415c891b8b2b082a120b896b443531c4d45
Author: Chris Wilson ch...@chris-wilson.co.uk
Date:   Wed Apr 9 09:07:36 2014 +0100

drm/i915: Always use kref tracking for all contexts.

-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/fb-helper: Fix hpd vs. initial config races

2014-04-17 Thread Thierry Reding
On Wed, Apr 16, 2014 at 04:45:21PM +0200, Daniel Vetter wrote:
 Some drivers need to be able to have a perfect race-free fbcon setup.
 Current drivers only enable hotplug processing after the call to
 drm_fb_helper_initial_config which leaves a tiny but important race.
 
 This race is especially noticable on embedded platforms where the
 driver itself enables the voltage for the hdmi output, since only then
 will monitors (after a bit of delay, as usual) respond by asserting
 the hpd pin.
 
 Most of the infrastructure is already there with the split-out
 drm_fb_helper_init. And drm_fb_helper_initial_config already has all
 the required locking to handle concurrent hpd events since
 
 commit 53f1904bced78d7c00f5d874c662ec3ac85d0f9f
 Author: Daniel Vetter daniel.vet...@ffwll.ch
 Date:   Thu Mar 20 14:26:35 2014 +0100
 
 drm/fb-helper: improve drm_fb_helper_initial_config locking
 
 The only missing bit is making drm_fb_helper_hotplug_event save
 against concurrent calls of drm_fb_helper_initial_config. The only
 unprotected bit is the check for fb_helper-fb.
 
 With that drivers can first initialize the fb helper, then enabel
 hotplug processing and then set up the initial config all in a
 completely race-free manner. Update kerneldoc and convert i915 as a
 proof of concept.
 
 Feature requested by Thierry since his tegra driver atm reliably boots
 slowly enough to misses the hotplug event for an external hdmi screen,
 but also reliably boots to quickly for the hpd pin to be asserted when
 the fb helper calls into the hdmi -detect function.
 
 Cc: Thierry Reding tred...@nvidia.com
 Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch
 ---
  drivers/gpu/drm/drm_fb_helper.c | 11 +--
  drivers/gpu/drm/i915/i915_dma.c |  3 ---
  drivers/gpu/drm/i915/i915_drv.c |  2 --
  drivers/gpu/drm/i915/i915_drv.h |  1 -
  drivers/gpu/drm/i915/i915_irq.c |  4 
  5 files changed, 5 insertions(+), 16 deletions(-)

The FB helper parts:

Tested-by: Thierry Reding tred...@nvidia.com

But I have one comment below...

 diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
[...]
 - * Note that the driver must ensure that this is only called _after_ the fb 
 has
 - * been fully set up, i.e. after the call to drm_fb_helper_initial_config.
 + * Note that drivers may call this even before calling
 + * drm_fb_helper_initial_config but only aftert drm_fb_helper_init. This 
 allows

I don't think the requirement is that strict. To elaborate: on Tegra we
cannot call drm_fb_helper_init() because the number of CRTCs and
connectors isn't known this early. We determine that dynamically after
all sub-devices have been initialized. So instead of calling
drm_fb_helper_init() before drm_kms_helper_poll_init(), I did something
more minimal (see attached patch). It's kind of difficult to tell
because of the context, but tegra_drm_fb_prepare() sets up the mode
config and functions and allocate memory for the FB helper and sets the
FB helper functions.

This may not be enough for all drivers, but on Tegra the implementation
of .output_poll_changed() simply calls drm_fb_helper_hotplug_event(),
which will work fine with just that rudimentary initialization.

Thierry
From ea394150524c8b54ee4131ad830bf5beb6b1056e Mon Sep 17 00:00:00 2001
From: Thierry Reding tred...@nvidia.com
Date: Thu, 17 Apr 2014 10:02:17 +0200
Subject: [PATCH] drm/tegra: Implement race-free hotplug detection

A race condition currently exists on Tegra, where it can happen that a
monitor attached via HDMI isn't detected during the initial FB helper
setup, but the hotplug event happens too early to be processed by the
poll helpers because they haven't been initialized yet. This happens
because on some boards the HDMI driver can control the regulator that
supplies the +5V pin on the HDMI connector. Therefore depending on the
timing between the initialization of the HDMI driver and the rest of
DRM, it's possible that the monitor returns the hotplug signal right
within the window where we would miss it.

Unfortunately, drm_kms_helper_poll_init() will wreak havoc when called
before at least some parts of the FB helpers have been set up.

This commit fixes this by splitting out the minimum of initialization
required to make drm_kms_helper_poll_init() work into a separate
function that can be called early. It is then safe to move all of the
poll helper initialization to an earlier point in time (before the
HDMI output driver has a chance to enable the +5V supply). That way if
the hotplug signal is returned before the initial FB helper setup, the
monitor will be forcefully detected at that point, and if the hotplug
signal is returned after that it will be properly handled by the poll
helpers.

Signed-off-by: Thierry Reding tred...@nvidia.com
---
 drivers/gpu/drm/tegra/drm.c |  8 ++--
 drivers/gpu/drm/tegra/drm.h |  1 +
 drivers/gpu/drm/tegra/fb.c  | 50 ++---
 3 files changed, 40 insertions(+), 19 deletions(-)


Re: [Intel-gfx] [PATCH 5/5] drm/i915: Move buffer pinning and ring selection to intel_crtc_page_flip()

2014-04-17 Thread Chris Wilson
On Tue, Apr 15, 2014 at 09:41:38PM +0300, ville.syrj...@linux.intel.com wrote:
 From: Ville Syrjälä ville.syrj...@linux.intel.com
 
 All of the .queue_flip() callbacks duplicate the same code to pin the
 buffers and calculate the gtt_offset. Move that code to
 intel_crtc_page_flip(). In order to do that we must also move the ring
 selection logic there.
 
 Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com

I was a little uneasy about this given the fun-and-games we play with
ring selection and the historic issues we have had with bugs in
unpinning. However, this looks to be worth it, especially with the new
piece of common code,
Reviewed-by: Chris Wilson ch...@chris-wilson.co.uk
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Return -ENOENT for unknown contexts

2014-04-17 Thread Mika Kuoppala
Chris Wilson ch...@chris-wilson.co.uk writes:

 On Wed, Apr 16, 2014 at 11:58:09AM +0300, Mika Kuoppala wrote:
 From: Mika Kuoppala mika.kuopp...@linux.intel.com
 
 If hw_contexts are disabled, we always return the per file
 descriptor default context stats. Make sure that the context
 is correctly given and fail accordingly if not.
 
 Testcase: igt/gem_reset_stats/params
 Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com

 This code is obsolete, and the bug should be already fixed by

 commit 691e6415c891b8b2b082a120b896b443531c4d45
 Author: Chris Wilson ch...@chris-wilson.co.uk
 Date:   Wed Apr 9 09:07:36 2014 +0100

 drm/i915: Always use kref tracking for all contexts.

Agreed. I should have run it against fixes in addition to dinq.

-Mika

 -Chris

 -- 
 Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 00/25] vlv: add support for RPM

2014-04-17 Thread Ville Syrjälä
On Mon, Apr 14, 2014 at 08:24:21PM +0300, Imre Deak wrote:
 For a description of this patchset see the previous cover letter [1].
 
 Tested on HSW (non-ULT), VLV with igt/kms_flip and pm_pc8.
 
 v2:
 - addressed comments about getting the proper runtime PM references in
   debugfs (Daniel, Paulo, Ville)
 - disable RPM if RC6 is disabled for all platforms, not just VLV
   (Daniel)
 - refactored the runtime PM callbacks pulling platform independent
   teardown/re-init code to the generic runtime suspend/resume callbacks
   (Daniel)
 - fixed a couple of issues I bumped into while checking the RC6/RPS
   and the GPU reset error capturing path
 
 [1]
 http://lists.freedesktop.org/archives/intel-gfx/2014-April/043208.html

For patches 01-08, 12-13, 15-16, 18, 20, 22, 24:
Reviewed-by: Ville Syrjälä ville.syrj...@linux.intel.com

23,25,26 I didn't really look at so far, and the rest had some small
issues I pointed out.

-- 
Ville Syrjälä
Intel OTC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/2] drm/i915: remove unexplained vblank wait in the DP off code

2014-04-17 Thread Ville Syrjälä
On Fri, Apr 11, 2014 at 02:25:41PM -0700, Jesse Barnes wrote:
 I don't think this is necessary; at least it doesn't appear to be on my
 BYT.  Dropping it speeds up our shutdown code a little, in some cases
 resulting in faster init times.
 
 Signed-off-by: Jesse Barnes jbar...@virtuousgeek.org
 ---
  drivers/gpu/drm/i915/intel_dp.c | 3 ---
  1 file changed, 3 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
 index e48d47c..728a5db 100644
 --- a/drivers/gpu/drm/i915/intel_dp.c
 +++ b/drivers/gpu/drm/i915/intel_dp.c
 @@ -2756,9 +2756,6 @@ intel_dp_link_down(struct intel_dp *intel_dp)
   }
   POSTING_READ(intel_dp-output_reg);
  
 - /* We don't really know why we're doing this */
 - intel_wait_for_vblank(dev, intel_crtc-pipe);
 -

Maybe this was here to guarantee we send the magic five idle patterns
specified in the DP spec. But since we're going to be turning off the
port anyway I don't see why we switch to transmitting the idle pattern
at all.

I guess switching to the idle pattern might make sense for the IBX
transcoder select workaround to avoid sending some garbage on the main
link. Although we don't seem to be doing that workaround quite according
to spec. The spec says we should first disable the port, and then
re-enable it temporarily w/ transcoder A. What we do is switch the port
over to transcoder A while it's still enabled, and only then disable it.

So I guess killing the wait here is fine, but looks like the IBX
workaround stuff needs a better look. I can try to clean it up a bit.

Reviewed-by: Ville Syrjälä ville.syrj...@linux.intel.com

   if (HAS_PCH_IBX(dev) 
   I915_READ(intel_dp-output_reg)  DP_PIPEB_SELECT) {
   struct drm_crtc *crtc = intel_dig_port-base.base.crtc;
 -- 
 1.8.4.2
 
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/4] drm/i915: Invalidate our pages under memory pressure

2014-04-17 Thread Jani Nikula
On Wed, 16 Apr 2014, Robert Beckett robert.beck...@intel.com wrote:
 On 25/03/2014 13:23, Chris Wilson wrote:
 Try to flush out dirty pages into the swapcache (and from there into the
 swapfile) when under memory pressure and forced to drop GEM objects from
 memory. In effect, this should just allow us to discard unused pages for
 memory reclaim and to start writeback earlier.

 v2: Hugh Dickins warned that explicitly starting writeback from
 shrink_slab was prone to deadlocks within shmemfs.

 Cc: Hugh Dickins hu...@google.com
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 ---
   drivers/gpu/drm/i915/i915_gem.c | 38 +++---
   1 file changed, 27 insertions(+), 11 deletions(-)

 diff --git a/drivers/gpu/drm/i915/i915_gem.c 
 b/drivers/gpu/drm/i915/i915_gem.c
 index 135ee8bd55f6..8287fd6701c6 100644
 --- a/drivers/gpu/drm/i915/i915_gem.c
 +++ b/drivers/gpu/drm/i915/i915_gem.c
 @@ -60,7 +60,6 @@ static unsigned long i915_gem_shrinker_scan(struct 
 shrinker *shrinker,
  struct shrink_control *sc);
   static unsigned long i915_gem_purge(struct drm_i915_private *dev_priv, 
 long target);
   static unsigned long i915_gem_shrink_all(struct drm_i915_private 
 *dev_priv);
 -static void i915_gem_object_truncate(struct drm_i915_gem_object *obj);
   static void i915_gem_retire_requests_ring(struct intel_ring_buffer *ring);

   static bool cpu_cache_is_coherent(struct drm_device *dev,
 @@ -1685,12 +1684,16 @@ i915_gem_mmap_gtt_ioctl(struct drm_device *dev, void 
 *data,
  return i915_gem_mmap_gtt(file, dev, args-handle, args-offset);
   }

 +static inline int
 +i915_gem_object_is_purgeable(struct drm_i915_gem_object *obj)
 +{
 +return obj-madv == I915_MADV_DONTNEED;
 +}
 +
   /* Immediately discard the backing storage */
   static void
   i915_gem_object_truncate(struct drm_i915_gem_object *obj)
   {
 -struct inode *inode;
 -
  i915_gem_object_free_mmap_offset(obj);

  if (obj-base.filp == NULL)
 @@ -1701,16 +1704,28 @@ i915_gem_object_truncate(struct drm_i915_gem_object 
 *obj)
   * To do this we must instruct the shmfs to drop all of its
   * backing pages, *now*.
   */
 -inode = file_inode(obj-base.filp);
 -shmem_truncate_range(inode, 0, (loff_t)-1);
 -
 +shmem_truncate_range(file_inode(obj-base.filp), 0, (loff_t)-1);
  obj-madv = __I915_MADV_PURGED;
   }

 -static inline int
 -i915_gem_object_is_purgeable(struct drm_i915_gem_object *obj)
 +/* Try to discard unwanted pages */
 +static void
 +i915_gem_object_invalidate(struct drm_i915_gem_object *obj)
   {
 -return obj-madv == I915_MADV_DONTNEED;
 +struct address_space *mapping;
 +
 +switch (obj-madv) {
 +case I915_MADV_DONTNEED:
 +i915_gem_object_truncate(obj);
 +case __I915_MADV_PURGED:
 +return;
 +}
 +
 +if (obj-base.filp == NULL)
 +return;
 +
 +mapping = file_inode(obj-base.filp)-i_mapping,
 +invalidate_mapping_pages(mapping, 0, (loff_t)-1);
   }

   static void
 @@ -1775,8 +1790,7 @@ i915_gem_object_put_pages(struct drm_i915_gem_object 
 *obj)
  ops-put_pages(obj);
  obj-pages = NULL;

 -if (i915_gem_object_is_purgeable(obj))
 -i915_gem_object_truncate(obj);
 +i915_gem_object_invalidate(obj);

  return 0;
   }
 @@ -4201,6 +4215,8 @@ void i915_gem_free_object(struct drm_gem_object 
 *gem_obj)

  if (WARN_ON(obj-pages_pin_count))
  obj-pages_pin_count = 0;
 +if (obj-madv != __I915_MADV_PURGED)
 +obj-madv = I915_MADV_DONTNEED;
  i915_gem_object_put_pages(obj);
  i915_gem_object_free_mmap_offset(obj);
  i915_gem_object_release_stolen(obj);


 Functionally it looks good to me.

 Though, you may want a /* fall-through */ comment (some people cant 
 mentally parse fallthroughs without being prompted) and a default: 
 break; (to avoid any static code analysis complaints) in the switch in 
 i915_gem_object_invalidate.

Or just two if statements.

Jani.




 Reviewed-by: Robert Beckett robert.beck...@intel.com
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Jani Nikula, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 3/3] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2014-04-17 Thread Volkin, Bradley D
On Mon, Mar 17, 2014 at 05:21:56AM -0700, Chris Wilson wrote:
 By exporting the ability to map user address and inserting PTEs
 representing their backing pages into the GTT, we can exploit UMA in order
 to utilize normal application data as a texture source or even as a
 render target (depending upon the capabilities of the chipset). This has
 a number of uses, with zero-copy downloads to the GPU and efficient
 readback making the intermixed streaming of CPU and GPU operations
 fairly efficient. This ability has many widespread implications from
 faster rendering of client-side software rasterisers (chromium),
 mitigation of stalls due to read back (firefox) and to faster pipelining
 of texture data (such as pixel buffer objects in GL or data blobs in CL).
 
 v2: Compile with CONFIG_MMU_NOTIFIER
 v3: We can sleep while performing invalidate-range, which we can utilise
 to drop our page references prior to the kernel manipulating the vma
 (for either discard or cloning) and so protect normal users.
 v4: Only run the invalidate notifier if the range intercepts the bo.
 v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
 v6: Recheck after reacquire mutex for lost mmu.
 v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary.
 v8: Fix rebasing error after forwarding porting the back port.
 v9: Limit the userptr to page aligned entries. We now expect userspace
 to handle all the offset-in-page adjustments itself.
 v10: Prevent vma from being copied across fork to avoid issues with cow.
 v11: Drop vma behaviour changes -- locking is nigh on impossible.
  Use a worker to load user pages to avoid lock inversions.
 v12: Use get_task_mm()/mmput() for correct refcounting of mm.
 v13: Use a worker to release the mmu_notifier to avoid lock inversion
 v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer
  with its own locking and tree of objects for each mm/mmu_notifier.
 v15: Prevent overlapping userptr objects, and invalidate all objects
  within the mmu_notifier range
 v16: Fix a typo for iterating over multiple objects in the range and
  rearrange error path to destroy the mmu_notifier locklessly.
  Also close a race between invalidate_range and the get_pages_worker.
 v17: Close a race between get_pages_worker/invalidate_range and fresh
  allocations of the same userptr range - and notice that
  struct_mutex was presumed to be held when during creation it wasn't.
 v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory
  for the struct sg_table and to clear it before reporting an error.
 v19: Always error out on read-only userptr requests as we don't have the
  hardware infrastructure to support them at the moment.
 v20: Refuse to implement read-only support until we have the required
  infrastructure - but reserve the bit in flags for future use.
 v21: use_mm() is not required for get_user_pages(). It is only meant to
  be used to fix up the kernel thread's current-mm for use with
  copy_user().
 
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 Cc: Tvrtko Ursulin tvrtko.ursu...@linux.intel.com
 Cc: Gong, Zhipeng zhipeng.g...@intel.com
 Cc: Akash Goel akash.g...@intel.com
 Cc: Volkin, Bradley D bradley.d.vol...@intel.com
 Reviewed-by: Tvrtko Ursulin tvrtko.ursu...@linux.intel.com
 ---
  drivers/gpu/drm/i915/Kconfig|   1 +
  drivers/gpu/drm/i915/Makefile   |   1 +
  drivers/gpu/drm/i915/i915_dma.c |   1 +
  drivers/gpu/drm/i915/i915_drv.h |  24 +-
  drivers/gpu/drm/i915/i915_gem.c |   4 +
  drivers/gpu/drm/i915/i915_gem_dmabuf.c  |   5 +
  drivers/gpu/drm/i915/i915_gem_userptr.c | 679 
 
  drivers/gpu/drm/i915/i915_gpu_error.c   |   2 +
  include/uapi/drm/i915_drm.h |  16 +
  9 files changed, 732 insertions(+), 1 deletion(-)
  create mode 100644 drivers/gpu/drm/i915/i915_gem_userptr.c
 
 diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
 index 73ed59eff139..9940baee10c2 100644
 --- a/drivers/gpu/drm/i915/Kconfig
 +++ b/drivers/gpu/drm/i915/Kconfig
 @@ -5,6 +5,7 @@ config DRM_I915
   depends on (AGP || AGP=n)
   select INTEL_GTT
   select AGP_INTEL if AGP
 + select INTERVAL_TREE
   # we need shmfs for the swappable backing store, and in particular
   # the shmem_readpage() which depends upon tmpfs
   select SHMEM
 diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
 index 2e02137a0d6e..1856494439e9 100644
 --- a/drivers/gpu/drm/i915/Makefile
 +++ b/drivers/gpu/drm/i915/Makefile
 @@ -27,6 +27,7 @@ i915-y += i915_cmd_parser.o \
 i915_gem.o \
 i915_gem_stolen.o \
 i915_gem_tiling.o \
 +   i915_gem_userptr.o \
 i915_gpu_error.o \
 i915_irq.o \
 i915_trace_points.o \
 diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
 index 

Re: [Intel-gfx] [PATCH 2/3] drm/i915: Do not call retire_requests from wait_for_rendering

2014-04-17 Thread Volkin, Bradley D
Hi Chris, just want to bring this one back to your attention while
I'm going through the rest of the series.
Thanks,
Brad

On Fri, Mar 28, 2014 at 03:58:25PM -0700, Volkin, Bradley D wrote:
 On Mon, Mar 17, 2014 at 05:21:55AM -0700, Chris Wilson wrote:
  A common issue we have is that retiring requests causes recursion
  through GTT manipulation or page table manipulation which we can only
  handle at very specific points. However, to maintain internal
  consistency (enforced through our sanity checks on write_domain at
  various points in the GEM object lifecycle) we do need to retire the
  object prior to marking it with a new write_domain, and also clear the
  write_domain for the implicit flush following a batch.
  
  Note that this then allows the unbound objects to still be on the active
  lists, and so care must be taken when removing objects from unbound lists
  (similar to the caveats we face processing the bound lists).
  
  v2: Fix i915_gem_shrink_all() to handle updated object lifetime rules,
  by refactoring it to call into __i915_gem_shrink().
  
  v3: Missed an object-retire prior to changing cache domains in
  i915_gem_object_set_cache_leve()
  
  v4: Rebase
  
  Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
  Tested-by: Ville Syrjälä ville.syrj...@linux.intel.com
  ---
   drivers/gpu/drm/i915/i915_gem.c| 112 
  -
   drivers/gpu/drm/i915/i915_gem_execbuffer.c |   3 +
   2 files changed, 66 insertions(+), 49 deletions(-)
  
  diff --git a/drivers/gpu/drm/i915/i915_gem.c 
  b/drivers/gpu/drm/i915/i915_gem.c
  index 58704ce62e3e..5cf4d80de867 100644
  --- a/drivers/gpu/drm/i915/i915_gem.c
  +++ b/drivers/gpu/drm/i915/i915_gem.c
  @@ -44,6 +44,9 @@ static void i915_gem_object_flush_cpu_write_domain(struct 
  drm_i915_gem_object *o
   static __must_check int
   i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
 bool readonly);
  +static void
  +i915_gem_object_retire(struct drm_i915_gem_object *obj);
  +
   static int i915_gem_phys_pwrite(struct drm_device *dev,
  struct drm_i915_gem_object *obj,
  struct drm_i915_gem_pwrite *args,
  @@ -502,6 +505,8 @@ int i915_gem_obj_prepare_shmem_read(struct 
  drm_i915_gem_object *obj,
  ret = i915_gem_object_wait_rendering(obj, true);
  if (ret)
  return ret;
  +
  +   i915_gem_object_retire(obj);
  }
   
  ret = i915_gem_object_get_pages(obj);
  @@ -917,6 +922,8 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
  ret = i915_gem_object_wait_rendering(obj, false);
  if (ret)
  return ret;
  +
  +   i915_gem_object_retire(obj);
  }
  /* Same trick applies to invalidate partially written cachelines read
   * before writing. */
  @@ -1304,7 +1311,8 @@ static int
   i915_gem_object_wait_rendering__tail(struct drm_i915_gem_object *obj,
   struct intel_ring_buffer *ring)
   {
  -   i915_gem_retire_requests_ring(ring);
  +   if (!obj-active)
  +   return 0;
   
  /* Manually manage the write flush as we may have not yet
   * retired the buffer.
  @@ -1314,7 +1322,6 @@ i915_gem_object_wait_rendering__tail(struct 
  drm_i915_gem_object *obj,
   * we know we have passed the last write.
   */
  obj-last_write_seqno = 0;
  -   obj-base.write_domain = ~I915_GEM_GPU_DOMAINS;
   
  return 0;
   }
  @@ -1949,58 +1956,58 @@ static unsigned long
   __i915_gem_shrink(struct drm_i915_private *dev_priv, long target,
bool purgeable_only)
   {
  -   struct list_head still_bound_list;
  -   struct drm_i915_gem_object *obj, *next;
  +   struct list_head still_in_list;
  +   struct drm_i915_gem_object *obj;
  unsigned long count = 0;
   
  -   list_for_each_entry_safe(obj, next,
  -dev_priv-mm.unbound_list,
  -global_list) {
  -   if ((i915_gem_object_is_purgeable(obj) || !purgeable_only) 
  -   i915_gem_object_put_pages(obj) == 0) {
  -   count += obj-base.size  PAGE_SHIFT;
  -   if (count = target)
  -   return count;
  -   }
  -   }
  -
  /*
  -* As we may completely rewrite the bound list whilst unbinding
  +* As we may completely rewrite the (un)bound list whilst unbinding
   * (due to retiring requests) we have to strictly process only
   * one element of the list at the time, and recheck the list
   * on every iteration.
 
 Is it still true that we could retire requests on this path? I see that
 currently we will retire requests via:
 i915_vma_unbind - i915_gem_object_finish_gpu - 
 i915_gem_object_wait_rendering.
 
 But we've taken the explicit request retirement out of the wait_rendering 
 path.
 Have I missed somewhere that it could still happen?
 
 Thanks,
 Brad
 
 

Re: [Intel-gfx] [PATCH 3/3] tests/gem_userptr_benchmark: Benchmarking userptr surfaces and impact

2014-04-17 Thread Volkin, Bradley D
On Wed, Mar 19, 2014 at 04:13:06AM -0700, Tvrtko Ursulin wrote:
 From: Tvrtko Ursulin tvrtko.ursu...@intel.com
 
 This adds a small benchmark for the new userptr functionality.
 
 Apart from basic surface creation and destruction, also tested is the
 impact of having userptr surfaces in the process address space. Reason
 for that is the impact of MMU notifiers on common address space
 operations like munmap() which is per process.
 
 v2:
   * Moved to benchmarks.
   * Added pointer read/write tests.
   * Changed output to say iterations per second instead of
 operations per second.
   * Multiply result by batch size for multi-create* tests
 for a more comparable number with create-destroy test.
 
 v3:
   * Fixed ioctl detection on kernels without MMU_NOTIFIERs.
 
 Signed-off-by: Tvrtko Ursulin tvrtko.ursu...@intel.com
 Cc: Chris Wilson ch...@chris-wilson.co.uk
 ---
  Android.mk |   3 +-
  benchmarks/.gitignore  |   1 +
  benchmarks/Android.mk  |  36 +++
  benchmarks/Makefile.am |   7 +-
  benchmarks/Makefile.sources|   6 +
  benchmarks/gem_userptr_benchmark.c | 513 
 +
  6 files changed, 558 insertions(+), 8 deletions(-)
  create mode 100644 benchmarks/Android.mk
  create mode 100644 benchmarks/Makefile.sources
  create mode 100644 benchmarks/gem_userptr_benchmark.c
 
 diff --git a/Android.mk b/Android.mk
 index 8aeb2d4..0c969b8 100644
 --- a/Android.mk
 +++ b/Android.mk
 @@ -1,2 +1 @@
 -include $(call all-named-subdir-makefiles, lib tests tools)
 -
 +include $(call all-named-subdir-makefiles, lib tests tools benchmarks)
 diff --git a/benchmarks/.gitignore b/benchmarks/.gitignore
 index ddea6f7..09e5bd8 100644
 --- a/benchmarks/.gitignore
 +++ b/benchmarks/.gitignore
 @@ -1,3 +1,4 @@
 +gem_userptr_benchmark
  intel_upload_blit_large
  intel_upload_blit_large_gtt
  intel_upload_blit_large_map
 diff --git a/benchmarks/Android.mk b/benchmarks/Android.mk
 new file mode 100644
 index 000..5bb8ef5
 --- /dev/null
 +++ b/benchmarks/Android.mk
 @@ -0,0 +1,36 @@
 +LOCAL_PATH := $(call my-dir)
 +
 +include $(LOCAL_PATH)/Makefile.sources
 +
 +##
 +
 +define add_benchmark
 +include $(CLEAR_VARS)
 +
 +LOCAL_SRC_FILES := $1.c
 +
 +LOCAL_CFLAGS += -DHAVE_STRUCT_SYSINFO_TOTALRAM
 +LOCAL_CFLAGS += -DANDROID -UNDEBUG -include check-ndebug.h
 +LOCAL_CFLAGS += -std=c99
 +# FIXME: drop once Bionic correctly annotates noreturn on pthread_exit
 +LOCAL_CFLAGS += -Wno-error=return-type
 +# Excessive complaining for established cases. Rely on the Linux version 
 warnings.
 +LOCAL_CFLAGS += -Wno-sign-compare
 +
 +LOCAL_MODULE := $1
 +LOCAL_MODULE_TAGS := optional
 +
 +LOCAL_STATIC_LIBRARIES := libintel_gpu_tools
 +
 +LOCAL_SHARED_LIBRARIES := libpciaccess  \
 +  libdrm\
 +  libdrm_intel
 +
 +include $(BUILD_EXECUTABLE)
 +endef
 +
 +##
 +
 +benchmark_list := $(bin_PROGRAMS)
 +
 +$(foreach item,$(benchmark_list),$(eval $(call add_benchmark,$(item
 diff --git a/benchmarks/Makefile.am b/benchmarks/Makefile.am
 index e2ad784..d173bf4 100644
 --- a/benchmarks/Makefile.am
 +++ b/benchmarks/Makefile.am
 @@ -1,9 +1,4 @@
 -
 -bin_PROGRAMS =   \
 - intel_upload_blit_large \
 - intel_upload_blit_large_gtt \
 - intel_upload_blit_large_map \
 - intel_upload_blit_small
 +include Makefile.sources
  
  AM_CPPFLAGS = -I$(top_srcdir) -I$(top_srcdir)/lib
  AM_CFLAGS = $(DRM_CFLAGS) $(CWARNFLAGS) $(CAIRO_CFLAGS)
 diff --git a/benchmarks/Makefile.sources b/benchmarks/Makefile.sources
 new file mode 100644
 index 000..fd6c107
 --- /dev/null
 +++ b/benchmarks/Makefile.sources
 @@ -0,0 +1,6 @@
 +bin_PROGRAMS =  \
 +intel_upload_blit_large \
 +intel_upload_blit_large_gtt \
 +intel_upload_blit_large_map \
 +intel_upload_blit_small \
 +gem_userptr_benchmark

You might split the makefile cleanup aspect of this into a separate
patch, but I'm fine either way.

 diff --git a/benchmarks/gem_userptr_benchmark.c 
 b/benchmarks/gem_userptr_benchmark.c
 new file mode 100644
 index 000..218f6f1
 --- /dev/null
 +++ b/benchmarks/gem_userptr_benchmark.c
 @@ -0,0 +1,513 @@
 +/*
 + * Copyright © 2014 Intel Corporation
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the Software),
 + * to deal in the Software without restriction, including without limitation
 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 + * and/or sell copies of the Software, and to permit persons to whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including the 

Re: [Intel-gfx] [PATCH 2/3] tests/gem_vmap_blits: Remove obsolete test case

2014-04-17 Thread Volkin, Bradley D
On Wed, Mar 19, 2014 at 04:13:05AM -0700, Tvrtko Ursulin wrote:
 From: Tvrtko Ursulin tvrtko.ursu...@intel.com
 
 No need for the old test case once the new one was added.
 
 Signed-off-by: Tvrtko Ursulin tvrtko.ursu...@intel.com

Reviewed-by: Brad Volkin bradley.d.vol...@intel.com

 ---
  tests/.gitignore   |   1 -
  tests/Makefile.sources |   1 -
  tests/gem_vmap_blits.c | 344 
 -
  3 files changed, 346 deletions(-)
  delete mode 100644 tests/gem_vmap_blits.c
 
 diff --git a/tests/.gitignore b/tests/.gitignore
 index ff1eff5..c600823 100644
 --- a/tests/.gitignore
 +++ b/tests/.gitignore
 @@ -93,7 +93,6 @@ gem_tiled_swapping
  gem_tiling_max_stride
  gem_unfence_active_buffers
  gem_unref_active_buffers
 -gem_vmap_blits
  gem_userptr_blits
  gem_wait_render_timeout
  gem_write_read_ring_switch
 diff --git a/tests/Makefile.sources b/tests/Makefile.sources
 index 240bd99..610bbf5 100644
 --- a/tests/Makefile.sources
 +++ b/tests/Makefile.sources
 @@ -118,7 +118,6 @@ TESTS_progs = \
   gem_tiling_max_stride \
   gem_unfence_active_buffers \
   gem_unref_active_buffers \
 - gem_vmap_blits \
   gem_userptr_blits \
   gem_wait_render_timeout \
   gen3_mixed_blits \
 diff --git a/tests/gem_vmap_blits.c b/tests/gem_vmap_blits.c
 deleted file mode 100644
 index 48297af..000
 --- a/tests/gem_vmap_blits.c
 +++ /dev/null
 @@ -1,344 +0,0 @@
 -/*
 - * Copyright © 2009,2011 Intel Corporation
 - *
 - * Permission is hereby granted, free of charge, to any person obtaining a
 - * copy of this software and associated documentation files (the Software),
 - * to deal in the Software without restriction, including without limitation
 - * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 - * and/or sell copies of the Software, and to permit persons to whom the
 - * Software is furnished to do so, subject to the following conditions:
 - *
 - * The above copyright notice and this permission notice (including the next
 - * paragraph) shall be included in all copies or substantial portions of the
 - * Software.
 - *
 - * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
 DEALINGS
 - * IN THE SOFTWARE.
 - *
 - * Authors:
 - *Eric Anholt e...@anholt.net
 - *Chris Wilson ch...@chris-wilson.co.uk
 - *
 - */
 -
 -/** @file gem_vmap_blits.c
 - *
 - * This is a test of doing many blits using a mixture of normal system pages
 - * and uncached linear buffers, with a working set larger than the
 - * aperture size.
 - *
 - * The goal is to simply ensure the basics work.
 - */
 -
 -#include stdlib.h
 -#include stdio.h
 -#include string.h
 -#include fcntl.h
 -#include inttypes.h
 -#include errno.h
 -#include sys/stat.h
 -#include sys/time.h
 -#include drm.h
 -#include i915_drm.h
 -#include drmtest.h
 -#include intel_bufmgr.h
 -#include intel_batchbuffer.h
 -#include intel_gpu_tools.h
 -
 -#if !defined(I915_PARAM_HAS_VMAP)
 -#pragma message(No vmap support in drm, skipping)
 -int main(int argc, char **argv)
 -{
 - fprintf(stderr, No vmap support in drm.\n);
 - return 77;
 -}
 -#else
 -
 -#define WIDTH 512
 -#define HEIGHT 512
 -
 -static uint32_t linear[WIDTH*HEIGHT];
 -
 -static uint32_t gem_vmap(int fd, void *ptr, int size, int read_only)
 -{
 - struct drm_i915_gem_vmap vmap;
 -
 - vmap.user_ptr = (uintptr_t)ptr;
 - vmap.user_size = size;
 - vmap.flags = 0;
 - if (read_only)
 - vmap.flags |= I915_VMAP_READ_ONLY;
 -
 - if (drmIoctl(fd, DRM_IOCTL_I915_GEM_VMAP, vmap))
 - return 0;
 -
 - return vmap.handle;
 -}
 -
 -
 -static void gem_vmap_sync(int fd, uint32_t handle)
 -{
 - gem_set_domain(fd, handle, I915_GEM_DOMAIN_CPU, I915_GEM_DOMAIN_CPU);
 -}
 -
 -static void
 -copy(int fd, uint32_t dst, uint32_t src)
 -{
 - uint32_t batch[10];
 - struct drm_i915_gem_relocation_entry reloc[2];
 - struct drm_i915_gem_exec_object2 obj[3];
 - struct drm_i915_gem_execbuffer2 exec;
 - uint32_t handle;
 - int ret;
 -
 - batch[0] = XY_SRC_COPY_BLT_CMD |
 -   XY_SRC_COPY_BLT_WRITE_ALPHA |
 -   XY_SRC_COPY_BLT_WRITE_RGB | 6;
 - batch[1] = (3  24) | /* 32 bits */
 -   (0xcc  16) | /* copy ROP */
 -   WIDTH*4;
 - batch[2] = 0; /* dst x1,y1 */
 - batch[3] = (HEIGHT  16) | WIDTH; /* dst x2,y2 */
 - batch[4] = 0; /* dst reloc */
 - batch[5] = 0; /* src x1,y1 */
 - batch[6] = WIDTH*4;
 - batch[7] = 0; /* src reloc */
 - batch[8] = MI_BATCH_BUFFER_END;
 - batch[9] = MI_NOOP;
 -
 -   

Re: [Intel-gfx] [PATCH 28/71] drm/i915/chv: Added CHV specific register read and write

2014-04-17 Thread Ben Widawsky
On Wed, Apr 09, 2014 at 01:28:26PM +0300, ville.syrj...@linux.intel.com wrote:
 From: Deepak S deepa...@intel.com
 
 Support to individually control Media/Render well based on the register 
 access.
 Add CHV specific write function to habdle difference between registers
 that are sadowed vs those that need forcewake even for writes.
 
 v2: Drop write FIFO for CHV and add comman well forcewake (Ville)
 
 Signed-off-by: Deepak S deepa...@intel.com
 [vsyrjala: Move the register range macros into intel_uncore.c]
 Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com
 ---
  drivers/gpu/drm/i915/intel_uncore.c | 139 
 +---
  1 file changed, 131 insertions(+), 8 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/intel_uncore.c 
 b/drivers/gpu/drm/i915/intel_uncore.c
 index 823d699..8e3c686 100644
 --- a/drivers/gpu/drm/i915/intel_uncore.c
 +++ b/drivers/gpu/drm/i915/intel_uncore.c
 @@ -495,6 +495,31 @@ void assert_force_wake_inactive(struct drm_i915_private 
 *dev_priv)
   ((reg) = 0x22000  (reg)  0x24000) ||\
   ((reg) = 0x3  (reg)  0x4))
  
 +#define FORCEWAKE_CHV_RENDER_RANGE_OFFSET(reg) \
 + (((reg) = 0x2000  (reg)  0x4000) ||\
 + ((reg) = 0x5000  (reg)  0x8000) ||\
 + ((reg) = 0x8300  (reg)  0x8500) ||\
 + ((reg) = 0xB000  (reg)  0xC000) ||\
 + ((reg) = 0xE000  (reg)  0xE800))
 +
 +#define FORCEWAKE_CHV_MEDIA_RANGE_OFFSET(reg)\
 + (((reg) = 0x8800  (reg)  0x8900) ||\
 + ((reg) = 0xD000  (reg)  0xD800) ||\
 + ((reg) = 0x12000  (reg)  0x14000) ||\
 + ((reg) = 0x1A000  (reg)  0x1C000) ||\
 + ((reg) = 0x1E800  (reg)  0x1EA00) ||\
 + ((reg) = 0x3  (reg)  0x4))
 +
 +#define FORCEWAKE_CHV_COMMON_RANGE_OFFSET(reg)\
 + (((reg) = 0x4000  (reg)  0x5000) ||\
 + ((reg) = 0x8000  (reg)  0x8300) ||\
 + ((reg) = 0x8500  (reg)  0x8600) ||\
 + ((reg) = 0x9000  (reg)  0xB000) ||\
 + ((reg) = 0xC000  (reg)  0xc800) ||\
 + ((reg) = 0xF000  (reg)  0x1) ||\
 + ((reg) = 0x14000  (reg)  0x14400) ||\
 + ((reg) = 0x22000  (reg)  0x24000))
 +

To satisfy both Chris, and Ville, how about:
#define REG_RANGE(reg, start, end) ((reg) = (start)  (reg)  0x5000)
REG_RANGE(reg, 0x4000, 0x5000) || \
REG_RANGE(reg, 0x8000, 0x8300) || \
...

By the way, I spent my due diligence trying to find where these ranges
come from, and have been unable. Doc name? I should have all the docs
from Ville.

I can't speak for the code generated, either.

  static void
  ilk_dummy_write(struct drm_i915_private *dev_priv)
  {
 @@ -587,7 +612,48 @@ vlv_read##x(struct drm_i915_private *dev_priv, off_t 
 reg, bool trace) { \
   REG_READ_FOOTER; \
  }
  
 +#define __chv_read(x) \
 +static u##x \
 +chv_read##x(struct drm_i915_private *dev_priv, off_t reg, bool trace) { \
 + unsigned fwengine = 0; \
 + REG_READ_HEADER(x); \
 + if (FORCEWAKE_CHV_RENDER_RANGE_OFFSET(reg)) { \
 + fwengine = FORCEWAKE_RENDER; \
 + } \
 + else if (FORCEWAKE_CHV_MEDIA_RANGE_OFFSET(reg)) { \
 + fwengine = FORCEWAKE_MEDIA; \
 + } \
 + else if (FORCEWAKE_CHV_COMMON_RANGE_OFFSET(reg)) { \
 + fwengine = FORCEWAKE_ALL; \
 + } \
 + if (FORCEWAKE_RENDER  fwengine) { \
 + if (dev_priv-uncore.fw_rendercount++ == 0) \
 + (dev_priv)-uncore.funcs.force_wake_get(dev_priv, \
 + fwengine); \
 + } \
 + if (FORCEWAKE_MEDIA  fwengine) { \
 + if (dev_priv-uncore.fw_mediacount++ == 0) \
 + (dev_priv)-uncore.funcs.force_wake_get(dev_priv, \
 + fwengine); \
 + } \
 + val = __raw_i915_read##x(dev_priv, reg); \
 + if (FORCEWAKE_RENDER  fwengine) { \
 + if (dev_priv-uncore.fw_rendercount++ == 0) \
 + (dev_priv)-uncore.funcs.force_wake_put(dev_priv, \
 + fwengine); \
 + } \
 + if (FORCEWAKE_MEDIA  fwengine) { \
 + if (dev_priv-uncore.fw_mediacount++ == 0) \
 + (dev_priv)-uncore.funcs.force_wake_put(dev_priv, \
 + fwengine); \
 + } \
 + REG_READ_FOOTER; \
 +}
  
 +__chv_read(8)
 +__chv_read(16)
 +__chv_read(32)
 +__chv_read(64)
  __vlv_read(8)
  __vlv_read(16)
  __vlv_read(32)
 @@ -605,6 +671,7 @@ __gen4_read(16)
  __gen4_read(32)
  __gen4_read(64)
  
 +#undef __chv_read
  #undef __vlv_read
  #undef __gen6_read
  #undef __gen5_read
 @@ -709,6 +776,49 @@ gen8_write##x(struct drm_i915_private *dev_priv, off_t 
 reg, u##x val, bool trace
   REG_WRITE_FOOTER; \
  }
  
 +#define __chv_write(x) \
 +static void \
 +chv_write##x(struct drm_i915_private *dev_priv, off_t reg, u##x val, bool 
 trace) { \
 + unsigned fwengine = 0; \
 + bool __needs_put = !is_gen8_shadowed(dev_priv, reg);