Re: [Intel-gfx] [PATCH] drm/i915: Return -ENOENT for unknown contexts
On Wed, Apr 16, 2014 at 11:58:09AM +0300, Mika Kuoppala wrote: From: Mika Kuoppala mika.kuopp...@linux.intel.com If hw_contexts are disabled, we always return the per file descriptor default context stats. Make sure that the context is correctly given and fail accordingly if not. Testcase: igt/gem_reset_stats/params Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com This code is obsolete, and the bug should be already fixed by commit 691e6415c891b8b2b082a120b896b443531c4d45 Author: Chris Wilson ch...@chris-wilson.co.uk Date: Wed Apr 9 09:07:36 2014 +0100 drm/i915: Always use kref tracking for all contexts. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/fb-helper: Fix hpd vs. initial config races
On Wed, Apr 16, 2014 at 04:45:21PM +0200, Daniel Vetter wrote: Some drivers need to be able to have a perfect race-free fbcon setup. Current drivers only enable hotplug processing after the call to drm_fb_helper_initial_config which leaves a tiny but important race. This race is especially noticable on embedded platforms where the driver itself enables the voltage for the hdmi output, since only then will monitors (after a bit of delay, as usual) respond by asserting the hpd pin. Most of the infrastructure is already there with the split-out drm_fb_helper_init. And drm_fb_helper_initial_config already has all the required locking to handle concurrent hpd events since commit 53f1904bced78d7c00f5d874c662ec3ac85d0f9f Author: Daniel Vetter daniel.vet...@ffwll.ch Date: Thu Mar 20 14:26:35 2014 +0100 drm/fb-helper: improve drm_fb_helper_initial_config locking The only missing bit is making drm_fb_helper_hotplug_event save against concurrent calls of drm_fb_helper_initial_config. The only unprotected bit is the check for fb_helper-fb. With that drivers can first initialize the fb helper, then enabel hotplug processing and then set up the initial config all in a completely race-free manner. Update kerneldoc and convert i915 as a proof of concept. Feature requested by Thierry since his tegra driver atm reliably boots slowly enough to misses the hotplug event for an external hdmi screen, but also reliably boots to quickly for the hpd pin to be asserted when the fb helper calls into the hdmi -detect function. Cc: Thierry Reding tred...@nvidia.com Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch --- drivers/gpu/drm/drm_fb_helper.c | 11 +-- drivers/gpu/drm/i915/i915_dma.c | 3 --- drivers/gpu/drm/i915/i915_drv.c | 2 -- drivers/gpu/drm/i915/i915_drv.h | 1 - drivers/gpu/drm/i915/i915_irq.c | 4 5 files changed, 5 insertions(+), 16 deletions(-) The FB helper parts: Tested-by: Thierry Reding tred...@nvidia.com But I have one comment below... diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c [...] - * Note that the driver must ensure that this is only called _after_ the fb has - * been fully set up, i.e. after the call to drm_fb_helper_initial_config. + * Note that drivers may call this even before calling + * drm_fb_helper_initial_config but only aftert drm_fb_helper_init. This allows I don't think the requirement is that strict. To elaborate: on Tegra we cannot call drm_fb_helper_init() because the number of CRTCs and connectors isn't known this early. We determine that dynamically after all sub-devices have been initialized. So instead of calling drm_fb_helper_init() before drm_kms_helper_poll_init(), I did something more minimal (see attached patch). It's kind of difficult to tell because of the context, but tegra_drm_fb_prepare() sets up the mode config and functions and allocate memory for the FB helper and sets the FB helper functions. This may not be enough for all drivers, but on Tegra the implementation of .output_poll_changed() simply calls drm_fb_helper_hotplug_event(), which will work fine with just that rudimentary initialization. Thierry From ea394150524c8b54ee4131ad830bf5beb6b1056e Mon Sep 17 00:00:00 2001 From: Thierry Reding tred...@nvidia.com Date: Thu, 17 Apr 2014 10:02:17 +0200 Subject: [PATCH] drm/tegra: Implement race-free hotplug detection A race condition currently exists on Tegra, where it can happen that a monitor attached via HDMI isn't detected during the initial FB helper setup, but the hotplug event happens too early to be processed by the poll helpers because they haven't been initialized yet. This happens because on some boards the HDMI driver can control the regulator that supplies the +5V pin on the HDMI connector. Therefore depending on the timing between the initialization of the HDMI driver and the rest of DRM, it's possible that the monitor returns the hotplug signal right within the window where we would miss it. Unfortunately, drm_kms_helper_poll_init() will wreak havoc when called before at least some parts of the FB helpers have been set up. This commit fixes this by splitting out the minimum of initialization required to make drm_kms_helper_poll_init() work into a separate function that can be called early. It is then safe to move all of the poll helper initialization to an earlier point in time (before the HDMI output driver has a chance to enable the +5V supply). That way if the hotplug signal is returned before the initial FB helper setup, the monitor will be forcefully detected at that point, and if the hotplug signal is returned after that it will be properly handled by the poll helpers. Signed-off-by: Thierry Reding tred...@nvidia.com --- drivers/gpu/drm/tegra/drm.c | 8 ++-- drivers/gpu/drm/tegra/drm.h | 1 + drivers/gpu/drm/tegra/fb.c | 50 ++--- 3 files changed, 40 insertions(+), 19 deletions(-)
Re: [Intel-gfx] [PATCH 5/5] drm/i915: Move buffer pinning and ring selection to intel_crtc_page_flip()
On Tue, Apr 15, 2014 at 09:41:38PM +0300, ville.syrj...@linux.intel.com wrote: From: Ville Syrjälä ville.syrj...@linux.intel.com All of the .queue_flip() callbacks duplicate the same code to pin the buffers and calculate the gtt_offset. Move that code to intel_crtc_page_flip(). In order to do that we must also move the ring selection logic there. Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com I was a little uneasy about this given the fun-and-games we play with ring selection and the historic issues we have had with bugs in unpinning. However, this looks to be worth it, especially with the new piece of common code, Reviewed-by: Chris Wilson ch...@chris-wilson.co.uk -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Return -ENOENT for unknown contexts
Chris Wilson ch...@chris-wilson.co.uk writes: On Wed, Apr 16, 2014 at 11:58:09AM +0300, Mika Kuoppala wrote: From: Mika Kuoppala mika.kuopp...@linux.intel.com If hw_contexts are disabled, we always return the per file descriptor default context stats. Make sure that the context is correctly given and fail accordingly if not. Testcase: igt/gem_reset_stats/params Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com This code is obsolete, and the bug should be already fixed by commit 691e6415c891b8b2b082a120b896b443531c4d45 Author: Chris Wilson ch...@chris-wilson.co.uk Date: Wed Apr 9 09:07:36 2014 +0100 drm/i915: Always use kref tracking for all contexts. Agreed. I should have run it against fixes in addition to dinq. -Mika -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 00/25] vlv: add support for RPM
On Mon, Apr 14, 2014 at 08:24:21PM +0300, Imre Deak wrote: For a description of this patchset see the previous cover letter [1]. Tested on HSW (non-ULT), VLV with igt/kms_flip and pm_pc8. v2: - addressed comments about getting the proper runtime PM references in debugfs (Daniel, Paulo, Ville) - disable RPM if RC6 is disabled for all platforms, not just VLV (Daniel) - refactored the runtime PM callbacks pulling platform independent teardown/re-init code to the generic runtime suspend/resume callbacks (Daniel) - fixed a couple of issues I bumped into while checking the RC6/RPS and the GPU reset error capturing path [1] http://lists.freedesktop.org/archives/intel-gfx/2014-April/043208.html For patches 01-08, 12-13, 15-16, 18, 20, 22, 24: Reviewed-by: Ville Syrjälä ville.syrj...@linux.intel.com 23,25,26 I didn't really look at so far, and the rest had some small issues I pointed out. -- Ville Syrjälä Intel OTC ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/2] drm/i915: remove unexplained vblank wait in the DP off code
On Fri, Apr 11, 2014 at 02:25:41PM -0700, Jesse Barnes wrote: I don't think this is necessary; at least it doesn't appear to be on my BYT. Dropping it speeds up our shutdown code a little, in some cases resulting in faster init times. Signed-off-by: Jesse Barnes jbar...@virtuousgeek.org --- drivers/gpu/drm/i915/intel_dp.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c index e48d47c..728a5db 100644 --- a/drivers/gpu/drm/i915/intel_dp.c +++ b/drivers/gpu/drm/i915/intel_dp.c @@ -2756,9 +2756,6 @@ intel_dp_link_down(struct intel_dp *intel_dp) } POSTING_READ(intel_dp-output_reg); - /* We don't really know why we're doing this */ - intel_wait_for_vblank(dev, intel_crtc-pipe); - Maybe this was here to guarantee we send the magic five idle patterns specified in the DP spec. But since we're going to be turning off the port anyway I don't see why we switch to transmitting the idle pattern at all. I guess switching to the idle pattern might make sense for the IBX transcoder select workaround to avoid sending some garbage on the main link. Although we don't seem to be doing that workaround quite according to spec. The spec says we should first disable the port, and then re-enable it temporarily w/ transcoder A. What we do is switch the port over to transcoder A while it's still enabled, and only then disable it. So I guess killing the wait here is fine, but looks like the IBX workaround stuff needs a better look. I can try to clean it up a bit. Reviewed-by: Ville Syrjälä ville.syrj...@linux.intel.com if (HAS_PCH_IBX(dev) I915_READ(intel_dp-output_reg) DP_PIPEB_SELECT) { struct drm_crtc *crtc = intel_dig_port-base.base.crtc; -- 1.8.4.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Ville Syrjälä Intel OTC ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 4/4] drm/i915: Invalidate our pages under memory pressure
On Wed, 16 Apr 2014, Robert Beckett robert.beck...@intel.com wrote: On 25/03/2014 13:23, Chris Wilson wrote: Try to flush out dirty pages into the swapcache (and from there into the swapfile) when under memory pressure and forced to drop GEM objects from memory. In effect, this should just allow us to discard unused pages for memory reclaim and to start writeback earlier. v2: Hugh Dickins warned that explicitly starting writeback from shrink_slab was prone to deadlocks within shmemfs. Cc: Hugh Dickins hu...@google.com Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk --- drivers/gpu/drm/i915/i915_gem.c | 38 +++--- 1 file changed, 27 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 135ee8bd55f6..8287fd6701c6 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -60,7 +60,6 @@ static unsigned long i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc); static unsigned long i915_gem_purge(struct drm_i915_private *dev_priv, long target); static unsigned long i915_gem_shrink_all(struct drm_i915_private *dev_priv); -static void i915_gem_object_truncate(struct drm_i915_gem_object *obj); static void i915_gem_retire_requests_ring(struct intel_ring_buffer *ring); static bool cpu_cache_is_coherent(struct drm_device *dev, @@ -1685,12 +1684,16 @@ i915_gem_mmap_gtt_ioctl(struct drm_device *dev, void *data, return i915_gem_mmap_gtt(file, dev, args-handle, args-offset); } +static inline int +i915_gem_object_is_purgeable(struct drm_i915_gem_object *obj) +{ +return obj-madv == I915_MADV_DONTNEED; +} + /* Immediately discard the backing storage */ static void i915_gem_object_truncate(struct drm_i915_gem_object *obj) { -struct inode *inode; - i915_gem_object_free_mmap_offset(obj); if (obj-base.filp == NULL) @@ -1701,16 +1704,28 @@ i915_gem_object_truncate(struct drm_i915_gem_object *obj) * To do this we must instruct the shmfs to drop all of its * backing pages, *now*. */ -inode = file_inode(obj-base.filp); -shmem_truncate_range(inode, 0, (loff_t)-1); - +shmem_truncate_range(file_inode(obj-base.filp), 0, (loff_t)-1); obj-madv = __I915_MADV_PURGED; } -static inline int -i915_gem_object_is_purgeable(struct drm_i915_gem_object *obj) +/* Try to discard unwanted pages */ +static void +i915_gem_object_invalidate(struct drm_i915_gem_object *obj) { -return obj-madv == I915_MADV_DONTNEED; +struct address_space *mapping; + +switch (obj-madv) { +case I915_MADV_DONTNEED: +i915_gem_object_truncate(obj); +case __I915_MADV_PURGED: +return; +} + +if (obj-base.filp == NULL) +return; + +mapping = file_inode(obj-base.filp)-i_mapping, +invalidate_mapping_pages(mapping, 0, (loff_t)-1); } static void @@ -1775,8 +1790,7 @@ i915_gem_object_put_pages(struct drm_i915_gem_object *obj) ops-put_pages(obj); obj-pages = NULL; -if (i915_gem_object_is_purgeable(obj)) -i915_gem_object_truncate(obj); +i915_gem_object_invalidate(obj); return 0; } @@ -4201,6 +4215,8 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj) if (WARN_ON(obj-pages_pin_count)) obj-pages_pin_count = 0; +if (obj-madv != __I915_MADV_PURGED) +obj-madv = I915_MADV_DONTNEED; i915_gem_object_put_pages(obj); i915_gem_object_free_mmap_offset(obj); i915_gem_object_release_stolen(obj); Functionally it looks good to me. Though, you may want a /* fall-through */ comment (some people cant mentally parse fallthroughs without being prompted) and a default: break; (to avoid any static code analysis complaints) in the switch in i915_gem_object_invalidate. Or just two if statements. Jani. Reviewed-by: Robert Beckett robert.beck...@intel.com ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Jani Nikula, Intel Open Source Technology Center ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 3/3] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl
On Mon, Mar 17, 2014 at 05:21:56AM -0700, Chris Wilson wrote: By exporting the ability to map user address and inserting PTEs representing their backing pages into the GTT, we can exploit UMA in order to utilize normal application data as a texture source or even as a render target (depending upon the capabilities of the chipset). This has a number of uses, with zero-copy downloads to the GPU and efficient readback making the intermixed streaming of CPU and GPU operations fairly efficient. This ability has many widespread implications from faster rendering of client-side software rasterisers (chromium), mitigation of stalls due to read back (firefox) and to faster pipelining of texture data (such as pixel buffer objects in GL or data blobs in CL). v2: Compile with CONFIG_MMU_NOTIFIER v3: We can sleep while performing invalidate-range, which we can utilise to drop our page references prior to the kernel manipulating the vma (for either discard or cloning) and so protect normal users. v4: Only run the invalidate notifier if the range intercepts the bo. v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers v6: Recheck after reacquire mutex for lost mmu. v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary. v8: Fix rebasing error after forwarding porting the back port. v9: Limit the userptr to page aligned entries. We now expect userspace to handle all the offset-in-page adjustments itself. v10: Prevent vma from being copied across fork to avoid issues with cow. v11: Drop vma behaviour changes -- locking is nigh on impossible. Use a worker to load user pages to avoid lock inversions. v12: Use get_task_mm()/mmput() for correct refcounting of mm. v13: Use a worker to release the mmu_notifier to avoid lock inversion v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer with its own locking and tree of objects for each mm/mmu_notifier. v15: Prevent overlapping userptr objects, and invalidate all objects within the mmu_notifier range v16: Fix a typo for iterating over multiple objects in the range and rearrange error path to destroy the mmu_notifier locklessly. Also close a race between invalidate_range and the get_pages_worker. v17: Close a race between get_pages_worker/invalidate_range and fresh allocations of the same userptr range - and notice that struct_mutex was presumed to be held when during creation it wasn't. v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory for the struct sg_table and to clear it before reporting an error. v19: Always error out on read-only userptr requests as we don't have the hardware infrastructure to support them at the moment. v20: Refuse to implement read-only support until we have the required infrastructure - but reserve the bit in flags for future use. v21: use_mm() is not required for get_user_pages(). It is only meant to be used to fix up the kernel thread's current-mm for use with copy_user(). Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: Tvrtko Ursulin tvrtko.ursu...@linux.intel.com Cc: Gong, Zhipeng zhipeng.g...@intel.com Cc: Akash Goel akash.g...@intel.com Cc: Volkin, Bradley D bradley.d.vol...@intel.com Reviewed-by: Tvrtko Ursulin tvrtko.ursu...@linux.intel.com --- drivers/gpu/drm/i915/Kconfig| 1 + drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/i915_dma.c | 1 + drivers/gpu/drm/i915/i915_drv.h | 24 +- drivers/gpu/drm/i915/i915_gem.c | 4 + drivers/gpu/drm/i915/i915_gem_dmabuf.c | 5 + drivers/gpu/drm/i915/i915_gem_userptr.c | 679 drivers/gpu/drm/i915/i915_gpu_error.c | 2 + include/uapi/drm/i915_drm.h | 16 + 9 files changed, 732 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/i915/i915_gem_userptr.c diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig index 73ed59eff139..9940baee10c2 100644 --- a/drivers/gpu/drm/i915/Kconfig +++ b/drivers/gpu/drm/i915/Kconfig @@ -5,6 +5,7 @@ config DRM_I915 depends on (AGP || AGP=n) select INTEL_GTT select AGP_INTEL if AGP + select INTERVAL_TREE # we need shmfs for the swappable backing store, and in particular # the shmem_readpage() which depends upon tmpfs select SHMEM diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 2e02137a0d6e..1856494439e9 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -27,6 +27,7 @@ i915-y += i915_cmd_parser.o \ i915_gem.o \ i915_gem_stolen.o \ i915_gem_tiling.o \ + i915_gem_userptr.o \ i915_gpu_error.o \ i915_irq.o \ i915_trace_points.o \ diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index
Re: [Intel-gfx] [PATCH 2/3] drm/i915: Do not call retire_requests from wait_for_rendering
Hi Chris, just want to bring this one back to your attention while I'm going through the rest of the series. Thanks, Brad On Fri, Mar 28, 2014 at 03:58:25PM -0700, Volkin, Bradley D wrote: On Mon, Mar 17, 2014 at 05:21:55AM -0700, Chris Wilson wrote: A common issue we have is that retiring requests causes recursion through GTT manipulation or page table manipulation which we can only handle at very specific points. However, to maintain internal consistency (enforced through our sanity checks on write_domain at various points in the GEM object lifecycle) we do need to retire the object prior to marking it with a new write_domain, and also clear the write_domain for the implicit flush following a batch. Note that this then allows the unbound objects to still be on the active lists, and so care must be taken when removing objects from unbound lists (similar to the caveats we face processing the bound lists). v2: Fix i915_gem_shrink_all() to handle updated object lifetime rules, by refactoring it to call into __i915_gem_shrink(). v3: Missed an object-retire prior to changing cache domains in i915_gem_object_set_cache_leve() v4: Rebase Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Tested-by: Ville Syrjälä ville.syrj...@linux.intel.com --- drivers/gpu/drm/i915/i915_gem.c| 112 - drivers/gpu/drm/i915/i915_gem_execbuffer.c | 3 + 2 files changed, 66 insertions(+), 49 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 58704ce62e3e..5cf4d80de867 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -44,6 +44,9 @@ static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *o static __must_check int i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj, bool readonly); +static void +i915_gem_object_retire(struct drm_i915_gem_object *obj); + static int i915_gem_phys_pwrite(struct drm_device *dev, struct drm_i915_gem_object *obj, struct drm_i915_gem_pwrite *args, @@ -502,6 +505,8 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj, ret = i915_gem_object_wait_rendering(obj, true); if (ret) return ret; + + i915_gem_object_retire(obj); } ret = i915_gem_object_get_pages(obj); @@ -917,6 +922,8 @@ i915_gem_shmem_pwrite(struct drm_device *dev, ret = i915_gem_object_wait_rendering(obj, false); if (ret) return ret; + + i915_gem_object_retire(obj); } /* Same trick applies to invalidate partially written cachelines read * before writing. */ @@ -1304,7 +1311,8 @@ static int i915_gem_object_wait_rendering__tail(struct drm_i915_gem_object *obj, struct intel_ring_buffer *ring) { - i915_gem_retire_requests_ring(ring); + if (!obj-active) + return 0; /* Manually manage the write flush as we may have not yet * retired the buffer. @@ -1314,7 +1322,6 @@ i915_gem_object_wait_rendering__tail(struct drm_i915_gem_object *obj, * we know we have passed the last write. */ obj-last_write_seqno = 0; - obj-base.write_domain = ~I915_GEM_GPU_DOMAINS; return 0; } @@ -1949,58 +1956,58 @@ static unsigned long __i915_gem_shrink(struct drm_i915_private *dev_priv, long target, bool purgeable_only) { - struct list_head still_bound_list; - struct drm_i915_gem_object *obj, *next; + struct list_head still_in_list; + struct drm_i915_gem_object *obj; unsigned long count = 0; - list_for_each_entry_safe(obj, next, -dev_priv-mm.unbound_list, -global_list) { - if ((i915_gem_object_is_purgeable(obj) || !purgeable_only) - i915_gem_object_put_pages(obj) == 0) { - count += obj-base.size PAGE_SHIFT; - if (count = target) - return count; - } - } - /* -* As we may completely rewrite the bound list whilst unbinding +* As we may completely rewrite the (un)bound list whilst unbinding * (due to retiring requests) we have to strictly process only * one element of the list at the time, and recheck the list * on every iteration. Is it still true that we could retire requests on this path? I see that currently we will retire requests via: i915_vma_unbind - i915_gem_object_finish_gpu - i915_gem_object_wait_rendering. But we've taken the explicit request retirement out of the wait_rendering path. Have I missed somewhere that it could still happen? Thanks, Brad
Re: [Intel-gfx] [PATCH 3/3] tests/gem_userptr_benchmark: Benchmarking userptr surfaces and impact
On Wed, Mar 19, 2014 at 04:13:06AM -0700, Tvrtko Ursulin wrote: From: Tvrtko Ursulin tvrtko.ursu...@intel.com This adds a small benchmark for the new userptr functionality. Apart from basic surface creation and destruction, also tested is the impact of having userptr surfaces in the process address space. Reason for that is the impact of MMU notifiers on common address space operations like munmap() which is per process. v2: * Moved to benchmarks. * Added pointer read/write tests. * Changed output to say iterations per second instead of operations per second. * Multiply result by batch size for multi-create* tests for a more comparable number with create-destroy test. v3: * Fixed ioctl detection on kernels without MMU_NOTIFIERs. Signed-off-by: Tvrtko Ursulin tvrtko.ursu...@intel.com Cc: Chris Wilson ch...@chris-wilson.co.uk --- Android.mk | 3 +- benchmarks/.gitignore | 1 + benchmarks/Android.mk | 36 +++ benchmarks/Makefile.am | 7 +- benchmarks/Makefile.sources| 6 + benchmarks/gem_userptr_benchmark.c | 513 + 6 files changed, 558 insertions(+), 8 deletions(-) create mode 100644 benchmarks/Android.mk create mode 100644 benchmarks/Makefile.sources create mode 100644 benchmarks/gem_userptr_benchmark.c diff --git a/Android.mk b/Android.mk index 8aeb2d4..0c969b8 100644 --- a/Android.mk +++ b/Android.mk @@ -1,2 +1 @@ -include $(call all-named-subdir-makefiles, lib tests tools) - +include $(call all-named-subdir-makefiles, lib tests tools benchmarks) diff --git a/benchmarks/.gitignore b/benchmarks/.gitignore index ddea6f7..09e5bd8 100644 --- a/benchmarks/.gitignore +++ b/benchmarks/.gitignore @@ -1,3 +1,4 @@ +gem_userptr_benchmark intel_upload_blit_large intel_upload_blit_large_gtt intel_upload_blit_large_map diff --git a/benchmarks/Android.mk b/benchmarks/Android.mk new file mode 100644 index 000..5bb8ef5 --- /dev/null +++ b/benchmarks/Android.mk @@ -0,0 +1,36 @@ +LOCAL_PATH := $(call my-dir) + +include $(LOCAL_PATH)/Makefile.sources + +## + +define add_benchmark +include $(CLEAR_VARS) + +LOCAL_SRC_FILES := $1.c + +LOCAL_CFLAGS += -DHAVE_STRUCT_SYSINFO_TOTALRAM +LOCAL_CFLAGS += -DANDROID -UNDEBUG -include check-ndebug.h +LOCAL_CFLAGS += -std=c99 +# FIXME: drop once Bionic correctly annotates noreturn on pthread_exit +LOCAL_CFLAGS += -Wno-error=return-type +# Excessive complaining for established cases. Rely on the Linux version warnings. +LOCAL_CFLAGS += -Wno-sign-compare + +LOCAL_MODULE := $1 +LOCAL_MODULE_TAGS := optional + +LOCAL_STATIC_LIBRARIES := libintel_gpu_tools + +LOCAL_SHARED_LIBRARIES := libpciaccess \ + libdrm\ + libdrm_intel + +include $(BUILD_EXECUTABLE) +endef + +## + +benchmark_list := $(bin_PROGRAMS) + +$(foreach item,$(benchmark_list),$(eval $(call add_benchmark,$(item diff --git a/benchmarks/Makefile.am b/benchmarks/Makefile.am index e2ad784..d173bf4 100644 --- a/benchmarks/Makefile.am +++ b/benchmarks/Makefile.am @@ -1,9 +1,4 @@ - -bin_PROGRAMS = \ - intel_upload_blit_large \ - intel_upload_blit_large_gtt \ - intel_upload_blit_large_map \ - intel_upload_blit_small +include Makefile.sources AM_CPPFLAGS = -I$(top_srcdir) -I$(top_srcdir)/lib AM_CFLAGS = $(DRM_CFLAGS) $(CWARNFLAGS) $(CAIRO_CFLAGS) diff --git a/benchmarks/Makefile.sources b/benchmarks/Makefile.sources new file mode 100644 index 000..fd6c107 --- /dev/null +++ b/benchmarks/Makefile.sources @@ -0,0 +1,6 @@ +bin_PROGRAMS = \ +intel_upload_blit_large \ +intel_upload_blit_large_gtt \ +intel_upload_blit_large_map \ +intel_upload_blit_small \ +gem_userptr_benchmark You might split the makefile cleanup aspect of this into a separate patch, but I'm fine either way. diff --git a/benchmarks/gem_userptr_benchmark.c b/benchmarks/gem_userptr_benchmark.c new file mode 100644 index 000..218f6f1 --- /dev/null +++ b/benchmarks/gem_userptr_benchmark.c @@ -0,0 +1,513 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the
Re: [Intel-gfx] [PATCH 2/3] tests/gem_vmap_blits: Remove obsolete test case
On Wed, Mar 19, 2014 at 04:13:05AM -0700, Tvrtko Ursulin wrote: From: Tvrtko Ursulin tvrtko.ursu...@intel.com No need for the old test case once the new one was added. Signed-off-by: Tvrtko Ursulin tvrtko.ursu...@intel.com Reviewed-by: Brad Volkin bradley.d.vol...@intel.com --- tests/.gitignore | 1 - tests/Makefile.sources | 1 - tests/gem_vmap_blits.c | 344 - 3 files changed, 346 deletions(-) delete mode 100644 tests/gem_vmap_blits.c diff --git a/tests/.gitignore b/tests/.gitignore index ff1eff5..c600823 100644 --- a/tests/.gitignore +++ b/tests/.gitignore @@ -93,7 +93,6 @@ gem_tiled_swapping gem_tiling_max_stride gem_unfence_active_buffers gem_unref_active_buffers -gem_vmap_blits gem_userptr_blits gem_wait_render_timeout gem_write_read_ring_switch diff --git a/tests/Makefile.sources b/tests/Makefile.sources index 240bd99..610bbf5 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -118,7 +118,6 @@ TESTS_progs = \ gem_tiling_max_stride \ gem_unfence_active_buffers \ gem_unref_active_buffers \ - gem_vmap_blits \ gem_userptr_blits \ gem_wait_render_timeout \ gen3_mixed_blits \ diff --git a/tests/gem_vmap_blits.c b/tests/gem_vmap_blits.c deleted file mode 100644 index 48297af..000 --- a/tests/gem_vmap_blits.c +++ /dev/null @@ -1,344 +0,0 @@ -/* - * Copyright © 2009,2011 Intel Corporation - * - * Permission is hereby granted, free of charge, to any person obtaining a - * copy of this software and associated documentation files (the Software), - * to deal in the Software without restriction, including without limitation - * the rights to use, copy, modify, merge, publish, distribute, sublicense, - * and/or sell copies of the Software, and to permit persons to whom the - * Software is furnished to do so, subject to the following conditions: - * - * The above copyright notice and this permission notice (including the next - * paragraph) shall be included in all copies or substantial portions of the - * Software. - * - * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS - * IN THE SOFTWARE. - * - * Authors: - *Eric Anholt e...@anholt.net - *Chris Wilson ch...@chris-wilson.co.uk - * - */ - -/** @file gem_vmap_blits.c - * - * This is a test of doing many blits using a mixture of normal system pages - * and uncached linear buffers, with a working set larger than the - * aperture size. - * - * The goal is to simply ensure the basics work. - */ - -#include stdlib.h -#include stdio.h -#include string.h -#include fcntl.h -#include inttypes.h -#include errno.h -#include sys/stat.h -#include sys/time.h -#include drm.h -#include i915_drm.h -#include drmtest.h -#include intel_bufmgr.h -#include intel_batchbuffer.h -#include intel_gpu_tools.h - -#if !defined(I915_PARAM_HAS_VMAP) -#pragma message(No vmap support in drm, skipping) -int main(int argc, char **argv) -{ - fprintf(stderr, No vmap support in drm.\n); - return 77; -} -#else - -#define WIDTH 512 -#define HEIGHT 512 - -static uint32_t linear[WIDTH*HEIGHT]; - -static uint32_t gem_vmap(int fd, void *ptr, int size, int read_only) -{ - struct drm_i915_gem_vmap vmap; - - vmap.user_ptr = (uintptr_t)ptr; - vmap.user_size = size; - vmap.flags = 0; - if (read_only) - vmap.flags |= I915_VMAP_READ_ONLY; - - if (drmIoctl(fd, DRM_IOCTL_I915_GEM_VMAP, vmap)) - return 0; - - return vmap.handle; -} - - -static void gem_vmap_sync(int fd, uint32_t handle) -{ - gem_set_domain(fd, handle, I915_GEM_DOMAIN_CPU, I915_GEM_DOMAIN_CPU); -} - -static void -copy(int fd, uint32_t dst, uint32_t src) -{ - uint32_t batch[10]; - struct drm_i915_gem_relocation_entry reloc[2]; - struct drm_i915_gem_exec_object2 obj[3]; - struct drm_i915_gem_execbuffer2 exec; - uint32_t handle; - int ret; - - batch[0] = XY_SRC_COPY_BLT_CMD | - XY_SRC_COPY_BLT_WRITE_ALPHA | - XY_SRC_COPY_BLT_WRITE_RGB | 6; - batch[1] = (3 24) | /* 32 bits */ - (0xcc 16) | /* copy ROP */ - WIDTH*4; - batch[2] = 0; /* dst x1,y1 */ - batch[3] = (HEIGHT 16) | WIDTH; /* dst x2,y2 */ - batch[4] = 0; /* dst reloc */ - batch[5] = 0; /* src x1,y1 */ - batch[6] = WIDTH*4; - batch[7] = 0; /* src reloc */ - batch[8] = MI_BATCH_BUFFER_END; - batch[9] = MI_NOOP; - -
Re: [Intel-gfx] [PATCH 28/71] drm/i915/chv: Added CHV specific register read and write
On Wed, Apr 09, 2014 at 01:28:26PM +0300, ville.syrj...@linux.intel.com wrote: From: Deepak S deepa...@intel.com Support to individually control Media/Render well based on the register access. Add CHV specific write function to habdle difference between registers that are sadowed vs those that need forcewake even for writes. v2: Drop write FIFO for CHV and add comman well forcewake (Ville) Signed-off-by: Deepak S deepa...@intel.com [vsyrjala: Move the register range macros into intel_uncore.c] Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com --- drivers/gpu/drm/i915/intel_uncore.c | 139 +--- 1 file changed, 131 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 823d699..8e3c686 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -495,6 +495,31 @@ void assert_force_wake_inactive(struct drm_i915_private *dev_priv) ((reg) = 0x22000 (reg) 0x24000) ||\ ((reg) = 0x3 (reg) 0x4)) +#define FORCEWAKE_CHV_RENDER_RANGE_OFFSET(reg) \ + (((reg) = 0x2000 (reg) 0x4000) ||\ + ((reg) = 0x5000 (reg) 0x8000) ||\ + ((reg) = 0x8300 (reg) 0x8500) ||\ + ((reg) = 0xB000 (reg) 0xC000) ||\ + ((reg) = 0xE000 (reg) 0xE800)) + +#define FORCEWAKE_CHV_MEDIA_RANGE_OFFSET(reg)\ + (((reg) = 0x8800 (reg) 0x8900) ||\ + ((reg) = 0xD000 (reg) 0xD800) ||\ + ((reg) = 0x12000 (reg) 0x14000) ||\ + ((reg) = 0x1A000 (reg) 0x1C000) ||\ + ((reg) = 0x1E800 (reg) 0x1EA00) ||\ + ((reg) = 0x3 (reg) 0x4)) + +#define FORCEWAKE_CHV_COMMON_RANGE_OFFSET(reg)\ + (((reg) = 0x4000 (reg) 0x5000) ||\ + ((reg) = 0x8000 (reg) 0x8300) ||\ + ((reg) = 0x8500 (reg) 0x8600) ||\ + ((reg) = 0x9000 (reg) 0xB000) ||\ + ((reg) = 0xC000 (reg) 0xc800) ||\ + ((reg) = 0xF000 (reg) 0x1) ||\ + ((reg) = 0x14000 (reg) 0x14400) ||\ + ((reg) = 0x22000 (reg) 0x24000)) + To satisfy both Chris, and Ville, how about: #define REG_RANGE(reg, start, end) ((reg) = (start) (reg) 0x5000) REG_RANGE(reg, 0x4000, 0x5000) || \ REG_RANGE(reg, 0x8000, 0x8300) || \ ... By the way, I spent my due diligence trying to find where these ranges come from, and have been unable. Doc name? I should have all the docs from Ville. I can't speak for the code generated, either. static void ilk_dummy_write(struct drm_i915_private *dev_priv) { @@ -587,7 +612,48 @@ vlv_read##x(struct drm_i915_private *dev_priv, off_t reg, bool trace) { \ REG_READ_FOOTER; \ } +#define __chv_read(x) \ +static u##x \ +chv_read##x(struct drm_i915_private *dev_priv, off_t reg, bool trace) { \ + unsigned fwengine = 0; \ + REG_READ_HEADER(x); \ + if (FORCEWAKE_CHV_RENDER_RANGE_OFFSET(reg)) { \ + fwengine = FORCEWAKE_RENDER; \ + } \ + else if (FORCEWAKE_CHV_MEDIA_RANGE_OFFSET(reg)) { \ + fwengine = FORCEWAKE_MEDIA; \ + } \ + else if (FORCEWAKE_CHV_COMMON_RANGE_OFFSET(reg)) { \ + fwengine = FORCEWAKE_ALL; \ + } \ + if (FORCEWAKE_RENDER fwengine) { \ + if (dev_priv-uncore.fw_rendercount++ == 0) \ + (dev_priv)-uncore.funcs.force_wake_get(dev_priv, \ + fwengine); \ + } \ + if (FORCEWAKE_MEDIA fwengine) { \ + if (dev_priv-uncore.fw_mediacount++ == 0) \ + (dev_priv)-uncore.funcs.force_wake_get(dev_priv, \ + fwengine); \ + } \ + val = __raw_i915_read##x(dev_priv, reg); \ + if (FORCEWAKE_RENDER fwengine) { \ + if (dev_priv-uncore.fw_rendercount++ == 0) \ + (dev_priv)-uncore.funcs.force_wake_put(dev_priv, \ + fwengine); \ + } \ + if (FORCEWAKE_MEDIA fwengine) { \ + if (dev_priv-uncore.fw_mediacount++ == 0) \ + (dev_priv)-uncore.funcs.force_wake_put(dev_priv, \ + fwengine); \ + } \ + REG_READ_FOOTER; \ +} +__chv_read(8) +__chv_read(16) +__chv_read(32) +__chv_read(64) __vlv_read(8) __vlv_read(16) __vlv_read(32) @@ -605,6 +671,7 @@ __gen4_read(16) __gen4_read(32) __gen4_read(64) +#undef __chv_read #undef __vlv_read #undef __gen6_read #undef __gen5_read @@ -709,6 +776,49 @@ gen8_write##x(struct drm_i915_private *dev_priv, off_t reg, u##x val, bool trace REG_WRITE_FOOTER; \ } +#define __chv_write(x) \ +static void \ +chv_write##x(struct drm_i915_private *dev_priv, off_t reg, u##x val, bool trace) { \ + unsigned fwengine = 0; \ + bool __needs_put = !is_gen8_shadowed(dev_priv, reg);