Re: [Intel-gfx] [PATCH] drm/i915: Reset request handling for gen8+
On Thu, Jun 18, 2015 at 12:18:39PM +0100, Tomas Elf wrote: My point was more along the lines of bailing out if the reset request fails and not return an error message but simply keep track of the number of times we've attempted the reset request. By not returning an error we would allow more subsequent hang detections to happen (since the hang is still there), which would end up in the same reset request in the future. If the reset request would fail more times we would simply increment the counter and at one point we would decide that we've had too many unsuccessful reset request attempts and simply go ahead with the reset anyway and if the reset would fail we would return an error at that point in time, which would result in a terminally wedged state. But, yeah, I can see why we shouldn't do this. Skipping to the middle! I understand the merit in trying the reset a few times before giving up, it would just need a bit of restructuring to try the reset before clearing gem state (trivial) and requeueing the hangcheck. I am just wary of feature creep before we get stuck into TDR, which promises to change how we think about resets entirely. I am trying not to block your work by doing it would be nice if tasks first! :) -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c
On 17/06/15 13:02, Daniel Vetter wrote: On Wed, Jun 17, 2015 at 08:23:40AM +0100, Dave Gordon wrote: On 15/06/15 21:09, Chris Wilson wrote: On Mon, Jun 15, 2015 at 07:36:19PM +0100, Dave Gordon wrote: From: Alex Dai yu@intel.com i915_gem_object_write() is a generic function to copy data from a plain linear buffer to a paged gem object. We will need this for the microcontroller firmware loading support code. Issue: VIZ-4884 Signed-off-by: Alex Dai yu@intel.com Signed-off-by: Dave Gordon david.s.gor...@intel.com --- drivers/gpu/drm/i915/i915_drv.h |2 ++ drivers/gpu/drm/i915/i915_gem.c | 28 2 files changed, 30 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 611fbd8..9094c06 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2713,6 +2713,8 @@ void *i915_gem_object_alloc(struct drm_device *dev); void i915_gem_object_free(struct drm_i915_gem_object *obj); void i915_gem_object_init(struct drm_i915_gem_object *obj, const struct drm_i915_gem_object_ops *ops); +int i915_gem_object_write(struct drm_i915_gem_object *obj, +const void *data, size_t size); struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev, size_t size); void i915_init_vm(struct drm_i915_private *dev_priv, diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index be35f04..75d63c2 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -5392,3 +5392,31 @@ bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj) return false; } +/* Fill the @obj with the @size amount of @data */ +int i915_gem_object_write(struct drm_i915_gem_object *obj, + const void *data, size_t size) +{ + struct sg_table *sg; + size_t bytes; + int ret; + + ret = i915_gem_object_get_pages(obj); + if (ret) + return ret; + + i915_gem_object_pin_pages(obj); You don't set the object into the CPU domain, or instead manually handle the domain flushing. You don't handle objects that cannot be written directly by the CPU, nor do you handle objects whose representation in memory is not linear. -Chris No we don't handle just any random gem object, but we do return an error code for any types not supported. However, as we don't really need the full generality of writing into a gem object of any type, I will replace this function with one that combines the allocation of a new object (which will therefore definitely be of the correct type, in the correct domain, etc) and filling it with the data to be preserved. The usage pattern for the particular case is going to be: Once-only: Allocate Fill Then each time GuC is (re-)initialised: Map to GTT DMA-read from buffer into GuC private memory Unmap Only on unload: Dispose So our object is write-once by the CPU (and that's always the first operation), thereafter read-occasionally by the GuC's DMA engine. Domain handling is required for all gem objects, and the resulting bugs if you don't for one-off objects are absolutely no fun to track down. -Daniel Is it not the case that the new object returned by i915_gem_alloc_object() is (a) of a type that can be mapped into the GTT, and (b) initially in the CPU domain for both reading and writing? So AFAICS the allocate-and-fill function I'm describing (to appear in next patch series respin) doesn't need any further domain handling. .Dave. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Fwd: [PATCH] drm/i915: Fix IPS related flicker
On Thu, 18 Jun 2015, Jani Nikula jani.nik...@linux.intel.com wrote: On Thu, 18 Jun 2015, Ander Conselvan De Oliveira conselv...@gmail.com wrote: On Fri, 2015-06-05 at 12:11 +0300, Ville Syrjälä wrote: On Fri, Jun 05, 2015 at 11:51:42AM +0300, Jani Nikula wrote: On Thu, 04 Jun 2015, Rodrigo Vivi rodrigo.v...@gmail.com wrote: I just noticed that I had forgotten to reply-all... Jani, would you consider merge this fix with the explanation above related to Ville's question? or do you want/need any action here? Ville's question, I'd like Ville's ack on it. It's good enough for me. This part of the driver is a quite a mess anyway currently, so doesn't matter too much what we stick in there. Ping. Seems like this still isn't merged. Does it need more work or did it just fall through the cracks? It fell between the cracks. I know the world isn't black and white, but it doesn't help the maintainers when review is some shade of grey. I've pushed this to drm-intel-next-fixes for now, but it has missed the train for both the v4.1 release and the main drm-next feature pull request for the v4.2 merge window. I expect this to land upstream in v4.2-rc2, unless there's an additional drm-next pull request during the merge window. I've added cc: stable. Thanks for the patch, and I guess the review was, uh, good enough for me now... :p Argh, I'll take that back. This conflicts with dinq, and while doing so also confuses git rerere enough to uncover a previous much bigger conflict that I have no intention of resolving again before the weekend. I'll return to it next week. Sorry. BR, Jani. BR, Jani. Thanks, Ander BR, Jani. Thanks, Rodrigo. -- Forwarded message -- From: Rodrigo Vivi rodrigo.v...@gmail.com Date: Fri, May 29, 2015 at 9:45 AM Subject: Re: [Intel-gfx] [PATCH] drm/i915: Fix IPS related flicker To: Ville Syrjälä ville.syrj...@linux.intel.com On Fri, May 29, 2015 at 1:47 AM, Ville Syrjälä ville.syrj...@linux.intel.com wrote: On Thu, May 28, 2015 at 11:07:11AM -0700, Rodrigo Vivi wrote: We cannot let IPS enabled with no plane on the pipe: BSpec: IPS cannot be enabled until after at least one plane has been enabled for at least one vertical blank. and IPS must be disabled while there is still at least one plane enabled on the same pipe as IPS. This restriction apply to HSW and BDW. However a shortcut path on update primary plane function to make primary plane invisible by setting DSPCTRL to 0 was leting IPS enabled while there was no other plane enabled on the pipe causing flickerings that we were believing that it was caused by that other restriction where ips cannot be used when pixel rate is greater than 95% of cdclok. v2: Don't mess with Atomic path as pointed out by Ville. Reference: https://bugs.freedesktop.org/show_bug.cgi?id=85583 Cc: Ville Syrjälä ville.syrj...@linux.intel.com Cc: Paulo Zanoni paulo.r.zan...@intel.com Signed-off-by: Rodrigo Vivi rodrigo.v...@intel.com --- drivers/gpu/drm/i915/intel_display.c | 13 + drivers/gpu/drm/i915/intel_drv.h | 1 + 2 files changed, 14 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 4e3f302..5a6b17b 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -13309,6 +13309,16 @@ intel_check_primary_plane(struct drm_plane *plane, intel_crtc-atomic.wait_vblank = true; } + /* + * FIXME: Actually if we will still have any other plane enabled + * on the pipe we could let IPS enabled still, but for + * now lets consider that when we make primary invisible + * by setting DSPCNTR to 0 on update_primary_plane function + * IPS needs to be disable. + */ + if (!state-visible || !fb) + intel_crtc-atomic.disable_ips = true; + How could it be visible without an fb? I don't like this !fb here as well, but I just tried to keep exactly same if statement that makes I915_WRITE(DSPCNTRL, 0) on update primary plane func... intel_crtc-atomic.fb_bits |= INTEL_FRONTBUFFER_PRIMARY(intel_crtc-pipe); @@ -13406,6 +13416,9 @@ static void intel_begin_crtc_commit(struct drm_crtc *crtc) if (intel_crtc-atomic.disable_fbc) intel_fbc_disable(dev); + if (intel_crtc-atomic.disable_ips) + hsw_disable_ips(intel_crtc); + if (intel_crtc-atomic.pre_disable_primary) intel_pre_disable_primary(crtc); intel_pre_disable_primary() would already disable IPS. Except no one sets .pre_disable_primary=true.
Re: [Intel-gfx] [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support
On 17/06/15 13:05, Daniel Vetter wrote: On Mon, Jun 15, 2015 at 07:36:20PM +0100, Dave Gordon wrote: Current devices may contain one or more programmable microcontrollers that need to have a firmware image (aka binary blob) loaded from an external medium and transferred to the device's memory. This file provides generic support functions for doing this; they can then be used by each uC-specific loader, thus reducing code duplication and testing effort. Signed-off-by: Dave Gordon david.s.gor...@intel.com Signed-off-by: Alex Dai yu@intel.com Given that I'm just shredding the synchronization used by the dmc loader I'm not convinced this is a good idea. Abstraction has cost, and a bit of copy-paste for similar sounding but slightly different things doesn't sound awful to me. And the critical bit in all the firmware loading I've seen thus far is in synchronizing the loading with other operations, hiding that isn't a good idea. Worse if we enforce stuff like requiring dev-struct_mutex. -Daniel It's precisely because it's in some sense trivial-but-tricky that we should write it once, get it right, and use it everywhere. Copypaste /does/ sound awful; I've seen how the code this was derived from had already been cloned into three flavours, all different and all wrong. It's a very simple abstraction: one early call to kick things off as early as possible, no locking required. One late call with the struct_mutex held to complete the synchronisation and actually do the work, thus guaranteeing that the transfer to the target uC is done in a controlled fashion, at a time of the caller's choice, and by the driver's mainline thread, NOT by an asynchronous thread racing with other activity (which was one of the things wrong with the original version). We should convert the DMC loader to use this too, so there need be only one bit of code in the whole driver that needs to understand how to use completions to get correct handover from a free-running no-locks-held thread to the properly disciplined environment of driver mainline for purposes of programming the h/w. .Dave. --- drivers/gpu/drm/i915/Makefile |3 + drivers/gpu/drm/i915/intel_uc_loader.c | 312 drivers/gpu/drm/i915/intel_uc_loader.h | 82 + 3 files changed, 397 insertions(+) create mode 100644 drivers/gpu/drm/i915/intel_uc_loader.c create mode 100644 drivers/gpu/drm/i915/intel_uc_loader.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index b7ddf48..607fa2a 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -38,6 +38,9 @@ i915-y += i915_cmd_parser.o \ intel_ringbuffer.o \ intel_uncore.o +# generic ancilliary microcontroller support +i915-y += intel_uc_loader.o + # autogenerated null render state i915-y += intel_renderstate_gen6.o \ intel_renderstate_gen7.o \ diff --git a/drivers/gpu/drm/i915/intel_uc_loader.c b/drivers/gpu/drm/i915/intel_uc_loader.c new file mode 100644 index 000..26f0fbe --- /dev/null +++ b/drivers/gpu/drm/i915/intel_uc_loader.c @@ -0,0 +1,312 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Author: + * Dave Gordon david.s.gor...@intel.com + */ +#include linux/firmware.h +#include i915_drv.h +#include intel_uc_loader.h + +/** + * DOC: Generic embedded microcontroller (uC) firmware loading support + * + * The functions in this file provide a generic way to load the firmware that + * may be required by an embedded microcontroller (uC). + * + * The function intel_uc_fw_init() should be called early, and will initiate + * an asynchronous request to fetch the firmware image (aka binary blob). + * When the image has been fetched into memory, the kernel will call back to + *
Re: [Intel-gfx] [PATCH 25/55] drm/i915: Update i915_gem_object_sync() to take a request structure
On Thu, Jun 18, 2015 at 01:14:56PM +0100, john.c.harri...@intel.com wrote: From: John Harrison john.c.harri...@intel.com The plan is to pass requests around as the basic submission tracking structure rather than rings and contexts. This patch updates the i915_gem_object_sync() code path. v2: Much more complex patch to share a single request between the sync and the page flip. The _sync() function now supports lazy allocation of the request structure. That is, if one is passed in then that will be used. If one is not, then a request will be allocated and passed back out. Note that the _sync() code does not necessarily require a request. Thus one will only be created until certain situations. The reason the lazy allocation must be done within the _sync() code itself is because the decision to need one or not is not really something that code above can second guess (except in the case where one is definitely not required because no ring is passed in). The call chains above _sync() now support passing a request through which most callers passing in NULL and assuming that no request will be required (because they also pass in NULL for the ring and therefore can't be generating any ring code). The exeception is intel_crtc_page_flip() which now supports having a request returned from _sync(). If one is, then that request is shared by the page flip (if the page flip is of a type to need a request). If _sync() does not generate a request but the page flip does need one, then the page flip path will create its own request. v3: Updated comment description to be clearer about 'to_req' parameter (Tomas Elf review request). Rebased onto newer tree that significantly changed the synchronisation code. v4: Updated comments from review feedback (Tomas Elf) For: VIZ-5115 Signed-off-by: John Harrison john.c.harri...@intel.com Reviewed-by: Tomas Elf tomas@intel.com --- drivers/gpu/drm/i915/i915_drv.h|4 ++- drivers/gpu/drm/i915/i915_gem.c| 48 +--- drivers/gpu/drm/i915/i915_gem_execbuffer.c |2 +- drivers/gpu/drm/i915/intel_display.c | 17 +++--- drivers/gpu/drm/i915/intel_drv.h |3 +- drivers/gpu/drm/i915/intel_fbdev.c |2 +- drivers/gpu/drm/i915/intel_lrc.c |2 +- drivers/gpu/drm/i915/intel_overlay.c |2 +- 8 files changed, 57 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 64a10fa..f69e9cb 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2778,7 +2778,8 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj) int __must_check i915_mutex_lock_interruptible(struct drm_device *dev); int i915_gem_object_sync(struct drm_i915_gem_object *obj, - struct intel_engine_cs *to); + struct intel_engine_cs *to, + struct drm_i915_gem_request **to_req); Nope. Did you forget to reorder the code to ensure that the request is allocated along with the context switch at the start of execbuf? -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: Per-DDI I_boost override
An OEM may request increased I_boost beyond the recommended values by specifying an I_boost value to be applied to all swing entries for a port. These override values are specified in VBT. Issue: VIZ-5676 Signed-off-by: Antti Koskipaa antti.koski...@linux.intel.com --- drivers/gpu/drm/i915/i915_drv.h | 3 +++ drivers/gpu/drm/i915/intel_bios.c | 21 + drivers/gpu/drm/i915/intel_bios.h | 9 + drivers/gpu/drm/i915/intel_ddi.c | 39 +++ 4 files changed, 64 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 09a57a5..e17fd56 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1402,6 +1402,9 @@ struct ddi_vbt_port_info { uint8_t supports_dvi:1; uint8_t supports_hdmi:1; uint8_t supports_dp:1; + + uint8_t dp_boost_level; + uint8_t hdmi_boost_level; }; enum psr_lines_to_wait { diff --git a/drivers/gpu/drm/i915/intel_bios.c b/drivers/gpu/drm/i915/intel_bios.c index 198fc3c..06b5dc3 100644 --- a/drivers/gpu/drm/i915/intel_bios.c +++ b/drivers/gpu/drm/i915/intel_bios.c @@ -946,6 +946,17 @@ err: memset(dev_priv-vbt.dsi.sequence, 0, sizeof(dev_priv-vbt.dsi.sequence)); } +static u8 translate_iboost(u8 val) +{ + static const u8 mapping[] = { 1, 3, 7 }; /* See VBT spec */ + + if (val = ARRAY_SIZE(mapping)) { + DRM_DEBUG_KMS(Unsupported I_boost value found in VBT (%d), display may not work properly\n, val); + return 0; + } + return mapping[val]; +} + static void parse_ddi_port(struct drm_i915_private *dev_priv, enum port port, const struct bdb_header *bdb) { @@ -1046,6 +1057,16 @@ static void parse_ddi_port(struct drm_i915_private *dev_priv, enum port port, hdmi_level_shift); info-hdmi_level_shift = hdmi_level_shift; } + + /* Parse the I_boost config for SKL and above */ + if (bdb-version = 196 (child-common.flags_1 IBOOST_ENABLE)) { + info-dp_boost_level = translate_iboost(child-common.iboost_level 0xF); + DRM_DEBUG_KMS(VBT (e)DP boost level for port %c: %d\n, + port_name(port), info-dp_boost_level); + info-hdmi_boost_level = translate_iboost(child-common.iboost_level 4); + DRM_DEBUG_KMS(VBT HDMI boost level for port %c: %d\n, + port_name(port), info-hdmi_boost_level); + } } static void parse_ddi_ports(struct drm_i915_private *dev_priv, diff --git a/drivers/gpu/drm/i915/intel_bios.h b/drivers/gpu/drm/i915/intel_bios.h index af0b476..8edd75c 100644 --- a/drivers/gpu/drm/i915/intel_bios.h +++ b/drivers/gpu/drm/i915/intel_bios.h @@ -231,6 +231,10 @@ struct old_child_dev_config { /* This one contains field offsets that are known to be common for all BDB * versions. Notice that the meaning of the contents contents may still change, * but at least the offsets are consistent. */ + +/* Definitions for flags_1 */ +#define IBOOST_ENABLE (13) + struct common_child_dev_config { u16 handle; u16 device_type; @@ -239,8 +243,13 @@ struct common_child_dev_config { u8 not_common2[2]; u8 ddc_pin; u16 edid_ptr; + u8 obsolete; + u8 flags_1; + u8 not_common3[13]; + u8 iboost_level; } __packed; + /* This field changes depending on the BDB version, so the most reliable way to * read it is by checking the BDB version and reading the raw pointer. */ union child_device_config { diff --git a/drivers/gpu/drm/i915/intel_ddi.c b/drivers/gpu/drm/i915/intel_ddi.c index 3abcb43..8e5e94c 100644 --- a/drivers/gpu/drm/i915/intel_ddi.c +++ b/drivers/gpu/drm/i915/intel_ddi.c @@ -434,6 +434,7 @@ static void intel_prepare_ddi_buffers(struct drm_device *dev, enum port port, { struct drm_i915_private *dev_priv = dev-dev_private; u32 reg; + u32 iboost_bit = 0; int i, n_hdmi_entries, n_dp_entries, n_edp_entries, hdmi_default_entry, size; int hdmi_level = dev_priv-vbt.ddi_port_info[port].hdmi_level_shift; @@ -459,6 +460,10 @@ static void intel_prepare_ddi_buffers(struct drm_device *dev, enum port port, ddi_translations_hdmi = skl_get_buf_trans_hdmi(dev, n_hdmi_entries); hdmi_default_entry = 8; + /* If we're boosting the current, set bit 31 of trans1 */ + if (dev_priv-vbt.ddi_port_info[port].hdmi_boost_level || + dev_priv-vbt.ddi_port_info[port].dp_boost_level) + iboost_bit = 131; } else if (IS_BROADWELL(dev)) { ddi_translations_fdi = bdw_ddi_translations_fdi; ddi_translations_dp = bdw_ddi_translations_dp; @@ -519,7 +524,7 @@ static void intel_prepare_ddi_buffers(struct drm_device
[Intel-gfx] [PATCH 25/55] drm/i915: Update i915_gem_object_sync() to take a request structure
From: John Harrison john.c.harri...@intel.com The plan is to pass requests around as the basic submission tracking structure rather than rings and contexts. This patch updates the i915_gem_object_sync() code path. v2: Much more complex patch to share a single request between the sync and the page flip. The _sync() function now supports lazy allocation of the request structure. That is, if one is passed in then that will be used. If one is not, then a request will be allocated and passed back out. Note that the _sync() code does not necessarily require a request. Thus one will only be created until certain situations. The reason the lazy allocation must be done within the _sync() code itself is because the decision to need one or not is not really something that code above can second guess (except in the case where one is definitely not required because no ring is passed in). The call chains above _sync() now support passing a request through which most callers passing in NULL and assuming that no request will be required (because they also pass in NULL for the ring and therefore can't be generating any ring code). The exeception is intel_crtc_page_flip() which now supports having a request returned from _sync(). If one is, then that request is shared by the page flip (if the page flip is of a type to need a request). If _sync() does not generate a request but the page flip does need one, then the page flip path will create its own request. v3: Updated comment description to be clearer about 'to_req' parameter (Tomas Elf review request). Rebased onto newer tree that significantly changed the synchronisation code. v4: Updated comments from review feedback (Tomas Elf) For: VIZ-5115 Signed-off-by: John Harrison john.c.harri...@intel.com Reviewed-by: Tomas Elf tomas@intel.com --- drivers/gpu/drm/i915/i915_drv.h|4 ++- drivers/gpu/drm/i915/i915_gem.c| 48 +--- drivers/gpu/drm/i915/i915_gem_execbuffer.c |2 +- drivers/gpu/drm/i915/intel_display.c | 17 +++--- drivers/gpu/drm/i915/intel_drv.h |3 +- drivers/gpu/drm/i915/intel_fbdev.c |2 +- drivers/gpu/drm/i915/intel_lrc.c |2 +- drivers/gpu/drm/i915/intel_overlay.c |2 +- 8 files changed, 57 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 64a10fa..f69e9cb 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2778,7 +2778,8 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj) int __must_check i915_mutex_lock_interruptible(struct drm_device *dev); int i915_gem_object_sync(struct drm_i915_gem_object *obj, -struct intel_engine_cs *to); +struct intel_engine_cs *to, +struct drm_i915_gem_request **to_req); void i915_vma_move_to_active(struct i915_vma *vma, struct intel_engine_cs *ring); int i915_gem_dumb_create(struct drm_file *file_priv, @@ -2889,6 +2890,7 @@ int __must_check i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, u32 alignment, struct intel_engine_cs *pipelined, +struct drm_i915_gem_request **pipelined_request, const struct i915_ggtt_view *view); void i915_gem_object_unpin_from_display_plane(struct drm_i915_gem_object *obj, const struct i915_ggtt_view *view); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index e59369a..d7c7127 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3095,25 +3095,26 @@ out: static int __i915_gem_object_sync(struct drm_i915_gem_object *obj, struct intel_engine_cs *to, - struct drm_i915_gem_request *req) + struct drm_i915_gem_request *from_req, + struct drm_i915_gem_request **to_req) { struct intel_engine_cs *from; int ret; - from = i915_gem_request_get_ring(req); + from = i915_gem_request_get_ring(from_req); if (to == from) return 0; - if (i915_gem_request_completed(req, true)) + if (i915_gem_request_completed(from_req, true)) return 0; - ret = i915_gem_check_olr(req); + ret = i915_gem_check_olr(from_req); if (ret) return ret; if (!i915_semaphore_is_enabled(obj-base.dev)) { struct drm_i915_private *i915 = to_i915(obj-base.dev); - ret = __i915_wait_request(req, + ret = __i915_wait_request(from_req, atomic_read(i915-gpu_error.reset_counter),
Re: [Intel-gfx] [PATCH v5] drm/i915 : Added Programming of the MOCS
On Thu, Jun 18, 2015 at 01:29:45PM +0100, Peter Antoine wrote: @@ -1379,6 +1380,13 @@ static int gen8_init_rcs_context(struct intel_engine_cs *ring, if (ret) return ret; + /* + * Failing to program the MOCS is non-fatal.The system will not + * run at peak performance. So generate a warning and carry on. + */ + if (intel_rcs_context_init_mocs(ring, ctx) != 0) + DRM_ERROR(MOCS failed to program: expect performance issues.); You said to expect display corruption as well if this failed. Fortunately, if this fails, we have severe driver issues... +/** + * emit_mocs_l3cc_table() - emit the mocs control table + * @ringbuf: DRM device. + * @table: The values to program into the control regs. + * + * This function simply emits a MI_LOAD_REGISTER_IMM command for the + * given table starting at the given address. This register set is programmed + * in pairs. + * + * Return: Nothing. + */ +static void emit_mocs_l3cc_table(struct intel_ringbuffer *ringbuf, + struct drm_i915_mocs_table *table) { + unsigned int count; + unsigned int i; + u32 value; + u32 filler = (table-table[0].l3cc_value 0x) | + ((table-table[0].l3cc_value 0x) 16); l3cc_value is only u16, 0x is just noise, without you don't need the parantheses. +int intel_rcs_context_init_mocs(struct intel_engine_cs *ring, + struct intel_context *ctx) +{ + int ret = 0; + + struct drm_i915_mocs_table t; + struct drm_device *dev = ring-dev; + struct intel_ringbuffer *ringbuf = ctx-engine[ring-id].ringbuf; + + if (get_mocs_settings(dev, t)) { + u32 table_size; + + /* + * OK. For each supported ring: + * number of mocs entries * 2 dwords for each control_value + * plus number of mocs entries /2 dwords for l3cc values. + * + * Plus 1 for the load command and 1 for the NOOP per ring + * and the l3cc programming. * * With 5 rings and 63 mocs entries, this gives 715 * dwords. + */ + table_size = GEN9_NUM_MOCS_RINGS * + ((2 * GEN9_NUM_MOCS_ENTRIES) + 2) + + GEN9_NUM_MOCS_ENTRIES + 2; If you pushed the ring_begin into each function, not only would it be easier to verify, you then don't need an explanation that starts with This looks like a mistake. Validation of ring_begin/ring_advance is by review, so it has to be easy to review. + ret = intel_logical_ring_begin(ringbuf, ctx, table_size); + if (ret) { + DRM_DEBUG(intel_logical_ring_begin failed %d\n, ret); + return ret; + } -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 48/55] drm/i915: Add *_ring_begin() to request allocation
On Thu, Jun 18, 2015 at 1:21 PM, John Harrison john.c.harri...@intel.com wrote: I'm still confused by what you are saying in the above referenced email. Part of it is about the sanity checks failing to handle the wrapping case correctly which has been fixed in the base reserve space patch (patch 2 in the series). The rest is either saying that you think we are potentially wrappping too early and wasting a few bytes of the ring buffer or that something is actually broken? Yeah I didn't realize that this change was meant to fix the ring-reserved_tail check since I didn't make that connection. It is correct with that change, but the problem I see is that the correctness of that debug aid isn't assured locally: No we both need that check _and_ the correct handling of the reservation tracking at wrap-around. If the check just handles wrapping it'll robustly stay in working shape even when the wrapping behaviour changes. Point 2: 100 bytes of reserve, 160 bytes of execbuf and 200 bytes remaining. You seem to think this will fail somehow? Why? The wait_for_space(160) in the execbuf code will cause a wrap because the the 100 bytes for the add_request reservation is added on and the wait is actually being done for 260 bytes. So yes, we wrap earlier than would otherwise have been necessary but that is the only way to absolutely guarantee that the add_request() call cannot fail when trying to do the wrap itself. There's no problem except that it's wasteful. And I tried to explain that no unconditionally force-wrapping for the entire reservation is actually not needed, since the additional space needed to account for the eventual wrapping is bounded by a factor of 2. It's much less in practice since we split up the final request bits into multiple smaller intel_ring_begin. And if feels a bit wasteful to throw that space away (and make the gpu eat through MI_NOP) just because it makes caring for the worst-case harder. And with GuC the 160 dwords is actually a fairly substantial part of the ring. Even more so when we completely switch to a transaction model for request, where we only need to wrap for individual commands and hence could place intel_ring_being per-cmd (which is mostly what we do already anyway). As Chris says, if the driver is attempting to create a single request that fills the entire ringbuffer then that is a bug that should be caught as soon as possible. Even with a Guc, the ring buffer is not small compared to the size of requests the driver currently produces. Part of the scheduler work is to limit the number of batch buffers that a given application/context can have outstanding in the ring buffer at any given time in order to prevent starvation of the rest of the system by one badly behaved app. Thus completely filling a large ring buffer becomes impossible anyway - the application will be blocked before it gets that far. My proposal for this reservation wrapping business would have been: - Increase the reservation by 31 dwords (to account for the worst-case wrap in pc_render_add_request). - Rework the reservation overflow WARN_ON in reserve_space_end to work correctly even when wrapping while the reservation has been in use. - Move the addition of reserved_space below the point where we wrap the ring and only check against total free space, neglecting wrapping. - Remove all other complications you've added. Result is no forced wrapping for reservation and a debug check which should even survive random changes by monkeys since the logic for that check is fully contained within reserve_space_end. And for the check we should be able to reuse __intel_free_space. If I'm reading things correctly this shouldn't have any effect outside of patch 2 and shouldn't cause any conflicts. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Fwd: [PATCH] drm/i915: Fix IPS related flicker
On Thu, 18 Jun 2015, Ander Conselvan De Oliveira conselv...@gmail.com wrote: On Fri, 2015-06-05 at 12:11 +0300, Ville Syrjälä wrote: On Fri, Jun 05, 2015 at 11:51:42AM +0300, Jani Nikula wrote: On Thu, 04 Jun 2015, Rodrigo Vivi rodrigo.v...@gmail.com wrote: I just noticed that I had forgotten to reply-all... Jani, would you consider merge this fix with the explanation above related to Ville's question? or do you want/need any action here? Ville's question, I'd like Ville's ack on it. It's good enough for me. This part of the driver is a quite a mess anyway currently, so doesn't matter too much what we stick in there. Ping. Seems like this still isn't merged. Does it need more work or did it just fall through the cracks? It fell between the cracks. I know the world isn't black and white, but it doesn't help the maintainers when review is some shade of grey. I've pushed this to drm-intel-next-fixes for now, but it has missed the train for both the v4.1 release and the main drm-next feature pull request for the v4.2 merge window. I expect this to land upstream in v4.2-rc2, unless there's an additional drm-next pull request during the merge window. I've added cc: stable. Thanks for the patch, and I guess the review was, uh, good enough for me now... :p BR, Jani. Thanks, Ander BR, Jani. Thanks, Rodrigo. -- Forwarded message -- From: Rodrigo Vivi rodrigo.v...@gmail.com Date: Fri, May 29, 2015 at 9:45 AM Subject: Re: [Intel-gfx] [PATCH] drm/i915: Fix IPS related flicker To: Ville Syrjälä ville.syrj...@linux.intel.com On Fri, May 29, 2015 at 1:47 AM, Ville Syrjälä ville.syrj...@linux.intel.com wrote: On Thu, May 28, 2015 at 11:07:11AM -0700, Rodrigo Vivi wrote: We cannot let IPS enabled with no plane on the pipe: BSpec: IPS cannot be enabled until after at least one plane has been enabled for at least one vertical blank. and IPS must be disabled while there is still at least one plane enabled on the same pipe as IPS. This restriction apply to HSW and BDW. However a shortcut path on update primary plane function to make primary plane invisible by setting DSPCTRL to 0 was leting IPS enabled while there was no other plane enabled on the pipe causing flickerings that we were believing that it was caused by that other restriction where ips cannot be used when pixel rate is greater than 95% of cdclok. v2: Don't mess with Atomic path as pointed out by Ville. Reference: https://bugs.freedesktop.org/show_bug.cgi?id=85583 Cc: Ville Syrjälä ville.syrj...@linux.intel.com Cc: Paulo Zanoni paulo.r.zan...@intel.com Signed-off-by: Rodrigo Vivi rodrigo.v...@intel.com --- drivers/gpu/drm/i915/intel_display.c | 13 + drivers/gpu/drm/i915/intel_drv.h | 1 + 2 files changed, 14 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 4e3f302..5a6b17b 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -13309,6 +13309,16 @@ intel_check_primary_plane(struct drm_plane *plane, intel_crtc-atomic.wait_vblank = true; } + /* + * FIXME: Actually if we will still have any other plane enabled + * on the pipe we could let IPS enabled still, but for + * now lets consider that when we make primary invisible + * by setting DSPCNTR to 0 on update_primary_plane function + * IPS needs to be disable. + */ + if (!state-visible || !fb) + intel_crtc-atomic.disable_ips = true; + How could it be visible without an fb? I don't like this !fb here as well, but I just tried to keep exactly same if statement that makes I915_WRITE(DSPCNTRL, 0) on update primary plane func... intel_crtc-atomic.fb_bits |= INTEL_FRONTBUFFER_PRIMARY(intel_crtc-pipe); @@ -13406,6 +13416,9 @@ static void intel_begin_crtc_commit(struct drm_crtc *crtc) if (intel_crtc-atomic.disable_fbc) intel_fbc_disable(dev); + if (intel_crtc-atomic.disable_ips) + hsw_disable_ips(intel_crtc); + if (intel_crtc-atomic.pre_disable_primary) intel_pre_disable_primary(crtc); intel_pre_disable_primary() would already disable IPS. Except no one sets .pre_disable_primary=true. OTOH that thing mostly seems to do stuff that has nothing to do with the primary plane (cxsr disable, fifo underrun reporting disable on gen2), so I don't think we want to use that. In any case we should really have the IPS state as part of the crtc state. These global disable_foo things should just be killed IMO. Hmm,
Re: [Intel-gfx] [PATCH 48/55] drm/i915: Add *_ring_begin() to request allocation
On 17/06/2015 16:52, Chris Wilson wrote: On Wed, Jun 17, 2015 at 04:54:42PM +0200, Daniel Vetter wrote: On Wed, Jun 17, 2015 at 03:27:08PM +0100, Chris Wilson wrote: On Wed, Jun 17, 2015 at 03:31:59PM +0200, Daniel Vetter wrote: On Fri, May 29, 2015 at 05:44:09PM +0100, john.c.harri...@intel.com wrote: From: John Harrison john.c.harri...@intel.com Now that the *_ring_begin() functions no longer call the request allocation code, it is finally safe for the request allocation code to call *_ring_begin(). This is important to guarantee that the space reserved for the subsequent i915_add_request() call does actually get reserved. v2: Renamed functions according to review feedback (Tomas Elf). For: VIZ-5115 Signed-off-by: John Harrison john.c.harri...@intel.com Still has my question open from the previos round: http://mid.gmane.org/20150323091030.GL1349@phenom.ffwll.local Note that this isn't all that unlikely with GuC mode since there the ringbuffer is substantially smaller (due to firmware limitations) than what we allocate ourselves right now. Looking at this patch, I am still fundamentally opposed to reserving space for the request. Detecting a request that wraps and cancelling that request (after the appropriate WARN for the overlow) is trivial and such a rare case (as it is a programming error) that it should only be handled in the slow path. I thought the entire point here that we don't have request half-committed because the final request ringcmds didn't fit in. And that does require that we reserve a bit of space for that postamble. I guess if it's too much (atm it's super-pessimistic due to ilk) we can make per-platform reservation limits to be really minimal. Maybe we could go towards a rollback model longterm of rewingind the ringbuffer. But if there's no clear need I'd like to avoid that complexity. Even if you didn't like the rollback model which helps handling the partial state from context switches and what not, if you run out of ringspace you can set the GPU as wedged. Issuing a request that fills the entire ringbuffer is a programming bug that needs to be caught very early in development. -Chris I'm still confused by what you are saying in the above referenced email. Part of it is about the sanity checks failing to handle the wrapping case correctly which has been fixed in the base reserve space patch (patch 2 in the series). The rest is either saying that you think we are potentially wrappping too early and wasting a few bytes of the ring buffer or that something is actually broken? Point 2: 100 bytes of reserve, 160 bytes of execbuf and 200 bytes remaining. You seem to think this will fail somehow? Why? The wait_for_space(160) in the execbuf code will cause a wrap because the the 100 bytes for the add_request reservation is added on and the wait is actually being done for 260 bytes. So yes, we wrap earlier than would otherwise have been necessary but that is the only way to absolutely guarantee that the add_request() call cannot fail when trying to do the wrap itself. As Chris says, if the driver is attempting to create a single request that fills the entire ringbuffer then that is a bug that should be caught as soon as possible. Even with a Guc, the ring buffer is not small compared to the size of requests the driver currently produces. Part of the scheduler work is to limit the number of batch buffers that a given application/context can have outstanding in the ring buffer at any given time in order to prevent starvation of the rest of the system by one badly behaved app. Thus completely filling a large ring buffer becomes impossible anyway - the application will be blocked before it gets that far. Note that with the removal of the OLR, all requests now have a definite start and a definite end. Thus the scheme could be extended to provide rollback of the ring buffer. Each new request takes a note of the ring pointers at creation time. If the request is cancelled it can reset the pointers to where they were before. Thus all half submitted work is discarded. That is a much bigger semantic change however, so I would really like to get the bare minimum anti-OLR patch set in first before trying to do fancy extra features. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 55/55] drm/i915: Rename the somewhat reduced i915_gem_object_flush_active()
On Thu, Jun 18, 2015 at 12:03:12PM +0100, John Harrison wrote: On 17/06/2015 15:21, Chris Wilson wrote: On Wed, Jun 17, 2015 at 04:06:05PM +0200, Daniel Vetter wrote: On Fri, May 29, 2015 at 05:44:16PM +0100, john.c.harri...@intel.com wrote: From: John Harrison john.c.harri...@intel.com The i915_gem_object_flush_active() call used to do lots. Over time it has done less and less. Now all it does check the various associated requests to see if they can be retired. Hence this patch renames the function and updates the comments around it to match the current operation. For: VIZ-5115 Signed-off-by: John Harrison john.c.harri...@intel.com When rebasing patches and especially like here when also renaming them a bit please leave some indication of what you've changed. Took me a while to figure out where one of my pending comments from the previous round went too. And please don't just v2: rebase, but please add some indicators against what it conflicted if it's obvious. This function doesn't do an unconditional retire - the new name is much worse since it is inconsistent with how requests retire. In my make GEM umpteen times faster patches, I repurposed this function for reporting the object's current activeness and called it bool i915_gem_oject_active() - though that is probably better as i915_gem_object_is_active(). -Chris Retiring is generally not an unconditional operation. In the code, I use object_retire to perform the retiring operation on that object. I can rename i915_gem_retire_requests if that makes you happier, but I don't think it needs to since retire_requests does not imply to me that all requests are retired, just some indefinite value (though positive indefinite at least!). -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC 03/11] drm/i915: Add reset stats entry point for per-engine reset.
On 16/06/15 14:54, Chris Wilson wrote: On Tue, Jun 16, 2015 at 03:48:09PM +0200, Daniel Vetter wrote: On Mon, Jun 08, 2015 at 06:33:59PM +0100, Chris Wilson wrote: On Mon, Jun 08, 2015 at 06:03:21PM +0100, Tomas Elf wrote: In preparation for per-engine reset add way for setting context reset stats. OPEN QUESTIONS: 1. How do we deal with get_reset_stats and the GL robustness interface when introducing per-engine resets? a. Do we set context that cause per-engine resets as guilty? If so, how does this affect context banning? Yes. If the reset works quicker, then we can set a higher threshold for DoS detection, but we still do need Dos detection? b. Do we extend the publically available reset stats to also contain per-engine reset statistics? If so, would this break the ABI? No. The get_reset_stats is targetted at the GL API and describing it in terms of whether my context is guilty or has been affected. That is orthogonal to whether the reset was on a single ring or the entire GPU - the question is how broad do want the affected to be. Ideally a per-context reset wouldn't necessarily impact others, except for the surfaces shared between them... gl computes sharing sets itself, the kernel only tells it whether a given context has been victimized, i.e. one of it's batches was not properly executed due to reset after a hang. So you don't think we should delete all pending requests that depend upon state from the hung request? -Chris John Harrison I discussed this yesterday; he's against doing so (even though the scheduler is ideally placed to do it, if that were actually the preferred policy). The primary argument (as I see it) is that you actually don't and can't know the nature of an apparent dependency between batches that share a buffer object. There are at least three cases: 1. tightly-coupled: the dependent batch is going to rely on data produced by the earlier batch. In this case, GIGO applies and the results will be undefined, possibly including a further hang. Subsequent batches presumably belong to the same or a closely-related (co-operating) task, and killing them might be a reasonable strategy here. 2. loosely-coupled: the dependent batch is going to access the data, but not in any way that depends on the content (for example, blitting a rectangle into a composition buffer). The result will be wrong, but only in a limited way (e.g. window belonging to the faulty application will appear corrupted). The dependent batches may well belong to unrelated system tasks (e.g. X or surfaceflinger) and killing them is probably not justified. 3. uncoupled: the dependent batch wants the /buffer/, not the data in it (most likely a framebuffer or similar object). Any incorrect data in the buffer is irrelevant. Killing off subsequent batches would be wrong. Buffer access mode (readonly, read/write, writeonly) might allow us to distinguish these somewhat, but probably not enough to help make the right decision. So the default must be *not* to kill off dependants automatically, but if the failure does propagate in such a way as to cause further consequent hangs, then the context-banning mechanism should eventually catch and block all the downstream effects. .Dave. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 55/55] drm/i915: Rename the somewhat reduced i915_gem_object_flush_active()
On 18/06/2015 12:10, Chris Wilson wrote: On Thu, Jun 18, 2015 at 12:03:12PM +0100, John Harrison wrote: On 17/06/2015 15:21, Chris Wilson wrote: On Wed, Jun 17, 2015 at 04:06:05PM +0200, Daniel Vetter wrote: On Fri, May 29, 2015 at 05:44:16PM +0100, john.c.harri...@intel.com wrote: From: John Harrison john.c.harri...@intel.com The i915_gem_object_flush_active() call used to do lots. Over time it has done less and less. Now all it does check the various associated requests to see if they can be retired. Hence this patch renames the function and updates the comments around it to match the current operation. For: VIZ-5115 Signed-off-by: John Harrison john.c.harri...@intel.com When rebasing patches and especially like here when also renaming them a bit please leave some indication of what you've changed. Took me a while to figure out where one of my pending comments from the previous round went too. And please don't just v2: rebase, but please add some indicators against what it conflicted if it's obvious. This function doesn't do an unconditional retire - the new name is much worse since it is inconsistent with how requests retire. In my make GEM umpteen times faster patches, I repurposed this function for reporting the object's current activeness and called it bool i915_gem_oject_active() - though that is probably better as i915_gem_object_is_active(). -Chris Retiring is generally not an unconditional operation. In the code, I use object_retire to perform the retiring operation on that object. I can rename i915_gem_retire_requests if that makes you happier, but I don't think it needs to since retire_requests does not imply to me that all requests are retired, just some indefinite value (though positive indefinite at least!). -Chris Fair enough. I guess I'm still thinking of the driver as it was when I first wrote the patch series which was before your re-write for read/read optimisations. Like I said, the exact new name isn't as important as at least giving it a new name. The old name is definitely not valid any more. Feel free to suggest something better. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v5] drm/i915 : Added Programming of the MOCS
On Thu, Jun 18, 2015 at 01:29:45PM +0100, Peter Antoine wrote: @@ -1379,6 +1380,13 @@ static int gen8_init_rcs_context(struct intel_engine_cs *ring, if (ret) return ret; + /* + * Failing to program the MOCS is non-fatal.The system will not + * run at peak performance. So generate a warning and carry on. + */ + if (intel_rcs_context_init_mocs(ring, ctx) != 0) + DRM_ERROR(MOCS failed to program: expect performance issues.); + Missing a '\n'. +static const struct drm_i915_mocs_entry skylake_mocs_table[] = { + /* {0x0009, 0x0010} */ + {(MOCS_CACHEABILITY(EDRAM_UC) | MOCS_TGT_CACHE(LLC_ELLC) | + MOCS_LRUM(0) | MOCS_AOM(0) | MOCS_LECC_ESC(0) | MOCS_SCC(0) | + MOC_PFM(0) | MOCS_SCF(0)), + (MOCS_ESC(0) | MOCS_SCC(0) | MOCS_L3_CACHEABILITY(L3_UC))}, + /* {0x003b, 0x0030} */ We're still missing the usage hints for those configuration entries That'd help user space a lot, which means make this patch land quicker as well. +int intel_rcs_context_init_mocs(struct intel_engine_cs *ring, + struct intel_context *ctx) +{ + int ret = 0; + + struct drm_i915_mocs_table t; + struct drm_device *dev = ring-dev; + struct intel_ringbuffer *ringbuf = ctx-engine[ring-id].ringbuf; + + if (get_mocs_settings(dev, t)) { + u32 table_size; + + /* + * OK. For each supported ring: + * number of mocs entries * 2 dwords for each control_value + * plus number of mocs entries /2 dwords for l3cc values. + * + * Plus 1 for the load command and 1 for the NOOP per ring + * and the l3cc programming. + */ + table_size = GEN9_NUM_MOCS_RINGS * + ((2 * GEN9_NUM_MOCS_ENTRIES) + 2) + + GEN9_NUM_MOCS_ENTRIES + 2; + ret = intel_logical_ring_begin(ringbuf, ctx, table_size); + if (ret) { + DRM_DEBUG(intel_logical_ring_begin failed %d\n, ret); + return ret; + } + + /* program the control registers */ + emit_mocs_control_table(ringbuf, t, GEN9_GFX_MOCS_0); + emit_mocs_control_table(ringbuf, t, GEN9_MFX0_MOCS_0); + emit_mocs_control_table(ringbuf, t, GEN9_MFX1_MOCS_0); + emit_mocs_control_table(ringbuf, t, GEN9_VEBOX_MOCS_0); + emit_mocs_control_table(ringbuf, t, GEN9_BLT_MOCS_0); So, if I'm not mistaken, I think this only works because we fully initialize the default context at start/reset time through: + i915_gem_init_hw() + i915_gem_context_enable() + cycle through all the rings and call ring-init_context() + gen8_init_rcs_context() + intel_rcs_context_init_mocs() (initalize ALL the MOCS!) So, intializing the other (non-render) MOCS in gen8_init_rcs_context() isn't the most logical thing to do I'm afraid. What happens if we suddenly decide that we don't want to fully initialize the default context at startup but initialize each ring on-demand for that context as well? We can end up in a situation where we use the blitter first and we wouldn't have the blitter MOCS initialized. In that sense, that code makes an assumption about how we do things in a completely different part of the driver and that's always a potential source of bugs. Chris, how far am I ? :p One way to solve this (if that's indeed the issue pointed at by Chris) would be to decouple the render MOCS from the others, still keep the render ones in there as they need to be emitted from the ring but put the other writes (which could be done through MMIO as well) higher in the chain, could probably make sense in i915_gem_context_enable()? (which, by the way is awfully namedm should have an _init somewhere?). It could also be a per-ring vfunc I suppose. For similar reasons, I think the GuC MOCS should be part of the GuC init as well so we don't couple too hard different part of the code. Now, is that really a blocker? I'd say no if we had userspace ready and could commit that today, because we really want it. Still something to look at, I could be totally wrong. The separate header for a single function isn't something we usually do either, but that can always be folded in later. -- Damien ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 25/55] drm/i915: Update i915_gem_object_sync() to take a request structure
On 18/06/2015 13:21, Chris Wilson wrote: On Thu, Jun 18, 2015 at 01:14:56PM +0100, john.c.harri...@intel.com wrote: From: John Harrison john.c.harri...@intel.com The plan is to pass requests around as the basic submission tracking structure rather than rings and contexts. This patch updates the i915_gem_object_sync() code path. v2: Much more complex patch to share a single request between the sync and the page flip. The _sync() function now supports lazy allocation of the request structure. That is, if one is passed in then that will be used. If one is not, then a request will be allocated and passed back out. Note that the _sync() code does not necessarily require a request. Thus one will only be created until certain situations. The reason the lazy allocation must be done within the _sync() code itself is because the decision to need one or not is not really something that code above can second guess (except in the case where one is definitely not required because no ring is passed in). The call chains above _sync() now support passing a request through which most callers passing in NULL and assuming that no request will be required (because they also pass in NULL for the ring and therefore can't be generating any ring code). The exeception is intel_crtc_page_flip() which now supports having a request returned from _sync(). If one is, then that request is shared by the page flip (if the page flip is of a type to need a request). If _sync() does not generate a request but the page flip does need one, then the page flip path will create its own request. v3: Updated comment description to be clearer about 'to_req' parameter (Tomas Elf review request). Rebased onto newer tree that significantly changed the synchronisation code. v4: Updated comments from review feedback (Tomas Elf) For: VIZ-5115 Signed-off-by: John Harrison john.c.harri...@intel.com Reviewed-by: Tomas Elf tomas@intel.com --- drivers/gpu/drm/i915/i915_drv.h|4 ++- drivers/gpu/drm/i915/i915_gem.c| 48 +--- drivers/gpu/drm/i915/i915_gem_execbuffer.c |2 +- drivers/gpu/drm/i915/intel_display.c | 17 +++--- drivers/gpu/drm/i915/intel_drv.h |3 +- drivers/gpu/drm/i915/intel_fbdev.c |2 +- drivers/gpu/drm/i915/intel_lrc.c |2 +- drivers/gpu/drm/i915/intel_overlay.c |2 +- 8 files changed, 57 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 64a10fa..f69e9cb 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2778,7 +2778,8 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj) int __must_check i915_mutex_lock_interruptible(struct drm_device *dev); int i915_gem_object_sync(struct drm_i915_gem_object *obj, -struct intel_engine_cs *to); +struct intel_engine_cs *to, +struct drm_i915_gem_request **to_req); Nope. Did you forget to reorder the code to ensure that the request is allocated along with the context switch at the start of execbuf? -Chris Not sure what you are objecting to? If you mean the lazily allocated request then that is for page flip code not execbuff code. If we get here from an execbuff call then the request will definitely have been allocated and will be passed in. Whereas the page flip code may or may not require a request (depending on whether MMIO or ring flips are in use. Likewise the sync code may or may not require a request (depending on whether there is anything to sync to or not). There is no point allocating and submitting an empty request in the MMIO/idle case. Hence the sync code needs to be able to use an existing request or create one if none already exists. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Reset request handling for gen8+
On 18/06/2015 11:36, Chris Wilson wrote: On Thu, Jun 18, 2015 at 11:11:55AM +0100, Tomas Elf wrote: On 18/06/2015 10:51, Mika Kuoppala wrote: In order for gen8+ hardware to guarantee that no context switch takes place during engine reset and that current context is properly saved, the driver needs to notify and query hw before commencing with reset. There are gpu hangs where the engine gets so stuck that it never will report to be ready for reset. We could proceed with reset anyway, but with some hangs with skl, the forced gpu reset will result in a system hang. By inspecting the unreadiness for reset seems to correlate with the probable system hang. We will only proceed with reset if all engines report that they are ready for reset. If root cause for system hang is found and can be worked around with another means, we can reconsider if we can reinstate full reset for unreadiness case. v2: -EIO, Recovery, gen8 (Chris, Tomas, Daniel) v3: updated commit msg v4: timeout_ms, simpler error path (Chris) References: https://bugs.freedesktop.org/show_bug.cgi?id=89959 References: https://bugs.freedesktop.org/show_bug.cgi?id=90854 Testcase: igt/gem_concurrent_blit --r prw-blt-overwrite-source-read-rcs-forked Testcase: igt/gem_concurrent_blit --r gtt-blt-overwrite-source-read-rcs-forked Cc: Chris Wilson ch...@chris-wilson.co.uk Cc: Daniel Vetter daniel.vet...@ffwll.ch Cc: Tomas Elf tomas@intel.com Signed-off-by: Mika Kuoppala mika.kuopp...@intel.com --- drivers/gpu/drm/i915/i915_reg.h | 3 +++ drivers/gpu/drm/i915/intel_uncore.c | 43 - 2 files changed, 45 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 0b979ad..3684f92 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -1461,6 +1461,9 @@ enum skl_disp_power_wells { #define RING_MAX_IDLE(base) ((base)+0x54) #define RING_HWS_PGA(base) ((base)+0x80) #define RING_HWS_PGA_GEN6(base) ((base)+0x2080) +#define RING_RESET_CTL(base) ((base)+0xd0) +#define RESET_CTL_REQUEST_RESET (1 0) +#define RESET_CTL_READY_TO_RESET (1 1) #define HSW_GTT_CACHE_EN 0x4024 #define GTT_CACHE_EN_ALL 0xF0007FFF diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 4a86cf0..160a47a 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -1455,9 +1455,50 @@ static int gen6_do_reset(struct drm_device *dev) return ret; } +static int wait_for_register(struct drm_i915_private *dev_priv, + const u32 reg, + const u32 mask, + const u32 value, + const unsigned long timeout_ms) +{ + return wait_for((I915_READ(reg) mask) == value, timeout_ms); +} + +static int gen8_do_reset(struct drm_device *dev) +{ + struct drm_i915_private *dev_priv = dev-dev_private; + struct intel_engine_cs *engine; + int i; + + for_each_ring(engine, dev_priv, i) { + I915_WRITE(RING_RESET_CTL(engine-mmio_base), + _MASKED_BIT_ENABLE(RESET_CTL_REQUEST_RESET)); + + if (wait_for_register(dev_priv, +RING_RESET_CTL(engine-mmio_base), +RESET_CTL_READY_TO_RESET, +RESET_CTL_READY_TO_RESET, +700)) { + DRM_ERROR(%s: reset request timeout\n, engine-name); + goto not_ready; + } So just to be clear here: If one or more of the reset control registers decide that they are at a point where they will never again be ready for reset we will simply not do a full GPU reset until reboot? Is there perhaps a case where you would want to try reset request once or twice or like five times or whatever but then simply go ahead with the full GPU reset regardless of what the reset control register tells you? After all, it's our only way out if the hardware is truly stuck. What happens is that we skip the reset, report an error and that marks the GPU as wedged. To get out of that state requires user intervention, either by rebooting or through use of debugfs/i915_wedged. That's a fair point, we will mark the GPU as terminally wedged. That's always been there as a final state where we simply give up. I guess it might be better to actively mark the GPU as terminally wedged from the driver's point of view rather than plow ahead in a last ditch effort to reset the GPU, which may or may not succeed and which may irrecoverably hang the system in the worst case. I guess we at least protect the currently running context if we just mark the GPU as terminally wedged instead of putting it in a potentially undefined state. We can try to repeat the reset from a workqueue, but we should first
Re: [Intel-gfx] [PATCH v4] drm/i915 : Added Programming of the MOCS
On Thu, 2015-06-18 at 10:10 +0100, ch...@chris-wilson.co.uk wrote: On Thu, Jun 18, 2015 at 08:45:10AM +, Antoine, Peter wrote: On Thu, 2015-06-18 at 08:49 +0100, ch...@chris-wilson.co.uk wrote: On Thu, Jun 18, 2015 at 07:36:41AM +, Antoine, Peter wrote: On Wed, 2015-06-17 at 17:33 +0100, Chris Wilson wrote: On Wed, Jun 17, 2015 at 04:19:22PM +0100, Peter Antoine wrote: This change adds the programming of the MOCS registers to the gen 9+ platforms. This change set programs the MOCS register values to a set of values that are defined to be optimal. It creates a fixed register set that is programmed across the different engines so that all engines have the same table. This is done as the main RCS context only holds the registers for itself and the shared L3 values. By trying to keep the registers consistent across the different engines it should make the programming for the registers consistent. v2: -'static const' for private data structures and style changes.(Matt Turner) v3: - Make the tables slightly more readable. (Damien Lespiau) - Updated tables fix performance regression. v4: - Code formatting. (Chris Wilson) - re-privatised mocs code. (Daniel Vetter) Being really picky now, but reading your comments impressed upon me the importance of reinforcing one particular point... + /* +* Failing to program the MOCS is non-fatal.The system will not +* run at peak performance. So generate a warning and carry on. +*/ + if (gen9_program_mocs(ring, ctx) != 0) I think this is better as intel_rcs_context_init_mocs(). Too me it is important that you emphasize this is to be run once during very early initialisation to setup the first context prior to anything else. i.e. All subsequent execution state must be derived from this. Renaming it as intel_rcs_context_init_mocs(): 1 - indicates you have written it to handle all generation, this is important as you are otherwise passing in gen8 into a gen9 function. 2 - it is only called during RCS-init_context() and must not be called at any other time - this avoids the issue of modifying registers used by other rings at runtime, which is the trap you lead me into last time. No problem with that.But adding rcs to the original name suggests that it is only setting up the rcs engine and not all the engines. If any of the other context engines have there context extended then we may need to call the function from other ring initialise functions. intel_rcs_context is the object init_mocs is the verb, with init being a fairly well defined phase of context operatinons. My suggestion is that is only run during RCS context init. The comments tell us that it affects all rings - and so we must emphasize that the RCS context init *must* be run before the other rings are enabled for submission. If we have contexts being initialised on other rings, then one would not think of calling intel_rcs_context_init* but instead think of how we would need to interact with concurrent engine initialisation. Being specifc here should stop someone simply calling the function and hoping for the best. I'll change it to intel_context_emit_mocs() as this does say what it does on the tin, it only emits the mocs to the context and does not program them. That misses the point I am trying to make. I don't get your point, the original seemed good to me. Changing name to what you want as this needs to get in. My point is that it is not a generic function and must be called at a certain phase of context construction and lrc initialisation. I am trying to suggest a name that encapsulates that to avoid possible misuse. + if (IS_SKYLAKE(dev)) { + table-size = ARRAY_SIZE(skylake_mocs_table); + table-table = skylake_mocs_table; + result = true; + } else if (IS_BROXTON(dev)) { + table-size = ARRAY_SIZE(broxton_mocs_table); + table-table = broxton_mocs_table; + result = true; + } else { + /* Platform that should have a MOCS table does not */ + WARN_ON(INTEL_INFO(dev)-gen = 9); result = false; here would be fewer lines of code today and tomorrow. :) Fail safe return value. Makes not difference here, but golden in larger functions. Actually I don't see why you can't encode the ARRAY_SIZE into the static const tables, then the return value is just the appropriate table. If you don't set a default value, then you get a compiler warning telling you missed adding it your new
[Intel-gfx] [PATCH v5] drm/i915 : Added Programming of the MOCS
This change adds the programming of the MOCS registers to the gen 9+ platforms. This change set programs the MOCS register values to a set of values that are defined to be optimal. It creates a fixed register set that is programmed across the different engines so that all engines have the same table. This is done as the main RCS context only holds the registers for itself and the shared L3 values. By trying to keep the registers consistent across the different engines it should make the programming for the registers consistent. v2: -'static const' for private data structures and style changes.(Matt Turner) v3: - Make the tables slightly more readable. (Damien Lespiau) - Updated tables fix performance regression. v4: - Code formatting. (Chris Wilson) - re-privatised mocs code. (Daniel Vetter) v5: - Changed the name of a function. (Chris Wilson) Signed-off-by: Peter Antoine peter.anto...@intel.com --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/i915_reg.h | 9 + drivers/gpu/drm/i915/intel_lrc.c | 10 +- drivers/gpu/drm/i915/intel_lrc.h | 4 + drivers/gpu/drm/i915/intel_mocs.c | 370 ++ drivers/gpu/drm/i915/intel_mocs.h | 61 +++ 6 files changed, 454 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/i915/intel_mocs.c create mode 100644 drivers/gpu/drm/i915/intel_mocs.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index b7ddf48..c781e19 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -35,6 +35,7 @@ i915-y += i915_cmd_parser.o \ i915_irq.o \ i915_trace_points.o \ intel_lrc.o \ + intel_mocs.o \ intel_ringbuffer.o \ intel_uncore.o diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 7213224..3a435b5 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -7829,4 +7829,13 @@ enum skl_disp_power_wells { #define _PALETTE_A (dev_priv-info.display_mmio_offset + 0xa000) #define _PALETTE_B (dev_priv-info.display_mmio_offset + 0xa800) +/* MOCS (Memory Object Control State) registers */ +#define GEN9_LNCFCMOCS0(0xB020)/* L3 Cache Control base */ + +#define GEN9_GFX_MOCS_0(0xc800)/* Graphics MOCS base register*/ +#define GEN9_MFX0_MOCS_0 (0xc900)/* Media 0 MOCS base register*/ +#define GEN9_MFX1_MOCS_0 (0xcA00)/* Media 1 MOCS base register*/ +#define GEN9_VEBOX_MOCS_0 (0xcB00)/* Video MOCS base register*/ +#define GEN9_BLT_MOCS_0(0xcc00)/* Blitter MOCS base register*/ + #endif /* _I915_REG_H_ */ diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 9f5485d..dd01caf 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -135,6 +135,7 @@ #include drm/drmP.h #include drm/i915_drm.h #include i915_drv.h +#include intel_mocs.h #define GEN9_LR_CONTEXT_RENDER_SIZE (22 * PAGE_SIZE) #define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE) @@ -796,7 +797,7 @@ static int logical_ring_prepare(struct intel_ringbuffer *ringbuf, * * Return: non-zero if the ringbuffer is not ready to be written to. */ -static int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, +int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, struct intel_context *ctx, int num_dwords) { struct intel_engine_cs *ring = ringbuf-ring; @@ -1379,6 +1380,13 @@ static int gen8_init_rcs_context(struct intel_engine_cs *ring, if (ret) return ret; + /* +* Failing to program the MOCS is non-fatal.The system will not +* run at peak performance. So generate a warning and carry on. +*/ + if (intel_rcs_context_init_mocs(ring, ctx) != 0) + DRM_ERROR(MOCS failed to program: expect performance issues.); + return intel_lr_context_render_state_init(ring, ctx); } diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h index 04d3a6d..dbbd6af 100644 --- a/drivers/gpu/drm/i915/intel_lrc.h +++ b/drivers/gpu/drm/i915/intel_lrc.h @@ -44,6 +44,10 @@ int intel_logical_rings_init(struct drm_device *dev); int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf, struct intel_context *ctx); + +int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, + struct intel_context *ctx, int num_dwords); + /** * intel_logical_ring_advance() - advance the ringbuffer tail * @ringbuf: Ringbuffer to advance. diff --git a/drivers/gpu/drm/i915/intel_mocs.c b/drivers/gpu/drm/i915/intel_mocs.c new file mode 100644 index 000..1651379e --- /dev/null +++ b/drivers/gpu/drm/i915/intel_mocs.c @@ -0,0 +1,370 @@ +/* + * Copyright (c) 2015 Intel Corporation + * + * Permission is hereby
Re: [Intel-gfx] [PATCH] drm/i915: Per-DDI I_boost override
Just FYI, this patch depends on David Weinehall's Buffer translation improvements patch from earlier today. -- - Antti ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 02/55] drm/i915: Reserve ring buffer space for i915_add_request() commands
From: John Harrison john.c.harri...@intel.com It is a bad idea for i915_add_request() to fail. The work will already have been send to the ring and will be processed, but there will not be any tracking or management of that work. The only way the add request call can fail is if it can't write its epilogue commands to the ring (cache flushing, seqno updates, interrupt signalling). The reasons for that are mostly down to running out of ring buffer space and the problems associated with trying to get some more. This patch prevents that situation from happening in the first place. When a request is created, it marks sufficient space as reserved for the epilogue commands. Thus guaranteeing that by the time the epilogue is written, there will be plenty of space for it. Note that a ring_begin() call is required to actually reserve the space (and do any potential waiting). However, that is not currently done at request creation time. This is because the ring_begin() code can allocate a request. Hence calling begin() from the request allocation code would lead to infinite recursion! Later patches in this series remove the need for begin() to do the allocate. At that point, it becomes safe for the allocate to call begin() and really reserve the space. Until then, there is a potential for insufficient space to be available at the point of calling i915_add_request(). However, that would only be in the case where the request was created and immediately submitted without ever calling ring_begin() and adding any work to that request. Which should never happen. And even if it does, and if that request happens to fall down the tiny window of opportunity for failing due to being out of ring space then does it really matter because the request wasn't doing anything in the first place? v2: Updated the 'reserved space too small' warning to include the offending sizes. Added a 'cancel' operation to clean up when a request is abandoned. Added re-initialisation of tracking state after a buffer wrap to keep the sanity checks accurate. v3: Incremented the reserved size to accommodate Ironlake (after finally managing to run on an ILK system). Also fixed missing wrap code in LRC mode. v4: Added extra comment and removed duplicate WARN (feedback from Tomas). For: VIZ-5115 CC: Tomas Elf tomas@intel.com Signed-off-by: John Harrison john.c.harri...@intel.com --- drivers/gpu/drm/i915/i915_drv.h |1 + drivers/gpu/drm/i915/i915_gem.c | 37 drivers/gpu/drm/i915/intel_lrc.c| 21 + drivers/gpu/drm/i915/intel_ringbuffer.c | 71 ++- drivers/gpu/drm/i915/intel_ringbuffer.h | 25 +++ 5 files changed, 153 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 0347eb9..eba1857 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2187,6 +2187,7 @@ struct drm_i915_gem_request { int i915_gem_request_alloc(struct intel_engine_cs *ring, struct intel_context *ctx); +void i915_gem_request_cancel(struct drm_i915_gem_request *req); void i915_gem_request_free(struct kref *req_ref); static inline uint32_t diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 81f3512..85fa27b 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2485,6 +2485,13 @@ int __i915_add_request(struct intel_engine_cs *ring, } else ringbuf = ring-buffer; + /* +* To ensure that this call will not fail, space for its emissions +* should already have been reserved in the ring buffer. Let the ring +* know that it is time to use that space up. +*/ + intel_ring_reserved_space_use(ringbuf); + request_start = intel_ring_get_tail(ringbuf); /* * Emit any outstanding flushes - execbuf can fail to emit the flush @@ -2567,6 +2574,9 @@ int __i915_add_request(struct intel_engine_cs *ring, round_jiffies_up_relative(HZ)); intel_mark_busy(dev_priv-dev); + /* Sanity check that the reserved size was large enough. */ + intel_ring_reserved_space_end(ringbuf); + return 0; } @@ -2666,6 +2676,26 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring, if (ret) goto err; + /* +* Reserve space in the ring buffer for all the commands required to +* eventually emit this request. This is to guarantee that the +* i915_add_request() call can't fail. Note that the reserve may need +* to be redone if the request is not actually submitted straight +* away, e.g. because a GPU scheduler has deferred it. +* +* Note further that this call merely notes the reserve request. A +* subsequent call to *_ring_begin() is required to actually ensure +* that the reservation is
[Intel-gfx] [PATCH 15/55] drm/i915: Split i915_ppgtt_init_hw() in half - generic and per ring
From: John Harrison john.c.harri...@intel.com The i915_gem_init_hw() function calls a bunch of smaller initialisation functions. Multiple of which have generic sections and per ring sections. This means multiple passes are done over the rings. Each pass writes data to the ring which floats around in that ring's OLR until some random point in the future when an add_request() is done by some random other piece of code. This patch breaks i915_ppgtt_init_hw() in two with the per ring initialisation now being done in i915_ppgtt_init_ring(). The ring looping is now done at the top level in i915_gem_init_hw(). v2: Fix dumb loop variable re-use. For: VIZ-5115 Signed-off-by: John Harrison john.c.harri...@intel.com Reviewed-by: Tomas Elf tomas@intel.com --- drivers/gpu/drm/i915/i915_gem.c | 27 --- drivers/gpu/drm/i915/i915_gem_gtt.c | 28 +++- drivers/gpu/drm/i915/i915_gem_gtt.h |1 + 3 files changed, 36 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index ac893e3..dff21bd 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -5016,7 +5016,7 @@ i915_gem_init_hw(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev-dev_private; struct intel_engine_cs *ring; - int ret, i; + int ret, i, j; if (INTEL_INFO(dev)-gen 6 !intel_enable_gtt()) return -EIO; @@ -5053,19 +5053,32 @@ i915_gem_init_hw(struct drm_device *dev) */ init_unused_rings(dev); + ret = i915_ppgtt_init_hw(dev); + if (ret) { + DRM_ERROR(PPGTT enable HW failed %d\n, ret); + goto out; + } + + /* Need to do basic initialisation of all rings first: */ for_each_ring(ring, dev_priv, i) { ret = ring-init_hw(ring); if (ret) goto out; } - for (i = 0; i NUM_L3_SLICES(dev); i++) - i915_gem_l3_remap(dev_priv-ring[RCS], i); + /* Now it is safe to go back round and do everything else: */ + for_each_ring(ring, dev_priv, i) { + if (ring-id == RCS) { + for (j = 0; j NUM_L3_SLICES(dev); j++) + i915_gem_l3_remap(ring, j); + } - ret = i915_ppgtt_init_hw(dev); - if (ret ret != -EIO) { - DRM_ERROR(PPGTT enable failed %d\n, ret); - i915_gem_cleanup_ringbuffer(dev); + ret = i915_ppgtt_init_ring(ring); + if (ret ret != -EIO) { + DRM_ERROR(PPGTT enable ring #%d failed %d\n, i, ret); + i915_gem_cleanup_ringbuffer(dev); + goto out; + } } ret = i915_gem_context_enable(dev_priv); diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 17b7df0..b14ae63 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -1543,11 +1543,6 @@ int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt) int i915_ppgtt_init_hw(struct drm_device *dev) { - struct drm_i915_private *dev_priv = dev-dev_private; - struct intel_engine_cs *ring; - struct i915_hw_ppgtt *ppgtt = dev_priv-mm.aliasing_ppgtt; - int i, ret = 0; - /* In the case of execlists, PPGTT is enabled by the context descriptor * and the PDPs are contained within the context itself. We don't * need to do anything here. */ @@ -1566,16 +1561,23 @@ int i915_ppgtt_init_hw(struct drm_device *dev) else MISSING_CASE(INTEL_INFO(dev)-gen); - if (ppgtt) { - for_each_ring(ring, dev_priv, i) { - ret = ppgtt-switch_mm(ppgtt, ring); - if (ret != 0) - return ret; - } - } + return 0; +} - return ret; +int i915_ppgtt_init_ring(struct intel_engine_cs *ring) +{ + struct drm_i915_private *dev_priv = ring-dev-dev_private; + struct i915_hw_ppgtt *ppgtt = dev_priv-mm.aliasing_ppgtt; + + if (i915.enable_execlists) + return 0; + + if (!ppgtt) + return 0; + + return ppgtt-switch_mm(ppgtt, ring); } + struct i915_hw_ppgtt * i915_ppgtt_create(struct drm_device *dev, struct drm_i915_file_private *fpriv) { diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h index 0d46dd2..0caa9eb 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.h +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h @@ -475,6 +475,7 @@ void i915_global_gtt_cleanup(struct drm_device *dev); int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt); int i915_ppgtt_init_hw(struct drm_device *dev); +int i915_ppgtt_init_ring(struct intel_engine_cs *ring); void
Re: [Intel-gfx] [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c
On Thu, Jun 18, 2015 at 12:49:55PM +0100, Dave Gordon wrote: On 17/06/15 13:02, Daniel Vetter wrote: Domain handling is required for all gem objects, and the resulting bugs if you don't for one-off objects are absolutely no fun to track down. Is it not the case that the new object returned by i915_gem_alloc_object() is (a) of a type that can be mapped into the GTT, and (b) initially in the CPU domain for both reading and writing? So AFAICS the allocate-and-fill function I'm describing (to appear in next patch series respin) doesn't need any further domain handling. A i915_gem_object_create_from_data() is a reasonable addition, and I suspect it will make the code a bit more succinct. Whilst your statement is true today, calling set_domain is then a no-op, and helps document how you use the object and so reduces the likelihood of us introducing bugs in the future. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] igt: remove deprecated reg access tools in favor of intel_reg
intel_iosf_sb_read, intel_iosf_sb_write, intel_reg_dumper, intel_reg_read, intel_reg_snapshot, intel_reg_write, intel_vga_read, and intel_vga_write have been deprecated in favor of intel_reg. Remove the deprecated tools. intel_reg does everything they do, and more. Signed-off-by: Jani Nikula jani.nik...@intel.com --- man/Makefile.am |3 - man/intel_reg_dumper.man| 33 - man/intel_reg_read.man | 15 - man/intel_reg_snapshot.man | 15 - man/intel_reg_write.man | 16 - tools/Makefile.sources |8 - tools/intel_iosf_sb_read.c | 153 --- tools/intel_iosf_sb_write.c | 140 -- tools/intel_reg_dumper.c| 3020 --- tools/intel_reg_read.c | 145 --- tools/intel_reg_snapshot.c | 56 - tools/intel_reg_write.c | 60 - tools/intel_vga_read.c | 97 -- tools/intel_vga_write.c | 97 -- 14 files changed, 3858 deletions(-) delete mode 100644 man/intel_reg_dumper.man delete mode 100644 man/intel_reg_read.man delete mode 100644 man/intel_reg_snapshot.man delete mode 100644 man/intel_reg_write.man delete mode 100644 tools/intel_iosf_sb_read.c delete mode 100644 tools/intel_iosf_sb_write.c delete mode 100644 tools/intel_reg_dumper.c delete mode 100644 tools/intel_reg_read.c delete mode 100644 tools/intel_reg_snapshot.c delete mode 100644 tools/intel_reg_write.c delete mode 100644 tools/intel_vga_read.c delete mode 100644 tools/intel_vga_write.c diff --git a/man/Makefile.am b/man/Makefile.am index ee09156c934e..c42a91beb09b 100644 --- a/man/Makefile.am +++ b/man/Makefile.am @@ -10,9 +10,6 @@ appman_PRE = \ intel_infoframes.man\ intel_lid.man \ intel_panel_fitter.man \ - intel_reg_dumper.man\ - intel_reg_read.man \ - intel_reg_write.man \ intel_stepping.man \ intel_upload_blit_large.man \ intel_upload_blit_large_gtt.man \ diff --git a/man/intel_reg_dumper.man b/man/intel_reg_dumper.man deleted file mode 100644 index 89f6b9f96072.. --- a/man/intel_reg_dumper.man +++ /dev/null @@ -1,33 +0,0 @@ -.\ shorthand for double quote that works everywhere. -.ds q \N'34' -.TH intel_reg_dumper __appmansuffix__ __xorgversion__ -.SH NAME -intel_reg_dumper \- Decode a bunch of Intel GPU registers for debugging -.SH SYNOPSIS -.B intel_reg_dumper [ options ] [ file ] -.SH DESCRIPTION -.B intel_reg_dumper -is a tool to read and decode the values of many Intel GPU registers. It is -commonly used in debugging video mode setting issues. If the -.B file -argument is present, the registers will be decoded from the given file -instead of the current registers. Use the -.B intel_reg_snapshot -tool to generate such files. - -When the -.B file -argument is present and the -.B -d -argument is not present, -.B intel_reg_dumper -will assume the file was generated on an Ironlake machine. -.SH OPTIONS -.TP -.B -d id -when a dump file is used, use 'id' as device id (in hex) -.TP -.B -h -prints a help message -.SH SEE ALSO -.BR intel_reg_snapshot(1) diff --git a/man/intel_reg_read.man b/man/intel_reg_read.man deleted file mode 100644 index cc2bf612eb35.. --- a/man/intel_reg_read.man +++ /dev/null @@ -1,15 +0,0 @@ -.\ shorthand for double quote that works everywhere. -.ds q \N'34' -.TH intel_reg_read __appmansuffix__ __xorgversion__ -.SH NAME -intel_reg_read \- Reads an Intel GPU register value -.SH SYNOPSIS -.B intel_reg_read \fIregister\fR -.SH DESCRIPTION -.B intel_reg_read -is a tool to read Intel GPU registers, for use in debugging. The -\fIregister\fR argument is given as hexadecimal. -.SH EXAMPLES -.TP -intel_reg_read 0x61230 -Shows the register value for the first internal panel fitter. diff --git a/man/intel_reg_snapshot.man b/man/intel_reg_snapshot.man deleted file mode 100644 index 1930f613fb26.. --- a/man/intel_reg_snapshot.man +++ /dev/null @@ -1,15 +0,0 @@ -.\ shorthand for double quote that works everywhere. -.ds q \N'34' -.TH intel_reg_snapshot __appmansuffix__ __xorgversion__ -.SH NAME -intel_reg_snapshot \- Take a GPU register snapshot -.SH SYNOPSIS -.B intel_reg_snapshot -.SH DESCRIPTION -.B intel_reg_snapshot -takes a snapshot of the registers of an Intel GPU, and writes it to standard -output. These files can be inspected later with the -.B intel_reg_dumper -tool. -.SH SEE ALSO -.BR intel_reg_dumper(1) diff --git a/man/intel_reg_write.man b/man/intel_reg_write.man deleted file mode 100644 index cb1731c6f04b.. --- a/man/intel_reg_write.man +++ /dev/null @@ -1,16 +0,0 @@ -.\ shorthand for double quote that works everywhere. -.ds q \N'34' -.TH intel_reg_write __appmansuffix__ __xorgversion__ -.SH NAME -intel_reg_write \- Set an Intel GPU register to a value -.SH SYNOPSIS -.B intel_reg_write \fIregister\fR \fIvalue\fR -.SH DESCRIPTION -.B intel_reg_write -is a tool to set Intel GPU
[Intel-gfx] [PATCH v5 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround
In Indirect context w/a batch buffer, WaClearSlmSpaceAtContextSwitch v2: s/PIPE_CONTROL_FLUSH_RO_CACHES/PIPE_CONTROL_FLUSH_L3 (Ville) Signed-off-by: Rafael Barbalho rafael.barba...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/intel_lrc.c | 16 2 files changed, 17 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index d14ad20..7637e64 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -410,6 +410,7 @@ #define DISPLAY_PLANE_A (020) #define DISPLAY_PLANE_B (120) #define GFX_OP_PIPE_CONTROL(len) ((0x329)|(0x327)|(0x224)|(len-2)) +#define PIPE_CONTROL_FLUSH_L3(127) #define PIPE_CONTROL_GLOBAL_GTT_IVB (124) /* gen7+ */ #define PIPE_CONTROL_MMIO_WRITE (123) #define PIPE_CONTROL_STORE_DATA_INDEX(121) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index dff8303..792d559 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1106,6 +1106,7 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring, uint32_t *num_dwords) { uint32_t index; + uint32_t scratch_addr; uint32_t *batch = *wa_ctx_batch; index = offset; @@ -1136,6 +1137,21 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring, wa_ctx_emit(batch, l3sqc4_flush ~GEN8_LQSC_FLUSH_COHERENT_LINES); } + /* WaClearSlmSpaceAtContextSwitch:bdw,chv */ + /* Actual scratch location is at 128 bytes offset */ + scratch_addr = ring-scratch.gtt_offset + 2*CACHELINE_BYTES; + scratch_addr |= PIPE_CONTROL_GLOBAL_GTT; + + wa_ctx_emit(batch, GFX_OP_PIPE_CONTROL(6)); + wa_ctx_emit(batch, (PIPE_CONTROL_FLUSH_L3 | + PIPE_CONTROL_GLOBAL_GTT_IVB | + PIPE_CONTROL_CS_STALL | + PIPE_CONTROL_QW_WRITE)); + wa_ctx_emit(batch, scratch_addr); + wa_ctx_emit(batch, 0); + wa_ctx_emit(batch, 0); + wa_ctx_emit(batch, 0); + /* padding */ while (((unsigned long) (batch + index) % CACHELINE_BYTES) != 0) wa_ctx_emit(batch, MI_NOOP); -- 2.3.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround
In Indirect context w/a batch buffer, +WaFlushCoherentL3CacheLinesAtContextSwitch:bdw v2: Add LRI commands to set/reset bit that invalidates coherent lines, update WA to include programming restrictions and exclude CHV as it is not required (Ville) v3: Avoid unnecessary read when it can be done by reading register once (Chris). Signed-off-by: Rafael Barbalho rafael.barba...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_reg.h | 2 ++ drivers/gpu/drm/i915/intel_lrc.c | 23 +++ 2 files changed, 25 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 84af255..d14ad20 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -426,6 +426,7 @@ #define PIPE_CONTROL_INDIRECT_STATE_DISABLE (19) #define PIPE_CONTROL_NOTIFY (18) #define PIPE_CONTROL_FLUSH_ENABLE(17) /* gen7+ */ +#define PIPE_CONTROL_DC_FLUSH_ENABLE (15) #define PIPE_CONTROL_VF_CACHE_INVALIDATE (14) #define PIPE_CONTROL_CONST_CACHE_INVALIDATE (13) #define PIPE_CONTROL_STATE_CACHE_INVALIDATE (12) @@ -5788,6 +5789,7 @@ enum skl_disp_power_wells { #define GEN8_L3SQCREG4 0xb118 #define GEN8_LQSC_RO_PERF_DIS (127) +#define GEN8_LQSC_FLUSH_COHERENT_LINES(121) /* GEN8 chicken */ #define HDC_CHICKEN0 0x7300 diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 8d5932a..dff8303 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1113,6 +1113,29 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring, /* WaDisableCtxRestoreArbitration:bdw,chv */ wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_DISABLE); + /* WaFlushCoherentL3CacheLinesAtContextSwitch:bdw */ + if (IS_BROADWELL(ring-dev)) { + struct drm_i915_private *dev_priv = to_i915(ring-dev); + uint32_t l3sqc4_flush = (I915_READ(GEN8_L3SQCREG4) | +GEN8_LQSC_FLUSH_COHERENT_LINES); + + wa_ctx_emit(batch, MI_LOAD_REGISTER_IMM(1)); + wa_ctx_emit(batch, GEN8_L3SQCREG4); + wa_ctx_emit(batch, l3sqc4_flush); + + wa_ctx_emit(batch, GFX_OP_PIPE_CONTROL(6)); + wa_ctx_emit(batch, (PIPE_CONTROL_CS_STALL | + PIPE_CONTROL_DC_FLUSH_ENABLE)); + wa_ctx_emit(batch, 0); + wa_ctx_emit(batch, 0); + wa_ctx_emit(batch, 0); + wa_ctx_emit(batch, 0); + + wa_ctx_emit(batch, MI_LOAD_REGISTER_IMM(1)); + wa_ctx_emit(batch, GEN8_L3SQCREG4); + wa_ctx_emit(batch, l3sqc4_flush ~GEN8_LQSC_FLUSH_COHERENT_LINES); + } + /* padding */ while (((unsigned long) (batch + index) % CACHELINE_BYTES) != 0) wa_ctx_emit(batch, MI_NOOP); -- 2.3.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 0/6] Add Per-context WA using WA batch buffers
From Gen8+ we have some workarounds that are applied Per context and they are applied using special batch buffers called as WA batch buffers. HW executes them at specific stages during context save/restore. The patches in this series adds this framework to i915. I did some basic testing on BDW by running glmark2 and didn't see any issues. These WA are mainly required when preemption is enabled. All of the previous comments are addressed in latest revision v5 [v1] http://lists.freedesktop.org/archives/intel-gfx/2015-February/060707.html [v2] http://www.spinics.net/lists/intel-gfx/msg67804.html [v3] In v2, two separate ring_buffer objects were used to load WA instructions and they were part of every context which is not really required. Chris suggested a better approach of adding a page to context itself and using it for this purpose. Since GuC is also planning to do the same it can probably be shared with GuC. But after discussions it is agreed to use an independent page as GuC area might grow in future. Independent page also makes sense because these WA are only initialized once and not changed afterwards so we can share them across all contexts. [v4] Changes in this revision, In the previous version the size of batch buffers are fixed during initialization which is not a good idea. This is corrected by updating the functions that load WA to return the number of dwords written and caller updates the size once all WA are initialized. The functions now also accept offset field which allows us to have multiple batches so that required batch can be selected based on a criteria. This is not a requirement at this point but could be useful in future. WaFlushCoherentL3CacheLinesAtContextSwitch implementation was incomplete which is fixed and programming restrictions correctly applied. http://www.spinics.net/lists/intel-gfx/msg68947.html [v5] No major changes in this revision but switched to new revision as changes affected all patches. Introduced macro to add commands which also checks for page overflow. Moved code around to simplify, indentation fixes and other improvements suggested by Chris. Since we don't know the number of WA applied upfront, Chris suggested a two-pass approach but that brings additional complexity which is not necessary. Discussed with Chris and agreed upon on single page setup as simpler code wins and also single page is sufficient for our requirement. Please see the patches for more details. Arun Siluvery (6): drm/i915/gen8: Add infrastructure to initialize WA batch buffers drm/i915/gen8: Re-order init pipe_control in lrc mode drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround drivers/gpu/drm/i915/i915_reg.h | 32 +++- drivers/gpu/drm/i915/intel_lrc.c| 298 +++- drivers/gpu/drm/i915/intel_ringbuffer.h | 18 ++ 3 files changed, 341 insertions(+), 7 deletions(-) -- 2.3.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
Some of the WA are to be applied during context save but before restore and some at the end of context save/restore but before executing the instructions in the ring, WA batch buffers are created for this purpose and these WA cannot be applied using normal means. Each context has two registers to load the offsets of these batch buffers. If they are non-zero, HW understands that it need to execute these batches. v1: In this version two separate ring_buffer objects were used to load WA instructions for indirect and per context batch buffers and they were part of every context. v2: Chris suggested to include additional page in context and use it to load these WA instead of creating separate objects. This will simplify lot of things as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC is planning to use a similar setup to share data between GuC and driver and WA batch buffers can probably share that page. However after discussions with Dave who is implementing GuC changes, he suggested to use an independent page for the reasons - GuC area might grow and these WA are initialized only once and are not changed afterwards so we can share them share across all contexts. The page is updated with WA during render ring init. This has an advantage of not adding more special cases to default_context. We don't know upfront the number of WA we will applying using these batch buffers. For this reason the size was fixed earlier but it is not a good idea. To fix this, the functions that load instructions are modified to report the no of commands inserted and the size is now calculated after the batch is updated. A macro is introduced to add commands to these batch buffers which also checks for overflow and returns error. We have a full page dedicated for these WA so that should be sufficient for good number of WA, anything more means we have major issues. The list for Gen8 is small, same for Gen9 also, maybe few more gets added going forward but not close to filling entire page. Chris suggested a two-pass approach but we agreed to go with single page setup as it is a one-off routine and simpler code wins. Moved around functions to simplify it further, add comments. One additional option is offset field which is helpful if we would like to have multiple batches at different offsets within the page and select them based on some criteria. This is not a requirement at this point but could help in future (Dave). (Thanks to Chris, Dave and Thomas for their reviews and inputs) Signed-off-by: Rafael Barbalho rafael.barba...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/intel_lrc.c| 204 +++- drivers/gpu/drm/i915/intel_ringbuffer.h | 18 +++ 2 files changed, 218 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 0413b8f..ad0b189 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -211,6 +211,7 @@ enum { FAULT_AND_CONTINUE /* Unsupported */ }; #define GEN8_CTX_ID_SHIFT 32 +#define CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT 0x17 static int intel_lr_context_pin(struct intel_engine_cs *ring, struct intel_context *ctx); @@ -1077,6 +1078,173 @@ static int intel_logical_ring_workarounds_emit(struct intel_engine_cs *ring, return 0; } +#define wa_ctx_emit(batch, cmd) { \ + if (WARN_ON(index = (PAGE_SIZE / sizeof(uint32_t { \ + return -ENOSPC; \ + } \ + batch[index++] = (cmd); \ + } + +/** + * gen8_init_indirectctx_bb() - initialize indirect ctx batch with WA + * + * @ring: only applicable for RCS + * @wa_ctx_batch: page in which WA are loaded + * @offset: This is for future use in case if we would like to have multiple + * batches at different offsets and select them based on a criteria. + * @num_dwords: The number of WA applied are known at the beginning, it returns + * the no of DWORDS written. This batch does not contain MI_BATCH_BUFFER_END + * so it adds padding to make it cacheline aligned. MI_BATCH_BUFFER_END will be + * added to perctx batch and both of them together makes a complete batch buffer. + * + * Return: non-zero if we exceed the PAGE_SIZE limit. + */ + +static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring, + uint32_t **wa_ctx_batch, + uint32_t offset, + uint32_t *num_dwords) +{ + uint32_t index; + uint32_t *batch = *wa_ctx_batch; + + index = offset; + + /* FIXME: fill one cacheline with NOOPs. +* Replace these instructions with WA +*/ + while (index (offset + 16)) + wa_ctx_emit(batch, MI_NOOP);
[Intel-gfx] [PATCH v5 3/6] drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround
In Indirect and Per context w/a batch buffer, +WaDisableCtxRestoreArbitration Signed-off-by: Rafael Barbalho rafael.barba...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/intel_lrc.c | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 1d31eb5..8d5932a 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1110,10 +1110,11 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring, index = offset; - /* FIXME: fill one cacheline with NOOPs. -* Replace these instructions with WA -*/ - while (index (offset + 16)) + /* WaDisableCtxRestoreArbitration:bdw,chv */ + wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_DISABLE); + + /* padding */ +while (((unsigned long) (batch + index) % CACHELINE_BYTES) != 0) wa_ctx_emit(batch, MI_NOOP); /* @@ -1143,13 +1144,10 @@ static int gen8_init_perctx_bb(struct intel_engine_cs *ring, index = offset; - /* FIXME: fill one cacheline with NOOPs. -* Replace these instructions with WA -*/ - while (index (offset + 16)) - wa_ctx_emit(batch, MI_NOOP); + /* WaDisableCtxRestoreArbitration:bdw,chv */ + wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_ENABLE); - batch[index - 1] = MI_BATCH_BUFFER_END; + wa_ctx_emit(batch, MI_BATCH_BUFFER_END); *num_dwords = index - offset; -- 2.3.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround
In Per context w/a batch buffer, WaRsRestoreWithPerCtxtBb v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions so as to not break any future users of existing definitions (Michel) v3: Length defined in current definitions of LRM, LRR instructions was specified as 0. It seems it is common convention for instructions whose length vary between platforms. This is not an issue so far because they are not used anywhere except command parser; now that we use in this patch update them with correct length and also move them out of command parser placeholder to appropriate place. remove unnecessary padding and follow the WA programming sequence exactly as mentioned in spec which is essential for this WA (Dave). Signed-off-by: Rafael Barbalho rafael.barba...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_reg.h | 29 +++-- drivers/gpu/drm/i915/intel_lrc.c | 54 2 files changed, 81 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 7637e64..208620d 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -347,6 +347,31 @@ #define MI_INVALIDATE_BSD(17) #define MI_FLUSH_DW_USE_GTT (12) #define MI_FLUSH_DW_USE_PPGTT(02) +#define MI_LOAD_REGISTER_MEMMI_INSTR(0x29, 1) +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2) +#define MI_LRM_USE_GLOBAL_GTT (122) +#define MI_LRM_ASYNC_MODE_ENABLE (121) +#define MI_LOAD_REGISTER_REGMI_INSTR(0x2A, 1) +#define MI_ATOMIC(len) MI_INSTR(0x2F, (len-2)) +#define MI_ATOMIC_MEMORY_TYPE_GGTT (122) +#define MI_ATOMIC_INLINE_DATA(118) +#define MI_ATOMIC_CS_STALL (117) +#define MI_ATOMIC_RETURN_DATA_CTL(116) +#define MI_ATOMIC_OP_MASK(op) ((op) 8) +#define MI_ATOMIC_AND MI_ATOMIC_OP_MASK(0x01) +#define MI_ATOMIC_OR MI_ATOMIC_OP_MASK(0x02) +#define MI_ATOMIC_XOR MI_ATOMIC_OP_MASK(0x03) +#define MI_ATOMIC_MOVE MI_ATOMIC_OP_MASK(0x04) +#define MI_ATOMIC_INC MI_ATOMIC_OP_MASK(0x05) +#define MI_ATOMIC_DEC MI_ATOMIC_OP_MASK(0x06) +#define MI_ATOMIC_ADD MI_ATOMIC_OP_MASK(0x07) +#define MI_ATOMIC_SUB MI_ATOMIC_OP_MASK(0x08) +#define MI_ATOMIC_RSUB MI_ATOMIC_OP_MASK(0x09) +#define MI_ATOMIC_IMAX MI_ATOMIC_OP_MASK(0x0A) +#define MI_ATOMIC_IMIN MI_ATOMIC_OP_MASK(0x0B) +#define MI_ATOMIC_UMAX MI_ATOMIC_OP_MASK(0x0C) +#define MI_ATOMIC_UMIN MI_ATOMIC_OP_MASK(0x0D) + #define MI_BATCH_BUFFERMI_INSTR(0x30, 1) #define MI_BATCH_NON_SECURE (1) /* for snb/ivb/vlv this also means batch in ppgtt when ppgtt is enabled. */ @@ -451,8 +476,6 @@ #define MI_CLFLUSH MI_INSTR(0x27, 0) #define MI_REPORT_PERF_COUNTMI_INSTR(0x28, 0) #define MI_REPORT_PERF_COUNT_GGTT (10) -#define MI_LOAD_REGISTER_MEMMI_INSTR(0x29, 0) -#define MI_LOAD_REGISTER_REGMI_INSTR(0x2A, 0) #define MI_RS_STORE_DATA_IMMMI_INSTR(0x2B, 0) #define MI_LOAD_URB_MEM MI_INSTR(0x2C, 0) #define MI_STORE_URB_MEMMI_INSTR(0x2D, 0) @@ -1799,6 +1822,8 @@ enum skl_disp_power_wells { #define GEN8_RC_SEMA_IDLE_MSG_DISABLE(1 12) #define GEN8_FF_DOP_CLOCK_GATE_DISABLE (110) +#define GEN8_RS_PREEMPT_STATUS 0x215C + /* Fuse readout registers for GT */ #define CHV_FUSE_GT(VLV_DISPLAY_BASE + 0x2168) #define CHV_FGT_DISABLE_SS0 (1 10) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 792d559..19a3460 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1179,13 +1179,67 @@ static int gen8_init_perctx_bb(struct intel_engine_cs *ring, uint32_t *num_dwords) { uint32_t index; + uint32_t scratch_addr; uint32_t *batch = *wa_ctx_batch; index = offset; + /* Actual scratch location is at 128 bytes offset */ + scratch_addr = ring-scratch.gtt_offset + 2*CACHELINE_BYTES; + scratch_addr |= PIPE_CONTROL_GLOBAL_GTT; + /* WaDisableCtxRestoreArbitration:bdw,chv */ wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_ENABLE); + /* +* As per Bspec, to workaround a known HW issue, SW must perform the +* below programming sequence prior to programming MI_BATCH_BUFFER_END. +* +* This is only applicable for Gen8. +*/ + + /* WaRsRestoreWithPerCtxtBb:bdw,chv */ + wa_ctx_emit(batch, MI_LOAD_REGISTER_IMM(1)); + wa_ctx_emit(batch, INSTPM); + wa_ctx_emit(batch, _MASKED_BIT_DISABLE(INSTPM_FORCE_ORDERING)); + + wa_ctx_emit(batch, (MI_ATOMIC(5) | + MI_ATOMIC_MEMORY_TYPE_GGTT | + MI_ATOMIC_INLINE_DATA | + MI_ATOMIC_CS_STALL | +
[Intel-gfx] [PATCH v5 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode
Some of the WA applied using WA batch buffers perform writes to scratch page. In the current flow WA are initialized before scratch obj is allocated. This patch reorders intel_init_pipe_control() to have a valid scratch obj before we initialize WA. Signed-off-by: Michel Thierry michel.thie...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/intel_lrc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index ad0b189..1d31eb5 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1641,7 +1641,8 @@ static int logical_render_ring_init(struct drm_device *dev) ring-emit_bb_start = gen8_emit_bb_start; ring-dev = dev; - ret = logical_ring_init(dev, ring); + + ret = intel_init_pipe_control(ring); if (ret) return ret; @@ -1653,7 +1654,7 @@ static int logical_render_ring_init(struct drm_device *dev) } } - ret = intel_init_pipe_control(ring); + ret = logical_ring_init(dev, ring); if (ret) { if (ring-wa_ctx.obj) lrc_destroy_wa_ctx_obj(ring); -- 2.3.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Reset request handling for gen8+
On Thu, Jun 18, 2015 at 01:22:36PM +0300, Mika Kuoppala wrote: Chris Wilson ch...@chris-wilson.co.uk writes: On Thu, Jun 18, 2015 at 12:51:40PM +0300, Mika Kuoppala wrote: In order for gen8+ hardware to guarantee that no context switch takes place during engine reset and that current context is properly saved, the driver needs to notify and query hw before commencing with reset. There are gpu hangs where the engine gets so stuck that it never will report to be ready for reset. We could proceed with reset anyway, but with some hangs with skl, the forced gpu reset will result in a system hang. By inspecting the unreadiness for reset seems to correlate with the probable system hang. We will only proceed with reset if all engines report that they are ready for reset. If root cause for system hang is found and can be worked around with another means, we can reconsider if we can reinstate full reset for unreadiness case. v2: -EIO, Recovery, gen8 (Chris, Tomas, Daniel) v3: updated commit msg v4: timeout_ms, simpler error path (Chris) References: https://bugs.freedesktop.org/show_bug.cgi?id=89959 References: https://bugs.freedesktop.org/show_bug.cgi?id=90854 Testcase: igt/gem_concurrent_blit --r prw-blt-overwrite-source-read-rcs-forked Testcase: igt/gem_concurrent_blit --r gtt-blt-overwrite-source-read-rcs-forked Is this the new format for subtests? No. It is me cutpasting from scripts. Daniel could you please fix while merging. Done and queued for -next, thanks for the patch. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Factor out p2 divider selection for pre-ilk platforms
On to, 2015-06-18 at 13:47 +0300, ville.syrj...@linux.intel.com wrote: From: Ville Syrjälä ville.syrj...@linux.intel.com The same dpll p2 divider selection is repeated three times in the gen2-4 .find_dpll() functions. Factor it out. Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com Looks ok to me: Reviewed-by: Imre Deak imre.d...@intel.com --- drivers/gpu/drm/i915/intel_display.c | 78 ++-- 1 file changed, 30 insertions(+), 48 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 2fa81ed..2cc8ae7 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -643,16 +643,12 @@ static bool intel_PLL_is_valid(struct drm_device *dev, return true; } -static bool -i9xx_find_best_dpll(const intel_limit_t *limit, - struct intel_crtc_state *crtc_state, - int target, int refclk, intel_clock_t *match_clock, - intel_clock_t *best_clock) +static int +i9xx_select_p2_div(const intel_limit_t *limit, +const struct intel_crtc_state *crtc_state, +int target) { - struct intel_crtc *crtc = to_intel_crtc(crtc_state-base.crtc); - struct drm_device *dev = crtc-base.dev; - intel_clock_t clock; - int err = target; + struct drm_device *dev = crtc_state-base.crtc-dev; if (intel_pipe_will_have_type(crtc_state, INTEL_OUTPUT_LVDS)) { /* @@ -661,18 +657,31 @@ i9xx_find_best_dpll(const intel_limit_t *limit, * single/dual channel state, if we even can. */ if (intel_is_dual_link_lvds(dev)) - clock.p2 = limit-p2.p2_fast; + return limit-p2.p2_fast; else - clock.p2 = limit-p2.p2_slow; + return limit-p2.p2_slow; } else { if (target limit-p2.dot_limit) - clock.p2 = limit-p2.p2_slow; + return limit-p2.p2_slow; else - clock.p2 = limit-p2.p2_fast; + return limit-p2.p2_fast; } +} + +static bool +i9xx_find_best_dpll(const intel_limit_t *limit, + struct intel_crtc_state *crtc_state, + int target, int refclk, intel_clock_t *match_clock, + intel_clock_t *best_clock) +{ + struct drm_device *dev = crtc_state-base.crtc-dev; + intel_clock_t clock; + int err = target; memset(best_clock, 0, sizeof(*best_clock)); + clock.p2 = i9xx_select_p2_div(limit, crtc_state, target); + for (clock.m1 = limit-m1.min; clock.m1 = limit-m1.max; clock.m1++) { for (clock.m2 = limit-m2.min; @@ -712,30 +721,14 @@ pnv_find_best_dpll(const intel_limit_t *limit, int target, int refclk, intel_clock_t *match_clock, intel_clock_t *best_clock) { - struct intel_crtc *crtc = to_intel_crtc(crtc_state-base.crtc); - struct drm_device *dev = crtc-base.dev; + struct drm_device *dev = crtc_state-base.crtc-dev; intel_clock_t clock; int err = target; - if (intel_pipe_will_have_type(crtc_state, INTEL_OUTPUT_LVDS)) { - /* - * For LVDS just rely on its current settings for dual-channel. - * We haven't figured out how to reliably set up different - * single/dual channel state, if we even can. - */ - if (intel_is_dual_link_lvds(dev)) - clock.p2 = limit-p2.p2_fast; - else - clock.p2 = limit-p2.p2_slow; - } else { - if (target limit-p2.dot_limit) - clock.p2 = limit-p2.p2_slow; - else - clock.p2 = limit-p2.p2_fast; - } - memset(best_clock, 0, sizeof(*best_clock)); + clock.p2 = i9xx_select_p2_div(limit, crtc_state, target); + for (clock.m1 = limit-m1.min; clock.m1 = limit-m1.max; clock.m1++) { for (clock.m2 = limit-m2.min; @@ -773,28 +766,17 @@ g4x_find_best_dpll(const intel_limit_t *limit, int target, int refclk, intel_clock_t *match_clock, intel_clock_t *best_clock) { - struct intel_crtc *crtc = to_intel_crtc(crtc_state-base.crtc); - struct drm_device *dev = crtc-base.dev; + struct drm_device *dev = crtc_state-base.crtc-dev; intel_clock_t clock; int max_n; - bool found; + bool found = false; /* approximately equals target * 0.00585 */ int err_most = (target 8) + (target 9); - found = false; - - if (intel_pipe_will_have_type(crtc_state, INTEL_OUTPUT_LVDS)) { - if (intel_is_dual_link_lvds(dev)) - clock.p2 = limit-p2.p2_fast; -
Re: [Intel-gfx] [PATCH v5] drm/i915 : Added Programming of the MOCS
On Thu, Jun 18, 2015 at 04:25:47PM +0100, Damien Lespiau wrote: On Thu, Jun 18, 2015 at 03:45:44PM +0100, Antoine, Peter wrote: So, intializing the other (non-render) MOCS in gen8_init_rcs_context() isn't the most logical thing to do I'm afraid. What happens if we suddenly decide that we don't want to fully initialize the default context at startup but initialize each ring on-demand for that context as well? We can end up in a situation where we use the blitter first and we wouldn't have the blitter MOCS initialized. In that sense, that code makes an assumption about how we do things in a completely different part of the driver and that's always a potential source of bugs. Yes, but this is the same with the golden context and the workarounds (as I understand it) so all this code would have to be moved. Ah, but the workarounds in that function are only for registers in the render context, not other rings/engine. Yes, but it just so happens that we initialise the default context before userspace so that we know that context is pristine before sending batches to the GPU. This is the reason why I think it is important to mark this function as being executed at that stage, so that all parties can be sure that the execution is before real use of the GPU and so we can use the RCS to initialise the other rings. At the moment, I am happy with baking that assumption into the code, we can readdress it later if there are non-RCS operations that must be performed at context init and conflict with the RCS programming. If you can think of a suitable comment to forewarn us in future about potential conflicts in adding xcs-init_context(), be my guest. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support
On Thu, Jun 18, 2015 at 01:11:34PM +0100, Dave Gordon wrote: On 17/06/15 13:05, Daniel Vetter wrote: On Mon, Jun 15, 2015 at 07:36:20PM +0100, Dave Gordon wrote: Current devices may contain one or more programmable microcontrollers that need to have a firmware image (aka binary blob) loaded from an external medium and transferred to the device's memory. This file provides generic support functions for doing this; they can then be used by each uC-specific loader, thus reducing code duplication and testing effort. Signed-off-by: Dave Gordon david.s.gor...@intel.com Signed-off-by: Alex Dai yu@intel.com Given that I'm just shredding the synchronization used by the dmc loader I'm not convinced this is a good idea. Abstraction has cost, and a bit of copy-paste for similar sounding but slightly different things doesn't sound awful to me. And the critical bit in all the firmware loading I've seen thus far is in synchronizing the loading with other operations, hiding that isn't a good idea. Worse if we enforce stuff like requiring dev-struct_mutex. -Daniel It's precisely because it's in some sense trivial-but-tricky that we should write it once, get it right, and use it everywhere. Copypaste /does/ sound awful; I've seen how the code this was derived from had already been cloned into three flavours, all different and all wrong. It's a very simple abstraction: one early call to kick things off as early as possible, no locking required. One late call with the struct_mutex held to complete the synchronisation and actually do the work, thus guaranteeing that the transfer to the target uC is done in a controlled fashion, at a time of the caller's choice, and by the driver's mainline thread, NOT by an asynchronous thread racing with other activity (which was one of the things wrong with the original version). Yeah I've seen the origins of this in the display code, and that code gets the syncing wrong. The only thing that one has do to is grab a runtime pm reference for the appropriate power well to prevent dc5 entry, and release it when the firmware is loaded and initialized. Which means any kind of firmware loader which requires/uses dev-struct_mutex get stuff wrong and is not appropriate everywhere. We should convert the DMC loader to use this too, so there need be only one bit of code in the whole driver that needs to understand how to use completions to get correct handover from a free-running no-locks-held thread to the properly disciplined environment of driver mainline for purposes of programming the h/w. Nack on using this for dmc, since I want them to convert it to the above synchronization, since that's how all the other async power initialization is done. Guc is different since we really must have it ready for execbuf, and for that usecase a completion at drm_open time sounds like the right thing. As a rule of thumb for refactoring and share infastructure we use the following recipe in drm: - first driver implements things as straightforward as possible - 2nd user copypastes - 3rd one has the duty to figure out whether some refactoring is in order or not. Imo that approach leads a really good balance between avoiding overengineering and having maintainable code. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 04/10] drm: Add Gamma correction structure
On 14 June 2015 at 10:02, Sharma, Shashank shashank.sha...@intel.com wrote: Hi, Emil Velikov The reason behind a zero sized array is that we want to use the same variable for various color correction possible across various driver . Due to current blob implementation, it doesn’t look very efficient to have another pointer in the structure, so we are left with this option only. Can you elaborate (to suggest any reading material) about those inefficiencies ? I guess as long as we are using gcc (which is for all Linux distributions), we are good. The size of the zero sized array will be zero, so no alignment errors as such. Note that most of the DRM subsystem code is dual-licensed. As such it is used in other OSes - Solaris, *BSD, not to mention the work (in progress) about using clang/LLVM to build the kernel. In the former case not everyone uses GCC. Thanks Emil ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 2/5] drm/i915/bxt: add missing DDI PLL registers to the state checking
Although we have a fixed setting for the PLL9 and EBB4 registers, it still makes sense to check them together with the rest of PLL registers. While at it also remove a redundant comment about 10 bit clock enabling. Signed-off-by: Imre Deak imre.d...@intel.com --- drivers/gpu/drm/i915/i915_drv.h | 3 ++- drivers/gpu/drm/i915/i915_reg.h | 3 ++- drivers/gpu/drm/i915/intel_ddi.c | 16 +--- drivers/gpu/drm/i915/intel_display.c | 6 -- 4 files changed, 21 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 491ef0c..bf235ff 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -366,7 +366,8 @@ struct intel_dpll_hw_state { uint32_t cfgcr1, cfgcr2; /* bxt */ - uint32_t ebb0, pll0, pll1, pll2, pll3, pll6, pll8, pll10, pcsdw12; + uint32_t ebb0, ebb4, pll0, pll1, pll2, pll3, pll6, pll8, pll9, pll10, +pcsdw12; }; struct intel_shared_dpll_config { diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 4bbc85a..bba0691 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -1207,7 +1207,8 @@ enum skl_disp_power_wells { /* PORT_PLL_8_A */ #define PORT_PLL_TARGET_CNT_MASK 0x3FF /* PORT_PLL_9_A */ -#define PORT_PLL_LOCK_THRESHOLD_MASK 0xe +#define PORT_PLL_LOCK_THRESHOLD_SHIFT 1 +#define PORT_PLL_LOCK_THRESHOLD_MASK (0x7 PORT_PLL_LOCK_THRESHOLD_SHIFT) /* PORT_PLL_10_A */ #define PORT_PLL_DCO_AMP_OVR_EN_H (127) #define PORT_PLL_DCO_AMP_MASK 0x3c00 diff --git a/drivers/gpu/drm/i915/intel_ddi.c b/drivers/gpu/drm/i915/intel_ddi.c index bdc5677..ca970ba 100644 --- a/drivers/gpu/drm/i915/intel_ddi.c +++ b/drivers/gpu/drm/i915/intel_ddi.c @@ -1476,11 +1476,15 @@ bxt_ddi_pll_select(struct intel_crtc *intel_crtc, crtc_state-dpll_hw_state.pll8 = targ_cnt; + crtc_state-dpll_hw_state.pll9 = 5 PORT_PLL_LOCK_THRESHOLD_SHIFT; + if (dcoampovr_en_h) crtc_state-dpll_hw_state.pll10 = PORT_PLL_DCO_AMP_OVR_EN_H; crtc_state-dpll_hw_state.pll10 |= PORT_PLL_DCO_AMP(dco_amp); + crtc_state-dpll_hw_state.ebb4 = PORT_PLL_10BIT_CLK_ENABLE; + crtc_state-dpll_hw_state.pcsdw12 = LANESTAGGER_STRAP_OVRD | lanestagger; @@ -2414,7 +2418,7 @@ static void bxt_ddi_pll_enable(struct drm_i915_private *dev_priv, temp = I915_READ(BXT_PORT_PLL(port, 9)); temp = ~PORT_PLL_LOCK_THRESHOLD_MASK; - temp |= (5 1); + temp |= pll-config.hw_state.pll9; I915_WRITE(BXT_PORT_PLL(port, 9), temp); temp = I915_READ(BXT_PORT_PLL(port, 10)); @@ -2427,8 +2431,8 @@ static void bxt_ddi_pll_enable(struct drm_i915_private *dev_priv, temp = I915_READ(BXT_PORT_PLL_EBB_4(port)); temp |= PORT_PLL_RECALIBRATE; I915_WRITE(BXT_PORT_PLL_EBB_4(port), temp); - /* Enable 10 bit clock */ - temp |= PORT_PLL_10BIT_CLK_ENABLE; + temp = ~PORT_PLL_10BIT_CLK_ENABLE; + temp |= pll-config.hw_state.ebb4; I915_WRITE(BXT_PORT_PLL_EBB_4(port), temp); /* Enable PLL */ @@ -2481,6 +2485,9 @@ static bool bxt_ddi_pll_get_hw_state(struct drm_i915_private *dev_priv, hw_state-ebb0 = I915_READ(BXT_PORT_PLL_EBB_0(port)); hw_state-ebb0 = PORT_PLL_P1_MASK | PORT_PLL_P2_MASK; + hw_state-ebb4 = I915_READ(BXT_PORT_PLL_EBB_4(port)); + hw_state-ebb4 = PORT_PLL_10BIT_CLK_ENABLE; + hw_state-pll0 = I915_READ(BXT_PORT_PLL(port, 0)); hw_state-pll0 = PORT_PLL_M2_MASK; @@ -2501,6 +2508,9 @@ static bool bxt_ddi_pll_get_hw_state(struct drm_i915_private *dev_priv, hw_state-pll8 = I915_READ(BXT_PORT_PLL(port, 8)); hw_state-pll8 = PORT_PLL_TARGET_CNT_MASK; + hw_state-pll9 = I915_READ(BXT_PORT_PLL(port, 9)); + hw_state-pll9 = PORT_PLL_LOCK_THRESHOLD_MASK; + hw_state-pll10 = I915_READ(BXT_PORT_PLL(port, 10)); hw_state-pll10 = PORT_PLL_DCO_AMP_OVR_EN_H | PORT_PLL_DCO_AMP_MASK; diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 9149410..6f79680 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -11905,17 +11905,19 @@ static void intel_dump_pipe_config(struct intel_crtc *crtc, DRM_DEBUG_KMS(double wide: %i\n, pipe_config-double_wide); if (IS_BROXTON(dev)) { - DRM_DEBUG_KMS(ddi_pll_sel: %u; dpll_hw_state: ebb0: 0x%x, + DRM_DEBUG_KMS(ddi_pll_sel: %u; dpll_hw_state: ebb0: 0x%x, ebb4: 0x%x, pll0: 0x%x, pll1: 0x%x, pll2: 0x%x, pll3: 0x%x, - pll6: 0x%x, pll8: 0x%x, pcsdw12: 0x%x\n, + pll6: 0x%x, pll8: 0x%x, pll9: 0x%x, pcsdw12: 0x%x\n, pipe_config-ddi_pll_sel,
[Intel-gfx] [PATCH 5/5] drm/i915/bxt: add DDI port HW readout support
Add support for reading out the HW state for DDI ports. Since the actual programming is very similar to the CHV/VLV DPIO PLL programming we can reuse much of the logic from there. This fixes the state checker failures I saw on my BXT with HDMI output. Signed-off-by: Imre Deak imre.d...@intel.com --- drivers/gpu/drm/i915/i915_reg.h | 15 +-- drivers/gpu/drm/i915/intel_ddi.c | 22 -- 2 files changed, 29 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index bba0691..fcf6ad5 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -1169,10 +1169,12 @@ enum skl_disp_power_wells { #define _PORT_PLL_EBB_0_A 0x162034 #define _PORT_PLL_EBB_0_B 0x6C034 #define _PORT_PLL_EBB_0_C 0x6C340 -#define PORT_PLL_P1_MASK (0x07 13) -#define PORT_PLL_P1(x) ((x) 13) -#define PORT_PLL_P2_MASK (0x1f 8) -#define PORT_PLL_P2(x) ((x) 8) +#define PORT_PLL_P1_SHIFT13 +#define PORT_PLL_P1_MASK (0x07 PORT_PLL_P1_SHIFT) +#define PORT_PLL_P1(x) ((x) PORT_PLL_P1_SHIFT) +#define PORT_PLL_P2_SHIFT8 +#define PORT_PLL_P2_MASK (0x1f PORT_PLL_P2_SHIFT) +#define PORT_PLL_P2(x) ((x) PORT_PLL_P2_SHIFT) #define BXT_PORT_PLL_EBB_0(port) _PORT3(port, _PORT_PLL_EBB_0_A, \ _PORT_PLL_EBB_0_B, \ _PORT_PLL_EBB_0_C) @@ -1192,8 +1194,9 @@ enum skl_disp_power_wells { /* PORT_PLL_0_A */ #define PORT_PLL_M2_MASK 0xFF /* PORT_PLL_1_A */ -#define PORT_PLL_N_MASK (0x0F 8) -#define PORT_PLL_N(x)((x) 8) +#define PORT_PLL_N_SHIFT 8 +#define PORT_PLL_N_MASK (0x0F PORT_PLL_N_SHIFT) +#define PORT_PLL_N(x)((x) PORT_PLL_N_SHIFT) /* PORT_PLL_2_A */ #define PORT_PLL_M2_FRAC_MASK0x3F /* PORT_PLL_3_A */ diff --git a/drivers/gpu/drm/i915/intel_ddi.c b/drivers/gpu/drm/i915/intel_ddi.c index ca970ba..6859068 100644 --- a/drivers/gpu/drm/i915/intel_ddi.c +++ b/drivers/gpu/drm/i915/intel_ddi.c @@ -971,8 +971,26 @@ static void hsw_ddi_clock_get(struct intel_encoder *encoder, static int bxt_calc_pll_link(struct drm_i915_private *dev_priv, enum intel_dpll_id dpll) { - /* FIXME formula not available in bspec */ - return 0; + struct intel_shared_dpll *pll; + struct intel_dpll_hw_state *state; + intel_clock_t clock; + + /* For DDI ports we always use a shared PLL. */ + if (WARN_ON(dpll == DPLL_ID_PRIVATE)) + return 0; + + pll = dev_priv-shared_dplls[dpll]; + state = pll-config.hw_state; + + clock.m1 = 2; + clock.m2 = (state-pll0 PORT_PLL_M2_MASK) 22; + if (state-pll3 PORT_PLL_M2_FRAC_ENABLE) + clock.m2 |= state-pll2 PORT_PLL_M2_FRAC_MASK; + clock.n = (state-pll1 PORT_PLL_N_MASK) PORT_PLL_N_SHIFT; + clock.p1 = (state-ebb0 PORT_PLL_P1_MASK) PORT_PLL_P1_SHIFT; + clock.p2 = (state-ebb0 PORT_PLL_P2_MASK) PORT_PLL_P2_SHIFT; + + return vlv_calc_port_clock(10, clock); } static void bxt_ddi_clock_get(struct intel_encoder *encoder, -- 2.1.4 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 1/5] drm/i915/bxt: mask off the DPLL state checker bits we don't program
For the purpose of state checking we only care about the DPLL HW flags that we actually program, so mask off the ones that we don't. This fixes one set of DPLL state check failures. Signed-off-by: Imre Deak imre.d...@intel.com --- drivers/gpu/drm/i915/intel_ddi.c | 20 1 file changed, 20 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_ddi.c b/drivers/gpu/drm/i915/intel_ddi.c index 9ae297a..bdc5677 100644 --- a/drivers/gpu/drm/i915/intel_ddi.c +++ b/drivers/gpu/drm/i915/intel_ddi.c @@ -2479,13 +2479,32 @@ static bool bxt_ddi_pll_get_hw_state(struct drm_i915_private *dev_priv, return false; hw_state-ebb0 = I915_READ(BXT_PORT_PLL_EBB_0(port)); + hw_state-ebb0 = PORT_PLL_P1_MASK | PORT_PLL_P2_MASK; + hw_state-pll0 = I915_READ(BXT_PORT_PLL(port, 0)); + hw_state-pll0 = PORT_PLL_M2_MASK; + hw_state-pll1 = I915_READ(BXT_PORT_PLL(port, 1)); + hw_state-pll1 = PORT_PLL_N_MASK; + hw_state-pll2 = I915_READ(BXT_PORT_PLL(port, 2)); + hw_state-pll2 = PORT_PLL_M2_FRAC_MASK; + hw_state-pll3 = I915_READ(BXT_PORT_PLL(port, 3)); + hw_state-pll3 = PORT_PLL_M2_FRAC_ENABLE; + hw_state-pll6 = I915_READ(BXT_PORT_PLL(port, 6)); + hw_state-pll6 = PORT_PLL_PROP_COEFF_MASK | + PORT_PLL_INT_COEFF_MASK | + PORT_PLL_GAIN_CTL_MASK; + hw_state-pll8 = I915_READ(BXT_PORT_PLL(port, 8)); + hw_state-pll8 = PORT_PLL_TARGET_CNT_MASK; + hw_state-pll10 = I915_READ(BXT_PORT_PLL(port, 10)); + hw_state-pll10 = PORT_PLL_DCO_AMP_OVR_EN_H | + PORT_PLL_DCO_AMP_MASK; + /* * While we write to the group register to program all lanes at once we * can read only lane registers. We configure all lanes the same way, so @@ -2496,6 +2515,7 @@ static bool bxt_ddi_pll_get_hw_state(struct drm_i915_private *dev_priv, DRM_DEBUG_DRIVER(lane stagger config different for lane 01 (%08x) and 23 (%08x)\n, hw_state-pcsdw12, I915_READ(BXT_PORT_PCS_DW12_LN23(port))); + hw_state-pcsdw12 = LANE_STAGGER_MASK | LANESTAGGER_STRAP_OVRD; return true; } -- 2.1.4 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 4/5] drm/i915/vlv: factor out vlv_calc_port_clock
This functionality will be needed by the next patch adding HW readout support for DDI ports on BXT, so factor it out. No functional change. Signed-off-by: Imre Deak imre.d...@intel.com --- drivers/gpu/drm/i915/intel_display.c | 18 ++ drivers/gpu/drm/i915/intel_drv.h | 2 ++ 2 files changed, 12 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 0e5c613..6cf2a15 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -7993,6 +7993,14 @@ static void i9xx_get_pfit_config(struct intel_crtc *crtc, I915_READ(LVDS) LVDS_BORDER_ENABLE; } +int vlv_calc_port_clock(int refclk, intel_clock_t *pll_clock) +{ + chv_clock(refclk, pll_clock); + + /* clock.dot is the fast clock */ + return pll_clock-dot / 5; +} + static void vlv_crtc_clock_get(struct intel_crtc *crtc, struct intel_crtc_state *pipe_config) { @@ -8017,10 +8025,7 @@ static void vlv_crtc_clock_get(struct intel_crtc *crtc, clock.p1 = (mdiv DPIO_P1_SHIFT) 7; clock.p2 = (mdiv DPIO_P2_SHIFT) 0x1f; - vlv_clock(refclk, clock); - - /* clock.dot is the fast clock */ - pipe_config-port_clock = clock.dot / 5; + pipe_config-port_clock = vlv_calc_port_clock(refclk, clock); } static void @@ -8116,10 +8121,7 @@ static void chv_crtc_clock_get(struct intel_crtc *crtc, clock.p1 = (cmn_dw13 DPIO_CHV_P1_DIV_SHIFT) 0x7; clock.p2 = (cmn_dw13 DPIO_CHV_P2_DIV_SHIFT) 0x1f; - chv_clock(refclk, clock); - - /* clock.dot is the fast clock */ - pipe_config-port_clock = clock.dot / 5; + pipe_config-port_clock = vlv_calc_port_clock(refclk, clock); } static bool i9xx_get_pipe_config(struct intel_crtc *crtc, diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h index bcafefc..95e14bb 100644 --- a/drivers/gpu/drm/i915/intel_drv.h +++ b/drivers/gpu/drm/i915/intel_drv.h @@ -1139,6 +1139,8 @@ ironlake_check_encoder_dotclock(const struct intel_crtc_state *pipe_config, int dotclock); bool bxt_find_best_dpll(struct intel_crtc_state *crtc_state, int target_clock, intel_clock_t *best_clock); +int vlv_calc_port_clock(int refclk, intel_clock_t *pll_clock); + bool intel_crtc_active(struct drm_crtc *crtc); void hsw_enable_ips(struct intel_crtc *crtc); void hsw_disable_ips(struct intel_crtc *crtc); -- 2.1.4 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 3/5] drm/i915/bxt: add PLL10 to the PLL state dumper
Signed-off-by: Imre Deak imre.d...@intel.com --- drivers/gpu/drm/i915/intel_display.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 6f79680..0e5c613 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -11907,7 +11907,7 @@ static void intel_dump_pipe_config(struct intel_crtc *crtc, if (IS_BROXTON(dev)) { DRM_DEBUG_KMS(ddi_pll_sel: %u; dpll_hw_state: ebb0: 0x%x, ebb4: 0x%x, pll0: 0x%x, pll1: 0x%x, pll2: 0x%x, pll3: 0x%x, - pll6: 0x%x, pll8: 0x%x, pll9: 0x%x, pcsdw12: 0x%x\n, + pll6: 0x%x, pll8: 0x%x, pll9: 0x%x, pll10: 0x%x, pcsdw12: 0x%x\n, pipe_config-ddi_pll_sel, pipe_config-dpll_hw_state.ebb0, pipe_config-dpll_hw_state.ebb4, @@ -11918,6 +11918,7 @@ static void intel_dump_pipe_config(struct intel_crtc *crtc, pipe_config-dpll_hw_state.pll6, pipe_config-dpll_hw_state.pll8, pipe_config-dpll_hw_state.pll9, + pipe_config-dpll_hw_state.pll10, pipe_config-dpll_hw_state.pcsdw12); } else if (IS_SKYLAKE(dev)) { DRM_DEBUG_KMS(ddi_pll_sel: %u; dpll_hw_state: -- 2.1.4 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Initialize HWS page address after GPU reset
On Thu, Jun 18, 2015 at 03:05:12PM +0100, Siluvery, Arun wrote: On 15/06/2015 06:20, Daniel Vetter wrote: On Wed, Jun 3, 2015 at 6:14 PM, Ville Syrjälä ville.syrj...@linux.intel.com wrote: I was going to suggest removing the same thing from the lrc_setup_hardware_status_page(), but after another look it seems we sometimes call .init_hw() before the context setup. Would be nice to have a more consistent sequence for init and reset. But anyway the patch looks OK to me. I verified that we indeed lose this register on GPU reset. Yep, this is a mess. And historically _any_ difference between driver load and gpu reset (or resume fwiw) has lead to hilarious bugs, so this difference is really troubling to me. Arun, can you please work on a patch to unify the setup sequence here, so that both driver load gpu resets work the same way? By the time we're calling gem_init_hw the default context should have been created already, and hence we should be able to write to HWS_PGA in ring-init_hw only. Hi Daniel, I think the problem in this case was the code to init HWS page after reset was missing for Gen8+. For Gen7 we are doing this as part of ring-init_hw. Gen7: i915_reset() +-- i915_gem_init_hw() +-- ring-init_hw() which is init_render_ring() +-- init_ring_common() + intel_ring_setup_status_page() Gen8: i915_reset() +-- i915_gem_init_hw() +-- ring-init_hw() which is gen8_init_render_ring() + gen8_init_common_ring() - I added changes in this function. We could probably use intel_ring_setup_status_page() for both cases, does it have to be Gen7 specific? My concern isn't that we have two functions doing hws setup. My concern is that we now have 2 callsites for execlist mode doing hws setup, with slight differences between reset/driver load and resume. I want one, unconditional call to set up the hws page at exactly the right place in the setup sequence. That might require some refactoring, I haven't looked that closely at intel_lrc.c The usual approach is that gem_init does exclusively software setup, and gem_init_hw does all the register writes an actual enabling. I think the hws setup in the driver load code is currently called from gem_init(), which is the wrong place. Also I wonder about resume, where's the HWS_PGA restore for that case? It is covered. i915_drm_resume() +--i915_gem_init_hw Ok, so should be covered with whatever fix we have for gpu reset. Thanks, Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Reset request handling for gen8+
On Thu, Jun 18, 2015 at 12:42:55PM +0100, Chris Wilson wrote: On Thu, Jun 18, 2015 at 12:18:39PM +0100, Tomas Elf wrote: My point was more along the lines of bailing out if the reset request fails and not return an error message but simply keep track of the number of times we've attempted the reset request. By not returning an error we would allow more subsequent hang detections to happen (since the hang is still there), which would end up in the same reset request in the future. If the reset request would fail more times we would simply increment the counter and at one point we would decide that we've had too many unsuccessful reset request attempts and simply go ahead with the reset anyway and if the reset would fail we would return an error at that point in time, which would result in a terminally wedged state. But, yeah, I can see why we shouldn't do this. Skipping to the middle! I understand the merit in trying the reset a few times before giving up, it would just need a bit of restructuring to try the reset before clearing gem state (trivial) and requeueing the hangcheck. I am just wary of feature creep before we get stuck into TDR, which promises to change how we think about resets entirely. My maintainer concern here is always that we should err on the side of not killing the machine. If the reset failed, or if the gpu reinit failed then marking the gpu as wedged has historically been the safe option. The system will still run, display mostly works and there's a reasonable chance you can gather debug data. We do have i915.reset to disable the reset for these cases, but it's always a nuisance to have to resort to that. -Daneil -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/skl: Buffer translation improvements
On Thu, Jun 18, 2015 at 12:50:33PM +0300, David Weinehall wrote: @@ -3520,6 +3545,9 @@ intel_dp_set_signal_levels(struct intel_dp *intel_dp, uint32_t *DP) } else if (HAS_DDI(dev)) { signal_levels = hsw_signal_levels(train_set); mask = DDI_BUF_EMP_MASK; + + if (IS_SKYLAKE(dev)) + skl_set_iboost(intel_dp); Imo this should be put into hsw_signal_levels and then hsw_signal_levels be moved into intel_ddi.c - that way everything related to low-level ddi DP signal level code in intel_ddi.c. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v5] drm/i915 : Added Programming of the MOCS
On Thu, Jun 18, 2015 at 03:45:44PM +0100, Antoine, Peter wrote: So, intializing the other (non-render) MOCS in gen8_init_rcs_context() isn't the most logical thing to do I'm afraid. What happens if we suddenly decide that we don't want to fully initialize the default context at startup but initialize each ring on-demand for that context as well? We can end up in a situation where we use the blitter first and we wouldn't have the blitter MOCS initialized. In that sense, that code makes an assumption about how we do things in a completely different part of the driver and that's always a potential source of bugs. Yes, but this is the same with the golden context and the workarounds (as I understand it) so all this code would have to be moved. Ah, but the workarounds in that function are only for registers in the render context, not other rings/engine. -- Damien ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 25/55] drm/i915: Update i915_gem_object_sync() to take a request structure
On Thu, Jun 18, 2015 at 04:24:53PM +0200, Daniel Vetter wrote: On Thu, Jun 18, 2015 at 01:59:13PM +0100, John Harrison wrote: On 18/06/2015 13:21, Chris Wilson wrote: On Thu, Jun 18, 2015 at 01:14:56PM +0100, john.c.harri...@intel.com wrote: From: John Harrison john.c.harri...@intel.com The plan is to pass requests around as the basic submission tracking structure rather than rings and contexts. This patch updates the i915_gem_object_sync() code path. v2: Much more complex patch to share a single request between the sync and the page flip. The _sync() function now supports lazy allocation of the request structure. That is, if one is passed in then that will be used. If one is not, then a request will be allocated and passed back out. Note that the _sync() code does not necessarily require a request. Thus one will only be created until certain situations. The reason the lazy allocation must be done within the _sync() code itself is because the decision to need one or not is not really something that code above can second guess (except in the case where one is definitely not required because no ring is passed in). The call chains above _sync() now support passing a request through which most callers passing in NULL and assuming that no request will be required (because they also pass in NULL for the ring and therefore can't be generating any ring code). The exeception is intel_crtc_page_flip() which now supports having a request returned from _sync(). If one is, then that request is shared by the page flip (if the page flip is of a type to need a request). If _sync() does not generate a request but the page flip does need one, then the page flip path will create its own request. v3: Updated comment description to be clearer about 'to_req' parameter (Tomas Elf review request). Rebased onto newer tree that significantly changed the synchronisation code. v4: Updated comments from review feedback (Tomas Elf) For: VIZ-5115 Signed-off-by: John Harrison john.c.harri...@intel.com Reviewed-by: Tomas Elf tomas@intel.com --- drivers/gpu/drm/i915/i915_drv.h|4 ++- drivers/gpu/drm/i915/i915_gem.c| 48 +--- drivers/gpu/drm/i915/i915_gem_execbuffer.c |2 +- drivers/gpu/drm/i915/intel_display.c | 17 +++--- drivers/gpu/drm/i915/intel_drv.h |3 +- drivers/gpu/drm/i915/intel_fbdev.c |2 +- drivers/gpu/drm/i915/intel_lrc.c |2 +- drivers/gpu/drm/i915/intel_overlay.c |2 +- 8 files changed, 57 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 64a10fa..f69e9cb 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2778,7 +2778,8 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj) int __must_check i915_mutex_lock_interruptible(struct drm_device *dev); int i915_gem_object_sync(struct drm_i915_gem_object *obj, - struct intel_engine_cs *to); + struct intel_engine_cs *to, + struct drm_i915_gem_request **to_req); Nope. Did you forget to reorder the code to ensure that the request is allocated along with the context switch at the start of execbuf? -Chris Not sure what you are objecting to? If you mean the lazily allocated request then that is for page flip code not execbuff code. If we get here from an execbuff call then the request will definitely have been allocated and will be passed in. Whereas the page flip code may or may not require a request (depending on whether MMIO or ring flips are in use. Likewise the sync code may or may not require a request (depending on whether there is anything to sync to or not). There is no point allocating and submitting an empty request in the MMIO/idle case. Hence the sync code needs to be able to use an existing request or create one if none already exists. I guess Chris' comment was that if you have a non-NULL to, then you better have a non-NULL to_req. And since we link up reqeusts to the engine they'll run on the former shouldn't be required any more. So either that's true and we can remove the to or we don't understand something yet (and perhaps that should be done as a follow-up). I am sure I sent a patch that outlined in great detail how that we need only the request parameter in i915_gem_object_sync(), for handling both execbuffer, pipelined pin_and_fence and synchronous pin_and_fence. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support
On Thu, Jun 18, 2015 at 04:49:49PM +0200, Daniel Vetter wrote: Guc is different since we really must have it ready for execbuf, and for that usecase a completion at drm_open time sounds like the right thing. But do we? It would be nice if we had a definite answer that the hw was ready before we started using it in anger, but I don't see any reason why we would have to delay userspace for a slow microcode update... (This presupposes that userspace batches are unaffected by GuC/execlist setup, which for userspace sanity I hope they are - or at least using predicate registers and conditional execution.) -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm: Avoid the double clflush on the last cache line in drm_clflush_virt_range()
On Thu, Jun 18, 2015 at 06:31:18PM +0300, Imre Deak wrote: On to, 2015-06-11 at 09:33 +0100, Chris Wilson wrote: On Thu, Jun 11, 2015 at 09:25:16AM +0100, Dave Gordon wrote: On 10/06/15 15:58, Chris Wilson wrote: As the clflush operates on cache lines, and we can flush any byte address, in order to flush all bytes given in the range we issue an extra clflush on the last byte to ensure the last cacheline is flushed. We can can the iteration to be over the actual cache lines to avoid this double clflush on the last byte. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: Imre Deak imre.d...@intel.com --- drivers/gpu/drm/drm_cache.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index 9a62d7a53553..6743ff7dccfa 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -130,11 +130,12 @@ drm_clflush_virt_range(void *addr, unsigned long length) { #if defined(CONFIG_X86) if (cpu_has_clflush) { + const int size = boot_cpu_data.x86_clflush_size; void *end = addr + length; + addr = (void *)(((unsigned long)addr) -size); Should this cast be to uintptr_t? The kernel has a strict equivalence between sizeof(unsigned long) and sizeof(pointer). You will see unsigned long used universally to pass along pointers to functions and as closures. Or intptr_t, as size has somewhat strangely been defined as signed? To complete the mix, x86_clflush_size is 'u16'! So maybe we should write + const size_t size = boot_cpu_data.x86_clflush_size; + const size_t mask = ~(size - 1); void *end = addr + length; + addr = (void *)(((uintptr_t)addr) mask); No. size_t has very poor definition inside the kernel - what does the maximum size of a userspace allocation have to do with kernel internals? Let's keep userspace types in userspace, or else we end up with i915_gem_gtt.c. I also think using unsigned long for virtual addresses is standard in the kernel and I can't see how using int would lead to problems given the expected range of x86_clflush_size, so this looks ok to me: Reviewed-by: Imre Deak imre.d...@intel.com Applied to drm-misc, thanks. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Initialize HWS page address after GPU reset
On 15/06/2015 06:20, Daniel Vetter wrote: On Wed, Jun 3, 2015 at 6:14 PM, Ville Syrjälä ville.syrj...@linux.intel.com wrote: I was going to suggest removing the same thing from the lrc_setup_hardware_status_page(), but after another look it seems we sometimes call .init_hw() before the context setup. Would be nice to have a more consistent sequence for init and reset. But anyway the patch looks OK to me. I verified that we indeed lose this register on GPU reset. Yep, this is a mess. And historically _any_ difference between driver load and gpu reset (or resume fwiw) has lead to hilarious bugs, so this difference is really troubling to me. Arun, can you please work on a patch to unify the setup sequence here, so that both driver load gpu resets work the same way? By the time we're calling gem_init_hw the default context should have been created already, and hence we should be able to write to HWS_PGA in ring-init_hw only. Hi Daniel, I think the problem in this case was the code to init HWS page after reset was missing for Gen8+. For Gen7 we are doing this as part of ring-init_hw. Gen7: i915_reset() +-- i915_gem_init_hw() +-- ring-init_hw() which is init_render_ring() +-- init_ring_common() + intel_ring_setup_status_page() Gen8: i915_reset() +-- i915_gem_init_hw() +-- ring-init_hw() which is gen8_init_render_ring() + gen8_init_common_ring() - I added changes in this function. We could probably use intel_ring_setup_status_page() for both cases, does it have to be Gen7 specific? Also I wonder about resume, where's the HWS_PGA restore for that case? It is covered. i915_drm_resume() +--i915_gem_init_hw regards Arun -Daniel ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v5] drm/i915 : Added Programming of the MOCS
-Original Message- From: Lespiau, Damien Sent: Thursday, June 18, 2015 2:51 PM To: Antoine, Peter Cc: intel-gfx@lists.freedesktop.org; daniel.vetter.intel@irsmsx102.ger.corp.intel.com; ch...@chris-wilson.co.uk; matts...@gmail.com Subject: Re: [PATCH v5] drm/i915 : Added Programming of the MOCS On Thu, Jun 18, 2015 at 01:29:45PM +0100, Peter Antoine wrote: @@ -1379,6 +1380,13 @@ static int gen8_init_rcs_context(struct intel_engine_cs *ring, if (ret) return ret; + /* + * Failing to program the MOCS is non-fatal.The system will not + * run at peak performance. So generate a warning and carry on. + */ + if (intel_rcs_context_init_mocs(ring, ctx) != 0) + DRM_ERROR(MOCS failed to program: expect performance issues.); + Missing a '\n'. Will fix. +static const struct drm_i915_mocs_entry skylake_mocs_table[] = { + /* {0x0009, 0x0010} */ + {(MOCS_CACHEABILITY(EDRAM_UC) | MOCS_TGT_CACHE(LLC_ELLC) | + MOCS_LRUM(0) | MOCS_AOM(0) | MOCS_LECC_ESC(0) | MOCS_SCC(0) | + MOC_PFM(0) | MOCS_SCF(0)), + (MOCS_ESC(0) | MOCS_SCC(0) | MOCS_L3_CACHEABILITY(L3_UC))}, + /* {0x003b, 0x0030} */ We're still missing the usage hints for those configuration entries That'd help user space a lot, which means make this patch land quicker as well. These are boiled down from 250+ requirements from different usecases (opencl, Media, etc...), I can't really generate anymore usage hints. +int intel_rcs_context_init_mocs(struct intel_engine_cs *ring, + struct intel_context *ctx) +{ + int ret = 0; + + struct drm_i915_mocs_table t; + struct drm_device *dev = ring-dev; + struct intel_ringbuffer *ringbuf = ctx-engine[ring-id].ringbuf; + + if (get_mocs_settings(dev, t)) { + u32 table_size; + + /* + * OK. For each supported ring: + * number of mocs entries * 2 dwords for each control_value + * plus number of mocs entries /2 dwords for l3cc values. + * + * Plus 1 for the load command and 1 for the NOOP per ring + * and the l3cc programming. + */ + table_size = GEN9_NUM_MOCS_RINGS * + ((2 * GEN9_NUM_MOCS_ENTRIES) + 2) + + GEN9_NUM_MOCS_ENTRIES + 2; + ret = intel_logical_ring_begin(ringbuf, ctx, table_size); + if (ret) { + DRM_DEBUG(intel_logical_ring_begin failed %d\n, ret); + return ret; + } + + /* program the control registers */ + emit_mocs_control_table(ringbuf, t, GEN9_GFX_MOCS_0); + emit_mocs_control_table(ringbuf, t, GEN9_MFX0_MOCS_0); + emit_mocs_control_table(ringbuf, t, GEN9_MFX1_MOCS_0); + emit_mocs_control_table(ringbuf, t, GEN9_VEBOX_MOCS_0); + emit_mocs_control_table(ringbuf, t, GEN9_BLT_MOCS_0); So, if I'm not mistaken, I think this only works because we fully initialize the default context at start/reset time through: + i915_gem_init_hw() + i915_gem_context_enable() + cycle through all the rings and call ring-init_context() + gen8_init_rcs_context() + intel_rcs_context_init_mocs() (initalize ALL the MOCS!) Yes. So, intializing the other (non-render) MOCS in gen8_init_rcs_context() isn't the most logical thing to do I'm afraid. What happens if we suddenly decide that we don't want to fully initialize the default context at startup but initialize each ring on-demand for that context as well? We can end up in a situation where we use the blitter first and we wouldn't have the blitter MOCS initialized. In that sense, that code makes an assumption about how we do things in a completely different part of the driver and that's always a potential source of bugs. Yes, but this is the same with the golden context and the workarounds (as I understand it) so all this code would have to be moved. Chris, how far am I ? :p One way to solve this (if that's indeed the issue pointed at by Chris) would be to decouple the render MOCS from the others, still keep the render ones in there as they need to be emitted from the ring but put the other writes (which could be done through MMIO as well) higher in the chain, could probably make sense in i915_gem_context_enable()? (which, by the way is awfully namedm should have an _init somewhere?). It could also be a per-ring vfunc I suppose. For similar reasons, I think the GuC MOCS should be part of the GuC init as well so we don't couple too hard different part of the code. Now, is that really a blocker? I'd say no if we had userspace ready and could commit that today, because we really want it. Still something to look at, I could be totally wrong. Not a blocker. It gets a little
Re: [Intel-gfx] [PATCH v3 15/19] drm/i915: atomic plane updates in a nutshell
On Mon, Jun 15, 2015 at 12:33:52PM +0200, Maarten Lankhorst wrote: Now that all planes are added during a modeset we can use the calculated changes before disabling a plane, and then either commit or force disable a plane before disabling the crtc. The code is shared with atomic_begin/flush, except watermark updating and vblank evasion are not used. This is needed for proper atomic suspend/resume support. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90868 Signed-off-by: Maarten Lankhorst maarten.lankho...@linux.intel.com --- drivers/gpu/drm/i915/intel_display.c | 103 --- drivers/gpu/drm/i915/intel_sprite.c | 4 +- 2 files changed, 23 insertions(+), 84 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index cc4ca4970716..beb69281f45c 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -2217,28 +2217,6 @@ static void intel_disable_pipe(struct intel_crtc *crtc) intel_wait_for_pipe_off(crtc); } -/** - * intel_enable_primary_hw_plane - enable the primary plane on a given pipe - * @plane: plane to be enabled - * @crtc: crtc for the plane - * - * Enable @plane on @crtc, making sure that the pipe is running first. - */ -static void intel_enable_primary_hw_plane(struct drm_plane *plane, - struct drm_crtc *crtc) -{ - struct drm_device *dev = plane-dev; - struct drm_i915_private *dev_priv = dev-dev_private; - struct intel_crtc *intel_crtc = to_intel_crtc(crtc); - - /* If the pipe isn't enabled, we can't pump pixels and may hang */ - assert_pipe_enabled(dev_priv, intel_crtc-pipe); - to_intel_plane_state(plane-state)-visible = true; - - dev_priv-display.update_primary_plane(crtc, plane-fb, -crtc-x, crtc-y); -} - static bool need_vtd_wa(struct drm_device *dev) { #ifdef CONFIG_INTEL_IOMMU @@ -4508,20 +4486,6 @@ static void ironlake_pfit_enable(struct intel_crtc *crtc) } } -static void intel_enable_sprite_planes(struct drm_crtc *crtc) -{ - struct drm_device *dev = crtc-dev; - enum pipe pipe = to_intel_crtc(crtc)-pipe; - struct drm_plane *plane; - struct intel_plane *intel_plane; - - drm_for_each_legacy_plane(plane, dev-mode_config.plane_list) { - intel_plane = to_intel_plane(plane); - if (intel_plane-pipe == pipe) - intel_plane_restore(intel_plane-base); - } -} - void hsw_enable_ips(struct intel_crtc *crtc) { struct drm_device *dev = crtc-base.dev; @@ -4817,27 +4781,6 @@ static void intel_pre_plane_update(struct intel_crtc *crtc) intel_pre_disable_primary(crtc-base); } -static void intel_crtc_enable_planes(struct drm_crtc *crtc) -{ - struct drm_device *dev = crtc-dev; - struct intel_crtc *intel_crtc = to_intel_crtc(crtc); - int pipe = intel_crtc-pipe; - - intel_enable_primary_hw_plane(crtc-primary, crtc); - intel_enable_sprite_planes(crtc); - if (to_intel_plane_state(crtc-cursor-state)-visible) - intel_crtc_update_cursor(crtc, true); - - intel_post_enable_primary(crtc); - - /* - * FIXME: Once we grow proper nuclear flip support out of this we need - * to compute the mask of flip planes precisely. For the time being - * consider this a flip to a NULL plane. - */ - intel_frontbuffer_flip(dev, INTEL_FRONTBUFFER_ALL_MASK(pipe)); -} - static void intel_crtc_disable_planes(struct drm_crtc *crtc, unsigned plane_mask) { struct drm_device *dev = crtc-dev; @@ -4845,10 +4788,6 @@ static void intel_crtc_disable_planes(struct drm_crtc *crtc, unsigned plane_mask struct drm_plane *p; int pipe = intel_crtc-pipe; - intel_crtc_wait_for_pending_flips(crtc); - - intel_pre_disable_primary(crtc); - intel_crtc_dpms_overlay_disable(intel_crtc); drm_for_each_plane_mask(p, dev, plane_mask) @@ -6270,6 +6209,11 @@ static void intel_crtc_disable_noatomic(struct drm_crtc *crtc) if (!intel_crtc-active) return; + if (to_intel_plane_state(crtc-primary-state)-visible) { + intel_crtc_wait_for_pending_flips(crtc); + intel_pre_disable_primary(crtc); + } + intel_crtc_disable_planes(crtc, crtc-state-plane_mask); dev_priv-display.crtc_disable(crtc); @@ -11783,10 +11727,6 @@ int intel_plane_atomic_calc_changes(struct drm_crtc_state *crtc_state, if (old_plane_state-base.fb !fb) intel_crtc-atomic.disabled_planes |= 1 i; - /* don't run rest during modeset yet */ - if (!intel_crtc-active || mode_changed) - return 0; - was_visible = old_plane_state-visible; visible = to_intel_plane_state(plane_state)-visible; @@
Re: [Intel-gfx] [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support
On Thu, Jun 18, 2015 at 04:27:52PM +0100, Chris Wilson wrote: On Thu, Jun 18, 2015 at 04:49:49PM +0200, Daniel Vetter wrote: Guc is different since we really must have it ready for execbuf, and for that usecase a completion at drm_open time sounds like the right thing. But do we? It would be nice if we had a definite answer that the hw was ready before we started using it in anger, but I don't see any reason why we would have to delay userspace for a slow microcode update... (This presupposes that userspace batches are unaffected by GuC/execlist setup, which for userspace sanity I hope they are - or at least using predicate registers and conditional execution.) Well I figured a wait_completion or flush_work unconditionally in execbuf is not to your liking, and it's better to keep that in open. But I think we should be able to get away with this at execbuf time. Might even be better since this wouldn't block sw-rendered boot-splashs. But either way should be suitable I think. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm: Avoid the double clflush on the last cache line in drm_clflush_virt_range()
On to, 2015-06-11 at 09:33 +0100, Chris Wilson wrote: On Thu, Jun 11, 2015 at 09:25:16AM +0100, Dave Gordon wrote: On 10/06/15 15:58, Chris Wilson wrote: As the clflush operates on cache lines, and we can flush any byte address, in order to flush all bytes given in the range we issue an extra clflush on the last byte to ensure the last cacheline is flushed. We can can the iteration to be over the actual cache lines to avoid this double clflush on the last byte. Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Cc: Imre Deak imre.d...@intel.com --- drivers/gpu/drm/drm_cache.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index 9a62d7a53553..6743ff7dccfa 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -130,11 +130,12 @@ drm_clflush_virt_range(void *addr, unsigned long length) { #if defined(CONFIG_X86) if (cpu_has_clflush) { + const int size = boot_cpu_data.x86_clflush_size; void *end = addr + length; + addr = (void *)(((unsigned long)addr) -size); Should this cast be to uintptr_t? The kernel has a strict equivalence between sizeof(unsigned long) and sizeof(pointer). You will see unsigned long used universally to pass along pointers to functions and as closures. Or intptr_t, as size has somewhat strangely been defined as signed? To complete the mix, x86_clflush_size is 'u16'! So maybe we should write + const size_t size = boot_cpu_data.x86_clflush_size; + const size_t mask = ~(size - 1); void *end = addr + length; + addr = (void *)(((uintptr_t)addr) mask); No. size_t has very poor definition inside the kernel - what does the maximum size of a userspace allocation have to do with kernel internals? Let's keep userspace types in userspace, or else we end up with i915_gem_gtt.c. I also think using unsigned long for virtual addresses is standard in the kernel and I can't see how using int would lead to problems given the expected range of x86_clflush_size, so this looks ok to me: Reviewed-by: Imre Deak imre.d...@intel.com -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c
On Thu, Jun 18, 2015 at 12:49:55PM +0100, Dave Gordon wrote: On 17/06/15 13:02, Daniel Vetter wrote: On Wed, Jun 17, 2015 at 08:23:40AM +0100, Dave Gordon wrote: On 15/06/15 21:09, Chris Wilson wrote: On Mon, Jun 15, 2015 at 07:36:19PM +0100, Dave Gordon wrote: From: Alex Dai yu@intel.com i915_gem_object_write() is a generic function to copy data from a plain linear buffer to a paged gem object. We will need this for the microcontroller firmware loading support code. Issue: VIZ-4884 Signed-off-by: Alex Dai yu@intel.com Signed-off-by: Dave Gordon david.s.gor...@intel.com --- drivers/gpu/drm/i915/i915_drv.h |2 ++ drivers/gpu/drm/i915/i915_gem.c | 28 2 files changed, 30 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 611fbd8..9094c06 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2713,6 +2713,8 @@ void *i915_gem_object_alloc(struct drm_device *dev); void i915_gem_object_free(struct drm_i915_gem_object *obj); void i915_gem_object_init(struct drm_i915_gem_object *obj, const struct drm_i915_gem_object_ops *ops); +int i915_gem_object_write(struct drm_i915_gem_object *obj, + const void *data, size_t size); struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev, size_t size); void i915_init_vm(struct drm_i915_private *dev_priv, diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index be35f04..75d63c2 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -5392,3 +5392,31 @@ bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj) return false; } +/* Fill the @obj with the @size amount of @data */ +int i915_gem_object_write(struct drm_i915_gem_object *obj, +const void *data, size_t size) +{ +struct sg_table *sg; +size_t bytes; +int ret; + +ret = i915_gem_object_get_pages(obj); +if (ret) +return ret; + +i915_gem_object_pin_pages(obj); You don't set the object into the CPU domain, or instead manually handle the domain flushing. You don't handle objects that cannot be written directly by the CPU, nor do you handle objects whose representation in memory is not linear. -Chris No we don't handle just any random gem object, but we do return an error code for any types not supported. However, as we don't really need the full generality of writing into a gem object of any type, I will replace this function with one that combines the allocation of a new object (which will therefore definitely be of the correct type, in the correct domain, etc) and filling it with the data to be preserved. The usage pattern for the particular case is going to be: Once-only: Allocate Fill Then each time GuC is (re-)initialised: Map to GTT DMA-read from buffer into GuC private memory Unmap Only on unload: Dispose So our object is write-once by the CPU (and that's always the first operation), thereafter read-occasionally by the GuC's DMA engine. Yup. The problem is more that on atom platforms the objects aren't coherent by default and generally you need to do something. Hence we either have - an explicit set_caching call to document that this is a gpu object which is always coherent (so also on chv/bxt), even when that's a no-op on big core - or wrap everything in set_domain calls, even when those are no-ops too. If either of those lack, reviews tend to freak out preemptively and the reptil brain takes over ;-) Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3 18/19] drm/i915: Remove transitional references from intel_plane_atomic_check.
On Mon, Jun 15, 2015 at 12:33:55PM +0200, Maarten Lankhorst wrote: All transitional plane helpers are gone, party! Signed-off-by: Maarten Lankhorst maarten.lankho...@linux.intel.com There's also a reference in skylake_update_primary_plane() that I assume can be removed? Matt --- drivers/gpu/drm/i915/intel_atomic_plane.c | 19 ++- 1 file changed, 6 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_atomic_plane.c b/drivers/gpu/drm/i915/intel_atomic_plane.c index 10a8ecedc942..f1ab8e4b9c11 100644 --- a/drivers/gpu/drm/i915/intel_atomic_plane.c +++ b/drivers/gpu/drm/i915/intel_atomic_plane.c @@ -115,6 +115,7 @@ static int intel_plane_atomic_check(struct drm_plane *plane, struct intel_crtc_state *crtc_state; struct intel_plane *intel_plane = to_intel_plane(plane); struct intel_plane_state *intel_state = to_intel_plane_state(state); + struct drm_crtc_state *drm_crtc_state; int ret; crtc = crtc ? crtc : plane-state-crtc; @@ -129,19 +130,11 @@ static int intel_plane_atomic_check(struct drm_plane *plane, if (!crtc) return 0; - /* FIXME: temporary hack necessary while we still use the plane update - * helper. */ - if (state-state) { - struct drm_crtc_state *drm_crtc_state = - drm_atomic_get_existing_crtc_state(state-state, crtc); + drm_crtc_state = drm_atomic_get_existing_crtc_state(state-state, crtc); + if (WARN_ON(!drm_crtc_state)) + return -EINVAL; - if (WARN_ON(!drm_crtc_state)) - return -EINVAL; - - crtc_state = to_intel_crtc_state(drm_crtc_state); - } else { - crtc_state = intel_crtc-config; - } + crtc_state = to_intel_crtc_state(drm_crtc_state); /* * The original src/dest coordinates are stored in state-base, but @@ -191,7 +184,7 @@ static int intel_plane_atomic_check(struct drm_plane *plane, intel_state-visible = false; ret = intel_plane-check_plane(plane, crtc_state, intel_state); - if (ret || !state-state) + if (ret) return ret; return intel_plane_atomic_calc_changes(crtc_state-base, state); -- 2.1.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Matt Roper Graphics Software Engineer IoTG Platform Enabling Development Intel Corporation (916) 356-2795 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 25/55] drm/i915: Update i915_gem_object_sync() to take a request structure
On Thu, Jun 18, 2015 at 01:59:13PM +0100, John Harrison wrote: On 18/06/2015 13:21, Chris Wilson wrote: On Thu, Jun 18, 2015 at 01:14:56PM +0100, john.c.harri...@intel.com wrote: From: John Harrison john.c.harri...@intel.com The plan is to pass requests around as the basic submission tracking structure rather than rings and contexts. This patch updates the i915_gem_object_sync() code path. v2: Much more complex patch to share a single request between the sync and the page flip. The _sync() function now supports lazy allocation of the request structure. That is, if one is passed in then that will be used. If one is not, then a request will be allocated and passed back out. Note that the _sync() code does not necessarily require a request. Thus one will only be created until certain situations. The reason the lazy allocation must be done within the _sync() code itself is because the decision to need one or not is not really something that code above can second guess (except in the case where one is definitely not required because no ring is passed in). The call chains above _sync() now support passing a request through which most callers passing in NULL and assuming that no request will be required (because they also pass in NULL for the ring and therefore can't be generating any ring code). The exeception is intel_crtc_page_flip() which now supports having a request returned from _sync(). If one is, then that request is shared by the page flip (if the page flip is of a type to need a request). If _sync() does not generate a request but the page flip does need one, then the page flip path will create its own request. v3: Updated comment description to be clearer about 'to_req' parameter (Tomas Elf review request). Rebased onto newer tree that significantly changed the synchronisation code. v4: Updated comments from review feedback (Tomas Elf) For: VIZ-5115 Signed-off-by: John Harrison john.c.harri...@intel.com Reviewed-by: Tomas Elf tomas@intel.com --- drivers/gpu/drm/i915/i915_drv.h|4 ++- drivers/gpu/drm/i915/i915_gem.c| 48 +--- drivers/gpu/drm/i915/i915_gem_execbuffer.c |2 +- drivers/gpu/drm/i915/intel_display.c | 17 +++--- drivers/gpu/drm/i915/intel_drv.h |3 +- drivers/gpu/drm/i915/intel_fbdev.c |2 +- drivers/gpu/drm/i915/intel_lrc.c |2 +- drivers/gpu/drm/i915/intel_overlay.c |2 +- 8 files changed, 57 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 64a10fa..f69e9cb 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2778,7 +2778,8 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj) int __must_check i915_mutex_lock_interruptible(struct drm_device *dev); int i915_gem_object_sync(struct drm_i915_gem_object *obj, -struct intel_engine_cs *to); +struct intel_engine_cs *to, +struct drm_i915_gem_request **to_req); Nope. Did you forget to reorder the code to ensure that the request is allocated along with the context switch at the start of execbuf? -Chris Not sure what you are objecting to? If you mean the lazily allocated request then that is for page flip code not execbuff code. If we get here from an execbuff call then the request will definitely have been allocated and will be passed in. Whereas the page flip code may or may not require a request (depending on whether MMIO or ring flips are in use. Likewise the sync code may or may not require a request (depending on whether there is anything to sync to or not). There is no point allocating and submitting an empty request in the MMIO/idle case. Hence the sync code needs to be able to use an existing request or create one if none already exists. I guess Chris' comment was that if you have a non-NULL to, then you better have a non-NULL to_req. And since we link up reqeusts to the engine they'll run on the former shouldn't be required any more. So either that's true and we can remove the to or we don't understand something yet (and perhaps that should be done as a follow-up). -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3 15/19] drm/i915: atomic plane updates in a nutshell
On Mon, Jun 15, 2015 at 12:33:52PM +0200, Maarten Lankhorst wrote: Now that all planes are added during a modeset we can use the calculated changes before disabling a plane, and then either commit or force disable a plane before disabling the crtc. The code is shared with atomic_begin/flush, except watermark updating and vblank evasion are not used. This is needed for proper atomic suspend/resume support. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90868 Signed-off-by: Maarten Lankhorst maarten.lankho...@linux.intel.com --- drivers/gpu/drm/i915/intel_display.c | 103 --- drivers/gpu/drm/i915/intel_sprite.c | 4 +- 2 files changed, 23 insertions(+), 84 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index cc4ca4970716..beb69281f45c 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -2217,28 +2217,6 @@ static void intel_disable_pipe(struct intel_crtc *crtc) intel_wait_for_pipe_off(crtc); } -/** - * intel_enable_primary_hw_plane - enable the primary plane on a given pipe - * @plane: plane to be enabled - * @crtc: crtc for the plane - * - * Enable @plane on @crtc, making sure that the pipe is running first. - */ -static void intel_enable_primary_hw_plane(struct drm_plane *plane, - struct drm_crtc *crtc) -{ - struct drm_device *dev = plane-dev; - struct drm_i915_private *dev_priv = dev-dev_private; - struct intel_crtc *intel_crtc = to_intel_crtc(crtc); - - /* If the pipe isn't enabled, we can't pump pixels and may hang */ - assert_pipe_enabled(dev_priv, intel_crtc-pipe); - to_intel_plane_state(plane-state)-visible = true; - - dev_priv-display.update_primary_plane(crtc, plane-fb, -crtc-x, crtc-y); -} - static bool need_vtd_wa(struct drm_device *dev) { #ifdef CONFIG_INTEL_IOMMU @@ -4508,20 +4486,6 @@ static void ironlake_pfit_enable(struct intel_crtc *crtc) } } -static void intel_enable_sprite_planes(struct drm_crtc *crtc) -{ - struct drm_device *dev = crtc-dev; - enum pipe pipe = to_intel_crtc(crtc)-pipe; - struct drm_plane *plane; - struct intel_plane *intel_plane; - - drm_for_each_legacy_plane(plane, dev-mode_config.plane_list) { - intel_plane = to_intel_plane(plane); - if (intel_plane-pipe == pipe) - intel_plane_restore(intel_plane-base); - } -} - void hsw_enable_ips(struct intel_crtc *crtc) { struct drm_device *dev = crtc-base.dev; @@ -4817,27 +4781,6 @@ static void intel_pre_plane_update(struct intel_crtc *crtc) intel_pre_disable_primary(crtc-base); } -static void intel_crtc_enable_planes(struct drm_crtc *crtc) -{ - struct drm_device *dev = crtc-dev; - struct intel_crtc *intel_crtc = to_intel_crtc(crtc); - int pipe = intel_crtc-pipe; - - intel_enable_primary_hw_plane(crtc-primary, crtc); - intel_enable_sprite_planes(crtc); - if (to_intel_plane_state(crtc-cursor-state)-visible) - intel_crtc_update_cursor(crtc, true); - - intel_post_enable_primary(crtc); - - /* - * FIXME: Once we grow proper nuclear flip support out of this we need - * to compute the mask of flip planes precisely. For the time being - * consider this a flip to a NULL plane. - */ - intel_frontbuffer_flip(dev, INTEL_FRONTBUFFER_ALL_MASK(pipe)); -} - static void intel_crtc_disable_planes(struct drm_crtc *crtc, unsigned plane_mask) { struct drm_device *dev = crtc-dev; @@ -4845,10 +4788,6 @@ static void intel_crtc_disable_planes(struct drm_crtc *crtc, unsigned plane_mask struct drm_plane *p; int pipe = intel_crtc-pipe; - intel_crtc_wait_for_pending_flips(crtc); - - intel_pre_disable_primary(crtc); - intel_crtc_dpms_overlay_disable(intel_crtc); drm_for_each_plane_mask(p, dev, plane_mask) @@ -6270,6 +6209,11 @@ static void intel_crtc_disable_noatomic(struct drm_crtc *crtc) if (!intel_crtc-active) return; + if (to_intel_plane_state(crtc-primary-state)-visible) { + intel_crtc_wait_for_pending_flips(crtc); + intel_pre_disable_primary(crtc); + } + intel_crtc_disable_planes(crtc, crtc-state-plane_mask); dev_priv-display.crtc_disable(crtc); @@ -11783,10 +11727,6 @@ int intel_plane_atomic_calc_changes(struct drm_crtc_state *crtc_state, if (old_plane_state-base.fb !fb) intel_crtc-atomic.disabled_planes |= 1 i; - /* don't run rest during modeset yet */ - if (!intel_crtc-active || mode_changed) - return 0; - was_visible = old_plane_state-visible; visible = to_intel_plane_state(plane_state)-visible; @@
Re: [Intel-gfx] [PATCH v3 17/19] drm/i915: Make setting color key atomic.
On Mon, Jun 15, 2015 at 12:33:54PM +0200, Maarten Lankhorst wrote: By making color key atomic there are no more transitional helpers. The plane check function will reject the color key when a scaler is active. Signed-off-by: Maarten Lankhorst maarten.lankho...@linux.intel.com --- drivers/gpu/drm/i915/intel_atomic_plane.c | 1 + drivers/gpu/drm/i915/intel_display.c | 7 ++- drivers/gpu/drm/i915/intel_drv.h | 6 +-- drivers/gpu/drm/i915/intel_sprite.c | 85 +++ 4 files changed, 46 insertions(+), 53 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_atomic_plane.c b/drivers/gpu/drm/i915/intel_atomic_plane.c index 91d53768df9d..10a8ecedc942 100644 --- a/drivers/gpu/drm/i915/intel_atomic_plane.c +++ b/drivers/gpu/drm/i915/intel_atomic_plane.c @@ -56,6 +56,7 @@ intel_create_plane_state(struct drm_plane *plane) state-base.plane = plane; state-base.rotation = BIT(DRM_ROTATE_0); + state-ckey.flags = I915_SET_COLORKEY_NONE; return state; } diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 5facd0501a34..746c73d2ab84 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -4401,9 +4401,9 @@ static int skl_update_scaler_plane(struct intel_crtc_state *crtc_state, return ret; /* check colorkey */ - if (WARN_ON(intel_plane-ckey.flags != I915_SET_COLORKEY_NONE)) { + if (plane_state-ckey.flags != I915_SET_COLORKEY_NONE) { DRM_DEBUG_KMS([PLANE:%d] scaling with color key not allowed, - intel_plane-base.base.id); + intel_plane-base.base.id); return -EINVAL; } @@ -13733,7 +13733,7 @@ intel_check_primary_plane(struct drm_plane *plane, /* use scaler when colorkey is not required */ if (INTEL_INFO(plane-dev)-gen = 9 - to_intel_plane(plane)-ckey.flags == I915_SET_COLORKEY_NONE) { + state-ckey.flags == I915_SET_COLORKEY_NONE) { min_scale = 1; max_scale = skl_max_scale(to_intel_crtc(crtc), crtc_state); can_position = true; @@ -13881,7 +13881,6 @@ static struct drm_plane *intel_primary_plane_create(struct drm_device *dev, primary-check_plane = intel_check_primary_plane; primary-commit_plane = intel_commit_primary_plane; primary-disable_plane = intel_disable_primary_plane; - primary-ckey.flags = I915_SET_COLORKEY_NONE; if (HAS_FBC(dev) INTEL_INFO(dev)-gen 4) primary-plane = !pipe; diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h index 93b9542ab8dc..3a2ac82b0970 100644 --- a/drivers/gpu/drm/i915/intel_drv.h +++ b/drivers/gpu/drm/i915/intel_drv.h @@ -274,6 +274,8 @@ struct intel_plane_state { * update_scaler_plane. */ int scaler_id; + + struct drm_intel_sprite_colorkey ckey; }; struct intel_initial_plane_config { @@ -588,9 +590,6 @@ struct intel_plane { bool can_scale; int max_downscale; - /* FIXME convert to properties */ - struct drm_intel_sprite_colorkey ckey; - /* Since we need to change the watermarks before/after * enabling/disabling the planes, we need to store the parameters here * as the other pieces of the struct may not reflect the values we want @@ -1390,7 +1389,6 @@ bool intel_sdvo_init(struct drm_device *dev, uint32_t sdvo_reg, bool is_sdvob); /* intel_sprite.c */ int intel_plane_init(struct drm_device *dev, enum pipe pipe, int plane); -int intel_plane_restore(struct drm_plane *plane); int intel_sprite_set_colorkey(struct drm_device *dev, void *data, struct drm_file *file_priv); bool intel_pipe_update_start(struct intel_crtc *crtc, diff --git a/drivers/gpu/drm/i915/intel_sprite.c b/drivers/gpu/drm/i915/intel_sprite.c index 168f90f346c2..21d3f7882c4d 100644 --- a/drivers/gpu/drm/i915/intel_sprite.c +++ b/drivers/gpu/drm/i915/intel_sprite.c @@ -182,7 +182,8 @@ skl_update_plane(struct drm_plane *drm_plane, struct drm_crtc *crtc, const int plane = intel_plane-plane + 1; u32 plane_ctl, stride_div, stride; int pixel_size = drm_format_plane_cpp(fb-pixel_format, 0); - const struct drm_intel_sprite_colorkey *key = intel_plane-ckey; + const struct drm_intel_sprite_colorkey *key = + to_intel_plane_state(drm_plane-state)-ckey; unsigned long surf_addr; u32 tile_height, plane_offset, plane_size; unsigned int rotation; @@ -344,7 +345,8 @@ vlv_update_plane(struct drm_plane *dplane, struct drm_crtc *crtc, u32 sprctl; unsigned long sprsurf_offset, linear_offset; int pixel_size = drm_format_plane_cpp(fb-pixel_format, 0); - const struct drm_intel_sprite_colorkey *key = intel_plane-ckey; + const struct
Re: [Intel-gfx] [PATCH 25/55] drm/i915: Update i915_gem_object_sync() to take a request structure
On 18/06/2015 16:39, Chris Wilson wrote: On Thu, Jun 18, 2015 at 04:24:53PM +0200, Daniel Vetter wrote: On Thu, Jun 18, 2015 at 01:59:13PM +0100, John Harrison wrote: On 18/06/2015 13:21, Chris Wilson wrote: On Thu, Jun 18, 2015 at 01:14:56PM +0100, john.c.harri...@intel.com wrote: From: John Harrison john.c.harri...@intel.com The plan is to pass requests around as the basic submission tracking structure rather than rings and contexts. This patch updates the i915_gem_object_sync() code path. v2: Much more complex patch to share a single request between the sync and the page flip. The _sync() function now supports lazy allocation of the request structure. That is, if one is passed in then that will be used. If one is not, then a request will be allocated and passed back out. Note that the _sync() code does not necessarily require a request. Thus one will only be created until certain situations. The reason the lazy allocation must be done within the _sync() code itself is because the decision to need one or not is not really something that code above can second guess (except in the case where one is definitely not required because no ring is passed in). The call chains above _sync() now support passing a request through which most callers passing in NULL and assuming that no request will be required (because they also pass in NULL for the ring and therefore can't be generating any ring code). The exeception is intel_crtc_page_flip() which now supports having a request returned from _sync(). If one is, then that request is shared by the page flip (if the page flip is of a type to need a request). If _sync() does not generate a request but the page flip does need one, then the page flip path will create its own request. v3: Updated comment description to be clearer about 'to_req' parameter (Tomas Elf review request). Rebased onto newer tree that significantly changed the synchronisation code. v4: Updated comments from review feedback (Tomas Elf) For: VIZ-5115 Signed-off-by: John Harrison john.c.harri...@intel.com Reviewed-by: Tomas Elf tomas@intel.com --- drivers/gpu/drm/i915/i915_drv.h|4 ++- drivers/gpu/drm/i915/i915_gem.c| 48 +--- drivers/gpu/drm/i915/i915_gem_execbuffer.c |2 +- drivers/gpu/drm/i915/intel_display.c | 17 +++--- drivers/gpu/drm/i915/intel_drv.h |3 +- drivers/gpu/drm/i915/intel_fbdev.c |2 +- drivers/gpu/drm/i915/intel_lrc.c |2 +- drivers/gpu/drm/i915/intel_overlay.c |2 +- 8 files changed, 57 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 64a10fa..f69e9cb 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2778,7 +2778,8 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj) int __must_check i915_mutex_lock_interruptible(struct drm_device *dev); int i915_gem_object_sync(struct drm_i915_gem_object *obj, -struct intel_engine_cs *to); +struct intel_engine_cs *to, +struct drm_i915_gem_request **to_req); Nope. Did you forget to reorder the code to ensure that the request is allocated along with the context switch at the start of execbuf? -Chris Not sure what you are objecting to? If you mean the lazily allocated request then that is for page flip code not execbuff code. If we get here from an execbuff call then the request will definitely have been allocated and will be passed in. Whereas the page flip code may or may not require a request (depending on whether MMIO or ring flips are in use. Likewise the sync code may or may not require a request (depending on whether there is anything to sync to or not). There is no point allocating and submitting an empty request in the MMIO/idle case. Hence the sync code needs to be able to use an existing request or create one if none already exists. I guess Chris' comment was that if you have a non-NULL to, then you better have a non-NULL to_req. And since we link up reqeusts to the engine they'll run on the former shouldn't be required any more. So either that's true and we can remove the to or we don't understand something yet (and perhaps that should be done as a follow-up). I am sure I sent a patch that outlined in great detail how that we need only the request parameter in i915_gem_object_sync(), for handling both execbuffer, pipelined pin_and_fence and synchronous pin_and_fence. -Chris As the driver stands, the page flip code wants to synchronise with the framebuffer object but potentially without touching the ring and therefore without creating a request. If the synchronisation is a no-op (because there are no outstanding operations on the given object) then there is no need for a request anywhere in the call chain. Thus there is a need to pass in the ring together with an optional
[Intel-gfx] [PATCH] drm/i915: Add the ddi get cdclk code for BXT (v2)
From: Bob Paauwe bob.j.paa...@intel.com The registers and process differ from other platforms. v2(Matt): Return 19.2 MHz when DE PLL is disabled (Ville) Cc: Ville Syrjälä ville.syrj...@linux.intel.com Cc: Imre Deak imre.d...@intel.com Signed-off-by: Bob Paauwe bob.j.paa...@intel.com Signed-off-by: Matt Roper matthew.d.ro...@intel.com --- drivers/gpu/drm/i915/intel_display.c | 31 +++ 1 file changed, 31 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 3ee7dbc..294c4e4 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -6689,6 +6689,34 @@ static int skylake_get_display_clock_speed(struct drm_device *dev) return 24000; } +static int broxton_get_display_clock_speed(struct drm_device *dev) +{ + struct drm_i915_private *dev_priv = to_i915(dev); + uint32_t cdctl = I915_READ(CDCLK_CTL); + uint32_t pll_freq = I915_READ(BXT_DE_PLL_CTL) BXT_DE_PLL_RATIO_MASK; + uint32_t pll_enab = I915_READ(BXT_DE_PLL_ENABLE); + + if (!(pll_enab BXT_DE_PLL_PLL_ENABLE)) + return 19200; + + switch (cdctl BXT_CDCLK_CD2X_DIV_SEL_MASK) { + case BXT_CDCLK_CD2X_DIV_SEL_1: + if (pll_freq == BXT_DE_PLL_RATIO(60)) /* PLL freq = 1152MHz */ + return 576000; + else /* PLL freq = 1248MHz */ + return 624000; + case BXT_CDCLK_CD2X_DIV_SEL_1_5: + return 384000; + case BXT_CDCLK_CD2X_DIV_SEL_2: + return 288000; + case BXT_CDCLK_CD2X_DIV_SEL_4: + return 144000; + } + + /* error case, assume higer PLL freq. */ + return 624000; +} + static int broadwell_get_display_clock_speed(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev-dev_private; @@ -14715,6 +14743,9 @@ static void intel_init_display(struct drm_device *dev) if (IS_SKYLAKE(dev)) dev_priv-display.get_display_clock_speed = skylake_get_display_clock_speed; + else if (IS_BROXTON(dev)) + dev_priv-display.get_display_clock_speed = + broxton_get_display_clock_speed; else if (IS_BROADWELL(dev)) dev_priv-display.get_display_clock_speed = broadwell_get_display_clock_speed; -- 1.8.5.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 02/15] drm/i915: Embedded microcontroller (uC) firmware loading support
On Thu, Jun 18, 2015 at 05:35:29PM +0200, Daniel Vetter wrote: On Thu, Jun 18, 2015 at 04:27:52PM +0100, Chris Wilson wrote: On Thu, Jun 18, 2015 at 04:49:49PM +0200, Daniel Vetter wrote: Guc is different since we really must have it ready for execbuf, and for that usecase a completion at drm_open time sounds like the right thing. But do we? It would be nice if we had a definite answer that the hw was ready before we started using it in anger, but I don't see any reason why we would have to delay userspace for a slow microcode update... (This presupposes that userspace batches are unaffected by GuC/execlist setup, which for userspace sanity I hope they are - or at least using predicate registers and conditional execution.) Well I figured a wait_completion or flush_work unconditionally in execbuf is not to your liking, and it's better to keep that in open. But I think we should be able to get away with this at execbuf time. Might even be better since this wouldn't block sw-rendered boot-splashs. But either way should be suitable I think. I am optimistic that we can make the request interface robust enough to be able queue up not only the ring initialisation and ppgtt initialisation requests, but also userspace requests. If it all works out, we only need to truly worry about microcode completion in hangcheck. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
I'm pretty happy with the code, I was just confused by the series changing the setup halfway through On Thu, Jun 18, 2015 at 02:07:30PM +0100, Arun Siluvery wrote: +static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring, + uint32_t **wa_ctx_batch, + uint32_t offset, + uint32_t *num_dwords) +{ + uint32_t index; + uint32_t *batch = *wa_ctx_batch; + + index = offset; + + /* FIXME: fill one cacheline with NOOPs. + * Replace these instructions with WA + */ + while (index (offset + 16)) + wa_ctx_emit(batch, MI_NOOP); If this was /* Replace me with WA */ wa_ctx_emit(batch, MI_NOOP) /* Pad to end of cacheline */ while (index % 16) wa_ctx_emit(batch, MI_NOOP); You then don't need to alter the code when yo add the real w/a. Note that using (unsigned long)batch as you do later for cacheline calculation is wrong, as that is a local physical CPU address (not the virtual address used by the cache in the GPU) and was page aligned anyway. Similary, +static int gen8_init_perctx_bb(struct intel_engine_cs *ring, +uint32_t **wa_ctx_batch, +uint32_t offset, +uint32_t *num_dwords) +{ + uint32_t index; + uint32_t *batch = *wa_ctx_batch; + + index = offset; + If this just did wa_ctx_emit(batch, MI_BATCH_BUFFER_END); rather than insert a cacheline of noops, again you wouldn't need to touch this infrastructure as you added the w/a. As it stands, I was a little worried halfway through when the cache alignment suddenly disappeared - but this patch implied to me that it was necessary. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] 3.16 backlight kernel options
Hi, Which option is mandatory in linux kernel to be able to act on brightness of display ? Regards, Steph ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v5] drm/i915 : Added Programming of the MOCS
On Thu, Jun 18, 2015 at 03:45:44PM +0100, Antoine, Peter wrote: Not a blocker. It gets a little more interesting, as the L3CC registers are shared across all engines, but is only saved in the RCS context. But, it is reset on the context switch when ELSP is set. So we would have to program it (i.e. MMIO) and also set it in the batch start for the RCS. Each ring would have to have a proper init_context() and these registers programmed there. Hum, so yes, it's like you say. I think leaving a comment somewhere in the init path telling us we rely on the RCS init_context() for all the rings would be nice, but that's extra topping that can be done any time. -- Damien ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3 15/19] drm/i915: atomic plane updates in a nutshell
Op 18-06-15 om 16:21 schreef Matt Roper: On Mon, Jun 15, 2015 at 12:33:52PM +0200, Maarten Lankhorst wrote: Now that all planes are added during a modeset we can use the calculated changes before disabling a plane, and then either commit or force disable a plane before disabling the crtc. The code is shared with atomic_begin/flush, except watermark updating and vblank evasion are not used. This is needed for proper atomic suspend/resume support. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90868 Signed-off-by: Maarten Lankhorst maarten.lankho...@linux.intel.com --- drivers/gpu/drm/i915/intel_display.c | 103 --- drivers/gpu/drm/i915/intel_sprite.c | 4 +- 2 files changed, 23 insertions(+), 84 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index cc4ca4970716..beb69281f45c 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -2217,28 +2217,6 @@ static void intel_disable_pipe(struct intel_crtc *crtc) intel_wait_for_pipe_off(crtc); } -/** - * intel_enable_primary_hw_plane - enable the primary plane on a given pipe - * @plane: plane to be enabled - * @crtc: crtc for the plane - * - * Enable @plane on @crtc, making sure that the pipe is running first. - */ -static void intel_enable_primary_hw_plane(struct drm_plane *plane, - struct drm_crtc *crtc) -{ -struct drm_device *dev = plane-dev; -struct drm_i915_private *dev_priv = dev-dev_private; -struct intel_crtc *intel_crtc = to_intel_crtc(crtc); - -/* If the pipe isn't enabled, we can't pump pixels and may hang */ -assert_pipe_enabled(dev_priv, intel_crtc-pipe); -to_intel_plane_state(plane-state)-visible = true; - -dev_priv-display.update_primary_plane(crtc, plane-fb, - crtc-x, crtc-y); -} - static bool need_vtd_wa(struct drm_device *dev) { #ifdef CONFIG_INTEL_IOMMU @@ -4508,20 +4486,6 @@ static void ironlake_pfit_enable(struct intel_crtc *crtc) } } -static void intel_enable_sprite_planes(struct drm_crtc *crtc) -{ -struct drm_device *dev = crtc-dev; -enum pipe pipe = to_intel_crtc(crtc)-pipe; -struct drm_plane *plane; -struct intel_plane *intel_plane; - -drm_for_each_legacy_plane(plane, dev-mode_config.plane_list) { -intel_plane = to_intel_plane(plane); -if (intel_plane-pipe == pipe) -intel_plane_restore(intel_plane-base); -} -} - void hsw_enable_ips(struct intel_crtc *crtc) { struct drm_device *dev = crtc-base.dev; @@ -4817,27 +4781,6 @@ static void intel_pre_plane_update(struct intel_crtc *crtc) intel_pre_disable_primary(crtc-base); } -static void intel_crtc_enable_planes(struct drm_crtc *crtc) -{ -struct drm_device *dev = crtc-dev; -struct intel_crtc *intel_crtc = to_intel_crtc(crtc); -int pipe = intel_crtc-pipe; - -intel_enable_primary_hw_plane(crtc-primary, crtc); -intel_enable_sprite_planes(crtc); -if (to_intel_plane_state(crtc-cursor-state)-visible) -intel_crtc_update_cursor(crtc, true); - -intel_post_enable_primary(crtc); - -/* - * FIXME: Once we grow proper nuclear flip support out of this we need - * to compute the mask of flip planes precisely. For the time being - * consider this a flip to a NULL plane. - */ -intel_frontbuffer_flip(dev, INTEL_FRONTBUFFER_ALL_MASK(pipe)); -} - static void intel_crtc_disable_planes(struct drm_crtc *crtc, unsigned plane_mask) { struct drm_device *dev = crtc-dev; @@ -4845,10 +4788,6 @@ static void intel_crtc_disable_planes(struct drm_crtc *crtc, unsigned plane_mask struct drm_plane *p; int pipe = intel_crtc-pipe; -intel_crtc_wait_for_pending_flips(crtc); - -intel_pre_disable_primary(crtc); - intel_crtc_dpms_overlay_disable(intel_crtc); drm_for_each_plane_mask(p, dev, plane_mask) @@ -6270,6 +6209,11 @@ static void intel_crtc_disable_noatomic(struct drm_crtc *crtc) if (!intel_crtc-active) return; +if (to_intel_plane_state(crtc-primary-state)-visible) { +intel_crtc_wait_for_pending_flips(crtc); +intel_pre_disable_primary(crtc); +} + intel_crtc_disable_planes(crtc, crtc-state-plane_mask); dev_priv-display.crtc_disable(crtc); @@ -11783,10 +11727,6 @@ int intel_plane_atomic_calc_changes(struct drm_crtc_state *crtc_state, if (old_plane_state-base.fb !fb) intel_crtc-atomic.disabled_planes |= 1 i; -/* don't run rest during modeset yet */ -if (!intel_crtc-active || mode_changed) -return 0; - was_visible = old_plane_state-visible; visible = to_intel_plane_state(plane_state)-visible; @@ -13255,15 +13195,18
Re: [Intel-gfx] [PATCH v3 17/19] drm/i915: Make setting color key atomic.
Op 18-06-15 om 16:21 schreef Matt Roper: On Mon, Jun 15, 2015 at 12:33:54PM +0200, Maarten Lankhorst wrote: By making color key atomic there are no more transitional helpers. The plane check function will reject the color key when a scaler is active. Signed-off-by: Maarten Lankhorst maarten.lankho...@linux.intel.com --- drivers/gpu/drm/i915/intel_atomic_plane.c | 1 + drivers/gpu/drm/i915/intel_display.c | 7 ++- drivers/gpu/drm/i915/intel_drv.h | 6 +-- drivers/gpu/drm/i915/intel_sprite.c | 85 +++ 4 files changed, 46 insertions(+), 53 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_atomic_plane.c b/drivers/gpu/drm/i915/intel_atomic_plane.c index 91d53768df9d..10a8ecedc942 100644 --- a/drivers/gpu/drm/i915/intel_atomic_plane.c +++ b/drivers/gpu/drm/i915/intel_atomic_plane.c @@ -56,6 +56,7 @@ intel_create_plane_state(struct drm_plane *plane) state-base.plane = plane; state-base.rotation = BIT(DRM_ROTATE_0); +state-ckey.flags = I915_SET_COLORKEY_NONE; return state; } diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 5facd0501a34..746c73d2ab84 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -4401,9 +4401,9 @@ static int skl_update_scaler_plane(struct intel_crtc_state *crtc_state, return ret; /* check colorkey */ -if (WARN_ON(intel_plane-ckey.flags != I915_SET_COLORKEY_NONE)) { +if (plane_state-ckey.flags != I915_SET_COLORKEY_NONE) { DRM_DEBUG_KMS([PLANE:%d] scaling with color key not allowed, -intel_plane-base.base.id); + intel_plane-base.base.id); return -EINVAL; } @@ -13733,7 +13733,7 @@ intel_check_primary_plane(struct drm_plane *plane, /* use scaler when colorkey is not required */ if (INTEL_INFO(plane-dev)-gen = 9 -to_intel_plane(plane)-ckey.flags == I915_SET_COLORKEY_NONE) { +state-ckey.flags == I915_SET_COLORKEY_NONE) { min_scale = 1; max_scale = skl_max_scale(to_intel_crtc(crtc), crtc_state); can_position = true; @@ -13881,7 +13881,6 @@ static struct drm_plane *intel_primary_plane_create(struct drm_device *dev, primary-check_plane = intel_check_primary_plane; primary-commit_plane = intel_commit_primary_plane; primary-disable_plane = intel_disable_primary_plane; -primary-ckey.flags = I915_SET_COLORKEY_NONE; if (HAS_FBC(dev) INTEL_INFO(dev)-gen 4) primary-plane = !pipe; diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h index 93b9542ab8dc..3a2ac82b0970 100644 --- a/drivers/gpu/drm/i915/intel_drv.h +++ b/drivers/gpu/drm/i915/intel_drv.h @@ -274,6 +274,8 @@ struct intel_plane_state { * update_scaler_plane. */ int scaler_id; + +struct drm_intel_sprite_colorkey ckey; }; struct intel_initial_plane_config { @@ -588,9 +590,6 @@ struct intel_plane { bool can_scale; int max_downscale; -/* FIXME convert to properties */ -struct drm_intel_sprite_colorkey ckey; - /* Since we need to change the watermarks before/after * enabling/disabling the planes, we need to store the parameters here * as the other pieces of the struct may not reflect the values we want @@ -1390,7 +1389,6 @@ bool intel_sdvo_init(struct drm_device *dev, uint32_t sdvo_reg, bool is_sdvob); /* intel_sprite.c */ int intel_plane_init(struct drm_device *dev, enum pipe pipe, int plane); -int intel_plane_restore(struct drm_plane *plane); int intel_sprite_set_colorkey(struct drm_device *dev, void *data, struct drm_file *file_priv); bool intel_pipe_update_start(struct intel_crtc *crtc, diff --git a/drivers/gpu/drm/i915/intel_sprite.c b/drivers/gpu/drm/i915/intel_sprite.c index 168f90f346c2..21d3f7882c4d 100644 --- a/drivers/gpu/drm/i915/intel_sprite.c +++ b/drivers/gpu/drm/i915/intel_sprite.c @@ -182,7 +182,8 @@ skl_update_plane(struct drm_plane *drm_plane, struct drm_crtc *crtc, const int plane = intel_plane-plane + 1; u32 plane_ctl, stride_div, stride; int pixel_size = drm_format_plane_cpp(fb-pixel_format, 0); -const struct drm_intel_sprite_colorkey *key = intel_plane-ckey; +const struct drm_intel_sprite_colorkey *key = +to_intel_plane_state(drm_plane-state)-ckey; unsigned long surf_addr; u32 tile_height, plane_offset, plane_size; unsigned int rotation; @@ -344,7 +345,8 @@ vlv_update_plane(struct drm_plane *dplane, struct drm_crtc *crtc, u32 sprctl; unsigned long sprsurf_offset, linear_offset; int pixel_size = drm_format_plane_cpp(fb-pixel_format, 0); -const struct drm_intel_sprite_colorkey *key = intel_plane-ckey; +const struct
Re: [Intel-gfx] [PATCH v3 18/19] drm/i915: Remove transitional references from intel_plane_atomic_check.
Op 18-06-15 om 16:21 schreef Matt Roper: On Mon, Jun 15, 2015 at 12:33:55PM +0200, Maarten Lankhorst wrote: All transitional plane helpers are gone, party! Signed-off-by: Maarten Lankhorst maarten.lankho...@linux.intel.com There's also a reference in skylake_update_primary_plane() that I assume can be removed? Sure, I left it in because people will complain about unrelated changes otherwise. :P It should be a separate patch though.. ~Maarten ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [alsa-devel] DP MST audio support
-Original Message- From: Takashi Iwai [mailto:ti...@suse.de] Sent: Monday, May 18, 2015 5:21 PM At Thu, 14 May 2015 09:10:33 +1000, Dave Airlie wrote: On 12 May 2015 at 13:27, Dave Airlie airl...@gmail.com wrote: On 12 May 2015 at 11:50, Dave Airlie airl...@gmail.com wrote: Hi, So I have a branch that makes no sound, http://cgit.freedesktop.org/~airlied/linux/log/?h=dp-mst-audio and I'm not sure where I need to turn to next, The Intel docs I've read are kinda vague, assuming you know lots of things I clearly don't. so in theory my branch, sets up the SDP stream to the monitor in the payload creation, enables the codec in the intel GPU driver, and passes the ELD to the audio driver. The audio driver uses the device list to get the presence/valid bits per device, and manages to retrieve the ELD. I even create ELD files in /proc/asound/HDMI/ that have sensible values in them So it looks like I'm just missing some routing somewhere, most likely in the audio driver, then again I could be missing a lot more than that. Just looking for any ideas or knowledge people may have locked in their brains or inside their firewalls. Okay the branch now has audio on my test setup, I've had to hack out the intel_not_share_assigned_cvt function it appears to do bad things, I set pin 6 to connection 0 (pin 2), the later sets on pin 5/7 to connection 1 by that function seems to reprogram pin 6. I'm guessing the connection is assigned to a device not a pin in the new hw, and the same device is routed via pin 5/7 so I end up trashing it. my test setup is a Haswell Lenovo t440s + docking station + Dell U2410. ping audio guys? can someone from alsa please take a look or some interest in this? Sorry, I've been on vacation for last two weeks. Now still swimming in the flood of backlogs... Hi Artie, Sorry for the late reply. We don't have Haswell Lenovo t440s atm, so could you share more info? - Dell U2410 should support both HDMI and DP input. But I guess it cannot support DP MST, right? - Are you connecting this monitor a DP cable? Which DDI port is used? DDI B, C or D? - Does audio fail after i915 enables DP MST? - Is patch snd/hdmi: hack out haswell codec workaround the only change on audio driver side? The graphics side patches are fairly trivial, also it would be good to get a good explaination of how the hw works, from what I can see devices get connections not pins on this hw, and I notice that I don't always get 3 devices, so I'm not sure if devices are a dynamic thing we should be reprobing on some signal. Do you mean 3 PCM devices here, like pcmC0D3p, pcmC0D7p, pcmC0D8p? Now the devices are not dynamic, a PCM device is created on each pin. It seems we need to revise this for DP MST, since a pin can be used to send up to 3 independent streams on Intel GPU which has 3 display pipelines. The intel_not_share_assigned_cvt() was needed for Haswell HDMI/DP as there was static routing between the pin and the converter widgets although the codec graph shows it's selectable. We need to check why this fails. Even if MST is enabled, the convertors should also be selectable. Was the pin default config of pin 6 enabled by BIOS properly? In anyway, Intel people should have a better clue about this; it's been always a strange behavior that is tied with the graphics... thanks, Takashi ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3 15/19] drm/i915: atomic plane updates in a nutshell
Op 18-06-15 om 17:28 schreef Ville Syrjälä: On Mon, Jun 15, 2015 at 12:33:52PM +0200, Maarten Lankhorst wrote: Now that all planes are added during a modeset we can use the calculated changes before disabling a plane, and then either commit or force disable a plane before disabling the crtc. The code is shared with atomic_begin/flush, except watermark updating and vblank evasion are not used. This is needed for proper atomic suspend/resume support. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90868 Signed-off-by: Maarten Lankhorst maarten.lankho...@linux.intel.com --- drivers/gpu/drm/i915/intel_display.c | 103 --- drivers/gpu/drm/i915/intel_sprite.c | 4 +- 2 files changed, 23 insertions(+), 84 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index cc4ca4970716..beb69281f45c 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -2217,28 +2217,6 @@ static void intel_disable_pipe(struct intel_crtc *crtc) intel_wait_for_pipe_off(crtc); } -/** - * intel_enable_primary_hw_plane - enable the primary plane on a given pipe - * @plane: plane to be enabled - * @crtc: crtc for the plane - * - * Enable @plane on @crtc, making sure that the pipe is running first. - */ -static void intel_enable_primary_hw_plane(struct drm_plane *plane, - struct drm_crtc *crtc) -{ -struct drm_device *dev = plane-dev; -struct drm_i915_private *dev_priv = dev-dev_private; -struct intel_crtc *intel_crtc = to_intel_crtc(crtc); - -/* If the pipe isn't enabled, we can't pump pixels and may hang */ -assert_pipe_enabled(dev_priv, intel_crtc-pipe); -to_intel_plane_state(plane-state)-visible = true; - -dev_priv-display.update_primary_plane(crtc, plane-fb, - crtc-x, crtc-y); -} - static bool need_vtd_wa(struct drm_device *dev) { #ifdef CONFIG_INTEL_IOMMU @@ -4508,20 +4486,6 @@ static void ironlake_pfit_enable(struct intel_crtc *crtc) } } -static void intel_enable_sprite_planes(struct drm_crtc *crtc) -{ -struct drm_device *dev = crtc-dev; -enum pipe pipe = to_intel_crtc(crtc)-pipe; -struct drm_plane *plane; -struct intel_plane *intel_plane; - -drm_for_each_legacy_plane(plane, dev-mode_config.plane_list) { -intel_plane = to_intel_plane(plane); -if (intel_plane-pipe == pipe) -intel_plane_restore(intel_plane-base); -} -} - void hsw_enable_ips(struct intel_crtc *crtc) { struct drm_device *dev = crtc-base.dev; @@ -4817,27 +4781,6 @@ static void intel_pre_plane_update(struct intel_crtc *crtc) intel_pre_disable_primary(crtc-base); } -static void intel_crtc_enable_planes(struct drm_crtc *crtc) -{ -struct drm_device *dev = crtc-dev; -struct intel_crtc *intel_crtc = to_intel_crtc(crtc); -int pipe = intel_crtc-pipe; - -intel_enable_primary_hw_plane(crtc-primary, crtc); -intel_enable_sprite_planes(crtc); -if (to_intel_plane_state(crtc-cursor-state)-visible) -intel_crtc_update_cursor(crtc, true); - -intel_post_enable_primary(crtc); - -/* - * FIXME: Once we grow proper nuclear flip support out of this we need - * to compute the mask of flip planes precisely. For the time being - * consider this a flip to a NULL plane. - */ -intel_frontbuffer_flip(dev, INTEL_FRONTBUFFER_ALL_MASK(pipe)); -} - static void intel_crtc_disable_planes(struct drm_crtc *crtc, unsigned plane_mask) { struct drm_device *dev = crtc-dev; @@ -4845,10 +4788,6 @@ static void intel_crtc_disable_planes(struct drm_crtc *crtc, unsigned plane_mask struct drm_plane *p; int pipe = intel_crtc-pipe; -intel_crtc_wait_for_pending_flips(crtc); - -intel_pre_disable_primary(crtc); - intel_crtc_dpms_overlay_disable(intel_crtc); drm_for_each_plane_mask(p, dev, plane_mask) @@ -6270,6 +6209,11 @@ static void intel_crtc_disable_noatomic(struct drm_crtc *crtc) if (!intel_crtc-active) return; +if (to_intel_plane_state(crtc-primary-state)-visible) { +intel_crtc_wait_for_pending_flips(crtc); +intel_pre_disable_primary(crtc); +} + intel_crtc_disable_planes(crtc, crtc-state-plane_mask); dev_priv-display.crtc_disable(crtc); @@ -11783,10 +11727,6 @@ int intel_plane_atomic_calc_changes(struct drm_crtc_state *crtc_state, if (old_plane_state-base.fb !fb) intel_crtc-atomic.disabled_planes |= 1 i; -/* don't run rest during modeset yet */ -if (!intel_crtc-active || mode_changed) -return 0; - was_visible = old_plane_state-visible; visible = to_intel_plane_state(plane_state)-visible; @@ -13255,15
Re: [Intel-gfx] [PATCH] drm/atomic: Extract needs_modeset function
Op 18-06-15 om 11:25 schreef Daniel Vetter: We use the same check already in the atomic core, so might as well make this official. And it's also reused in e.g. i915. Motivated by Maarten's idea to extract a connector_changed state out of mode_changed. Cc: Maarten Lankhorst maarten.lankho...@linux.intel.com Signed-off-by: Daniel Vetter daniel.vet...@intel.com Reviewed-By: Maarten Lankhorst maarten.lankho...@linux.intel.com ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PULL] drm-intel-next-fixes
On 18 June 2015 at 16:04, Jani Nikula jani.nik...@intel.com wrote: Hi Dave, i915 fixes for drm-next/v4.2. BR, Jani. And my gcc says: /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/i915/intel_display.c: In function ‘__intel_set_mode’: /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/i915/intel_display.c:11850:14: warning: ‘crtc_state’ may be used uninitialized in this function [-Wmaybe-uninitialized] return state-mode_changed || state-active_changed; ^ /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/i915/intel_display.c:11860:25: note: ‘crtc_state’ was declared here struct drm_crtc_state *crtc_state; ^ /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/i915/intel_display.c:11874:6: warning: ‘crtc’ may be used uninitialized in this function [-Wmaybe-uninitialized] if (crtc != intel_encoder-base.crtc) ^ /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/i915/intel_display.c:11859:19: note: ‘crtc’ was declared here struct drm_crtc *crtc; ^ No idea if this is true, but I don't think I've seen it before now. gcc 5.1.1 on fedora 22 Dave. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode
Some of the WA applied using WA batch buffers perform writes to scratch page. In the current flow WA are initialized before scratch obj is allocated. This patch reorders intel_init_pipe_control() to have a valid scratch obj before we initialize WA. Cc: Chris Wilson ch...@chris-wilson.co.uk Cc: Dave Gordon david.s.gor...@intel.com Signed-off-by: Michel Thierry michel.thie...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/intel_lrc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 8cc851dd..62486cd 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1636,7 +1636,8 @@ static int logical_render_ring_init(struct drm_device *dev) ring-emit_bb_start = gen8_emit_bb_start; ring-dev = dev; - ret = logical_ring_init(dev, ring); + + ret = intel_init_pipe_control(ring); if (ret) return ret; @@ -1648,7 +1649,7 @@ static int logical_render_ring_init(struct drm_device *dev) } } - ret = intel_init_pipe_control(ring); + ret = logical_ring_init(dev, ring); if (ret) { if (ring-wa_ctx.obj) lrc_destroy_wa_ctx_obj(ring); -- 2.3.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround
In Indirect context w/a batch buffer, WaClearSlmSpaceAtContextSwitch v2: s/PIPE_CONTROL_FLUSH_RO_CACHES/PIPE_CONTROL_FLUSH_L3 (Ville) Cc: Chris Wilson ch...@chris-wilson.co.uk Cc: Dave Gordon david.s.gor...@intel.com Signed-off-by: Rafael Barbalho rafael.barba...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/intel_lrc.c | 16 2 files changed, 17 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index d14ad20..7637e64 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -410,6 +410,7 @@ #define DISPLAY_PLANE_A (020) #define DISPLAY_PLANE_B (120) #define GFX_OP_PIPE_CONTROL(len) ((0x329)|(0x327)|(0x224)|(len-2)) +#define PIPE_CONTROL_FLUSH_L3(127) #define PIPE_CONTROL_GLOBAL_GTT_IVB (124) /* gen7+ */ #define PIPE_CONTROL_MMIO_WRITE (123) #define PIPE_CONTROL_STORE_DATA_INDEX(121) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 3291ef4..b631390 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1106,6 +1106,7 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring, uint32_t *num_dwords) { uint32_t index; + uint32_t scratch_addr; uint32_t *batch = *wa_ctx_batch; index = offset; @@ -1136,6 +1137,21 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring, wa_ctx_emit(batch, l3sqc4_flush ~GEN8_LQSC_FLUSH_COHERENT_LINES); } + /* WaClearSlmSpaceAtContextSwitch:bdw,chv */ + /* Actual scratch location is at 128 bytes offset */ + scratch_addr = ring-scratch.gtt_offset + 2*CACHELINE_BYTES; + scratch_addr |= PIPE_CONTROL_GLOBAL_GTT; + + wa_ctx_emit(batch, GFX_OP_PIPE_CONTROL(6)); + wa_ctx_emit(batch, (PIPE_CONTROL_FLUSH_L3 | + PIPE_CONTROL_GLOBAL_GTT_IVB | + PIPE_CONTROL_CS_STALL | + PIPE_CONTROL_QW_WRITE)); + wa_ctx_emit(batch, scratch_addr); + wa_ctx_emit(batch, 0); + wa_ctx_emit(batch, 0); + wa_ctx_emit(batch, 0); + /* Pad to end of cacheline */ while (index % CACHELINE_DWORDS) wa_ctx_emit(batch, MI_NOOP); -- 2.3.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 2/5] drm/i915: PSR: Remove Low Power HW tracking mask.
By Spec we should just mask memup and hotplug detection for hardware tracking cases. However we always masked LPSP that is for low power tracking support because without it PSR was constantly exiting and never really getting activated. Now with runtime PM being enabled by default Matthew reported that he was facing missed screen updates. So let's remove this undesirable mask and let HW tracking take care of cases like this were power saving features are also running. WARNING: With this patch PSR depends on Audio and GPU runtime PM to be properly enabled, working on auto. If either audio runtime PM or gpu runtime pm are not properly set PSR will constant Exit and Performance Counter will be 0. But the best thing of this patch is that with one more HW tracking working the risks of missed blank screen are minimized at most. This affects just core platforms where PSR exit are also helped by HW tracking: Haswell, Broadwell and Skylake for now. Cc: Daniel Vetter daniel.vet...@ffwll.ch Cc: Matthew Garrett mj...@srcf.ucam.org via codon.org.uk Signed-off-by: Rodrigo Vivi rodrigo.v...@intel.com --- drivers/gpu/drm/i915/intel_psr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_psr.c b/drivers/gpu/drm/i915/intel_psr.c index 5ee0fa5..6549d58 100644 --- a/drivers/gpu/drm/i915/intel_psr.c +++ b/drivers/gpu/drm/i915/intel_psr.c @@ -400,7 +400,7 @@ void intel_psr_enable(struct intel_dp *intel_dp) /* Avoid continuous PSR exit by masking memup and hpd */ I915_WRITE(EDP_PSR_DEBUG_CTL(dev), EDP_PSR_DEBUG_MASK_MEMUP | - EDP_PSR_DEBUG_MASK_HPD | EDP_PSR_DEBUG_MASK_LPSP); + EDP_PSR_DEBUG_MASK_HPD); /* Enable PSR on the panel */ hsw_psr_enable_sink(intel_dp); -- 2.1.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 3/5] drm/i915: Remove unused ring argument from frontbuffer invalidate and busy functions.
This patch doesn't have any functional change, but organize fruntbuffer invalidate and busy by removing unecesarry signature argument for ring. It was unsed on mark_fb_busy and only used on fb_obj_invalidate for the same ORIGIN_CS usage. So let's clean it a bit Cc: Daniel Vetter daniel.vet...@ffwll.ch Signed-off-by: Rodrigo Vivi rodrigo.v...@intel.com --- drivers/gpu/drm/i915/i915_gem.c| 10 +- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 +- drivers/gpu/drm/i915/intel_drv.h | 1 - drivers/gpu/drm/i915/intel_fbdev.c | 4 ++-- drivers/gpu/drm/i915/intel_frontbuffer.c | 14 +- 5 files changed, 13 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 248fd1a..49beca2 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -350,7 +350,7 @@ i915_gem_phys_pwrite(struct drm_i915_gem_object *obj, if (ret) return ret; - intel_fb_obj_invalidate(obj, NULL, ORIGIN_CPU); + intel_fb_obj_invalidate(obj, ORIGIN_CPU); if (__copy_from_user_inatomic_nocache(vaddr, user_data, args-size)) { unsigned long unwritten; @@ -804,7 +804,7 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev, offset = i915_gem_obj_ggtt_offset(obj) + args-offset; - intel_fb_obj_invalidate(obj, NULL, ORIGIN_GTT); + intel_fb_obj_invalidate(obj, ORIGIN_GTT); while (remain 0) { /* Operation in this page @@ -948,7 +948,7 @@ i915_gem_shmem_pwrite(struct drm_device *dev, if (ret) return ret; - intel_fb_obj_invalidate(obj, NULL, ORIGIN_CPU); + intel_fb_obj_invalidate(obj, ORIGIN_CPU); i915_gem_object_pin_pages(obj); @@ -3939,7 +3939,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write) } if (write) - intel_fb_obj_invalidate(obj, NULL, ORIGIN_GTT); + intel_fb_obj_invalidate(obj, ORIGIN_GTT); trace_i915_gem_object_change_domain(obj, old_read_domains, @@ -4212,7 +4212,7 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write) } if (write) - intel_fb_obj_invalidate(obj, NULL, ORIGIN_CPU); + intel_fb_obj_invalidate(obj, ORIGIN_CPU); trace_i915_gem_object_change_domain(obj, old_read_domains, diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 3336e1c..edb8c45 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1038,7 +1038,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas, obj-dirty = 1; i915_gem_request_assign(obj-last_write_req, req); - intel_fb_obj_invalidate(obj, ring, ORIGIN_CS); + intel_fb_obj_invalidate(obj, ORIGIN_CS); /* update for the implicit flush after a batch */ obj-base.write_domain = ~I915_GEM_GPU_DOMAINS; diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h index bcafefc..64fb9fe 100644 --- a/drivers/gpu/drm/i915/intel_drv.h +++ b/drivers/gpu/drm/i915/intel_drv.h @@ -955,7 +955,6 @@ void bxt_ddi_vswing_sequence(struct drm_device *dev, u32 level, /* intel_frontbuffer.c */ void intel_fb_obj_invalidate(struct drm_i915_gem_object *obj, -struct intel_engine_cs *ring, enum fb_op_origin origin); void intel_frontbuffer_flip_prepare(struct drm_device *dev, unsigned frontbuffer_bits); diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c index 6372cfc..8382146 100644 --- a/drivers/gpu/drm/i915/intel_fbdev.c +++ b/drivers/gpu/drm/i915/intel_fbdev.c @@ -89,7 +89,7 @@ static int intel_fbdev_blank(int blank, struct fb_info *info) * now until we solve this for real. */ mutex_lock(fb_helper-dev-struct_mutex); - intel_fb_obj_invalidate(ifbdev-fb-obj, NULL, ORIGIN_GTT); + intel_fb_obj_invalidate(ifbdev-fb-obj, ORIGIN_GTT); mutex_unlock(fb_helper-dev-struct_mutex); } @@ -115,7 +115,7 @@ static int intel_fbdev_pan_display(struct fb_var_screeninfo *var, * now until we solve this for real. */ mutex_lock(fb_helper-dev-struct_mutex); - intel_fb_obj_invalidate(ifbdev-fb-obj, NULL, ORIGIN_GTT); + intel_fb_obj_invalidate(ifbdev-fb-obj, ORIGIN_GTT); mutex_unlock(fb_helper-dev-struct_mutex); } diff --git a/drivers/gpu/drm/i915/intel_frontbuffer.c b/drivers/gpu/drm/i915/intel_frontbuffer.c
[Intel-gfx] [PATCH 5/5] drm/i915: Enable PSR by default.
With a reliable frontbuffer tracking and all instability corner cases solved let's re-enabled PSR by default on all supported platforms. Signed-off-by: Rodrigo Vivi rodrigo.v...@intel.com --- drivers/gpu/drm/i915/i915_params.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c index 8ac5a1b..e864e67 100644 --- a/drivers/gpu/drm/i915/i915_params.c +++ b/drivers/gpu/drm/i915/i915_params.c @@ -37,7 +37,7 @@ struct i915_params i915 __read_mostly = { .enable_execlists = -1, .enable_hangcheck = true, .enable_ppgtt = -1, - .enable_psr = 0, + .enable_psr = 1, .preliminary_hw_support = IS_ENABLED(CONFIG_DRM_I915_PRELIMINARY_HW_SUPPORT), .disable_power_well = 1, .enable_ips = 1, @@ -124,7 +124,7 @@ MODULE_PARM_DESC(enable_execlists, (-1=auto [default], 0=disabled, 1=enabled)); module_param_named(enable_psr, i915.enable_psr, int, 0600); -MODULE_PARM_DESC(enable_psr, Enable PSR (default: false)); +MODULE_PARM_DESC(enable_psr, Enable PSR (default: true)); module_param_named(preliminary_hw_support, i915.preliminary_hw_support, int, 0600); MODULE_PARM_DESC(preliminary_hw_support, -- 2.1.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 6/8] drivers/pwm: Add Crystalcove (CRC) PWM driver
On Fri, May 1, 2015 at 2:42 AM, Paul Bolle pebo...@tiscali.nl wrote: On Wed, 2015-04-29 at 19:30 +0530, Shobhit Kumar wrote: --- a/drivers/pwm/Kconfig +++ b/drivers/pwm/Kconfig +config PWM_CRC + bool Intel Crystalcove (CRC) PWM support + depends on X86 INTEL_SOC_PMIC + help + Generic PWM framework driver for Crystalcove (CRC) PMIC based PWM + control. --- a/drivers/pwm/Makefile +++ b/drivers/pwm/Makefile +obj-$(CONFIG_PWM_CRC)+= pwm-crc.o PWM_CRC is a bool symbol. So pwm-crc.o can never be part of a module. I actually started this as a module but later decided to make it as bool because INTEL_SOC_PMIC on which this depends is itself a bool as well. Still it is good to keep the module based initialization. Firstly because it causes no harm and even though some of the macros are pre-processed out, gives info about the driver. Secondly there were discussion on why INTEL_SOC_PMIC is bool (note this driver also has module based initialization even when bool). I am guessing because of some tricky module load order dependencies. If ever that becomes a module, this can mostly be unchanged to be loaded as a module. Regards Shobhit (If I'm wrong, and that object file can actually be part of a module, you can stop reading here.) --- /dev/null +++ b/drivers/pwm/pwm-crc.c +#include linux/module.h Perhaps this include is not needed. +static const struct pwm_ops crc_pwm_ops = { + .config = crc_pwm_config, + .enable = crc_pwm_enable, + .disable = crc_pwm_disable, + .owner = THIS_MODULE, For built-in only code THIS_MODULE is basically equivalent to NULL (see include/linux/export.h). So I guess this line can be dropped. +}; +static struct platform_driver crystalcove_pwm_driver = { + .probe = crystalcove_pwm_probe, + .remove = crystalcove_pwm_remove, + .driver = { + .name = crystal_cove_pwm, + }, +}; + +module_platform_driver(crystalcove_pwm_driver); Speaking from memory: for built-in only code this is equivalent to calling platform_driver_register(crystalcove_pwm_driver); from a wrapper, and marking that wrapper with device_initcall(). +MODULE_AUTHOR(Shobhit Kumar shobhit.ku...@intel.com); +MODULE_DESCRIPTION(Intel Crystal Cove PWM Driver); +MODULE_LICENSE(GPL v2); These macros will be effectively preprocessed away for built-in only code. Paul Bolle ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 05/15] drm/i915: GuC-specific firmware loader
On 06/15/2015 01:30 PM, Chris Wilson wrote: On Mon, Jun 15, 2015 at 07:36:23PM +0100, Dave Gordon wrote: snip + * Return true if get a success code from normal boot or RC6 boot + */ +static inline bool i915_guc_get_status(struct drm_i915_private *dev_priv, + u32 *status) +{ + *status = I915_READ(GUC_STATUS); + return (((*status) GS_UKERNEL_MASK) == GS_UKERNEL_READY || + ((*status) GS_UKERNEL_MASK) == GS_UKERNEL_LAPIC_DONE); Weird function. Does two things, only one of those is get_status. Maybe you would like to split this up better and use a switch when you mean a switch. Or rename it to reflect it's use only as a condition. Yes. It makes sense to change it to something like i915_guc_is_ucode_loaded(). +} + +/* Transfers the firmware image to RAM for execution by the microcontroller. + * + * GuC Firmware layout: + * +---+ + * | CSS header | 128B + * +---+ + * | uCode | + * +---+ + * | RSA signature | 256B + * +---+ + * | RSA public Key| 256B + * +---+ + * | Public key modulus |4B + * +---+ + * + * Architecturally, the DMA engine is bidirectional, and in can potentially + * even transfer between GTT locations. This functionality is left out of the + * API for now as there is no need for it. + * + * Be note that GuC need the CSS header plus uKernel code to be copied as one + * chunk of data. RSA sig data is loaded via MMIO. + */ +static int guc_ucode_xfer_dma(struct drm_i915_private *dev_priv) +{ + struct intel_uc_fw *guc_fw = dev_priv-guc.guc_fw; + struct drm_i915_gem_object *fw_obj = guc_fw-uc_fw_obj; + unsigned long offset; + struct sg_table *sg = fw_obj-pages; + u32 status, ucode_size, rsa[UOS_RSA_SIG_SIZE / sizeof(u32)]; + int i, ret = 0; + + /* uCode size, also is where RSA signature starts */ + offset = ucode_size = guc_fw-uc_fw_size - UOS_CSS_SIGNING_SIZE; + + /* Copy RSA signature from the fw image to HW for verification */ + sg_pcopy_to_buffer(sg-sgl, sg-nents, rsa, UOS_RSA_SIG_SIZE, offset); + for (i = 0; i UOS_RSA_SIG_SIZE / sizeof(u32); i++) + I915_WRITE(UOS_RSA_SCRATCH_0 + i * sizeof(u32), rsa[i]); + + /* Set the source address for the new blob */ + offset = i915_gem_obj_ggtt_offset(fw_obj); Why would it even have a GGTT vma? There's no precondition here to assert that it should. It is pinned into GGTT inside gem_allocate_guc_obj. + I915_WRITE(DMA_ADDR_0_LOW, lower_32_bits(offset)); + I915_WRITE(DMA_ADDR_0_HIGH, upper_32_bits(offset) 0x); + + /* Set the destination. Current uCode expects an 8k stack starting from + * offset 0. */ + I915_WRITE(DMA_ADDR_1_LOW, 0x2000); + + /* XXX: The image is automatically transfered to SRAM after the RSA + * verification. This is why the address space is chosen as such. */ + I915_WRITE(DMA_ADDR_1_HIGH, DMA_ADDRESS_SPACE_WOPCM); + + I915_WRITE(DMA_COPY_SIZE, ucode_size); + + /* Finally start the DMA */ + I915_WRITE(DMA_CTRL, _MASKED_BIT_ENABLE(UOS_MOVE | START_DMA)); + Just assuming that the writes land and in the order you expect? A POSTING_READ of DMA_COPY_SIZE before issue the DMA is enough here? Or, POSTING_READ all those writes? -Alex ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/5] drm/i915: Enable runtime pm
I understand this patch is yet under discussion. I just re-sent to warn that the following one depends on this. Otherwise it is better to remove it and proceed with the last 3 patches of the series. Thanks On Thu, Jun 18, 2015 at 11:43 AM, Rodrigo Vivi rodrigo.v...@intel.com wrote: From: Daniel Vetter daniel.vet...@ffwll.ch Like with every other feature that's not enabled by default we break runtime pm support way too often by accident because the overall test coverage isn't great. And it's been almost 2 years since we enabled the power well code by default commit bf51d5e2cda5d36d98e4b46ac7fca9461e512c41 Author: Paulo Zanoni paulo.r.zan...@intel.com Date: Wed Jul 3 17:12:13 2013 -0300 drm/i915: switch disable_power_well default value to 1 It's really more than overdue for runtime pm itself to follow! Note that in practice this wont do a hole lot yet, since we're still gated on snd-hda-intel doing proper runtime pm. But I've discussed this with Liam and we agreed that this needs to be done. And the audio team is working to hold up their end of this bargain. And the justification for updating the autosuspend delay to 100ms: Quick measurment shows that we can do a full rpm cycle in about 5ms, which means the delay should still be really conservative from a power conservation pov. The only workload that would suffer from ping-pong is also only gpu/compute with all screens off. 100ms should cover any kind of latency with submitting follow-up batches. Cc: Takashi Iwai ti...@suse.de Cc: Liam Girdwood liam.r.girdw...@intel.com Cc: Yang, Libin libin.y...@intel.com Cc: Lin, Mengdong mengdong@intel.com Cc: Li, Jocelyn jocelyn...@intel.com Cc: Kaskinen, Tanu tanu.kaski...@intel.com Cc: Zanoni, Paulo R paulo.r.zan...@intel.com Signed-off-by: Daniel Vetter daniel.vet...@intel.com Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: shuang...@intel.com) Signed-off-by: Rodrigo Vivi rodrigo.v...@intel.com --- drivers/gpu/drm/i915/intel_runtime_pm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c index 1a45385..2628b21 100644 --- a/drivers/gpu/drm/i915/intel_runtime_pm.c +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c @@ -1831,9 +1831,10 @@ void intel_runtime_pm_enable(struct drm_i915_private *dev_priv) return; } - pm_runtime_set_autosuspend_delay(device, 1); /* 10s */ + pm_runtime_set_autosuspend_delay(device, 100); pm_runtime_mark_last_busy(device); pm_runtime_use_autosuspend(device); + pm_runtime_allow(device); pm_runtime_put_autosuspend(device); } -- 2.1.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Rodrigo Vivi Blog: http://blog.vivi.eng.br ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 1/5] drm/i915: Enable runtime pm
From: Daniel Vetter daniel.vet...@ffwll.ch Like with every other feature that's not enabled by default we break runtime pm support way too often by accident because the overall test coverage isn't great. And it's been almost 2 years since we enabled the power well code by default commit bf51d5e2cda5d36d98e4b46ac7fca9461e512c41 Author: Paulo Zanoni paulo.r.zan...@intel.com Date: Wed Jul 3 17:12:13 2013 -0300 drm/i915: switch disable_power_well default value to 1 It's really more than overdue for runtime pm itself to follow! Note that in practice this wont do a hole lot yet, since we're still gated on snd-hda-intel doing proper runtime pm. But I've discussed this with Liam and we agreed that this needs to be done. And the audio team is working to hold up their end of this bargain. And the justification for updating the autosuspend delay to 100ms: Quick measurment shows that we can do a full rpm cycle in about 5ms, which means the delay should still be really conservative from a power conservation pov. The only workload that would suffer from ping-pong is also only gpu/compute with all screens off. 100ms should cover any kind of latency with submitting follow-up batches. Cc: Takashi Iwai ti...@suse.de Cc: Liam Girdwood liam.r.girdw...@intel.com Cc: Yang, Libin libin.y...@intel.com Cc: Lin, Mengdong mengdong@intel.com Cc: Li, Jocelyn jocelyn...@intel.com Cc: Kaskinen, Tanu tanu.kaski...@intel.com Cc: Zanoni, Paulo R paulo.r.zan...@intel.com Signed-off-by: Daniel Vetter daniel.vet...@intel.com Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: shuang...@intel.com) Signed-off-by: Rodrigo Vivi rodrigo.v...@intel.com --- drivers/gpu/drm/i915/intel_runtime_pm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c index 1a45385..2628b21 100644 --- a/drivers/gpu/drm/i915/intel_runtime_pm.c +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c @@ -1831,9 +1831,10 @@ void intel_runtime_pm_enable(struct drm_i915_private *dev_priv) return; } - pm_runtime_set_autosuspend_delay(device, 1); /* 10s */ + pm_runtime_set_autosuspend_delay(device, 100); pm_runtime_mark_last_busy(device); pm_runtime_use_autosuspend(device); + pm_runtime_allow(device); pm_runtime_put_autosuspend(device); } -- 2.1.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 4/5] drm/i915: Invalidate frontbuffer bits on FBDEV sync
Before this we had some duct tapes to cover known cases where FBDEV would cause a frontbuffer flush so we invalidate it again. However other cases appeared like the boot splash screen doing modeset and flushing it. So let's fix it for all cases. FBDEV ops provides a function to fb_sync that was designed to wait for blit idle. We don't need to wait for blit idle for the operations, but we can use this function to let frontbuffer tracking know that fbdev is about to do something. So this patch introduces a reliable way to know when fbdev is performing any operation. I could've use ORIGIN_FBDEV to set fbdev_running bool inside the invalidate function, however I decided to let it on fbdev so we can use the single lock to know when we need to invalidate minimizing the struct_mutex locks and invalidates themselves. So actual invalidate happens only on the first fbdev frontbuffer touch only, or whenever needed. Like if the splash screen called a modeset during boot the fbdev will invalidate on the next screen drawn so there will be no risk of missing screen updates if PSR is enabled. The fbdev_running unset is happening on frontbuffer tracking code when a async flip completes. Since fbdev has no reliable place to tell when it got paused we can use this place that will happen if something else completed a modeset. The risk of false positive exist but is minimal since any proper alternation will go through this path. Also false positive while we don't get the propper modeset is better than the risk of miss screen updates. Althoguth fbdev presumes that all callbacks work from atomic context I don't believe that any wait for idle is atomic. So I also removed the FIXME comments we had for using struct_mutext there on fb_ops. Cc: Daniel Vetter daniel.vet...@ffwll.ch Signed-off-by: Rodrigo Vivi rodrigo.v...@intel.com --- drivers/gpu/drm/i915/i915_debugfs.c | 5 ++ drivers/gpu/drm/i915/i915_drv.h | 2 + drivers/gpu/drm/i915/intel_fbdev.c | 106 ++- drivers/gpu/drm/i915/intel_frontbuffer.c | 19 ++ 4 files changed, 59 insertions(+), 73 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index c49fe2a..e3adddb 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -2376,6 +2376,11 @@ static int i915_edp_psr_status(struct seq_file *m, void *data) } mutex_unlock(dev_priv-psr.lock); + mutex_lock(dev_priv-fb_tracking.lock); + seq_printf(m, FBDEV running: %s\n, + yesno(dev_priv-fb_tracking.fbdev_running)); + mutex_unlock(dev_priv-fb_tracking.lock); + intel_runtime_pm_put(dev_priv); return 0; } diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 491ef0c..f1478f5 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -888,6 +888,7 @@ enum fb_op_origin { ORIGIN_CPU, ORIGIN_CS, ORIGIN_FLIP, + ORIGIN_FBDEV, }; struct i915_fbc { @@ -1627,6 +1628,7 @@ struct i915_frontbuffer_tracking { */ unsigned busy_bits; unsigned flip_bits; + bool fbdev_running; }; struct i915_wa_reg { diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c index 8382146..4a96c20 100644 --- a/drivers/gpu/drm/i915/intel_fbdev.c +++ b/drivers/gpu/drm/i915/intel_fbdev.c @@ -45,92 +45,52 @@ #include drm/i915_drm.h #include i915_drv.h -static int intel_fbdev_set_par(struct fb_info *info) -{ - struct drm_fb_helper *fb_helper = info-par; - struct intel_fbdev *ifbdev = - container_of(fb_helper, struct intel_fbdev, helper); - int ret; - - ret = drm_fb_helper_set_par(info); - - if (ret == 0) { - /* -* FIXME: fbdev presumes that all callbacks also work from -* atomic contexts and relies on that for emergency oops -* printing. KMS totally doesn't do that and the locking here is -* by far not the only place this goes wrong. Ignore this for -* now until we solve this for real. -*/ - mutex_lock(fb_helper-dev-struct_mutex); - ret = i915_gem_object_set_to_gtt_domain(ifbdev-fb-obj, - true); - mutex_unlock(fb_helper-dev-struct_mutex); - } - - return ret; -} - -static int intel_fbdev_blank(int blank, struct fb_info *info) -{ - struct drm_fb_helper *fb_helper = info-par; - struct intel_fbdev *ifbdev = - container_of(fb_helper, struct intel_fbdev, helper); - int ret; - - ret = drm_fb_helper_blank(blank, info); - - if (ret == 0) { - /* -* FIXME: fbdev presumes that all callbacks also work from -* atomic contexts and relies on that for emergency oops -
Re: [Intel-gfx] [PATCH 05/15] drm/i915: GuC-specific firmware loader
On 15/06/15 21:30, Chris Wilson wrote: On Mon, Jun 15, 2015 at 07:36:23PM +0100, Dave Gordon wrote: +/* We can't enable contexts until all firmware is loaded */ +ret = intel_guc_ucode_load(dev, false); Pardon. I know context initialisation is broken, but adding to that breakage is not pleasant. Sorry, but that's just the way it works. If you want to use the GuC for batch submission, then you cannot submit any commands to any engine via the GuC before its firmware is loaded, nor can you submit anything at all directly to the ELSPs. However in /this/ patch the 'false' above should have been 'true' to give synchronous load semantics; and then ignoring the return is intentional, because either it's worked and we're going to use the GuC, or it hasn't and we're not (and it's already printed a message). Then there's a later patch that tries to decouple engine MMIO setup from engine setup using batches contexts, at which point we can make use of the return code. ret = i915_gem_context_enable(dev_priv); if (ret ret != -EIO) { DRM_ERROR(Context enable failed %d\n, ret); diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h index 82367c9..0b44265 100644 --- a/drivers/gpu/drm/i915/intel_guc.h +++ b/drivers/gpu/drm/i915/intel_guc.h @@ -166,4 +166,9 @@ struct intel_guc { #define GUC_WD_VECS_IER 0xC558 #define GUC_PM_P24C_IER 0xC55C +/* intel_guc_loader.c */ +extern void intel_guc_ucode_init(struct drm_device *dev); +extern int intel_guc_ucode_load(struct drm_device *dev, bool wait); +extern void intel_guc_ucode_fini(struct drm_device *dev); + #endif diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c new file mode 100644 index 000..16eef4c --- /dev/null +++ b/drivers/gpu/drm/i915/intel_guc_loader.c @@ -0,0 +1,416 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Vinit Azad vinit.a...@intel.com + *Ben Widawsky b...@bwidawsk.net + *Dave Gordon david.s.gor...@intel.com + *Alex Dai yu@intel.com + */ +#include linux/firmware.h +#include i915_drv.h +#include intel_guc.h + +/** + * DOC: GuC + * + * intel_guc: + * Top level structure of guc. It handles firmware loading and manages client + * pool and doorbells. intel_guc owns a i915_guc_client to replace the legacy + * ExecList submission. + * + * Firmware versioning: + * The firmware build process will generate a version header file with major and + * minor version defined. The versions are built into CSS header of firmware. + * i915 kernel driver set the minimal firmware version required per platform. + * The firmware installation package will install (symbolic link) proper version + * of firmware. + * + * GuC address space: + * GuC does not allow any gfx GGTT address that falls into range [0, WOPCM_TOP), + * which is reserved for Boot ROM, SRAM and WOPCM. Currently this top address is + * 512K. In order to exclude 0-512K address space from GGTT, all gfx objects + * used by GuC is pinned with PIN_OFFSET_BIAS along with size of WOPCM. + * + * Firmware log: + * Firmware log is enabled by setting i915.guc_log_level to non-negative level. + * Log data is printed out via reading debugfs i915_guc_log_dump. Reading from + * i915_guc_load_status will print out firmware loading status and scratch + * registers value. + * + */ + +#define I915_SKL_GUC_UCODE i915/skl_guc_ver3.bin +MODULE_FIRMWARE(I915_SKL_GUC_UCODE); + +static u32 get_gttype(struct drm_device *dev) +{ +/* XXX: GT type based on PCI device ID? field seems unused by fw */ +return 0; +} + +static u32 get_core_family(struct drm_device *dev) For new code we really should be in the habit of passing around the
Re: [Intel-gfx] [PATCH 5/5] drm/i915: Enable PSR by default.
2015-06-18 15:43 GMT-03:00 Rodrigo Vivi rodrigo.v...@intel.com: With a reliable frontbuffer tracking and all instability corner cases solved let's re-enabled PSR by default on all supported platforms. Are we now passing all the PSR tests from kms_frontbuffer_tracking too? Signed-off-by: Rodrigo Vivi rodrigo.v...@intel.com --- drivers/gpu/drm/i915/i915_params.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c index 8ac5a1b..e864e67 100644 --- a/drivers/gpu/drm/i915/i915_params.c +++ b/drivers/gpu/drm/i915/i915_params.c @@ -37,7 +37,7 @@ struct i915_params i915 __read_mostly = { .enable_execlists = -1, .enable_hangcheck = true, .enable_ppgtt = -1, - .enable_psr = 0, + .enable_psr = 1, .preliminary_hw_support = IS_ENABLED(CONFIG_DRM_I915_PRELIMINARY_HW_SUPPORT), .disable_power_well = 1, .enable_ips = 1, @@ -124,7 +124,7 @@ MODULE_PARM_DESC(enable_execlists, (-1=auto [default], 0=disabled, 1=enabled)); module_param_named(enable_psr, i915.enable_psr, int, 0600); -MODULE_PARM_DESC(enable_psr, Enable PSR (default: false)); +MODULE_PARM_DESC(enable_psr, Enable PSR (default: true)); module_param_named(preliminary_hw_support, i915.preliminary_hw_support, int, 0600); MODULE_PARM_DESC(preliminary_hw_support, -- 2.1.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Paulo Zanoni ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround
In Indirect context w/a batch buffer, +WaFlushCoherentL3CacheLinesAtContextSwitch:bdw v2: Add LRI commands to set/reset bit that invalidates coherent lines, update WA to include programming restrictions and exclude CHV as it is not required (Ville) v3: Avoid unnecessary read when it can be done by reading register once (Chris). Cc: Chris Wilson ch...@chris-wilson.co.uk Cc: Dave Gordon david.s.gor...@intel.com Signed-off-by: Rafael Barbalho rafael.barba...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_reg.h | 2 ++ drivers/gpu/drm/i915/intel_lrc.c | 23 +++ 2 files changed, 25 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 84af255..d14ad20 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -426,6 +426,7 @@ #define PIPE_CONTROL_INDIRECT_STATE_DISABLE (19) #define PIPE_CONTROL_NOTIFY (18) #define PIPE_CONTROL_FLUSH_ENABLE(17) /* gen7+ */ +#define PIPE_CONTROL_DC_FLUSH_ENABLE (15) #define PIPE_CONTROL_VF_CACHE_INVALIDATE (14) #define PIPE_CONTROL_CONST_CACHE_INVALIDATE (13) #define PIPE_CONTROL_STATE_CACHE_INVALIDATE (12) @@ -5788,6 +5789,7 @@ enum skl_disp_power_wells { #define GEN8_L3SQCREG4 0xb118 #define GEN8_LQSC_RO_PERF_DIS (127) +#define GEN8_LQSC_FLUSH_COHERENT_LINES(121) /* GEN8 chicken */ #define HDC_CHICKEN0 0x7300 diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index c4b3493..3291ef4 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1113,6 +1113,29 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring, /* WaDisableCtxRestoreArbitration:bdw,chv */ wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_DISABLE); + /* WaFlushCoherentL3CacheLinesAtContextSwitch:bdw */ + if (IS_BROADWELL(ring-dev)) { + struct drm_i915_private *dev_priv = to_i915(ring-dev); + uint32_t l3sqc4_flush = (I915_READ(GEN8_L3SQCREG4) | +GEN8_LQSC_FLUSH_COHERENT_LINES); + + wa_ctx_emit(batch, MI_LOAD_REGISTER_IMM(1)); + wa_ctx_emit(batch, GEN8_L3SQCREG4); + wa_ctx_emit(batch, l3sqc4_flush); + + wa_ctx_emit(batch, GFX_OP_PIPE_CONTROL(6)); + wa_ctx_emit(batch, (PIPE_CONTROL_CS_STALL | + PIPE_CONTROL_DC_FLUSH_ENABLE)); + wa_ctx_emit(batch, 0); + wa_ctx_emit(batch, 0); + wa_ctx_emit(batch, 0); + wa_ctx_emit(batch, 0); + + wa_ctx_emit(batch, MI_LOAD_REGISTER_IMM(1)); + wa_ctx_emit(batch, GEN8_L3SQCREG4); + wa_ctx_emit(batch, l3sqc4_flush ~GEN8_LQSC_FLUSH_COHERENT_LINES); + } + /* Pad to end of cacheline */ while (index % CACHELINE_DWORDS) wa_ctx_emit(batch, MI_NOOP); -- 2.3.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
Some of the WA are to be applied during context save but before restore and some at the end of context save/restore but before executing the instructions in the ring, WA batch buffers are created for this purpose and these WA cannot be applied using normal means. Each context has two registers to load the offsets of these batch buffers. If they are non-zero, HW understands that it need to execute these batches. v1: In this version two separate ring_buffer objects were used to load WA instructions for indirect and per context batch buffers and they were part of every context. v2: Chris suggested to include additional page in context and use it to load these WA instead of creating separate objects. This will simplify lot of things as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC is planning to use a similar setup to share data between GuC and driver and WA batch buffers can probably share that page. However after discussions with Dave who is implementing GuC changes, he suggested to use an independent page for the reasons - GuC area might grow and these WA are initialized only once and are not changed afterwards so we can share them share across all contexts. The page is updated with WA during render ring init. This has an advantage of not adding more special cases to default_context. We don't know upfront the number of WA we will applying using these batch buffers. For this reason the size was fixed earlier but it is not a good idea. To fix this, the functions that load instructions are modified to report the no of commands inserted and the size is now calculated after the batch is updated. A macro is introduced to add commands to these batch buffers which also checks for overflow and returns error. We have a full page dedicated for these WA so that should be sufficient for good number of WA, anything more means we have major issues. The list for Gen8 is small, same for Gen9 also, maybe few more gets added going forward but not close to filling entire page. Chris suggested a two-pass approach but we agreed to go with single page setup as it is a one-off routine and simpler code wins. Moved around functions to simplify it further, add comments and fix alignment check. One additional option is offset field which is helpful if we would like to have multiple batches at different offsets within the page and select them based on some criteria. This is not a requirement at this point but could help in future (Dave). (Many thanks to Chris, Dave and Thomas for their reviews and inputs) Cc: Chris Wilson ch...@chris-wilson.co.uk Cc: Dave Gordon david.s.gor...@intel.com Signed-off-by: Rafael Barbalho rafael.barba...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/intel_lrc.c| 199 +++- drivers/gpu/drm/i915/intel_ringbuffer.h | 18 +++ 2 files changed, 213 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 0413b8f..8cc851dd 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -211,6 +211,7 @@ enum { FAULT_AND_CONTINUE /* Unsupported */ }; #define GEN8_CTX_ID_SHIFT 32 +#define CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT 0x17 static int intel_lr_context_pin(struct intel_engine_cs *ring, struct intel_context *ctx); @@ -1077,6 +1078,168 @@ static int intel_logical_ring_workarounds_emit(struct intel_engine_cs *ring, return 0; } +#define wa_ctx_emit(batch, cmd) { \ + if (WARN_ON(index = (PAGE_SIZE / sizeof(uint32_t { \ + return -ENOSPC; \ + } \ + batch[index++] = (cmd); \ + } + +/** + * gen8_init_indirectctx_bb() - initialize indirect ctx batch with WA + * + * @ring: only applicable for RCS + * @wa_ctx_batch: page in which WA are loaded + * @offset: This is for future use in case if we would like to have multiple + * batches at different offsets and select them based on a criteria. + * @num_dwords: The number of WA applied are known at the beginning, it returns + * the no of DWORDS written. This batch does not contain MI_BATCH_BUFFER_END + * so it adds padding to make it cacheline aligned. MI_BATCH_BUFFER_END will be + * added to perctx batch and both of them together makes a complete batch buffer. + * + * Return: non-zero if we exceed the PAGE_SIZE limit. + */ + +static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring, + uint32_t **wa_ctx_batch, + uint32_t offset, + uint32_t *num_dwords) +{ + uint32_t index; + uint32_t *batch = *wa_ctx_batch; + + index = offset; + + /* FIXME: Replace me with WA */ + wa_ctx_emit(batch,
[Intel-gfx] [PATCH v5 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround
In Per context w/a batch buffer, WaRsRestoreWithPerCtxtBb v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions so as to not break any future users of existing definitions (Michel) v3: Length defined in current definitions of LRM, LRR instructions was specified as 0. It seems it is common convention for instructions whose length vary between platforms. This is not an issue so far because they are not used anywhere except command parser; now that we use in this patch update them with correct length and also move them out of command parser placeholder to appropriate place. remove unnecessary padding and follow the WA programming sequence exactly as mentioned in spec which is essential for this WA (Dave). Cc: Chris Wilson ch...@chris-wilson.co.uk Cc: Dave Gordon david.s.gor...@intel.com Signed-off-by: Rafael Barbalho rafael.barba...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/i915_reg.h | 29 +++-- drivers/gpu/drm/i915/intel_lrc.c | 54 2 files changed, 81 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 7637e64..208620d 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -347,6 +347,31 @@ #define MI_INVALIDATE_BSD(17) #define MI_FLUSH_DW_USE_GTT (12) #define MI_FLUSH_DW_USE_PPGTT(02) +#define MI_LOAD_REGISTER_MEMMI_INSTR(0x29, 1) +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2) +#define MI_LRM_USE_GLOBAL_GTT (122) +#define MI_LRM_ASYNC_MODE_ENABLE (121) +#define MI_LOAD_REGISTER_REGMI_INSTR(0x2A, 1) +#define MI_ATOMIC(len) MI_INSTR(0x2F, (len-2)) +#define MI_ATOMIC_MEMORY_TYPE_GGTT (122) +#define MI_ATOMIC_INLINE_DATA(118) +#define MI_ATOMIC_CS_STALL (117) +#define MI_ATOMIC_RETURN_DATA_CTL(116) +#define MI_ATOMIC_OP_MASK(op) ((op) 8) +#define MI_ATOMIC_AND MI_ATOMIC_OP_MASK(0x01) +#define MI_ATOMIC_OR MI_ATOMIC_OP_MASK(0x02) +#define MI_ATOMIC_XOR MI_ATOMIC_OP_MASK(0x03) +#define MI_ATOMIC_MOVE MI_ATOMIC_OP_MASK(0x04) +#define MI_ATOMIC_INC MI_ATOMIC_OP_MASK(0x05) +#define MI_ATOMIC_DEC MI_ATOMIC_OP_MASK(0x06) +#define MI_ATOMIC_ADD MI_ATOMIC_OP_MASK(0x07) +#define MI_ATOMIC_SUB MI_ATOMIC_OP_MASK(0x08) +#define MI_ATOMIC_RSUB MI_ATOMIC_OP_MASK(0x09) +#define MI_ATOMIC_IMAX MI_ATOMIC_OP_MASK(0x0A) +#define MI_ATOMIC_IMIN MI_ATOMIC_OP_MASK(0x0B) +#define MI_ATOMIC_UMAX MI_ATOMIC_OP_MASK(0x0C) +#define MI_ATOMIC_UMIN MI_ATOMIC_OP_MASK(0x0D) + #define MI_BATCH_BUFFERMI_INSTR(0x30, 1) #define MI_BATCH_NON_SECURE (1) /* for snb/ivb/vlv this also means batch in ppgtt when ppgtt is enabled. */ @@ -451,8 +476,6 @@ #define MI_CLFLUSH MI_INSTR(0x27, 0) #define MI_REPORT_PERF_COUNTMI_INSTR(0x28, 0) #define MI_REPORT_PERF_COUNT_GGTT (10) -#define MI_LOAD_REGISTER_MEMMI_INSTR(0x29, 0) -#define MI_LOAD_REGISTER_REGMI_INSTR(0x2A, 0) #define MI_RS_STORE_DATA_IMMMI_INSTR(0x2B, 0) #define MI_LOAD_URB_MEM MI_INSTR(0x2C, 0) #define MI_STORE_URB_MEMMI_INSTR(0x2D, 0) @@ -1799,6 +1822,8 @@ enum skl_disp_power_wells { #define GEN8_RC_SEMA_IDLE_MSG_DISABLE(1 12) #define GEN8_FF_DOP_CLOCK_GATE_DISABLE (110) +#define GEN8_RS_PREEMPT_STATUS 0x215C + /* Fuse readout registers for GT */ #define CHV_FUSE_GT(VLV_DISPLAY_BASE + 0x2168) #define CHV_FGT_DISABLE_SS0 (1 10) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index b631390..281aec6 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1179,13 +1179,67 @@ static int gen8_init_perctx_bb(struct intel_engine_cs *ring, uint32_t *num_dwords) { uint32_t index; + uint32_t scratch_addr; uint32_t *batch = *wa_ctx_batch; index = offset; + /* Actual scratch location is at 128 bytes offset */ + scratch_addr = ring-scratch.gtt_offset + 2*CACHELINE_BYTES; + scratch_addr |= PIPE_CONTROL_GLOBAL_GTT; + /* WaDisableCtxRestoreArbitration:bdw,chv */ wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_ENABLE); + /* +* As per Bspec, to workaround a known HW issue, SW must perform the +* below programming sequence prior to programming MI_BATCH_BUFFER_END. +* +* This is only applicable for Gen8. +*/ + + /* WaRsRestoreWithPerCtxtBb:bdw,chv */ + wa_ctx_emit(batch, MI_LOAD_REGISTER_IMM(1)); + wa_ctx_emit(batch, INSTPM); + wa_ctx_emit(batch, _MASKED_BIT_DISABLE(INSTPM_FORCE_ORDERING)); + + wa_ctx_emit(batch, (MI_ATOMIC(5) | + MI_ATOMIC_MEMORY_TYPE_GGTT | + MI_ATOMIC_INLINE_DATA |
[Intel-gfx] [PATCH v5 3/6] drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround
In Indirect and Per context w/a batch buffer, +WaDisableCtxRestoreArbitration Cc: Chris Wilson ch...@chris-wilson.co.uk Cc: Dave Gordon david.s.gor...@intel.com Signed-off-by: Rafael Barbalho rafael.barba...@intel.com Signed-off-by: Arun Siluvery arun.siluv...@linux.intel.com --- drivers/gpu/drm/i915/intel_lrc.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 62486cd..c4b3493 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1110,8 +1110,8 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring, index = offset; - /* FIXME: Replace me with WA */ - wa_ctx_emit(batch, MI_NOOP); + /* WaDisableCtxRestoreArbitration:bdw,chv */ + wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_DISABLE); /* Pad to end of cacheline */ while (index % CACHELINE_DWORDS) @@ -1144,6 +1144,9 @@ static int gen8_init_perctx_bb(struct intel_engine_cs *ring, index = offset; + /* WaDisableCtxRestoreArbitration:bdw,chv */ + wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_ENABLE); + wa_ctx_emit(batch, MI_BATCH_BUFFER_END); *num_dwords = index - offset; -- 2.3.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c
On 18/06/15 13:10, Chris Wilson wrote: On Thu, Jun 18, 2015 at 12:49:55PM +0100, Dave Gordon wrote: On 17/06/15 13:02, Daniel Vetter wrote: Domain handling is required for all gem objects, and the resulting bugs if you don't for one-off objects are absolutely no fun to track down. Is it not the case that the new object returned by i915_gem_alloc_object() is (a) of a type that can be mapped into the GTT, and (b) initially in the CPU domain for both reading and writing? So AFAICS the allocate-and-fill function I'm describing (to appear in next patch series respin) doesn't need any further domain handling. A i915_gem_object_create_from_data() is a reasonable addition, and I suspect it will make the code a bit more succinct. I shall adopt this name for it :) Whilst your statement is true today, calling set_domain is then a no-op, and helps document how you use the object and so reduces the likelihood of us introducing bugs in the future. -Chris So here's the new function ... where should the set-to-cpu-domain go? After the pin_pages and before the sg_copy_from_buffer? /* Allocate a new GEM object and fill it with the supplied data */ struct drm_i915_gem_object * i915_gem_object_create_from_data(struct drm_device *dev, const void *data, size_t size) { struct drm_i915_gem_object *obj; struct sg_table *sg; size_t bytes; int ret; obj = i915_gem_alloc_object(dev, round_up(size, PAGE_SIZE)); if (!obj) return NULL; ret = i915_gem_object_get_pages(obj); if (ret) goto fail; i915_gem_object_pin_pages(obj); sg = obj-pages; bytes = sg_copy_from_buffer(sg-sgl, sg-nents, (void *)data, size); i915_gem_object_unpin_pages(obj); if (WARN_ON(bytes != size)) { DRM_ERROR(Incomplete copy, wrote %zu of %zu, bytes, size); goto fail; } return obj; fail: drm_gem_object_unreference(obj-base); return NULL; } .Dave. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 6/8] drivers/pwm: Add Crystalcove (CRC) PWM driver
Hi Shobhit, On Thu, 2015-06-18 at 23:24 +0530, Shobhit Kumar wrote: On Fri, May 1, 2015 at 2:42 AM, Paul Bolle pebo...@tiscali.nl wrote: On Wed, 2015-04-29 at 19:30 +0530, Shobhit Kumar wrote: --- a/drivers/pwm/Kconfig +++ b/drivers/pwm/Kconfig +config PWM_CRC + bool Intel Crystalcove (CRC) PWM support + depends on X86 INTEL_SOC_PMIC + help + Generic PWM framework driver for Crystalcove (CRC) PMIC based PWM + control. --- a/drivers/pwm/Makefile +++ b/drivers/pwm/Makefile +obj-$(CONFIG_PWM_CRC)+= pwm-crc.o PWM_CRC is a bool symbol. So pwm-crc.o can never be part of a module. I actually started this as a module but later decided to make it as bool because INTEL_SOC_PMIC on which this depends is itself a bool as well. As does GPIO_CRYSTAL_COVE and that's a tristate. So? Still it is good to keep the module based initialization. Firstly because it causes no harm If I got a dime for every time people used an argument like that I ... I could treat myself to an ice cream. A really big ice cream. Hmm, that doesn't sound too impressive. But still, causes no harm is below the bar for kernel code. Kernel code needs to add value. and even though some of the macros are pre-processed out, gives info about the driver. None of which can't be gotten elsewhere (ie, the commit message, or the file these macro reside in). Secondly there were discussion on why INTEL_SOC_PMIC is bool (note this driver also has module based initialization even when bool). Yes, there's copy and paste going on even in kernel development. I am guessing because of some tricky module load order dependencies. If ever that becomes a module, this can mostly be unchanged to be loaded as a module. You put in a macro, or any other bit of code, when it's needed, not beforehand, just in case. That's silly. Thanks, Paul Bolle ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 01/15] drm/i915: Add i915_gem_object_write() to i915_gem.c
On 18/06/15 15:31, Daniel Vetter wrote: On Thu, Jun 18, 2015 at 12:49:55PM +0100, Dave Gordon wrote: On 17/06/15 13:02, Daniel Vetter wrote: On Wed, Jun 17, 2015 at 08:23:40AM +0100, Dave Gordon wrote: On 15/06/15 21:09, Chris Wilson wrote: On Mon, Jun 15, 2015 at 07:36:19PM +0100, Dave Gordon wrote: From: Alex Dai yu@intel.com i915_gem_object_write() is a generic function to copy data from a plain linear buffer to a paged gem object. We will need this for the microcontroller firmware loading support code. Issue: VIZ-4884 Signed-off-by: Alex Dai yu@intel.com Signed-off-by: Dave Gordon david.s.gor...@intel.com --- drivers/gpu/drm/i915/i915_drv.h |2 ++ drivers/gpu/drm/i915/i915_gem.c | 28 2 files changed, 30 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 611fbd8..9094c06 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2713,6 +2713,8 @@ void *i915_gem_object_alloc(struct drm_device *dev); void i915_gem_object_free(struct drm_i915_gem_object *obj); void i915_gem_object_init(struct drm_i915_gem_object *obj, const struct drm_i915_gem_object_ops *ops); +int i915_gem_object_write(struct drm_i915_gem_object *obj, + const void *data, size_t size); struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev, size_t size); void i915_init_vm(struct drm_i915_private *dev_priv, diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index be35f04..75d63c2 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -5392,3 +5392,31 @@ bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj) return false; } +/* Fill the @obj with the @size amount of @data */ +int i915_gem_object_write(struct drm_i915_gem_object *obj, +const void *data, size_t size) +{ +struct sg_table *sg; +size_t bytes; +int ret; + +ret = i915_gem_object_get_pages(obj); +if (ret) +return ret; + +i915_gem_object_pin_pages(obj); You don't set the object into the CPU domain, or instead manually handle the domain flushing. You don't handle objects that cannot be written directly by the CPU, nor do you handle objects whose representation in memory is not linear. -Chris No we don't handle just any random gem object, but we do return an error code for any types not supported. However, as we don't really need the full generality of writing into a gem object of any type, I will replace this function with one that combines the allocation of a new object (which will therefore definitely be of the correct type, in the correct domain, etc) and filling it with the data to be preserved. The usage pattern for the particular case is going to be: Once-only: Allocate Fill Then each time GuC is (re-)initialised: Map to GTT DMA-read from buffer into GuC private memory Unmap Only on unload: Dispose So our object is write-once by the CPU (and that's always the first operation), thereafter read-occasionally by the GuC's DMA engine. Yup. The problem is more that on atom platforms the objects aren't coherent by default and generally you need to do something. Hence we either have - an explicit set_caching call to document that this is a gpu object which is always coherent (so also on chv/bxt), even when that's a no-op on big core - or wrap everything in set_domain calls, even when those are no-ops too. If either of those lack, reviews tend to freak out preemptively and the reptil brain takes over ;-) Cheers, Daniel We don't need coherency as such. The buffer is filled (once only) by the CPU (so I should put a set-to-cpu-domain between the allocate and fill stages?) Once it's filled, the CPU need not read or write it ever again. Then before the DMA engine accesses it, we call i915_gem_obj_ggtt_pin, which I'm assuming will take care of any coherency issues (making sure the data written by the CPU is now visible to the DMA engine) when it puts the buffer into the GTT-readable domain. Is that not sufficient? .Dave. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 05/15] drm/i915: GuC-specific firmware loader
On Thu, Jun 18, 2015 at 10:53:10AM -0700, Yu Dai wrote: On 06/15/2015 01:30 PM, Chris Wilson wrote: On Mon, Jun 15, 2015 at 07:36:23PM +0100, Dave Gordon wrote: + /* Set the source address for the new blob */ + offset = i915_gem_obj_ggtt_offset(fw_obj); Why would it even have a GGTT vma? There's no precondition here to assert that it should. It is pinned into GGTT inside gem_allocate_guc_obj. The basic rules when reviewing is pinning is: - is there a reason for this pin? - is the lifetime of the pin bound to the hardware access? - are the pad-to-size/alignment correct? - is the vma in the wrong location? Pinning early (and then not even stating in the function preamble that you expect the object to be pinned) makes it hard to review both the reason and check the lifetime. An easy solution to avoiding the assumption of having a pinned object is to pass around the vma instead. Though because you pin too early it is not clear the reason for the pin nor that you only pin it for the lifetime of the hardware access, and you have to scour the code to ensure that the pin isn't randomly dropped or reused for another access. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [RFC 06/14] drm/i915: Disable vlank interrupt for disabling MIPI cmd mode
vblank interrupt should be disabled before starting the disable sequence for MIPI command mode. Otherwise when pipe is disabled TE interurpt will be still handled and one memory write command will be sent with pipe disabled. This makes the pipe hw to get stuck and it doesn't recover in the next enable sequence causing display blank out. v2: Use drm_blank_off instead of platform specific disable vblank functions (Daniel) Signed-off-by: Yogesh Mohan Marimuthu yogesh.mohan.marimu...@intel.com Signed-off-by: Gaurav K Singh gaurav.k.si...@intel.com --- drivers/gpu/drm/i915/intel_dsi.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c index d378246..7021591 100644 --- a/drivers/gpu/drm/i915/intel_dsi.c +++ b/drivers/gpu/drm/i915/intel_dsi.c @@ -513,11 +513,25 @@ static void intel_dsi_enable_nop(struct intel_encoder *encoder) static void intel_dsi_pre_disable(struct intel_encoder *encoder) { + struct drm_device *dev = encoder-base.dev; struct intel_dsi *intel_dsi = enc_to_intel_dsi(encoder-base); + struct intel_crtc *intel_crtc = to_intel_crtc(encoder-base.crtc); + int pipe = intel_crtc-pipe; enum port port; DRM_DEBUG_KMS(\n); + if (is_cmd_mode(intel_dsi)) { + drm_vblank_off(dev, pipe); + + /* +* Make sure that the last frame is sent otherwise pipe can get +* stuck. Currently providing delay time for ~2 vblanks +* assuming 60fps. +*/ + mdelay(40); + } + if (is_vid_mode(intel_dsi)) { /* Send Shutdown command to the panel in LP mode */ for_each_dsi_port(port, intel_dsi-ports) -- 1.7.9.5 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [RFC 07/14] drm/i915: Disable MIPI display self refresh mode
During disable sequence for MIPI encoder in command mode, disable MIPI display self-refresh mode bit in Pipe Ctrl reg. v2: Use crtc state flag instead of loop over encoders (Daniel) Signed-off-by: Gaurav K Singh gaurav.k.si...@intel.com Signed-off-by: Yogesh Mohan Marimuthu yogesh.mohan.marimu...@intel.com Signed-off-by: Shobhit Kumar shobhit.ku...@intel.com --- drivers/gpu/drm/i915/intel_display.c |3 +++ drivers/gpu/drm/i915/intel_drv.h |3 +++ drivers/gpu/drm/i915/intel_dsi.c |3 +++ 3 files changed, 9 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 067b1de..dd518d6 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -2193,6 +2193,9 @@ static void intel_disable_pipe(struct intel_crtc *crtc) if ((val PIPECONF_ENABLE) == 0) return; + if (crtc-config-dsi_self_refresh) + val = val ~PIPECONF_MIPI_DSR_ENABLE; + /* * Double wide has implications for planes * so best keep it disabled when not needed. diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h index 14562c6..4298a00 100644 --- a/drivers/gpu/drm/i915/intel_drv.h +++ b/drivers/gpu/drm/i915/intel_drv.h @@ -444,6 +444,9 @@ struct intel_crtc_state { bool double_wide; bool dp_encoder_is_mst; + + bool dsi_self_refresh; + int pbn; struct intel_crtc_scaler_state scaler_state; diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c index 7021591..36d8ad6 100644 --- a/drivers/gpu/drm/i915/intel_dsi.c +++ b/drivers/gpu/drm/i915/intel_dsi.c @@ -308,6 +308,9 @@ static bool intel_dsi_compute_config(struct intel_encoder *encoder, DRM_DEBUG_KMS(\n); + if (is_cmd_mode(intel_dsi)) + config-dsi_self_refresh = true; + if (fixed_mode) intel_fixed_panel_mode(fixed_mode, adjusted_mode); -- 1.7.9.5 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [RFC 11/14] drm/i915: Enable MIPI display self refresh mode
During enable sequence for MIPI encoder in command mode, enable MIPI display self-refresh mode bit in Pipe Ctrl reg. v2: Use crtc state flag instead of loop over encoders (Daniel) Signed-off-by: Gaurav K Singh gaurav.k.si...@intel.com Signed-off-by: Yogesh Mohan Marimuthu yogesh.mohan.marimu...@intel.com Signed-off-by: Shobhit Kumar shobhit.ku...@intel.com --- drivers/gpu/drm/i915/intel_display.c |5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index dd518d6..c53f66d 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -2158,6 +2158,11 @@ static void intel_enable_pipe(struct intel_crtc *crtc) return; } + if (crtc-config-dsi_self_refresh) { + val = val | PIPECONF_MIPI_DSR_ENABLE; + I915_WRITE(reg, val); + } + I915_WRITE(reg, val | PIPECONF_ENABLE); POSTING_READ(reg); } -- 1.7.9.5 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] {Intel-gfx] [RFC 01/14] drm/i915: allocate gem memory for mipi dbi cmd buffer
Allocate gem memory for MIPI DBI command buffer. This memory will be used when sending command via DBI interface. v2: lock mutex before gem object unreference and later set gem obj ptr to NULL (Gaurav) Signed-off-by: Yogesh Mohan Marimuthu yogesh.mohan.marimu...@intel.com Signed-off-by: Gaurav K Singh gaurav.k.si...@intel.com Signed-off-by: Shobhit Kumar shobhit.ku...@intel.com --- drivers/gpu/drm/i915/intel_dsi.c | 40 ++ drivers/gpu/drm/i915/intel_dsi.h |4 2 files changed, 44 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c index 98998e9..011fef2 100644 --- a/drivers/gpu/drm/i915/intel_dsi.c +++ b/drivers/gpu/drm/i915/intel_dsi.c @@ -407,9 +407,35 @@ static void intel_dsi_pre_enable(struct intel_encoder *encoder) enum pipe pipe = intel_crtc-pipe; enum port port; u32 tmp; + int ret; DRM_DEBUG_KMS(\n); + if (!intel_dsi-gem_obj is_cmd_mode(intel_dsi)) { + intel_dsi-gem_obj = i915_gem_alloc_object(dev, 4096); + if (!intel_dsi-gem_obj) { + DRM_ERROR(Failed to allocate seqno page\n); + return; + } + + ret = i915_gem_object_set_cache_level(intel_dsi-gem_obj, + I915_CACHE_LLC); + if (ret) + goto err_unref; + + ret = i915_gem_obj_ggtt_pin(intel_dsi-gem_obj, 4096, 0); + if (ret) { +err_unref: + drm_gem_object_unreference(intel_dsi-gem_obj-base); + return; + } + + intel_dsi-cmd_buff = + kmap(sg_page(intel_dsi-gem_obj-pages-sgl)); + intel_dsi-cmd_buff_phy_addr = page_to_phys( + sg_page(intel_dsi-gem_obj-pages-sgl)); + } + /* Disable DPOunit clock gating, can stall pipe * and we need DPLL REFA always enabled */ tmp = I915_READ(DPLL(pipe)); @@ -555,6 +581,7 @@ static void intel_dsi_post_disable(struct intel_encoder *encoder) { struct drm_i915_private *dev_priv = encoder-base.dev-dev_private; struct intel_dsi *intel_dsi = enc_to_intel_dsi(encoder-base); + struct drm_device *dev = encoder-base.dev; u32 val; DRM_DEBUG_KMS(\n); @@ -571,6 +598,15 @@ static void intel_dsi_post_disable(struct intel_encoder *encoder) msleep(intel_dsi-panel_off_delay); msleep(intel_dsi-panel_pwr_cycle_delay); + + if (intel_dsi-gem_obj) { + kunmap(intel_dsi-cmd_buff); + i915_gem_object_ggtt_unpin(intel_dsi-gem_obj); + mutex_lock(dev-struct_mutex); + drm_gem_object_unreference(intel_dsi-gem_obj-base); + mutex_unlock(dev-struct_mutex); + } + intel_dsi-gem_obj = NULL; } static bool intel_dsi_get_hw_state(struct intel_encoder *encoder, @@ -1042,6 +1078,10 @@ void intel_dsi_init(struct drm_device *dev) intel_dsi-ports = (1 PORT_C); } + intel_dsi-cmd_buff = NULL; + intel_dsi-cmd_buff_phy_addr = 0; + intel_dsi-gem_obj = NULL; + /* Create a DSI host (and a device) for each port. */ for_each_dsi_port(port, intel_dsi-ports) { struct intel_dsi_host *host; diff --git a/drivers/gpu/drm/i915/intel_dsi.h b/drivers/gpu/drm/i915/intel_dsi.h index 2784ac4..36ca3cc 100644 --- a/drivers/gpu/drm/i915/intel_dsi.h +++ b/drivers/gpu/drm/i915/intel_dsi.h @@ -44,6 +44,10 @@ struct intel_dsi { struct intel_connector *attached_connector; + struct drm_i915_gem_object *gem_obj; + void *cmd_buff; + dma_addr_t cmd_buff_phy_addr; + /* bit mask of ports being driven */ u16 ports; -- 1.7.9.5 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] {Intel-gfx] [RFC 01/14] drm/i915: allocate gem memory for mipi dbi cmd buffer
On 6/19/2015 3:32 AM, Gaurav K Singh wrote: Allocate gem memory for MIPI DBI command buffer. This memory will be used when sending command via DBI interface. v2: lock mutex before gem object unreference and later set gem obj ptr to NULL (Gaurav) Signed-off-by: Yogesh Mohan Marimuthu yogesh.mohan.marimu...@intel.com Signed-off-by: Gaurav K Singh gaurav.k.si...@intel.com Signed-off-by: Shobhit Kumar shobhit.ku...@intel.com --- drivers/gpu/drm/i915/intel_dsi.c | 40 ++ drivers/gpu/drm/i915/intel_dsi.h |4 2 files changed, 44 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c index 98998e9..011fef2 100644 --- a/drivers/gpu/drm/i915/intel_dsi.c +++ b/drivers/gpu/drm/i915/intel_dsi.c @@ -407,9 +407,35 @@ static void intel_dsi_pre_enable(struct intel_encoder *encoder) enum pipe pipe = intel_crtc-pipe; enum port port; u32 tmp; + int ret; DRM_DEBUG_KMS(\n); + if (!intel_dsi-gem_obj is_cmd_mode(intel_dsi)) { + intel_dsi-gem_obj = i915_gem_alloc_object(dev, 4096); + if (!intel_dsi-gem_obj) { + DRM_ERROR(Failed to allocate seqno page\n); + return; + } + + ret = i915_gem_object_set_cache_level(intel_dsi-gem_obj, + I915_CACHE_LLC); + if (ret) + goto err_unref; + + ret = i915_gem_obj_ggtt_pin(intel_dsi-gem_obj, 4096, 0); + if (ret) { +err_unref: + drm_gem_object_unreference(intel_dsi-gem_obj-base); + return; + } + + intel_dsi-cmd_buff = + kmap(sg_page(intel_dsi-gem_obj-pages-sgl)); + intel_dsi-cmd_buff_phy_addr = page_to_phys( + sg_page(intel_dsi-gem_obj-pages-sgl)); + } + /* Disable DPOunit clock gating, can stall pipe * and we need DPLL REFA always enabled */ tmp = I915_READ(DPLL(pipe)); @@ -555,6 +581,7 @@ static void intel_dsi_post_disable(struct intel_encoder *encoder) { struct drm_i915_private *dev_priv = encoder-base.dev-dev_private; struct intel_dsi *intel_dsi = enc_to_intel_dsi(encoder-base); + struct drm_device *dev = encoder-base.dev; u32 val; DRM_DEBUG_KMS(\n); @@ -571,6 +598,15 @@ static void intel_dsi_post_disable(struct intel_encoder *encoder) msleep(intel_dsi-panel_off_delay); msleep(intel_dsi-panel_pwr_cycle_delay); + + if (intel_dsi-gem_obj) { + kunmap(intel_dsi-cmd_buff); + i915_gem_object_ggtt_unpin(intel_dsi-gem_obj); + mutex_lock(dev-struct_mutex); + drm_gem_object_unreference(intel_dsi-gem_obj-base); + mutex_unlock(dev-struct_mutex); + } + intel_dsi-gem_obj = NULL; } static bool intel_dsi_get_hw_state(struct intel_encoder *encoder, @@ -1042,6 +1078,10 @@ void intel_dsi_init(struct drm_device *dev) intel_dsi-ports = (1 PORT_C); } + intel_dsi-cmd_buff = NULL; + intel_dsi-cmd_buff_phy_addr = 0; + intel_dsi-gem_obj = NULL; + /* Create a DSI host (and a device) for each port. */ for_each_dsi_port(port, intel_dsi-ports) { struct intel_dsi_host *host; diff --git a/drivers/gpu/drm/i915/intel_dsi.h b/drivers/gpu/drm/i915/intel_dsi.h index 2784ac4..36ca3cc 100644 --- a/drivers/gpu/drm/i915/intel_dsi.h +++ b/drivers/gpu/drm/i915/intel_dsi.h @@ -44,6 +44,10 @@ struct intel_dsi { struct intel_connector *attached_connector; + struct drm_i915_gem_object *gem_obj; + void *cmd_buff; + dma_addr_t cmd_buff_phy_addr; + /* bit mask of ports being driven */ u16 ports; Corrected the initial patch. Working on the dma_alloc_coherent patch , will update soon. With regards, Gaurav ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PULL] drm-intel-next-fixes
Hi Dave, i915 fixes for drm-next/v4.2. BR, Jani. The following changes since commit bf546f8158e2df2656494a475e6235634121c87c: drm/i915/skl: Fix DMC API version in firmware file name (2015-06-05 12:08:01 +0300) are available in the git repository at: git://anongit.freedesktop.org/drm-intel tags/drm-intel-next-fixes-2015-06-18 for you to fetch changes up to 4ed9fb371ccdfe465bd3bbb69e4cad5243e6c4e2: drm/i915: Don't set enabled value of all CRTCs when restoring the mode (2015-06-17 14:21:01 +0300) Ander Conselvan de Oliveira (3): drm/i915: Don't check modeset state in the hw state force restore path drm/i915: Don't update staged config during force restore modesets drm/i915: Don't set enabled value of all CRTCs when restoring the mode Francisco Jerez (3): drm/i915: Fix command parser to validate multiple register access with the same command. drm/i915: Extend the parser to check register writes against a mask/value pair. drm/i915: Add SCRATCH1 and ROW_CHICKEN3 to the register whitelist. Ville Syrjälä (1): drm/i915: Don't skip request retirement if the active list is empty drivers/gpu/drm/i915/i915_cmd_parser.c | 197 +--- drivers/gpu/drm/i915/i915_drv.h | 5 + drivers/gpu/drm/i915/i915_gem.c | 3 - drivers/gpu/drm/i915/intel_display.c| 54 - drivers/gpu/drm/i915/intel_ringbuffer.h | 5 +- 5 files changed, 164 insertions(+), 100 deletions(-) -- Jani Nikula, Intel Open Source Technology Center ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3 06/19] drm/i915: Split skl_update_scaler, v3.
On Thu, Jun 18, 2015 at 07:42:10AM +0200, Maarten Lankhorst wrote: Op 18-06-15 om 03:48 schreef Matt Roper: On Mon, Jun 15, 2015 at 12:33:43PM +0200, Maarten Lankhorst wrote: It's easier to read separate functions for crtc and plane scaler state. Changes since v1: - Update documentation. Changes since v2: - Get rid of parameters to skl_update_scaler only used for traces. This avoids needing to document the other parameters. Signed-off-by: Maarten Lankhorst maarten.lankho...@linux.intel.com --- drivers/gpu/drm/i915/intel_display.c | 211 +++ drivers/gpu/drm/i915/intel_dp.c | 2 +- drivers/gpu/drm/i915/intel_drv.h | 12 +- drivers/gpu/drm/i915/intel_sprite.c | 3 +- 4 files changed, 121 insertions(+), 107 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 0f7652a31c95..26d610acb61f 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -4303,62 +4303,16 @@ static void cpt_verify_modeset(struct drm_device *dev, int pipe) } } -/** - * skl_update_scaler_users - Stages update to crtc's scaler state - * @intel_crtc: crtc - * @crtc_state: crtc_state - * @plane: plane (NULL indicates crtc is requesting update) - * @plane_state: plane's state - * @force_detach: request unconditional detachment of scaler - * - * This function updates scaler state for requested plane or crtc. - * To request scaler usage update for a plane, caller shall pass plane pointer. - * To request scaler usage update for crtc, caller shall pass plane pointer - * as NULL. - * - * Return - * 0 - scaler_usage updated successfully - *error - requested scaling cannot be supported or other error condition - */ -int -skl_update_scaler_users( - struct intel_crtc *intel_crtc, struct intel_crtc_state *crtc_state, - struct intel_plane *intel_plane, struct intel_plane_state *plane_state, - int force_detach) +static int +skl_update_scaler(struct intel_crtc_state *crtc_state, bool force_detach, +unsigned scaler_idx, int *scaler_id, unsigned int rotation, ^^ This parameter isn't actually the scaler index is it (that's what scaler_id winds up being once assigned here)? I think this one is the plane index that we're assigning a scaler for (or the special value of SKL_CRTC_INDEX if we're assigning for the CRTC instead of a plane). Maybe 'scaler_target' or 'scaler_user' would be better? Could we call it 'i'? Not for a function argument really ;-) -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v4] drm/i915 : Added Programming of the MOCS
On Wed, Jun 17, 2015 at 04:19:22PM +0100, Peter Antoine wrote: This change adds the programming of the MOCS registers to the gen 9+ platforms. This change set programs the MOCS register values to a set of values that are defined to be optimal. It creates a fixed register set that is programmed across the different engines so that all engines have the same table. This is done as the main RCS context only holds the registers for itself and the shared L3 values. By trying to keep the registers consistent across the different engines it should make the programming for the registers consistent. v2: -'static const' for private data structures and style changes.(Matt Turner) v3: - Make the tables slightly more readable. (Damien Lespiau) - Updated tables fix performance regression. v4: - Code formatting. (Chris Wilson) - re-privatised mocs code. (Daniel Vetter) Signed-off-by: Peter Antoine peter.anto...@intel.com --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/i915_reg.h | 9 + drivers/gpu/drm/i915/intel_lrc.c | 10 +- drivers/gpu/drm/i915/intel_lrc.h | 4 + drivers/gpu/drm/i915/intel_mocs.c | 373 ++ drivers/gpu/drm/i915/intel_mocs.h | 64 +++ 6 files changed, 460 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/i915/intel_mocs.c create mode 100644 drivers/gpu/drm/i915/intel_mocs.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index b7ddf48..c781e19 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -35,6 +35,7 @@ i915-y += i915_cmd_parser.o \ i915_irq.o \ i915_trace_points.o \ intel_lrc.o \ + intel_mocs.o \ intel_ringbuffer.o \ intel_uncore.o diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 7213224..3a435b5 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -7829,4 +7829,13 @@ enum skl_disp_power_wells { #define _PALETTE_A (dev_priv-info.display_mmio_offset + 0xa000) #define _PALETTE_B (dev_priv-info.display_mmio_offset + 0xa800) +/* MOCS (Memory Object Control State) registers */ +#define GEN9_LNCFCMOCS0 (0xB020)/* L3 Cache Control base */ + +#define GEN9_GFX_MOCS_0 (0xc800)/* Graphics MOCS base register*/ +#define GEN9_MFX0_MOCS_0 (0xc900)/* Media 0 MOCS base register*/ +#define GEN9_MFX1_MOCS_0 (0xcA00)/* Media 1 MOCS base register*/ +#define GEN9_VEBOX_MOCS_0(0xcB00)/* Video MOCS base register*/ +#define GEN9_BLT_MOCS_0 (0xcc00)/* Blitter MOCS base register*/ + #endif /* _I915_REG_H_ */ diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 9f5485d..73b919d 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -135,6 +135,7 @@ #include drm/drmP.h #include drm/i915_drm.h #include i915_drv.h +#include intel_mocs.h #define GEN9_LR_CONTEXT_RENDER_SIZE (22 * PAGE_SIZE) #define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE) @@ -796,7 +797,7 @@ static int logical_ring_prepare(struct intel_ringbuffer *ringbuf, * * Return: non-zero if the ringbuffer is not ready to be written to. */ -static int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, +int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, struct intel_context *ctx, int num_dwords) { struct intel_engine_cs *ring = ringbuf-ring; @@ -1379,6 +1380,13 @@ static int gen8_init_rcs_context(struct intel_engine_cs *ring, if (ret) return ret; + /* + * Failing to program the MOCS is non-fatal.The system will not + * run at peak performance. So generate a warning and carry on. + */ Is this really true? Userspace must make sure that they don't inappropriately overwrite the caching settings using MOCS for frontbuffers and in doing so causing coherency issues with the display block. If we fail to program MOCS correctly then things won't look pretty. Sounds like even more reaons imo why we really need the userspace side of this ... Also the general approach for render side setup failures is to return -EIO, which will result in a wedged gpu. No reason imo here to eat this failure. -Daniel + if (gen9_program_mocs(ring, ctx) != 0) + DRM_ERROR(MOCS failed to program: expect performance issues.); + return intel_lr_context_render_state_init(ring, ctx); } diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h index 04d3a6d..dbbd6af 100644 --- a/drivers/gpu/drm/i915/intel_lrc.h +++ b/drivers/gpu/drm/i915/intel_lrc.h @@ -44,6 +44,10 @@ int intel_logical_rings_init(struct drm_device *dev); int logical_ring_flush_all_caches(struct intel_ringbuffer *ringbuf,
Re: [Intel-gfx] [PATCH v2 17/18] drm/i915: Wa32bitGeneralStateOffset Wa32bitInstructionBaseOffset
On Thu, Jun 18, 2015 at 08:45:50AM +0200, Daniel Vetter wrote: On Wed, Jun 17, 2015 at 06:37:03PM +0100, Chris Wilson wrote: On Wed, Jun 17, 2015 at 05:03:19PM +0200, Daniel Vetter wrote: On Wed, Jun 17, 2015 at 01:53:17PM +0100, Chris Wilson wrote: On Wed, Jun 17, 2015 at 02:49:47PM +0200, Daniel Vetter wrote: On Wed, Jun 10, 2015 at 07:09:03PM +0100, Chris Wilson wrote: On Wed, Jun 10, 2015 at 05:46:54PM +0100, Michel Thierry wrote: There are some allocations that must be only referenced by 32bit offsets. To limit the chances of having the first 4GB already full, objects not requiring this workaround use DRM_MM_SEARCH_BELOW/ DRM_MM_CREATE_TOP flags User must pass I915_EXEC_SUPPORTS_48BADDRESS flag to indicate it can be allocated above the 32b address range. This should be a per-object flag not per-execbuffer. We need both. This one to opt into the large address space, the per-object one to apply the w/a. Also libdrm/mesa patches for this are still missing. Do we need the opt in on the context? The 48bit vm is lazily constructed, if no object asks to use the high range, it will never be populated. Or is there a cost with preparing a 48bit vm? If we restrict to 4G we'll evict objects if we run out, and will stay correct even when processing fairly large workloads. With just lazily eating into 48b that won't be the case. A bit far-fetched, but if we go to the trouble of implementing this might as well do it right. i915_evict_something runs between the range requested for pinning. If we run out of 4G space and the desired pin does not opt into 48bit, we will evict from the lower 4G. I obviously missed your concern. Care to elaborate? Current situation: You always get an address below 4G for all objects, even if you use more than 4G of textures - the evict code will make space. New situation with 48b address space enabled but existing userspace and a total BO set bigger than 4G: The kernel will eventually hand out ppgtt addresses 4G, which means if we get such an address potentially even for an object where this wa needs to apply. This would be a regression. But if we make 48b strictly opt-in the kernel will restrict _all_ objects to below 4G, creating no regression. How? The pin code requires PIN_48BIT to be set to hand out higher addresses. That is only set by execbuffer if execobject-flags is also set. Ofc new userspace on 48b would set both the execbuf opt-in (or context flag, we have those now) plus the per-obj I need this below 4G flag for the objects that need this wa. I don't see why we need another flag beyond the per-object flag. If you are thinking validation, we have to validate per-object flags anyway. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx