Re: [Intel-gfx] [PATCH v2 2/2] drm/i915/scheduler: add gvt notification for guc
> -Original Message- > From: intel-gvt-dev [mailto:intel-gvt-dev-boun...@lists.freedesktop.org] On > Behalf Of Dong, Chuanxiao > Sent: Thursday, April 6, 2017 11:19 PM > To: Chris Wilson> Cc: Tian, Kevin ; intel-gvt-...@lists.freedesktop.org; > intel-gfx@lists.freedesktop.org; joonas.lahti...@linux.intel.com; Zheng, > Xiao ; Wang, Zhi A > Subject: RE: [PATCH v2 2/2] drm/i915/scheduler: add gvt notification for guc > > > > > -Original Message- > > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > > Sent: Thursday, April 6, 2017 11:07 PM > > To: Dong, Chuanxiao > > Cc: intel-gfx@lists.freedesktop.org; > > intel-gvt-...@lists.freedesktop.org; > > Zheng, Xiao; Tian, Kevin; joonas.lahti...@linux.intel.com; Wang, Zhi A > > Subject: Re: [PATCH v2 2/2] drm/i915/scheduler: add gvt notification > > for guc > > > > On Thu, Apr 06, 2017 at 02:49:54PM +, Dong, Chuanxiao wrote: > > > > > > > > > > -Original Message- > > > > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > > > > Sent: Thursday, April 6, 2017 10:19 PM > > > > To: Dong, Chuanxiao > > > > Cc: intel-gfx@lists.freedesktop.org; > > > > intel-gvt-...@lists.freedesktop.org; > > > > Zheng, Xiao; Tian, Kevin; joonas.lahti...@linux.intel.com > > > > Subject: Re: [PATCH v2 2/2] drm/i915/scheduler: add gvt > > > > notification for guc > > > > > > > > On Thu, Apr 06, 2017 at 02:05:15PM +, Dong, Chuanxiao wrote: > > > > > > > > > > > > > > > > -Original Message- > > > > > > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > > > > > > Sent: Thursday, April 6, 2017 9:32 PM > > > > > > To: Dong, Chuanxiao > > > > > > Cc: intel-gfx@lists.freedesktop.org; > > > > > > intel-gvt-...@lists.freedesktop.org; > > > > > > Zheng, Xiao; Tian, Kevin; joonas.lahti...@linux.intel.com > > > > > > Subject: Re: [PATCH v2 2/2] drm/i915/scheduler: add gvt > > > > > > notification for guc > > > > > > > > > > > > On Tue, Mar 28, 2017 at 05:38:41PM +0800, Chuanxiao Dong wrote: > > > > > > > GVT request needs a manual mmio load/restore. Before GuC > > > > > > > submit a request, send notification to gvt for mmio loading. > > > > > > > And after the GuC finished this GVT request, notify gvt > > > > > > > again for > > mmio restore. > > > > > > > This follows the usage when using execlists submission. > > > > > > > > > > > > > > Cc: xiao.zh...@intel.com > > > > > > > Cc: kevin.t...@intel.com > > > > > > > Cc: joonas.lahti...@linux.intel.com > > > > > > > Cc: ch...@chris-wilson.co.uk > > > > > > > Signed-off-by: Chuanxiao Dong > > > > > > > --- > > > > > > > drivers/gpu/drm/i915/i915_guc_submission.c | 4 > > > > > > > drivers/gpu/drm/i915/intel_gvt.h | 12 > > > > > > > drivers/gpu/drm/i915/intel_lrc.c | 21 > > > > > > > +++-- > > > > > > > 3 files changed, 19 insertions(+), 18 deletions(-) > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c > > > > > > > b/drivers/gpu/drm/i915/i915_guc_submission.c > > > > > > > index 58087630..d8a5942 100644 > > > > > > > --- a/drivers/gpu/drm/i915/i915_guc_submission.c > > > > > > > +++ b/drivers/gpu/drm/i915/i915_guc_submission.c > > > > > > > @@ -606,6 +606,8 @@ static void __i915_guc_submit(struct > > > > > > drm_i915_gem_request *rq) > > > > > > > unsigned long flags; > > > > > > > int b_ret; > > > > > > > > > > > > > > + intel_gvt_notify_context_status(rq, > > > > > > > +INTEL_CONTEXT_SCHEDULE_IN); > > > > > > > + > > > > > > > /* WA to flush out the pending GMADR writes to ring buffer. > > */ > > > > > > > if (i915_vma_is_map_and_fenceable(rq->ring->vma)) > > > > > > > POSTING_READ_FW(GUC_STATUS); @@ -725,6 > > +727,8 @@ static > > > > > > > void i915_guc_irq_handler(unsigned long > > > > data) > > > > > > > rq = port[0].request; > > > > > > > while (rq && i915_gem_request_completed(rq)) { > > > > > > > trace_i915_gem_request_out(rq); > > > > > > > + intel_gvt_notify_context_status(rq, > > > > > > > + > > INTEL_CONTEXT_SCHEDULE_OUT); > > > > > > > > > > > > This is incorrect though. This is no better than just waiting > > > > > > for the request, which is not enough since the idea is that > > > > > > you need to wait for the context image to be completely > > > > > > written to memory before > > > > you read it. > > > > > > -Chris > > > > > > > > > > The wait for the context image to be completely written will be > > > > > done in the > > > > notification from the GVT, by checking the CSB. If put the wait > > > > here will made each i915 request to wait, which seems not necessary. > > > > > > > > Urm, no. I hope you mean the wait will be on some other thread > > > > than inside this interrupt handler. > > > > > > The SCHEDULE_OUT means to stop GuC to submit another request > before > > the current one is completed
Re: [Intel-gfx] [PATCH] ACPI / bus: Introduce a list of ids for "always present" devices
Hi, On 27-02-17 23:53, Takashi Iwai wrote: On Mon, 27 Feb 2017 22:27:58 +0100, Rafael J. Wysocki wrote: On Mon, Feb 27, 2017 at 3:40 PM, Takashi Iwaiwrote: Oh, this is interesting, please let me join to the party. We've hit a similar problem, but for other one: namely, it's INT0002 that is found on a few CHT devices. It's never bound properly by a similar reason, where _STA is always zero on Linux: Method (_STA, 0, NotSerialized) // _STA: Status { If (SOCS <= 0x04) { Return (0x0F) } Else { Return (Zero) } } The device is supposed to be a "virtual GPIO" stuff, and the driver hasn't been upstreamed from Intel. Due to the lack of this driver (and it's binding), the machine gets spurious IRQ#9 after the PM resume, and it locks up at the second time of PM. Well, the solution here seems to be to acquire the missing driver instead of adding quirks to the ACPI core. The driver is available (not upstreamed, though), but it's not bound due to _STA above. That's why I'm interested in this thread. Takashi thanks for pointing me to the INT0002 device / driver to fix the spurious IRQ#9 after the PM resume issue. I was seeing this on the GPD-win too (when suspending with power-button + wakeup with spacebar). I've added a patch to add the INT0002 device to the always present device list + cleaned up the INT0002 driver. I plan to submit this upstream soon. If you want to test this on your device you need these patches: https://github.com/jwrdegoede/linux-sunxi/commit/a1a6e92f2665376ed72f575553238a93e88bb037 https://github.com/jwrdegoede/linux-sunxi/commit/4946f68f8eaa300f42289bf767722d78cf7fa9e2 https://github.com/jwrdegoede/linux-sunxi/commit/32640c816dd60d17f003ae8306863da01c215afb https://github.com/jwrdegoede/linux-sunxi/commit/abb6a9d69690bb2a1a00b184b06cdae43d6ad001 Regards, Hans ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/5] i915: flush gem obj freeing workqueues to add accuracy to the i915 shrinker
On Fri, Apr 07, 2017 at 04:30:11PM +0100, Chris Wilson wrote: > Not getting hangs is a good sign, but lockdep doesn't like it: > > [ 460.684901] WARNING: CPU: 1 PID: 172 at kernel/workqueue.c:2418 > check_flush_dependency+0x92/0x130 > [ 460.684924] workqueue: PF_MEMALLOC task 172(kworker/1:1H) is flushing > !WQ_MEM_RECLAIM events:__i915_gem_free_work [i915] > > If I allocated the workqueue with WQ_MEM_RELCAIM, it complains bitterly > as well. So in PF_MEMALLOC context we can't flush a workqueue with !WQ_MEM_RECLAIM. system_wq = alloc_workqueue("events", 0, 0); My point is synchronize_rcu_expedited will still push its work in the same system_wq workqueue... /* Marshall arguments & schedule the expedited grace period. */ rew.rew_func = func; rew.rew_rsp = rsp; rew.rew_s = s; INIT_WORK_ONSTACK(_work, wait_rcu_exp_gp); schedule_work(_work); It's also using schedule_work, so either the above is a false positive, or we've still a problem with synchronize_rcu_expedited, just a reproducible issue anymore after we stop running it under the struct_mutex. Even synchronize_sched will wait on the system_wq if synchronize_rcu_expedited has been issued in parallel by some other code, it's just there's no check for it because it's not invoking flush_work to wait. The deadlock happens if we flush_work() while holding any lock that may be taken by any of the workqueues that could be queued there. i915 makes sure to flush_work only if the struct_mutex was released (not my initial version) but we're not checking if any of the other system_wq workqueues could possibly taking a lock that may have been hold during the allocation that invoked reclaim, I suppose that is the problem left, but I don't see how flush_work is special about it, synchronize_rcu_expedited would still have the same issue no? (despite no lockdep warning) I suspect this means synchronize_rcu_expedited() is not usable in reclaim context and lockdep should warn if PF_MEMALLOC is set when synchronize_rcu_expedited/synchronize_sched/synchronize_rcu are called. Probably to fix this we should create a private workqueue for both RCU and i915 and stop sharing the system_wq with the rest of the system (and of course set WQ_MEM_RECLAIM in both workqueues). This makes sure when we call synchronize_rcu_expedited; flush_work from the shrinker, we won't risk waiting on other random work that may be taking locks that are hold by the code that invoked reclaim during an allocation. The macro bug of waiting on system_wq 100% of the time while always holding the struct_mutex is gone, but we need to perfect this further and stop using the system_wq for RCU and i915 shrinker work. > We do need it for shrinker_count as well. If we just report 0 objects, Yes the shrinker_count too. > the shrinker_scan callback will be skipped, iirc. All we do need it for > direct calls to i915_gem_shrink() which themselves may or may not be > underneath the struct_mutex at the time. Yes. > I don't follow, > > static inline struct task_struct *__mutex_owner(struct mutex *lock) > { > return (struct task_struct *)(atomic_long_read(>owner) & ~0x07); > } > > The atomic read is equivalent to READ_ONCE(). What's broken here? (I > guess strict aliasing and pointer cast?) That was an oversight, atomic64_read does READ_ONCE internally (as it should of course being an atomic read). I didn't recall it uses atomic_long_read. Thanks, Andrea ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/3] dma-fence: Reserve 0 as a special NO_CONTEXT token
Am 09.04.2017 um 18:02 schrieb Chris Wilson: On Sun, Apr 09, 2017 at 09:08:53AM +0200, Christian König wrote: Am 08.04.2017 um 20:00 schrieb Chris Wilson: On Sat, Apr 08, 2017 at 07:49:37PM +0200, Christian König wrote: Am 08.04.2017 um 18:26 schrieb Chris Wilson: Reserve 0 for general use a token meaning that the fence doesn't belong to an ordered timeline (fence context). NAK, we kept context allocation cheap to avoid exactly that. However, they result in very sparse mappings. Which is perfectly fine at least for how we use this in Radeon and Amdgpu. The fence context is used as key for a hashtable, even when the context is only used once we want an evenly distribution over all of them. The ht is a more expensive solution. Not when the fences without contexts are rare. Please elaborate further why it should be necessary now. Because I want to efficiently exclude them from comparisons as demonstrated by this small series as there may be several hundred such fences as dependencies for this job. That would horrible break Amdgpu, Radeon and the GPU scheduler because all of them assume that context numbers are unique. Why would it break if it respected the trivial notion of an unordered timeline? You seem to miss the point. We already had that discussion and decided against using no-context fences. Now we have a whole bunch of cases over different drivers/components where we assume uniqueness of fence contexts. If you change this without fixing all those cases they will just subtle break, probably without anybody noticing the problem immediately. So if you don't want to scan all drivers for such cases (and I know at least three of hand) and fix them before that patch then that is a clear no-go for this patch. I suggest to just use two separate fence contexts for you clflush problem in i915. We have a similar situation in the GPU scheduler solve it the same way. Regards, Christian. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/3] dma-fence: Reserve 0 as a special NO_CONTEXT token
On Sun, Apr 09, 2017 at 09:08:53AM +0200, Christian König wrote: > Am 08.04.2017 um 20:00 schrieb Chris Wilson: > >On Sat, Apr 08, 2017 at 07:49:37PM +0200, Christian König wrote: > >>Am 08.04.2017 um 18:26 schrieb Chris Wilson: > >>>Reserve 0 for general use a token meaning that the fence doesn't belong > >>>to an ordered timeline (fence context). > >>NAK, we kept context allocation cheap to avoid exactly that. > >However, they result in very sparse mappings. > > Which is perfectly fine at least for how we use this in Radeon and Amdgpu. > > The fence context is used as key for a hashtable, even when the > context is only used once we want an evenly distribution over all of > them. The ht is a more expensive solution. > >>Please elaborate further why it should be necessary now. > >Because I want to efficiently exclude them from comparisons as > >demonstrated by this small series as there may be several hundred such > >fences as dependencies for this job. > > That would horrible break Amdgpu, Radeon and the GPU scheduler > because all of them assume that context numbers are unique. Why would it break if it respected the trivial notion of an unordered timeline? -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [regression] Re: 4.11-rc0, thinkpad x220: GPU hang
On Mon 2017-03-06 12:23:41, Chris Wilson wrote: > On Mon, Mar 06, 2017 at 01:10:48PM +0100, Pavel Machek wrote: > > On Mon 2017-03-06 11:15:28, Chris Wilson wrote: > > > On Mon, Mar 06, 2017 at 12:01:51AM +0100, Pavel Machek wrote: > > > > Hi! > > > > > > > > > > mplayer stopped working after a while. Dmesg says: > > > > > > > > > > > > [ 3000.266533] cdc_ether 2-1.2:1.0 usb0: register 'cdc_ether' at > > > > > > > > Now I'm pretty sure it is a regression in v4.11-rc0. Any ideas what to > > > > try? Bisect will be slow and nasty :-(. > > > > > > I came the conclusion that #99671 is the ring HEAD overtaking the TAIL, > > > and under the presumption that your bug matches (as the symptoms do): > > > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c > > > index 4ffa35faff49..62e31a7438ac 100644 > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > > @@ -782,10 +782,10 @@ static void i9xx_submit_request(struct > > > drm_i915_gem_request *request) > > > { > > > struct drm_i915_private *dev_priv = request->i915; > > > > > > - i915_gem_request_submit(request); > > > - > > > GEM_BUG_ON(!IS_ALIGNED(request->tail, 8)); > > > I915_WRITE_TAIL(request->engine, request->tail); > > > + > > > + i915_gem_request_submit(request); > > > } > > > > > > static void i9xx_emit_breadcrumb(struct drm_i915_gem_request *req, u32 > > > *cs) > > > > I applied it as: > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c > > b/drivers/gpu/drm/i915/intel_ringbuffer.c > > index 91bc4ab..9c49c7a 100644 > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > @@ -1338,9 +1338,9 @@ static void i9xx_submit_request(struct > > drm_i915_gem_request *request) > > { > > struct drm_i915_private *dev_priv = request->i915; > > > > - i915_gem_request_submit(request); > > - > > I915_WRITE_TAIL(request->engine, request->tail); > > + > > + i915_gem_request_submit(request); > > } > > > > static void i9xx_emit_breadcrumb(struct drm_i915_gem_request *req, > > > > Hmm. But your next mail suggest that it may not be smart to try to > > boot it? :-). > > Don't bother, it'll promptly hang. Any news here? 4.11-rc5 is actually usable on the hardware (unlike -rc1), not sure what changed. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3] ACPI / bus: Introduce a list of ids for "always present" devices
On Sun, Apr 09, 2017 at 11:46:41AM +0200, Hans de Goede wrote: > On 09-04-17 09:26, Lukas Wunner wrote: > > On Sat, Apr 08, 2017 at 05:05:11PM +0200, Hans de Goede wrote: > > > + pr_debug("Device [%s] is in always present list setting status > > > [%08x]\n", > > > + adev->pnp.bus_id, ACPI_STA_DEFAULT); > > > > In a lot of places in drivers/acpi/, dev_warn(>dev, ...) is used. > > Rafael asked me to use pr_debug here. But I agree that maybe always > logging something when this trigger might be a good idea, although > warning is too high a loglevel IMHO. So I would got with dev_info. Sorry for not being clear, I just meant that existing code seems to prefer the dev_*() macros to report device-specific stuff, I wasn't referring to the severity level. Thanks, Lukas ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3] ACPI / bus: Introduce a list of ids for "always present" devices
Hi, On 09-04-17 09:26, Lukas Wunner wrote: On Sat, Apr 08, 2017 at 05:05:11PM +0200, Hans de Goede wrote: Several cherrytrail devices (all of which ship with windows 10) hide the lpss pwm controller in ACPI, typically the _STA method looks like this: Method (_STA, 0, NotSerialized) // _STA: Status { If (OSID == One) { Return (Zero) } Return (0x0F) } Where OSID is some dark magic seen in all cherrytrail ACPI tables making the machine behave differently depending on which OS it *thinks* it is booting, this gets set in a number of ways which we cannot control, on some newer machines it simple hardcoded to "One" aka win10. Would it be a viable alternative to respond differently to _OSI queries for these devices by amending acpi_osi_dmi_table[] in drivers/acpi/osi.c? Nope, _OSI influences OSYS not OSID, OSID is some quite bad Cherry Trail hack where the BIOS behaves differently based on whether it thinks the OS it is booting is going to be Android or Windows and there is nothing one can do to influence OSID (other then hardcoding special handling of that variable name in ACPICA I guess). But even if we could influence OSID we really don't want to, for all the same reasons our ACPI implementation always claims to be the latest windows. + pr_debug("Device [%s] is in always present list setting status [%08x]\n", +adev->pnp.bus_id, ACPI_STA_DEFAULT); In a lot of places in drivers/acpi/, dev_warn(>dev, ...) is used. Rafael asked me to use pr_debug here. But I agree that maybe always logging something when this trigger might be a good idea, although warning is too high a loglevel IMHO. So I would got with dev_info. Rafael, what is your take on this ? Regards, Hans ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3] ACPI / bus: Introduce a list of ids for "always present" devices
On Sat, Apr 08, 2017 at 05:05:11PM +0200, Hans de Goede wrote: > Several cherrytrail devices (all of which ship with windows 10) hide the > lpss pwm controller in ACPI, typically the _STA method looks like this: > > Method (_STA, 0, NotSerialized) // _STA: Status > { > If (OSID == One) > { > Return (Zero) > } > > Return (0x0F) > } > > Where OSID is some dark magic seen in all cherrytrail ACPI tables making > the machine behave differently depending on which OS it *thinks* it is > booting, this gets set in a number of ways which we cannot control, on > some newer machines it simple hardcoded to "One" aka win10. Would it be a viable alternative to respond differently to _OSI queries for these devices by amending acpi_osi_dmi_table[] in drivers/acpi/osi.c? > + pr_debug("Device [%s] is in always present list setting status > [%08x]\n", > + adev->pnp.bus_id, ACPI_STA_DEFAULT); In a lot of places in drivers/acpi/, dev_warn(>dev, ...) is used. Otherwise LGTM. Thanks, Lukas ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/3] dma-fence: Reserve 0 as a special NO_CONTEXT token
Am 08.04.2017 um 20:00 schrieb Chris Wilson: On Sat, Apr 08, 2017 at 07:49:37PM +0200, Christian König wrote: Am 08.04.2017 um 18:26 schrieb Chris Wilson: Reserve 0 for general use a token meaning that the fence doesn't belong to an ordered timeline (fence context). NAK, we kept context allocation cheap to avoid exactly that. However, they result in very sparse mappings. Which is perfectly fine at least for how we use this in Radeon and Amdgpu. The fence context is used as key for a hashtable, even when the context is only used once we want an evenly distribution over all of them. Please elaborate further why it should be necessary now. Because I want to efficiently exclude them from comparisons as demonstrated by this small series as there may be several hundred such fences as dependencies for this job. That would horrible break Amdgpu, Radeon and the GPU scheduler because all of them assume that context numbers are unique. So that is a clear NAK from my side, Christian. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx