Re: [Intel-gfx] [PATCH] drm/i915/execlists: Fix annotation for decoupling virtual request

2019-10-07 Thread Tvrtko Ursulin


On 04/10/2019 20:47, Chris Wilson wrote:

As we may signal a request and take the engine->active.lock within the
signaler, the engine submission paths have to use a nested annotation on
their requests -- but we guarantee that we can never submit on the same
engine as the signaling fence.

<4>[  723.763281] WARNING: possible circular locking dependency detected
<4>[  723.763285] 5.3.0-g80fa0e042cdb-drmtip_379+ #1 Tainted: G U
<4>[  723.763288] --
<4>[  723.763291] gem_exec_await/1388 is trying to acquire lock:
<4>[  723.763294] 93a7b53221d8 (>active.lock){..-.}, at: 
execlists_submit_request+0x2b/0x1e0 [i915]
<4>[  723.763378]
   but task is already holding lock:
<4>[  723.763381] 93a7c25f6d20 (_request_get(rq)->submit/1){-.-.}, at: 
__i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[  723.763420]
   which lock already depends on the new lock.

<4>[  723.763423]
   the existing dependency chain (in reverse order) is:
<4>[  723.763427]
   -> #2 (_request_get(rq)->submit/1){-.-.}:
<4>[  723.763434]_raw_spin_lock_irqsave_nested+0x39/0x50
<4>[  723.763478]__i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[  723.763513]intel_engine_breadcrumbs_irq+0x3aa/0x5e0 [i915]
<4>[  723.763600]cs_irq_handler+0x49/0x50 [i915]
<4>[  723.763659]gen11_gt_irq_handler+0x17b/0x280 [i915]
<4>[  723.763690]gen11_irq_handler+0x54/0xf0 [i915]
<4>[  723.763695]__handle_irq_event_percpu+0x41/0x2d0
<4>[  723.763699]handle_irq_event_percpu+0x2b/0x70
<4>[  723.763702]handle_irq_event+0x2f/0x50
<4>[  723.763706]handle_edge_irq+0xee/0x1a0
<4>[  723.763709]do_IRQ+0x7e/0x160
<4>[  723.763712]ret_from_intr+0x0/0x1d
<4>[  723.763717]__slab_alloc.isra.28.constprop.33+0x4f/0x70
<4>[  723.763720]kmem_cache_alloc+0x28d/0x2f0
<4>[  723.763724]vm_area_dup+0x15/0x40
<4>[  723.763727]dup_mm+0x2dd/0x550
<4>[  723.763730]copy_process+0xf21/0x1ef0
<4>[  723.763734]_do_fork+0x71/0x670
<4>[  723.763737]__se_sys_clone+0x6e/0xa0
<4>[  723.763741]do_syscall_64+0x4f/0x210
<4>[  723.763744]entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  723.763747]
   -> #1 (&(>lock)->rlock#2){-.-.}:
<4>[  723.763752]_raw_spin_lock+0x2a/0x40
<4>[  723.763789]__unwind_incomplete_requests+0x3eb/0x450 [i915]
<4>[  723.763825]__execlists_submission_tasklet+0x9ec/0x1d60 [i915]
<4>[  723.763864]execlists_submission_tasklet+0x34/0x50 [i915]
<4>[  723.763874]tasklet_action_common.isra.5+0x47/0xb0
<4>[  723.763878]__do_softirq+0xd8/0x4ae
<4>[  723.763881]irq_exit+0xa9/0xc0
<4>[  723.763883]smp_apic_timer_interrupt+0xb7/0x280
<4>[  723.763887]apic_timer_interrupt+0xf/0x20
<4>[  723.763892]cpuidle_enter_state+0xae/0x450
<4>[  723.763895]cpuidle_enter+0x24/0x40
<4>[  723.763899]do_idle+0x1e7/0x250
<4>[  723.763902]cpu_startup_entry+0x14/0x20
<4>[  723.763905]start_secondary+0x15f/0x1b0
<4>[  723.763908]secondary_startup_64+0xa4/0xb0
<4>[  723.763911]
   -> #0 (>active.lock){..-.}:
<4>[  723.763916]__lock_acquire+0x15d8/0x1ea0
<4>[  723.763919]lock_acquire+0xa6/0x1c0
<4>[  723.763922]_raw_spin_lock_irqsave+0x33/0x50
<4>[  723.763956]execlists_submit_request+0x2b/0x1e0 [i915]
<4>[  723.764002]submit_notify+0xa8/0x13c [i915]
<4>[  723.764035]__i915_sw_fence_complete+0x81/0x250 [i915]
<4>[  723.764054]i915_sw_fence_wake+0x51/0x64 [i915]
<4>[  723.764054]__i915_sw_fence_complete+0x1ee/0x250 [i915]
<4>[  723.764054]dma_i915_sw_fence_wake_timer+0x14/0x20 [i915]
<4>[  723.764054]dma_fence_signal_locked+0x9e/0x1c0
<4>[  723.764054]dma_fence_signal+0x1f/0x40
<4>[  723.764054]vgem_fence_signal_ioctl+0x67/0xc0 [vgem]
<4>[  723.764054]drm_ioctl_kernel+0x83/0xf0
<4>[  723.764054]drm_ioctl+0x2f3/0x3b0
<4>[  723.764054]do_vfs_ioctl+0xa0/0x6f0
<4>[  723.764054]ksys_ioctl+0x35/0x60
<4>[  723.764054]__x64_sys_ioctl+0x11/0x20
<4>[  723.764054]do_syscall_64+0x4f/0x210
<4>[  723.764054]entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  723.764054]
   other info that might help us debug this:

<4>[  723.764054] Chain exists of:
 >active.lock --> &(>lock)->rlock#2 --> 
_request_get(rq)->submit/1

<4>[  723.764054]  Possible unsafe locking scenario:

<4>[  723.764054]CPU0CPU1
<4>[  723.764054]
<4>[  723.764054]   lock(_request_get(rq)->submit/1);
<4>[  723.764054]lock(&(>lock)->rlock#2);
<4>[  723.764054]
lock(_request_get(rq)->submit/1);
<4>[  723.764054]   

[Intel-gfx] [PATCH] drm/i915/execlists: Fix annotation for decoupling virtual request

2019-10-04 Thread Chris Wilson
As we may signal a request and take the engine->active.lock within the
signaler, the engine submission paths have to use a nested annotation on
their requests -- but we guarantee that we can never submit on the same
engine as the signaling fence.

<4>[  723.763281] WARNING: possible circular locking dependency detected
<4>[  723.763285] 5.3.0-g80fa0e042cdb-drmtip_379+ #1 Tainted: G U
<4>[  723.763288] --
<4>[  723.763291] gem_exec_await/1388 is trying to acquire lock:
<4>[  723.763294] 93a7b53221d8 (>active.lock){..-.}, at: 
execlists_submit_request+0x2b/0x1e0 [i915]
<4>[  723.763378]
  but task is already holding lock:
<4>[  723.763381] 93a7c25f6d20 (_request_get(rq)->submit/1){-.-.}, at: 
__i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[  723.763420]
  which lock already depends on the new lock.

<4>[  723.763423]
  the existing dependency chain (in reverse order) is:
<4>[  723.763427]
  -> #2 (_request_get(rq)->submit/1){-.-.}:
<4>[  723.763434]_raw_spin_lock_irqsave_nested+0x39/0x50
<4>[  723.763478]__i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[  723.763513]intel_engine_breadcrumbs_irq+0x3aa/0x5e0 [i915]
<4>[  723.763600]cs_irq_handler+0x49/0x50 [i915]
<4>[  723.763659]gen11_gt_irq_handler+0x17b/0x280 [i915]
<4>[  723.763690]gen11_irq_handler+0x54/0xf0 [i915]
<4>[  723.763695]__handle_irq_event_percpu+0x41/0x2d0
<4>[  723.763699]handle_irq_event_percpu+0x2b/0x70
<4>[  723.763702]handle_irq_event+0x2f/0x50
<4>[  723.763706]handle_edge_irq+0xee/0x1a0
<4>[  723.763709]do_IRQ+0x7e/0x160
<4>[  723.763712]ret_from_intr+0x0/0x1d
<4>[  723.763717]__slab_alloc.isra.28.constprop.33+0x4f/0x70
<4>[  723.763720]kmem_cache_alloc+0x28d/0x2f0
<4>[  723.763724]vm_area_dup+0x15/0x40
<4>[  723.763727]dup_mm+0x2dd/0x550
<4>[  723.763730]copy_process+0xf21/0x1ef0
<4>[  723.763734]_do_fork+0x71/0x670
<4>[  723.763737]__se_sys_clone+0x6e/0xa0
<4>[  723.763741]do_syscall_64+0x4f/0x210
<4>[  723.763744]entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  723.763747]
  -> #1 (&(>lock)->rlock#2){-.-.}:
<4>[  723.763752]_raw_spin_lock+0x2a/0x40
<4>[  723.763789]__unwind_incomplete_requests+0x3eb/0x450 [i915]
<4>[  723.763825]__execlists_submission_tasklet+0x9ec/0x1d60 [i915]
<4>[  723.763864]execlists_submission_tasklet+0x34/0x50 [i915]
<4>[  723.763874]tasklet_action_common.isra.5+0x47/0xb0
<4>[  723.763878]__do_softirq+0xd8/0x4ae
<4>[  723.763881]irq_exit+0xa9/0xc0
<4>[  723.763883]smp_apic_timer_interrupt+0xb7/0x280
<4>[  723.763887]apic_timer_interrupt+0xf/0x20
<4>[  723.763892]cpuidle_enter_state+0xae/0x450
<4>[  723.763895]cpuidle_enter+0x24/0x40
<4>[  723.763899]do_idle+0x1e7/0x250
<4>[  723.763902]cpu_startup_entry+0x14/0x20
<4>[  723.763905]start_secondary+0x15f/0x1b0
<4>[  723.763908]secondary_startup_64+0xa4/0xb0
<4>[  723.763911]
  -> #0 (>active.lock){..-.}:
<4>[  723.763916]__lock_acquire+0x15d8/0x1ea0
<4>[  723.763919]lock_acquire+0xa6/0x1c0
<4>[  723.763922]_raw_spin_lock_irqsave+0x33/0x50
<4>[  723.763956]execlists_submit_request+0x2b/0x1e0 [i915]
<4>[  723.764002]submit_notify+0xa8/0x13c [i915]
<4>[  723.764035]__i915_sw_fence_complete+0x81/0x250 [i915]
<4>[  723.764054]i915_sw_fence_wake+0x51/0x64 [i915]
<4>[  723.764054]__i915_sw_fence_complete+0x1ee/0x250 [i915]
<4>[  723.764054]dma_i915_sw_fence_wake_timer+0x14/0x20 [i915]
<4>[  723.764054]dma_fence_signal_locked+0x9e/0x1c0
<4>[  723.764054]dma_fence_signal+0x1f/0x40
<4>[  723.764054]vgem_fence_signal_ioctl+0x67/0xc0 [vgem]
<4>[  723.764054]drm_ioctl_kernel+0x83/0xf0
<4>[  723.764054]drm_ioctl+0x2f3/0x3b0
<4>[  723.764054]do_vfs_ioctl+0xa0/0x6f0
<4>[  723.764054]ksys_ioctl+0x35/0x60
<4>[  723.764054]__x64_sys_ioctl+0x11/0x20
<4>[  723.764054]do_syscall_64+0x4f/0x210
<4>[  723.764054]entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  723.764054]
  other info that might help us debug this:

<4>[  723.764054] Chain exists of:
>active.lock --> &(>lock)->rlock#2 --> 
_request_get(rq)->submit/1

<4>[  723.764054]  Possible unsafe locking scenario:

<4>[  723.764054]CPU0CPU1
<4>[  723.764054]
<4>[  723.764054]   lock(_request_get(rq)->submit/1);
<4>[  723.764054]lock(&(>lock)->rlock#2);
<4>[  723.764054]
lock(_request_get(rq)->submit/1);
<4>[  723.764054]   lock(>active.lock);
<4>[  723.764054]
   ***