Re: [Intel-gfx] [PATCH] drm/i915: Preallocate mmu notifier to unbreak cpu hotplug deadlock

2017-10-05 Thread Thomas Gleixner
On Thu, 5 Oct 2017, Daniel Vetter wrote:
> On Thu, Oct 05, 2017 at 06:19:30PM +0200, Thomas Gleixner wrote:
> > On Thu, 5 Oct 2017, Daniel Vetter wrote:
> > > On Thu, Oct 05, 2017 at 05:23:20PM +0200, Thomas Gleixner wrote:
> > > > Aside of that, is it really required to use stomp_machine() for this
> > > > synchronization? We certainly have less intrusive mechansisms than that.
> > > 
> > > Yeah, the stop_machine needs to go, I'm working on something that uses
> > > rcu_read_lock+synchronize_rcu for this case. Probably shouldn't have
> > > merged even.
> > > 
> > > Now this one isn't the one I wanted to fix with this patch since there's
> > > clearly something dubious going on on the i915 side too.
> > 
> > I already wondered :)
> > 
> > > The proper trace, with the same part on the cpu hotplug side, highlights
> > > that you can't allocate a workqueue while hodling mmap_sem. That one
> > > matches patch description a bit better :-)
> > 
> > > Sorry for misleading you, should have checked to attach the right one. No
> > > stop_machine()/i915_gem_set_wedged() in the below one.
> > 
> > Well the problem is more or less the same and what I said about solving it
> > in a different place is still valid. I think about it some more, but don't
> > expect wonders :)
> 
> Yeah just want to make you aware there's now new implications in the
> locking maze and that we overall decide to break the loop in the right
> place. Also adding Tejun, since this is about workqueues, I forgot him.
> 
> tldr for Tejun: The new cross-release stuff in lockdep seems to indicate
> that we cannot allocate a new workqueue while holding mmap_sem. Full
> details in the thread.

The issue is not restricted to work queues and mmap_sem. There is the
general problem of:

 cpuhotplug -> cpu_up/down -> callback -> device_create/destroy()

which creates a dependency between

  cpuhotplug_rwsem and devfs locking

So now any chain which either holds a devfs lock or has a separate
dependecy chain on the devfs locks and then calls some function which tries
to take cpuhotplug_rwsem will trigger a splat. Rightfully so 

So in your case mmap_sem is involved in that, but that's not a
prerequisite. There are a gazillion other ways to achieve that.

The pattern which causes that is device creation in a hotplug callback and
then some other device access (read/write/ioctl) which ends up to acquire
cpuhotplug_rwsem plus a connection of both chains through arbitrary locks.

I'm trying to move out that decice_create/remove stuff from the regular
hotplug states, but I haven't found a solution for that yet which is
neither butt ugly nor creates other hard to solve problems.

Maybe a glas of wine or some sleep or both will help to get over that :)

Surely anyone is welcome to beat me to it.

Thanks,

tglx



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Preallocate mmu notifier to unbreak cpu hotplug deadlock

2017-10-05 Thread Daniel Vetter
On Thu, Oct 05, 2017 at 06:19:30PM +0200, Thomas Gleixner wrote:
> On Thu, 5 Oct 2017, Daniel Vetter wrote:
> > On Thu, Oct 05, 2017 at 05:23:20PM +0200, Thomas Gleixner wrote:
> > > Aside of that, is it really required to use stomp_machine() for this
> > > synchronization? We certainly have less intrusive mechansisms than that.
> > 
> > Yeah, the stop_machine needs to go, I'm working on something that uses
> > rcu_read_lock+synchronize_rcu for this case. Probably shouldn't have
> > merged even.
> > 
> > Now this one isn't the one I wanted to fix with this patch since there's
> > clearly something dubious going on on the i915 side too.
> 
> I already wondered :)
> 
> > The proper trace, with the same part on the cpu hotplug side, highlights
> > that you can't allocate a workqueue while hodling mmap_sem. That one
> > matches patch description a bit better :-)
> 
> > Sorry for misleading you, should have checked to attach the right one. No
> > stop_machine()/i915_gem_set_wedged() in the below one.
> 
> Well the problem is more or less the same and what I said about solving it
> in a different place is still valid. I think about it some more, but don't
> expect wonders :)

Yeah just want to make you aware there's now new implications in the
locking maze and that we overall decide to break the loop in the right
place. Also adding Tejun, since this is about workqueues, I forgot him.

tldr for Tejun: The new cross-release stuff in lockdep seems to indicate
that we cannot allocate a new workqueue while holding mmap_sem. Full
details in the thread.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Preallocate mmu notifier to unbreak cpu hotplug deadlock

2017-10-05 Thread Thomas Gleixner
On Thu, 5 Oct 2017, Daniel Vetter wrote:
> On Thu, Oct 05, 2017 at 05:23:20PM +0200, Thomas Gleixner wrote:
> > Aside of that, is it really required to use stomp_machine() for this
> > synchronization? We certainly have less intrusive mechansisms than that.
> 
> Yeah, the stop_machine needs to go, I'm working on something that uses
> rcu_read_lock+synchronize_rcu for this case. Probably shouldn't have
> merged even.
> 
> Now this one isn't the one I wanted to fix with this patch since there's
> clearly something dubious going on on the i915 side too.

I already wondered :)

> The proper trace, with the same part on the cpu hotplug side, highlights
> that you can't allocate a workqueue while hodling mmap_sem. That one
> matches patch description a bit better :-)

> Sorry for misleading you, should have checked to attach the right one. No
> stop_machine()/i915_gem_set_wedged() in the below one.

Well the problem is more or less the same and what I said about solving it
in a different place is still valid. I think about it some more, but don't
expect wonders :)

Thanks,

tglx


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Preallocate mmu notifier to unbreak cpu hotplug deadlock

2017-10-05 Thread Thomas Gleixner
On Thu, 5 Oct 2017, Daniel Vetter wrote:

> 4.14-rc1 gained the fancy new cross-release support in lockdep, which
> seems to have uncovered a few more rules about what is allowed and
> isn't.
> 
> This one here seems to indicate that allocating a work-queue while
> holding mmap_sem is a no-go, so let's try to preallocate it.
> 
> Of course another way to break this chain would be somewhere in the
> cpu hotplug code, since this isn't the only trace we're finding now
> which goes through msr_create_device.

That's an interesting multi chain circular dependency which is related to
devfs.

Now the MSR device is not the only one which is creating that
dependency. We have CPUID and MCE as well. That's what a quick search in
x86 revealed. No idea whether there are more of those interesting bits and
pieces.

To fix it on the hotplug side we'd have to introduce extra state space
which is handled outside the cpuhotplug_rwsem region, but inside of the
cpu_maps_update_begin()/end() region, which has a nasty pile of
implications vs. the state registration/deregistration as this stuff can be
built as modules. So we'd need a complete set of new interfaces and
handling routines with some explicit restrictions on those state callbacks.

I rather prefer not to go there unless its unavoidable, which brings me to
the obvious question about the stop_machine() usage in the graphics code.

void i915_gem_set_wedged(struct drm_i915_private *dev_priv)
{
stop_machine(__i915_gem_set_wedged_BKL, dev_priv, NULL);
}

The function name is telling. The machine is wedged and stop_machine()
might make it even more wedged when looking at this splat :)

The called function name is interesting as well. Is that _BKL postfix a
leftover of the BKL removal a couple of years ago?

Aside of that, is it really required to use stomp_machine() for this
synchronization? We certainly have less intrusive mechansisms than that.

Thanks,

tglx







___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Preallocate mmu notifier to unbreak cpu hotplug deadlock

2017-10-05 Thread Daniel Vetter
On Thu, Oct 05, 2017 at 05:23:20PM +0200, Thomas Gleixner wrote:
> On Thu, 5 Oct 2017, Daniel Vetter wrote:
> 
> > 4.14-rc1 gained the fancy new cross-release support in lockdep, which
> > seems to have uncovered a few more rules about what is allowed and
> > isn't.
> > 
> > This one here seems to indicate that allocating a work-queue while
> > holding mmap_sem is a no-go, so let's try to preallocate it.
> > 
> > Of course another way to break this chain would be somewhere in the
> > cpu hotplug code, since this isn't the only trace we're finding now
> > which goes through msr_create_device.
> 
> That's an interesting multi chain circular dependency which is related to
> devfs.
> 
> Now the MSR device is not the only one which is creating that
> dependency. We have CPUID and MCE as well. That's what a quick search in
> x86 revealed. No idea whether there are more of those interesting bits and
> pieces.
> 
> To fix it on the hotplug side we'd have to introduce extra state space
> which is handled outside the cpuhotplug_rwsem region, but inside of the
> cpu_maps_update_begin()/end() region, which has a nasty pile of
> implications vs. the state registration/deregistration as this stuff can be
> built as modules. So we'd need a complete set of new interfaces and
> handling routines with some explicit restrictions on those state callbacks.
> 
> I rather prefer not to go there unless its unavoidable, which brings me to
> the obvious question about the stop_machine() usage in the graphics code.
> 
> void i915_gem_set_wedged(struct drm_i915_private *dev_priv)
> {
> stop_machine(__i915_gem_set_wedged_BKL, dev_priv, NULL);
> }
> 
> The function name is telling. The machine is wedged and stop_machine()
> might make it even more wedged when looking at this splat :)
> 
> The called function name is interesting as well. Is that _BKL postfix a
> leftover of the BKL removal a couple of years ago?
> 
> Aside of that, is it really required to use stomp_machine() for this
> synchronization? We certainly have less intrusive mechansisms than that.

Yeah, the stop_machine needs to go, I'm working on something that uses
rcu_read_lock+synchronize_rcu for this case. Probably shouldn't have
merged even.

Now this one isn't the one I wanted to fix with this patch since there's
clearly something dubious going on on the i915 side too. The proper trace,
with the same part on the cpu hotplug side, highlights that you can't
allocate a workqueue while hodling mmap_sem. That one matches patch
description a bit better :-)

Sorry for misleading you, should have checked to attach the right one. No
stop_machine()/i915_gem_set_wedged() in the below one.
-Daniel

==
WARNING: possible circular locking dependency detected
4.14.0-rc3-CI-CI_DRM_3172+ #1 Tainted: G U 
--
prime_mmap/1588 is trying to acquire lock:
 (cpu_hotplug_lock.rw_sem){}, at: [] 
apply_workqueue_attrs+0x17/0x50

but task is already holding lock:
 (_priv->mm_lock){+.+.}, at: [] 
i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #6 (_priv->mm_lock){+.+.}:
   __lock_acquire+0x1420/0x15e0
   lock_acquire+0xb0/0x200
   __mutex_lock+0x86/0x9b0
   mutex_lock_nested+0x1b/0x20
   i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
   i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
   drm_ioctl_kernel+0x69/0xb0
   drm_ioctl+0x2f9/0x3d0
   do_vfs_ioctl+0x94/0x670
   SyS_ioctl+0x41/0x70
   entry_SYSCALL_64_fastpath+0x1c/0xb1

-> #5 (>mmap_sem){}:
   __lock_acquire+0x1420/0x15e0
   lock_acquire+0xb0/0x200
   __might_fault+0x68/0x90
   _copy_to_user+0x23/0x70
   filldir+0xa5/0x120
   dcache_readdir+0xf9/0x170
   iterate_dir+0x69/0x1a0
   SyS_getdents+0xa5/0x140
   entry_SYSCALL_64_fastpath+0x1c/0xb1

-> #4 (>s_type->i_mutex_key#5){}:
   down_write+0x3b/0x70
   handle_create+0xcb/0x1e0
   devtmpfsd+0x139/0x180
   kthread+0x152/0x190
   ret_from_fork+0x27/0x40

-> #3 ((complete)){+.+.}:
   __lock_acquire+0x1420/0x15e0
   lock_acquire+0xb0/0x200
   wait_for_common+0x58/0x210
   wait_for_completion+0x1d/0x20
   devtmpfs_create_node+0x13d/0x160
   device_add+0x5eb/0x620
   device_create_groups_vargs+0xe0/0xf0
   device_create+0x3a/0x40
   msr_device_create+0x2b/0x40
   cpuhp_invoke_callback+0xc9/0xbf0
   cpuhp_thread_fun+0x17b/0x240
   smpboot_thread_fn+0x18a/0x280
   kthread+0x152/0x190
   ret_from_fork+0x27/0x40

-> #2 (cpuhp_state-up){+.+.}:
   __lock_acquire+0x1420/0x15e0
   lock_acquire+0xb0/0x200
   cpuhp_issue_call+0x133/0x1c0
   __cpuhp_setup_state_cpuslocked+0x139/0x2a0
   __cpuhp_setup_state+0x46/0x60
   page_writeback_init+0x43/0x67
   pagecache_init+0x3d/0x42
   

Re: [Intel-gfx] [PATCH] drm/i915: Preallocate mmu notifier to unbreak cpu hotplug deadlock

2017-10-05 Thread Chris Wilson
Quoting Daniel Vetter (2017-10-05 14:22:06)
> 4.14-rc1 gained the fancy new cross-release support in lockdep, which
> seems to have uncovered a few more rules about what is allowed and
> isn't.
> 
> This one here seems to indicate that allocating a work-queue while
> holding mmap_sem is a no-go, so let's try to preallocate it.
> 
> Of course another way to break this chain would be somewhere in the
> cpu hotplug code, since this isn't the only trace we're finding now
> which goes through msr_create_device.
> 
> Full lockdep splat:
> 
> ==
> WARNING: possible circular locking dependency detected
> 4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G U
> --
> kworker/3:4/562 is trying to acquire lock:
>  (cpu_hotplug_lock.rw_sem){}, at: [] 
> stop_machine+0x1c/0x40
> 
> but task is already holding lock:
>  (>struct_mutex){+.+.}, at: [] 
> i915_reset_device+0x1e8/0x260 [i915]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #6 (>struct_mutex){+.+.}:
>__lock_acquire+0x1420/0x15e0
>lock_acquire+0xb0/0x200
>__mutex_lock+0x86/0x9b0
>mutex_lock_interruptible_nested+0x1b/0x20
>i915_mutex_lock_interruptible+0x51/0x130 [i915]
>i915_gem_fault+0x209/0x650 [i915]
>__do_fault+0x1e/0x80
>__handle_mm_fault+0xa08/0xed0
>handle_mm_fault+0x156/0x300
>__do_page_fault+0x2c5/0x570
>do_page_fault+0x28/0x250
>page_fault+0x22/0x30
> 
> -> #5 (>mmap_sem){}:
>__lock_acquire+0x1420/0x15e0
>lock_acquire+0xb0/0x200
>__might_fault+0x68/0x90
>_copy_to_user+0x23/0x70
>filldir+0xa5/0x120
>dcache_readdir+0xf9/0x170
>iterate_dir+0x69/0x1a0
>SyS_getdents+0xa5/0x140
>entry_SYSCALL_64_fastpath+0x1c/0xb1
> 
> -> #4 (>s_type->i_mutex_key#5){}:
>down_write+0x3b/0x70
>handle_create+0xcb/0x1e0
>devtmpfsd+0x139/0x180
>kthread+0x152/0x190
>ret_from_fork+0x27/0x40
> 
> -> #3 ((complete)){+.+.}:
>__lock_acquire+0x1420/0x15e0
>lock_acquire+0xb0/0x200
>wait_for_common+0x58/0x210
>wait_for_completion+0x1d/0x20
>devtmpfs_create_node+0x13d/0x160
>device_add+0x5eb/0x620
>device_create_groups_vargs+0xe0/0xf0
>device_create+0x3a/0x40
>msr_device_create+0x2b/0x40
>cpuhp_invoke_callback+0xc9/0xbf0
>cpuhp_thread_fun+0x17b/0x240
>smpboot_thread_fn+0x18a/0x280
>kthread+0x152/0x190
>ret_from_fork+0x27/0x40
> 
> -> #2 (cpuhp_state-up){+.+.}:
>__lock_acquire+0x1420/0x15e0
>lock_acquire+0xb0/0x200
>cpuhp_issue_call+0x133/0x1c0
>__cpuhp_setup_state_cpuslocked+0x139/0x2a0
>__cpuhp_setup_state+0x46/0x60
>page_writeback_init+0x43/0x67
>pagecache_init+0x3d/0x42
>start_kernel+0x3a8/0x3fc
>x86_64_start_reservations+0x2a/0x2c
>x86_64_start_kernel+0x6d/0x70
>verify_cpu+0x0/0xfb
> 
> -> #1 (cpuhp_state_mutex){+.+.}:
>__lock_acquire+0x1420/0x15e0
>lock_acquire+0xb0/0x200
>__mutex_lock+0x86/0x9b0
>mutex_lock_nested+0x1b/0x20
>__cpuhp_setup_state_cpuslocked+0x53/0x2a0
>__cpuhp_setup_state+0x46/0x60
>page_alloc_init+0x28/0x30
>start_kernel+0x145/0x3fc
>x86_64_start_reservations+0x2a/0x2c
>x86_64_start_kernel+0x6d/0x70
>verify_cpu+0x0/0xfb
> 
> -> #0 (cpu_hotplug_lock.rw_sem){}:
>check_prev_add+0x430/0x840
>__lock_acquire+0x1420/0x15e0
>lock_acquire+0xb0/0x200
>cpus_read_lock+0x3d/0xb0
>stop_machine+0x1c/0x40
>i915_gem_set_wedged+0x1a/0x20 [i915]
>i915_reset+0xb9/0x230 [i915]
>i915_reset_device+0x1f6/0x260 [i915]
>i915_handle_error+0x2d8/0x430 [i915]
>hangcheck_declare_hang+0xd3/0xf0 [i915]
>i915_hangcheck_elapsed+0x262/0x2d0 [i915]
>process_one_work+0x233/0x660
>worker_thread+0x4e/0x3b0
>kthread+0x152/0x190
>ret_from_fork+0x27/0x40
> 
> other info that might help us debug this:
> 
> Chain exists of:
>   cpu_hotplug_lock.rw_sem --> >mmap_sem --> >struct_mutex
> 
>  Possible unsafe locking scenario:
> 
>CPU0CPU1
>
>   lock(>struct_mutex);
>lock(>mmap_sem);
>lock(>struct_mutex);
>   lock(cpu_hotplug_lock.rw_sem);
> 
>  *** DEADLOCK ***
> 
> 3 locks held by kworker/3:4/562:
>  #0:  ("events_long"){+.+.}, at: [] 
> process_one_work+0x1aa/0x660
>  #1:  ((&(>gpu_error.hangcheck_work)->work)){+.+.}, at: 
> [] process_one_work+0x1aa/0x660
>  #2:  (>struct_mutex){+.+.}, at: [] 
> i915_reset_device+0x1e8/0x260 [i915]
> 
> stack backtrace:
> CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G U 

Re: [Intel-gfx] [PATCH] drm/i915: Preallocate mmu notifier to unbreak cpu hotplug deadlock

2017-10-05 Thread Chris Wilson
Quoting Daniel Vetter (2017-10-05 14:22:06)
> 4.14-rc1 gained the fancy new cross-release support in lockdep, which
> seems to have uncovered a few more rules about what is allowed and
> isn't.
> 
> This one here seems to indicate that allocating a work-queue while
> holding mmap_sem is a no-go, so let's try to preallocate it.
> 
> Of course another way to break this chain would be somewhere in the
> cpu hotplug code, since this isn't the only trace we're finding now
> which goes through msr_create_device.
> 
> Full lockdep splat:
> 
> ==
> WARNING: possible circular locking dependency detected
> 4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G U
> --
> kworker/3:4/562 is trying to acquire lock:
>  (cpu_hotplug_lock.rw_sem){}, at: [] 
> stop_machine+0x1c/0x40
> 
> but task is already holding lock:
>  (>struct_mutex){+.+.}, at: [] 
> i915_reset_device+0x1e8/0x260 [i915]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #6 (>struct_mutex){+.+.}:
>__lock_acquire+0x1420/0x15e0
>lock_acquire+0xb0/0x200
>__mutex_lock+0x86/0x9b0
>mutex_lock_interruptible_nested+0x1b/0x20
>i915_mutex_lock_interruptible+0x51/0x130 [i915]
>i915_gem_fault+0x209/0x650 [i915]
>__do_fault+0x1e/0x80
>__handle_mm_fault+0xa08/0xed0
>handle_mm_fault+0x156/0x300
>__do_page_fault+0x2c5/0x570
>do_page_fault+0x28/0x250
>page_fault+0x22/0x30
> 
> -> #5 (>mmap_sem){}:
>__lock_acquire+0x1420/0x15e0
>lock_acquire+0xb0/0x200
>__might_fault+0x68/0x90
>_copy_to_user+0x23/0x70
>filldir+0xa5/0x120
>dcache_readdir+0xf9/0x170
>iterate_dir+0x69/0x1a0
>SyS_getdents+0xa5/0x140
>entry_SYSCALL_64_fastpath+0x1c/0xb1
> 
> -> #4 (>s_type->i_mutex_key#5){}:
>down_write+0x3b/0x70
>handle_create+0xcb/0x1e0
>devtmpfsd+0x139/0x180
>kthread+0x152/0x190
>ret_from_fork+0x27/0x40
> 
> -> #3 ((complete)){+.+.}:
>__lock_acquire+0x1420/0x15e0
>lock_acquire+0xb0/0x200
>wait_for_common+0x58/0x210
>wait_for_completion+0x1d/0x20
>devtmpfs_create_node+0x13d/0x160
>device_add+0x5eb/0x620
>device_create_groups_vargs+0xe0/0xf0
>device_create+0x3a/0x40
>msr_device_create+0x2b/0x40
>cpuhp_invoke_callback+0xc9/0xbf0
>cpuhp_thread_fun+0x17b/0x240
>smpboot_thread_fn+0x18a/0x280
>kthread+0x152/0x190
>ret_from_fork+0x27/0x40
> 
> -> #2 (cpuhp_state-up){+.+.}:
>__lock_acquire+0x1420/0x15e0
>lock_acquire+0xb0/0x200
>cpuhp_issue_call+0x133/0x1c0
>__cpuhp_setup_state_cpuslocked+0x139/0x2a0
>__cpuhp_setup_state+0x46/0x60
>page_writeback_init+0x43/0x67
>pagecache_init+0x3d/0x42
>start_kernel+0x3a8/0x3fc
>x86_64_start_reservations+0x2a/0x2c
>x86_64_start_kernel+0x6d/0x70
>verify_cpu+0x0/0xfb
> 
> -> #1 (cpuhp_state_mutex){+.+.}:
>__lock_acquire+0x1420/0x15e0
>lock_acquire+0xb0/0x200
>__mutex_lock+0x86/0x9b0
>mutex_lock_nested+0x1b/0x20
>__cpuhp_setup_state_cpuslocked+0x53/0x2a0
>__cpuhp_setup_state+0x46/0x60
>page_alloc_init+0x28/0x30
>start_kernel+0x145/0x3fc
>x86_64_start_reservations+0x2a/0x2c
>x86_64_start_kernel+0x6d/0x70
>verify_cpu+0x0/0xfb
> 
> -> #0 (cpu_hotplug_lock.rw_sem){}:
>check_prev_add+0x430/0x840
>__lock_acquire+0x1420/0x15e0
>lock_acquire+0xb0/0x200
>cpus_read_lock+0x3d/0xb0
>stop_machine+0x1c/0x40
>i915_gem_set_wedged+0x1a/0x20 [i915]
>i915_reset+0xb9/0x230 [i915]
>i915_reset_device+0x1f6/0x260 [i915]
>i915_handle_error+0x2d8/0x430 [i915]
>hangcheck_declare_hang+0xd3/0xf0 [i915]
>i915_hangcheck_elapsed+0x262/0x2d0 [i915]
>process_one_work+0x233/0x660
>worker_thread+0x4e/0x3b0
>kthread+0x152/0x190
>ret_from_fork+0x27/0x40
> 
> other info that might help us debug this:
> 
> Chain exists of:
>   cpu_hotplug_lock.rw_sem --> >mmap_sem --> >struct_mutex
> 
>  Possible unsafe locking scenario:
> 
>CPU0CPU1
>
>   lock(>struct_mutex);
>lock(>mmap_sem);
>lock(>struct_mutex);
>   lock(cpu_hotplug_lock.rw_sem);
> 
>  *** DEADLOCK ***
> 
> 3 locks held by kworker/3:4/562:
>  #0:  ("events_long"){+.+.}, at: [] 
> process_one_work+0x1aa/0x660
>  #1:  ((&(>gpu_error.hangcheck_work)->work)){+.+.}, at: 
> [] process_one_work+0x1aa/0x660
>  #2:  (>struct_mutex){+.+.}, at: [] 
> i915_reset_device+0x1e8/0x260 [i915]
> 
> stack backtrace:
> CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G U