Re: [BUG] Deadlock in blk_mq_register_disk error path

2016-08-16 Thread Jinpu Wang
On Mon, Aug 15, 2016 at 6:22 PM, Bart Van Assche
 wrote:
> On 08/15/2016 09:01 AM, Jinpu Wang wrote:
>>
>> It's more likely you hit another bug, my colleague Roman fix that:
>>
>> http://www.spinics.net/lists/linux-block/msg04552.html
>
>
> Hello Jinpu,
>
> Interesting. However, I see that wrote the following: "Firstly this wrong
> sequence raises two kernel warnings: 1st. WARNING at
> lib/percpu-recount.c:309 percpu_ref_kill_and_confirm called more than once
> 2nd. WARNING at lib/percpu-refcount.c:331". I haven't seen any of these
> kernel warnings ...
>
> Thanks,
>
> Bart.
>

The warning happened from time to time, but your hung tasks are
similar with ours.
We injected some delay in order to reproduce easily.


-- 
Mit freundlichen Grüßen,
Best Regards,

Jack Wang

Linux Kernel Developer Storage
ProfitBricks GmbH  The IaaS-Company.

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin
Tel: +49 30 5770083-42
Fax: +49 30 5770085-98
Email: jinpu.w...@profitbricks.com
URL: http://www.profitbricks.de

Sitz der Gesellschaft: Berlin.
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B.
Geschäftsführer: Andreas Gauger, Achim Weiss.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Deadlock in blk_mq_register_disk error path

2016-08-15 Thread Bart Van Assche

On 08/15/2016 10:15 AM, Jens Axboe wrote:

Can you reproduce at will? Would be nice to know if it hit the error case,
which is where it would hang.


Hello Jens,

Unfortunately this hang is only triggered sporadically by my tests. 
Since about four weeks ago I triggered several thousand 
scsi_remove_host() calls with my https://github.com/bvanassche/srp-test 
software. This morning was the first time that I ran into a blk-mq 
related hang.


Thanks,

Bart.


--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Deadlock in blk_mq_register_disk error path

2016-08-15 Thread Jens Axboe

On 08/15/2016 09:53 AM, Bart Van Assche wrote:

On 08/02/2016 10:21 AM, Jens Axboe wrote:

On 08/02/2016 06:58 AM, Jinpu Wang wrote:

Hi Jens,

I found in blk_mq_register_disk, we blk_mq_disable_hotplug which in
turn mutex_lock(_q_mutex);
  queue_for_each_hw_ctx(q, hctx, i) {
ret = blk_mq_register_hctx(hctx);
if (ret)
break; /// if about error out, we will call
unregister below
}

if (ret)
blk_mq_unregister_disk(disk);

In blk_mq_unregister_disk, we will try to disable_hotplug again, which
leads to dead lock.

Did I miss anything?


Nope, your analysis looks correct. This should fix it:

http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus=6316338a94b2319abe9d3790eb9cdc56ef81ac1a


Hi Jens,

Will that patch be included in stable kernels? I just encountered a
deadlock with kernel v4.7 that looks similar.


Sure, we can push to stable, it's a pretty straight forward patch. Can
you reproduce at will? Would be nice to know if it hit the error case,
which is where it would hang.

--
Jens Axboe
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Deadlock in blk_mq_register_disk error path

2016-08-15 Thread Bart Van Assche

On 08/15/2016 09:01 AM, Jinpu Wang wrote:

It's more likely you hit another bug, my colleague Roman fix that:

http://www.spinics.net/lists/linux-block/msg04552.html


Hello Jinpu,

Interesting. However, I see that wrote the following: "Firstly this 
wrong sequence raises two kernel warnings: 1st. WARNING at 
lib/percpu-recount.c:309 percpu_ref_kill_and_confirm called more than 
once 2nd. WARNING at lib/percpu-refcount.c:331". I haven't seen any of 
these kernel warnings ...


Thanks,

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: [BUG] Deadlock in blk_mq_register_disk error path

2016-08-15 Thread Jinpu Wang
Hi Bart,

>>
>> Nope, your analysis looks correct. This should fix it:
>>
>> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus=6316338a94b2319abe9d3790eb9cdc56ef81ac1a
>
> Hi Jens,
>
> Will that patch be included in stable kernels? I just encountered a
> deadlock with kernel v4.7 that looks similar.
>
> Thank you,
>
> Bart.
>
> INFO: task kworker/u64:6:136 blocked for more than 480 seconds.
>   Tainted: GW   4.7.0-dbg+ #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u64:6   D 88016f677bb0 0   136  2 0x
> Workqueue: events_unbound async_run_entry_fn
> Call Trace:
>  [] schedule+0x37/0x90
>  [] schedule_preempt_disabled+0x10/0x20
>  [] mutex_lock_nested+0x144/0x350
>  [] blk_mq_disable_hotplug+0x12/0x20
>  [] blk_mq_register_disk+0x29/0x120
>  [] blk_register_queue+0xb6/0x160
>  [] add_disk+0x219/0x4a0
>  [] sd_probe_async+0x100/0x1b0
>  [] async_run_entry_fn+0x45/0x140
>  [] process_one_work+0x1f9/0x6a0
>  [] worker_thread+0x49/0x490
>  [] kthread+0xea/0x100
>  [] ret_from_fork+0x1f/0x40
> 3 locks held by kworker/u64:6/136:
>  #0:  ("events_unbound"){.+.+.+}, at: [] 
> process_one_work+0x17a/0x6a0
>  #1:  ((>work)){+.+.+.}, at: [] 
> process_one_work+0x17a/0x6a0
>  #2:  (all_q_mutex){+.+.+.}, at: [] 
> blk_mq_disable_hotplug+0x12/0x20
> INFO: task 02:8101 blocked for more than 480 seconds.
>   Tainted: GW   4.7.0-dbg+ #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> 02  D 88039b747968 0  8101  1 0x0004
> Call Trace:
>  [] schedule+0x37/0x90
>  [] blk_mq_freeze_queue_wait+0x51/0xb0
>  [] blk_mq_update_tag_set_depth+0x3a/0xb0
>  [] blk_mq_init_allocated_queue+0x432/0x450
>  [] blk_mq_init_queue+0x35/0x60
>  [] scsi_mq_alloc_queue+0x17/0x50
>  [] scsi_alloc_sdev+0x2b9/0x350
>  [] scsi_probe_and_add_lun+0x98b/0xe50
>  [] __scsi_scan_target+0x5ca/0x6b0
>  [] scsi_scan_target+0xe1/0xf0
>  [] srp_create_target+0xf06/0x13d4 [ib_srp]
>  [] dev_attr_store+0x13/0x20
>  [] sysfs_kf_write+0x40/0x50
>  [] kernfs_fop_write+0x137/0x1c0
>  [] __vfs_write+0x23/0x140
>  [] vfs_write+0xb0/0x190
>  [] SyS_write+0x44/0xa0
>  [] entry_SYSCALL_64_fastpath+0x18/0xa8
> 8 locks held by 02/8101:
>  #0:  (sb_writers#4){.+.+.+}, at: [] 
> __sb_start_write+0xb2/0xf0
>  #1:  (>mutex){+.+.+.}, at: [] 
> kernfs_fop_write+0x101/0x1c0
>  #2:  (s_active#363){.+.+.+}, at: [] 
> kernfs_fop_write+0x10a/0x1c0
>  #3:  (>add_target_mutex){+.+.+.}, at: [] 
> srp_create_target+0x134/0x13d4 [ib_srp]
>  #4:  (>scan_mutex){+.+.+.}, at: [] 
> scsi_scan_target+0x8d/0xf0
>  #5:  (cpu_hotplug.lock){++}, at: [] 
> get_online_cpus+0x2d/0x80
>  #6:  (all_q_mutex){+.+.+.}, at: [] 
> blk_mq_init_allocated_queue+0x34a/0x450
>  #7:  (>tag_list_lock){+.+...}, at: [] 
> blk_mq_init_allocated_queue+0x37a/0x450
>

It's more likely you hit another bug, my colleague Roman fix that:

http://www.spinics.net/lists/linux-block/msg04552.html

It will be great, you test and see if it works for you!

-- 
Mit freundlichen Grüßen,
Best Regards,

Jack Wang

Linux Kernel Developer Storage
ProfitBricks GmbH  The IaaS-Company.

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin
Tel: +49 30 5770083-42
Fax: +49 30 5770085-98
Email: jinpu.w...@profitbricks.com
URL: http://www.profitbricks.de

Sitz der Gesellschaft: Berlin.
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B.
Geschäftsführer: Andreas Gauger, Achim Weiss.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: [BUG] Deadlock in blk_mq_register_disk error path

2016-08-15 Thread Bart Van Assche
On 08/02/2016 10:21 AM, Jens Axboe wrote:
> On 08/02/2016 06:58 AM, Jinpu Wang wrote:
>> Hi Jens,
>>
>> I found in blk_mq_register_disk, we blk_mq_disable_hotplug which in
>> turn mutex_lock(_q_mutex);
>>   queue_for_each_hw_ctx(q, hctx, i) {
>> ret = blk_mq_register_hctx(hctx);
>> if (ret)
>> break; /// if about error out, we will call
>> unregister below
>> }
>>
>> if (ret)
>> blk_mq_unregister_disk(disk);
>>
>> In blk_mq_unregister_disk, we will try to disable_hotplug again, which
>> leads to dead lock.
>>
>> Did I miss anything?
> 
> Nope, your analysis looks correct. This should fix it:
> 
> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus=6316338a94b2319abe9d3790eb9cdc56ef81ac1a

Hi Jens,

Will that patch be included in stable kernels? I just encountered a
deadlock with kernel v4.7 that looks similar.

Thank you,

Bart.

INFO: task kworker/u64:6:136 blocked for more than 480 seconds.
  Tainted: GW   4.7.0-dbg+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u64:6   D 88016f677bb0 0   136  2 0x
Workqueue: events_unbound async_run_entry_fn
Call Trace:
 [] schedule+0x37/0x90
 [] schedule_preempt_disabled+0x10/0x20
 [] mutex_lock_nested+0x144/0x350
 [] blk_mq_disable_hotplug+0x12/0x20
 [] blk_mq_register_disk+0x29/0x120
 [] blk_register_queue+0xb6/0x160
 [] add_disk+0x219/0x4a0
 [] sd_probe_async+0x100/0x1b0
 [] async_run_entry_fn+0x45/0x140
 [] process_one_work+0x1f9/0x6a0
 [] worker_thread+0x49/0x490
 [] kthread+0xea/0x100
 [] ret_from_fork+0x1f/0x40
3 locks held by kworker/u64:6/136:
 #0:  ("events_unbound"){.+.+.+}, at: [] 
process_one_work+0x17a/0x6a0
 #1:  ((>work)){+.+.+.}, at: [] 
process_one_work+0x17a/0x6a0
 #2:  (all_q_mutex){+.+.+.}, at: [] 
blk_mq_disable_hotplug+0x12/0x20
INFO: task 02:8101 blocked for more than 480 seconds.
  Tainted: GW   4.7.0-dbg+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
02  D 88039b747968 0  8101  1 0x0004
Call Trace:
 [] schedule+0x37/0x90
 [] blk_mq_freeze_queue_wait+0x51/0xb0
 [] blk_mq_update_tag_set_depth+0x3a/0xb0
 [] blk_mq_init_allocated_queue+0x432/0x450
 [] blk_mq_init_queue+0x35/0x60
 [] scsi_mq_alloc_queue+0x17/0x50
 [] scsi_alloc_sdev+0x2b9/0x350
 [] scsi_probe_and_add_lun+0x98b/0xe50
 [] __scsi_scan_target+0x5ca/0x6b0
 [] scsi_scan_target+0xe1/0xf0
 [] srp_create_target+0xf06/0x13d4 [ib_srp]
 [] dev_attr_store+0x13/0x20
 [] sysfs_kf_write+0x40/0x50
 [] kernfs_fop_write+0x137/0x1c0
 [] __vfs_write+0x23/0x140
 [] vfs_write+0xb0/0x190
 [] SyS_write+0x44/0xa0
 [] entry_SYSCALL_64_fastpath+0x18/0xa8
8 locks held by 02/8101:
 #0:  (sb_writers#4){.+.+.+}, at: [] 
__sb_start_write+0xb2/0xf0
 #1:  (>mutex){+.+.+.}, at: [] 
kernfs_fop_write+0x101/0x1c0
 #2:  (s_active#363){.+.+.+}, at: [] 
kernfs_fop_write+0x10a/0x1c0
 #3:  (>add_target_mutex){+.+.+.}, at: [] 
srp_create_target+0x134/0x13d4 [ib_srp]
 #4:  (>scan_mutex){+.+.+.}, at: [] 
scsi_scan_target+0x8d/0xf0
 #5:  (cpu_hotplug.lock){++}, at: [] 
get_online_cpus+0x2d/0x80
 #6:  (all_q_mutex){+.+.+.}, at: [] 
blk_mq_init_allocated_queue+0x34a/0x450
 #7:  (>tag_list_lock){+.+...}, at: [] 
blk_mq_init_allocated_queue+0x37a/0x450

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html