BUG: scsi/qla2xxx: scsi_done from qla2x00_sp_compl race with scsi_queue_insert from abort handler

2017-11-07 Thread jianchao.wang
Hi 

[1.] One line summary of the problem:
scsi_done from qla2x00_sp_compl race with scsi_queue_insert from 
scmd_eh_abort_handler()
then cause the BUG_ON(blk_queued_rq(req)) trigger.

[2.] Full description of the problem/report:
The detailed scene is as following:
cpu A cpu B
kworker   interrupt
 
scmd_eh_abort_handler()   
  -> scsi_try_to_abort_cmd()   qla2x00_status_entry()   
-> qla2xxx_eh_abort()-> res = DID_RESET << 16 
//case CS_ABORTED   
  -> qla2x00_eh_wait_on_command()-> qla2x00_sp_compl()  

   +-> cmd->result = res; // 
--->P0_B
  while (CMD_SP(cmd) && wait_iter--) { +-> qla2x00_sp_free_dma()
 
msleep(ABORT_POLLING_PERIOD);  |-> CMD_SP(cmd) = NULL   
  
  }| 
   |
  -> scsi_noretry_cmd() //--->P2_A |
  -> scsi_queue_insert()   +-> scsi_done()
-> __scsi_queue_insert() -> blk_complete_request()
  -> cmd->result = 0; // --->P0_A  |
-> blk_requeue_request()   |
  -> blk_clear_rq_complete() // --->P1_A   |
-> elv_requeue_request()   |
  -> __elv_add_request()   +-> 
blk_mark_rq_complete() //  > P1_B
// req will be queued here +-> 
__blk_complete_request()
BLK_SOFTIRQ
scsi_softirq_done()
  -> 
scsi_decide_disposition() // > P2_B
  // scsi_cmnd->result is 
set to 0 in P0_A
  // so it will return 
SUCCESS
-> scsi_finish_command()
  -> 
scsi_io_completion()
-> 
scsi_end_request()
  -> 
blk_finish_request()  // BUG_ON(blk_queued_rq(req)) !!!

The scsi_cmd->result was set to DID_RESET << 16 at P0_B.
At P2_A, the scsi_noretry_cmd() would return 0 and scsi_queue_insert() would be 
invoked.
When the blk_clear_rq_complete() at P1_A was invoked before the 
blk_mark_rq_complete()
at P1_B, __blk_complete_request() would be invoked in cpu B.
At P2_B, the scsi_decide_disposition() will return SUCCESS as the 
scsi_cmd->result had been set to 0 at P0_A.
Finally, the BUG_ON(blk_queued_rq(req)) in blk_finish_request() was triggered.

Whether the scsi_done should be invoked in qla2xxx_sp_compl() when scsi_cmd is 
aborted ?
When the P1_B is before P1_A, it will do nothing, otherwise, the panic comes up.

Thanks
Jianchao



Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Ming Lei
On Tue, Nov 07, 2017 at 08:17:59PM -0700, Jens Axboe wrote:
> On 11/07/2017 08:12 PM, Ming Lei wrote:
> > On Tue, Nov 07, 2017 at 08:01:48PM -0700, Jens Axboe wrote:
> >> On 11/07/2017 06:03 PM, Ming Lei wrote:
> >>> On Tue, Nov 07, 2017 at 03:06:31PM -0700, Jens Axboe wrote:
>  On 11/07/2017 10:36 AM, Jens Axboe wrote:
> > On 11/07/2017 10:10 AM, Jens Axboe wrote:
> >> On 11/07/2017 09:29 AM, Jens Axboe wrote:
> >>> On 11/07/2017 09:20 AM, Bart Van Assche wrote:
>  On Tue, 2017-11-07 at 10:11 +0800, Ming Lei wrote:
> > If you can reproduce, please provide me at least the following log
> > first:
> >
> > find /sys/kernel/debug/block -name tags | xargs cat | grep busy
> >
> > If any pending requests arn't completed, please provide the related
> > info in dbgfs about where is the request.
> 
>  Every time I ran the above or a similar command its output was 
>  empty. I
>  assume that's because the hang usually occurs in a phase where these 
>  debugfs
>  attributes either have not yet been created or have already 
>  disappeared.
> >>>
> >>> Bart, do you still see a hang with the patch that fixes the tag leak 
> >>> when
> >>> we fail to get a dispatch budget?
> >>>
> >>> https://marc.info/?l=linux-block=151004881411480=2
> >>>
> >>> I've run a lot of stability testing here, and I haven't run into any
> >>> issues. This is with shared tags as well. So if you still see the 
> >>> failure
> >>> case with the current tree AND the above patch, then I'll try and get
> >>> a test case setup that hits it too so we can get to the bottom of 
> >>> this.
> >>
> >> Ming, I managed to reproduce the hang using null_blk. Note this is
> >> WITHOUT the patch mentioned above, running with that now.
> >>
> >> # modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 
> >> submit_queues=1 hw_queue_depth=1
> >>
> >> and using this fio file:
> >>
> >> [global]
> >> bs=4k
> >> rw=randread
> >> norandommap
> >> direct=1
> >> ioengine=libaio
> >> iodepth=4
> >>
> >> [nullb0]
> >> filename=/dev/nullb0
> >> [nullb1]
> >> filename=/dev/nullb1
> >> [nullb2]
> >> filename=/dev/nullb2
> >> [nullb3]
> >> filename=/dev/nullb3
> >>
> >> it seemed to keep running, but it hung when exiting. The troublesome
> >> device was nullb1:
> >>
> >> [  492.513374] INFO: task fio:3263 blocked for more than 120 seconds.
> >> [  492.520782]   Not tainted 4.14.0-rc7+ #499
> >> [  492.526247] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> >> disables this message.
> >> [  492.535904] fio D13208  3263   3211 0x
> >> [  492.542535] Call Trace:
> >> [  492.545764]  __schedule+0x279/0x720
> >> [  492.550151]  schedule+0x33/0x90
> >> [  492.554145]  io_schedule+0x16/0x40
> >> [  492.558435]  blk_mq_get_tag+0x148/0x250
> >> [  492.563211]  ? finish_wait+0x90/0x90
> >> [  492.567693]  blk_mq_get_request+0xf0/0x3e0
> >> [  492.572760]  blk_mq_make_request+0xe2/0x690
> >> [  492.577913]  generic_make_request+0xfc/0x2f0
> >> [  492.583177]  submit_bio+0x64/0x120
> >> [  492.587475]  ? set_page_dirty_lock+0x4b/0x60
> >> [  492.592736]  ? submit_bio+0x64/0x120
> >> [  492.597309]  ? bio_set_pages_dirty+0x55/0x60
> >> [  492.602570]  blkdev_direct_IO+0x388/0x3c0
> >> [  492.607546]  ? free_ioctx_users+0xe0/0xe0
> >> [  492.612511]  ? blk_mq_dispatch_rq_list+0x238/0x3a0
> >> [  492.618353]  ? _raw_spin_unlock+0xe/0x20
> >> [  492.623227]  generic_file_read_iter+0xb3/0xa00
> >> [  492.628682]  ? generic_file_read_iter+0xb3/0xa00
> >> [  492.634334]  ? security_file_permission+0x9b/0xc0
> >> [  492.640114]  blkdev_read_iter+0x35/0x40
> >> [  492.644877]  aio_read+0xc5/0x120
> >> [  492.648973]  ? aio_read_events+0x24c/0x340
> >> [  492.654124]  ? __might_sleep+0x4a/0x80
> >> [  492.658800]  do_io_submit+0x47c/0x5e0
> >> [  492.663373]  ? do_io_submit+0x47c/0x5e0
> >> [  492.668234]  SyS_io_submit+0x10/0x20
> >> [  492.672715]  ? SyS_io_submit+0x10/0x20
> >> [  492.677394]  entry_SYSCALL_64_fastpath+0x13/0x94
> >> [  492.683039] RIP: 0033:0x7f83d1871717
> >> [  492.687521] RSP: 002b:7ffd38fe5a88 EFLAGS: 0202 ORIG_RAX: 
> >> 00d1
> >> [  492.696969] RAX: ffda RBX: 7f83b6972b50 RCX: 
> >> 7f83d1871717
> >> [  492.705423] RDX: 01f41418 RSI: 0001 RDI: 
> >> 7f83e4d36000
> >> [  492.713889] RBP: 0001 R08: 0001 R09: 
> >> 01f3f2e0
> >> [  492.722352] R10: 1000 R11: 0202 R12: 
> >> 7ffd38fe5be0
> >> [  492.730827] R13: 7f83b6972b01 R14: 

[PATCH 1/6] target: Fix QUEUE_FULL + SCSI task attribute handling

2017-11-07 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

This patch fixes a bug during QUEUE_FULL where transport_complete_qf()
calls transport_complete_task_attr() after it's already been invoked
by target_complete_ok_work() or transport_generic_request_failure()
during initial completion, preceeding QUEUE_FULL.

This will result in se_device->simple_cmds, se_device->dev_cur_ordered_id
and/or se_device->dev_ordered_sync being updated multiple times for
a single se_cmd.

To address this bug, clear SCF_TASK_ATTR_SET after the first call
to transport_complete_task_attr(), and avoid updating SCSI task
attribute related counters for any subsequent calls.

Also, when a se_cmd is deferred due to ordered tags and executed
via target_restart_delayed_cmds(), set CMD_T_SENT before execution
matching what target_execute_cmd() does.

Cc: Michael Cyr 
Cc: Bryant G. Ly 
Cc: Mike Christie 
Cc: Hannes Reinecke 
Signed-off-by: Nicholas Bellinger 
---
 drivers/target/target_core_transport.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index 473d652..c33d1e9 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2011,6 +2011,8 @@ static void target_restart_delayed_cmds(struct se_device 
*dev)
list_del(>se_delayed_node);
spin_unlock(>delayed_cmd_lock);
 
+   cmd->transport_state |= CMD_T_SENT;
+
__target_execute_cmd(cmd, true);
 
if (cmd->sam_task_attr == TCM_ORDERED_TAG)
@@ -2046,6 +2048,8 @@ static void transport_complete_task_attr(struct se_cmd 
*cmd)
pr_debug("Incremented dev_cur_ordered_id: %u for ORDERED\n",
 dev->dev_cur_ordered_id);
}
+   cmd->se_cmd_flags &= ~SCF_TASK_ATTR_SET;
+
 restart:
target_restart_delayed_cmds(dev);
 }
-- 
1.9.1



[PATCH 0/6] target fixes for v4.15-rc1

2017-11-07 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

Hi all,

Here are the outstanding target bugfixes in queue for v4.15-rc1
code.

Patch #1 addresses a long standing bug wrt to QUEUE_FULL and
SCSI task attribute handling, that results in SCSI task related
counters getting updated multiple times during QUEUE_FULL.

It primarly effects hosts using ORDERED tasks, which depend upon
these counters to know when to delay incoming tasks.

Patch #2 is for a recent v4.11+ regression, which during ABORT
of COMPARE_AND_WRITE can result in se_device->cam_sem getting
leaked due to se_cmd->transport_complete_callback() being
skipped.

Patch #3 addresses a possible end-less loop during QUEUE_FULL +
TFO->write_pending() failure, allowing se_cmd quiese to properly
complete the outstanding descriptor when requested.

Patch #4 addresses a use-after-tree that was hit in the field,
involving pre-backend execution se_cmd exceptions + subsequent
ABORT_TASK for a matching tag.

Patch #5 + #6 address a iscsi-target TMR related memory and
se_cmd->cmd_kref reference leaks respectively.

We've been testing #4, #5, and #6 internally on v4.1.y stable
code, and have not run into additional regressions.

The rest are straight-forward.

Please review.

--nab

Nicholas Bellinger (6):
  target: Fix QUEUE_FULL + SCSI task attribute handling
  target: Fix caw_sem leak in transport_generic_request_failure
  target: Fix quiese during transport_write_pending_qf endless loop
  target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK
  iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref
  iscsi-target: Fix non-immediate TMR reference leak

 drivers/target/iscsi/iscsi_target.c| 30 ++
 drivers/target/target_core_tmr.c   |  9 +
 drivers/target/target_core_transport.c | 26 +++---
 include/target/target_core_base.h  |  1 +
 4 files changed, 47 insertions(+), 19 deletions(-)

-- 
1.9.1



[PATCH 4/6] target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK

2017-11-07 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

This patch fixes bug where early se_cmd exceptions that occur
before backend execution can result in use-after-free if/when
a subsequent ABORT_TASK occurs for the same tag.

Since an early se_cmd exception will have had se_cmd added to
se_session->sess_cmd_list via target_get_sess_cmd(), it will
not have CMD_T_COMPLETE set by the usual target_complete_cmd()
backend completion path.

This causes a subsequent ABORT_TASK + __target_check_io_state()
to signal ABORT_TASK should proceed.  As core_tmr_abort_task()
executes, it will bring the outstanding se_cmd->cmd_kref count
down to zero releasing se_cmd, after se_cmd has already been
queued with error status into fabric driver response path code.

To address this bug, introduce a CMD_T_PRE_EXECUTE bit that is
set at target_get_sess_cmd() time, and cleared immediately before
backend driver dispatch in target_execute_cmd() once CMD_T_ACTIVE
is set.

Then, check CMD_T_PRE_EXECUTE within __target_check_io_state() to
determine when an early exception has occured, and avoid aborting
this se_cmd since it will have already been queued into fabric
driver response path code.

Reported-by: Donald White 
Cc: Donald White 
Cc: Mike Christie 
Cc: Hannes Reinecke 
Signed-off-by: Nicholas Bellinger 
---
 drivers/target/target_core_tmr.c   | 9 +
 drivers/target/target_core_transport.c | 2 ++
 include/target/target_core_base.h  | 1 +
 3 files changed, 12 insertions(+)

diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
index 61909b2..9c7bc1c 100644
--- a/drivers/target/target_core_tmr.c
+++ b/drivers/target/target_core_tmr.c
@@ -133,6 +133,15 @@ static bool __target_check_io_state(struct se_cmd *se_cmd,
spin_unlock(_cmd->t_state_lock);
return false;
}
+   if (se_cmd->transport_state & CMD_T_PRE_EXECUTE) {
+   if (se_cmd->scsi_status) {
+   pr_debug("Attempted to abort io tag: %llu early failure"
+" status: 0x%02x\n", se_cmd->tag,
+se_cmd->scsi_status);
+   spin_unlock(_cmd->t_state_lock);
+   return false;
+   }
+   }
if (sess->sess_tearing_down || se_cmd->cmd_wait_set) {
pr_debug("Attempted to abort io tag: %llu already shutdown,"
" skipping\n", se_cmd->tag);
diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index 0e89db8..58caacd 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -1975,6 +1975,7 @@ void target_execute_cmd(struct se_cmd *cmd)
}
 
cmd->t_state = TRANSPORT_PROCESSING;
+   cmd->transport_state &= ~CMD_T_PRE_EXECUTE;
cmd->transport_state |= CMD_T_ACTIVE | CMD_T_SENT;
spin_unlock_irq(>t_state_lock);
 
@@ -2667,6 +2668,7 @@ int target_get_sess_cmd(struct se_cmd *se_cmd, bool 
ack_kref)
ret = -ESHUTDOWN;
goto out;
}
+   se_cmd->transport_state |= CMD_T_PRE_EXECUTE;
list_add_tail(_cmd->se_cmd_list, _sess->sess_cmd_list);
 out:
spin_unlock_irqrestore(_sess->sess_cmd_lock, flags);
diff --git a/include/target/target_core_base.h 
b/include/target/target_core_base.h
index d3139a9..ccf501b 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -490,6 +490,7 @@ struct se_cmd {
 #define CMD_T_STOP (1 << 5)
 #define CMD_T_TAS  (1 << 10)
 #define CMD_T_FABRIC_STOP  (1 << 11)
+#define CMD_T_PRE_EXECUTE  (1 << 12)
spinlock_t  t_state_lock;
struct kref cmd_kref;
struct completion   t_transport_stop_comp;
-- 
1.9.1



[PATCH 3/6] target: Fix quiese during transport_write_pending_qf endless loop

2017-11-07 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

This patch fixes a potential end-less loop during QUEUE_FULL,
where cmd->se_tfo->write_pending() callback fails repeatedly
but __transport_wait_for_tasks() has already been invoked to
quiese the outstanding se_cmd descriptor.

To address this bug, this patch adds a CMD_T_STOP|CMD_T_ABORTED
check within transport_write_pending_qf() and invokes the
existing se_cmd->t_transport_stop_comp to signal quiese
completion back to __transport_wait_for_tasks().

Cc: Mike Christie 
Cc: Hannes Reinecke 
Cc: Bryant G. Ly 
Cc: Michael Cyr 
Cc: Potnuri Bharat Teja 
Cc: Sagi Grimberg 
Signed-off-by: Nicholas Bellinger 
---
 drivers/target/target_core_transport.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index d02218c..0e89db8 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2560,7 +2560,20 @@ void transport_kunmap_data_sg(struct se_cmd *cmd)
 
 static void transport_write_pending_qf(struct se_cmd *cmd)
 {
+   unsigned long flags;
int ret;
+   bool stop;
+
+   spin_lock_irqsave(>t_state_lock, flags);
+   stop = (cmd->transport_state & (CMD_T_STOP | CMD_T_ABORTED));
+   spin_unlock_irqrestore(>t_state_lock, flags);
+
+   if (stop) {
+   pr_debug("%s:%d CMD_T_STOP|CMD_T_ABORTED for ITT: 0x%08llx\n",
+   __func__, __LINE__, cmd->tag);
+   complete_all(>t_transport_stop_comp);
+   return;
+   }
 
ret = cmd->se_tfo->write_pending(cmd);
if (ret) {
-- 
1.9.1



[PATCH 2/6] target: Fix caw_sem leak in transport_generic_request_failure

2017-11-07 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

With the recent addition of transport_check_aborted_status() within
transport_generic_request_failure() to avoid sending a SCSI status
exception after CMD_T_ABORTED w/ TAS=1 has occured, it introduced
a COMPARE_AND_WRITE early failure regression.

Namely when COMPARE_AND_WRITE fails and se_device->caw_sem has
been taken by sbc_compare_and_write(), if the new check for
transport_check_aborted_status() returns true and exits,
cmd->transport_complete_callback() -> compare_and_write_post()
is skipped never releasing se_device->caw_sem.

This regression was originally introduced by:

  commit e3b88ee95b4e4bf3e9729a4695d695b9c7c296c8
  Author: Bart Van Assche 
  Date:   Tue Feb 14 16:25:45 2017 -0800

  target: Fix handling of aborted failed commands

To address this bug, move the transport_check_aborted_status()
call after transport_complete_task_attr() and
cmd->transport_complete_callback().

Cc: Mike Christie 
Cc: Hannes Reinecke 
Cc: Bart Van Assche 
Signed-off-by: Nicholas Bellinger 
---
 drivers/target/target_core_transport.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index c33d1e9..d02218c 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -1729,9 +1729,6 @@ void transport_generic_request_failure(struct se_cmd *cmd,
 {
int ret = 0, post_ret = 0;
 
-   if (transport_check_aborted_status(cmd, 1))
-   return;
-
pr_debug("-[ Storage Engine Exception; sense_reason %d\n",
 sense_reason);
target_show_cmd("-[ ", cmd);
@@ -1740,6 +1737,7 @@ void transport_generic_request_failure(struct se_cmd *cmd,
 * For SAM Task Attribute emulation for failed struct se_cmd
 */
transport_complete_task_attr(cmd);
+
/*
 * Handle special case for COMPARE_AND_WRITE failure, where the
 * callback is expected to drop the per device ->caw_sem.
@@ -1748,6 +1746,9 @@ void transport_generic_request_failure(struct se_cmd *cmd,
 cmd->transport_complete_callback)
cmd->transport_complete_callback(cmd, false, _ret);
 
+   if (transport_check_aborted_status(cmd, 1))
+   return;
+
switch (sense_reason) {
case TCM_NON_EXISTENT_LUN:
case TCM_UNSUPPORTED_SCSI_OPCODE:
-- 
1.9.1



[PATCH 5/6] iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref

2017-11-07 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

Since commit 59b6986dbf fixed a potential NULL pointer dereference
by allocating a se_tmr_req for ISCSI_TM_FUNC_TASK_REASSIGN, the
se_tmr_req is currently leaked by iscsit_free_cmd() because no
iscsi_cmd->se_cmd.se_tfo was associated.

To address this, treat ISCSI_TM_FUNC_TASK_REASSIGN like any other
TMR and call transport_init_se_cmd() + target_get_sess_cmd() to
setup iscsi_cmd->se_cmd.se_tfo with se_cmd->cmd_kref of 2.

This will ensure normal release operation once se_cmd->cmd_kref
reaches zero and target_release_cmd_kref() is invoked, se_tmr_req
will be released via existing target_free_cmd_mem() and
core_tmr_release_req() code.

Reported-by: Donald White 
Cc: Donald White 
Cc: Mike Christie 
Cc: Hannes Reinecke 
Signed-off-by: Nicholas Bellinger 
---
 drivers/target/iscsi/iscsi_target.c | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target.c 
b/drivers/target/iscsi/iscsi_target.c
index 541f66a..048d422 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -1955,7 +1955,6 @@ static enum tcm_tmreq_table iscsit_convert_tmf(u8 
iscsi_tmf)
struct iscsi_tmr_req *tmr_req;
struct iscsi_tm *hdr;
int out_of_order_cmdsn = 0, ret;
-   bool sess_ref = false;
u8 function, tcm_function = TMR_UNKNOWN;
 
hdr = (struct iscsi_tm *) buf;
@@ -1988,22 +1987,23 @@ static enum tcm_tmreq_table iscsit_convert_tmf(u8 
iscsi_tmf)
 
cmd->data_direction = DMA_NONE;
cmd->tmr_req = kzalloc(sizeof(*cmd->tmr_req), GFP_KERNEL);
-   if (!cmd->tmr_req)
+   if (!cmd->tmr_req) {
return iscsit_add_reject_cmd(cmd,
 ISCSI_REASON_BOOKMARK_NO_RESOURCES,
 buf);
+   }
+
+   transport_init_se_cmd(>se_cmd, _ops,
+ conn->sess->se_sess, 0, DMA_NONE,
+ TCM_SIMPLE_TAG, cmd->sense_buffer + 2);
+
+   target_get_sess_cmd(>se_cmd, true);
 
/*
 * TASK_REASSIGN for ERL=2 / connection stays inside of
 * LIO-Target $FABRIC_MOD
 */
if (function != ISCSI_TM_FUNC_TASK_REASSIGN) {
-   transport_init_se_cmd(>se_cmd, _ops,
- conn->sess->se_sess, 0, DMA_NONE,
- TCM_SIMPLE_TAG, cmd->sense_buffer + 2);
-
-   target_get_sess_cmd(>se_cmd, true);
-   sess_ref = true;
tcm_function = iscsit_convert_tmf(function);
if (tcm_function == TMR_UNKNOWN) {
pr_err("Unknown iSCSI TMR Function:"
@@ -2119,12 +2119,8 @@ static enum tcm_tmreq_table iscsit_convert_tmf(u8 
iscsi_tmf)
 * For connection recovery, this is also the default action for
 * TMR TASK_REASSIGN.
 */
-   if (sess_ref) {
-   pr_debug("Handle TMR, using sess_ref=true check\n");
-   target_put_sess_cmd(>se_cmd);
-   }
-
iscsit_add_cmd_to_response_queue(cmd, conn, cmd->i_state);
+   target_put_sess_cmd(>se_cmd);
return 0;
 }
 EXPORT_SYMBOL(iscsit_handle_task_mgt_cmd);
-- 
1.9.1



[PATCH 6/6] iscsi-target: Fix non-immediate TMR reference leak

2017-11-07 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

This patch fixes a se_cmd->cmd_kref reference leak that can
occur when a non immediate TMR is proceeded our of command
sequence number order, and CMDSN_LOWER_THAN_EXP is returned
by iscsit_sequence_cmd().

To address this bug, call target_put_sess_cmd() during this
special case following what iscsit_process_scsi_cmd() does
upon CMDSN_LOWER_THAN_EXP.

Cc: Mike Christie 
Cc: Hannes Reinecke 
Signed-off-by: Nicholas Bellinger 
---
 drivers/target/iscsi/iscsi_target.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target.c 
b/drivers/target/iscsi/iscsi_target.c
index 048d422..3b7bb58 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -2094,12 +2094,14 @@ static enum tcm_tmreq_table iscsit_convert_tmf(u8 
iscsi_tmf)
 
if (!(hdr->opcode & ISCSI_OP_IMMEDIATE)) {
int cmdsn_ret = iscsit_sequence_cmd(conn, cmd, buf, hdr->cmdsn);
-   if (cmdsn_ret == CMDSN_HIGHER_THAN_EXP)
+   if (cmdsn_ret == CMDSN_HIGHER_THAN_EXP) {
out_of_order_cmdsn = 1;
-   else if (cmdsn_ret == CMDSN_LOWER_THAN_EXP)
+   } else if (cmdsn_ret == CMDSN_LOWER_THAN_EXP) {
+   target_put_sess_cmd(>se_cmd);
return 0;
-   else if (cmdsn_ret == CMDSN_ERROR_CANNOT_RECOVER)
+   } else if (cmdsn_ret == CMDSN_ERROR_CANNOT_RECOVER) {
return -1;
+   }
}
iscsit_ack_from_expstatsn(conn, be32_to_cpu(hdr->exp_statsn));
 
-- 
1.9.1



Re: [PATCH V2] scsi_debugfs: fix crash in scsi_show_rq()

2017-11-07 Thread Ming Lei
Hi James,

On Tue, Nov 07, 2017 at 07:04:23PM -0800, James Bottomley wrote:
> On Wed, 2017-11-08 at 09:15 +0800, Ming Lei wrote:
> > On Wed, Nov 08, 2017 at 01:06:44AM +, Bart Van Assche wrote:
> > > 
> > > On Wed, 2017-11-08 at 08:59 +0800, Ming Lei wrote:
> > > > 
> > > > On Tue, Nov 07, 2017 at 04:13:48PM +, Bart Van Assche wrote:
> > > > > 
> > > > > On Tue, 2017-11-07 at 23:21 +0800, Ming Lei wrote:
> > > > > > 
> > > > > > diff --git a/drivers/scsi/scsi_debugfs.c
> > > > > > b/drivers/scsi/scsi_debugfs.c
> > > > > > index 5e9755008aed..7a50878446b4 100644
> > > > > > --- a/drivers/scsi/scsi_debugfs.c
> > > > > > +++ b/drivers/scsi/scsi_debugfs.c
> > > > > > @@ -9,7 +9,10 @@ void scsi_show_rq(struct seq_file *m, struct
> > > > > > request *rq)
> > > > > >     int msecs = jiffies_to_msecs(jiffies - cmd-
> > > > > > >jiffies_at_alloc);
> > > > > >     char buf[80];
> > > > > >  
> > > > > > -   __scsi_format_command(buf, sizeof(buf), cmd->cmnd,
> > > > > > cmd->cmd_len);
> > > > > > +   if (cmd->cmnd == scsi_req(rq)->cmd)
> > > > > > +   __scsi_format_command(buf, sizeof(buf), cmd-
> > > > > > >cmnd, cmd->cmd_len);
> > > > > > +   else
> > > > > > +   strcpy(buf, "unknown");
> > > > > >     seq_printf(m, ", .cmd=%s, .retries=%d, allocated
> > > > > > %d.%03d s ago", buf,
> > > > > >        cmd->retries, msecs / 1000, msecs %
> > > > > > 1000);
> > > > > >  }
> > > > > 
> > > > > This change introduces a new bug, namely that "unknown" will
> > > > > appear in the debugfs output if (cmd->cmnd != scsi_req(rq)-
> > > > > >cmd). Have you considered to use
> > > > 
> > > > Because there isn't reliable way to get the command safely, and I
> > > > don't think it is a new bug.
> > > > 
> > > > > 
> > > > > the test (cmd->cmnd != NULL) instead?
> > > > 
> > > > No, that is worse, because you may cause use-after-free if you
> > > > read a freed buffer.
> > > 
> > > It seems like your operating mode is still to contradict all
> > > feedback you get instead of trying to address the feedback you get?
> > > 
> > > Anyway, this is a debugging facility so I'm not convinced that
> > > avoiding a (very sporadic) use-after-free in this code is better
> > > than omitting very useful output.
> > 
> > OK, if no one objects the use-after-free, because this way may
> > trigger warning from some utility, such as KASAN.
> 
> Good grief, this list is supposed to be for technical discussions not
> kindergarten playground squabbles; could you both please act your age?

In my reply, I mentioned I don't object to check NULL any more, and only
provide one extra input about possible KASAN's complaint, which may annoy
people.

> 
> The patch as proposed would lose us all information about PI commands,
> hence the objection.  For the concern about use after free on the NULL
> check, what about modifying sd_uninit_command() to NULL SCpnt->cmnd
> before it calls mempool_free()?  That will likely eliminate the race
> window for printing the command.

That may decrease the window, but can't eliminate it because of write/read
can be reordered.

But looks better, will cook a patch soon, together with using READ_ONCE() and
WRITE_ONCE().

Thanks,
Ming


Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Jens Axboe
On 11/07/2017 08:12 PM, Ming Lei wrote:
> On Tue, Nov 07, 2017 at 08:01:48PM -0700, Jens Axboe wrote:
>> On 11/07/2017 06:03 PM, Ming Lei wrote:
>>> On Tue, Nov 07, 2017 at 03:06:31PM -0700, Jens Axboe wrote:
 On 11/07/2017 10:36 AM, Jens Axboe wrote:
> On 11/07/2017 10:10 AM, Jens Axboe wrote:
>> On 11/07/2017 09:29 AM, Jens Axboe wrote:
>>> On 11/07/2017 09:20 AM, Bart Van Assche wrote:
 On Tue, 2017-11-07 at 10:11 +0800, Ming Lei wrote:
> If you can reproduce, please provide me at least the following log
> first:
>
>   find /sys/kernel/debug/block -name tags | xargs cat | grep busy
>
> If any pending requests arn't completed, please provide the related
> info in dbgfs about where is the request.

 Every time I ran the above or a similar command its output was empty. I
 assume that's because the hang usually occurs in a phase where these 
 debugfs
 attributes either have not yet been created or have already 
 disappeared.
>>>
>>> Bart, do you still see a hang with the patch that fixes the tag leak 
>>> when
>>> we fail to get a dispatch budget?
>>>
>>> https://marc.info/?l=linux-block=151004881411480=2
>>>
>>> I've run a lot of stability testing here, and I haven't run into any
>>> issues. This is with shared tags as well. So if you still see the 
>>> failure
>>> case with the current tree AND the above patch, then I'll try and get
>>> a test case setup that hits it too so we can get to the bottom of this.
>>
>> Ming, I managed to reproduce the hang using null_blk. Note this is
>> WITHOUT the patch mentioned above, running with that now.
>>
>> # modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 
>> submit_queues=1 hw_queue_depth=1
>>
>> and using this fio file:
>>
>> [global]
>> bs=4k
>> rw=randread
>> norandommap
>> direct=1
>> ioengine=libaio
>> iodepth=4
>>
>> [nullb0]
>> filename=/dev/nullb0
>> [nullb1]
>> filename=/dev/nullb1
>> [nullb2]
>> filename=/dev/nullb2
>> [nullb3]
>> filename=/dev/nullb3
>>
>> it seemed to keep running, but it hung when exiting. The troublesome
>> device was nullb1:
>>
>> [  492.513374] INFO: task fio:3263 blocked for more than 120 seconds.
>> [  492.520782]   Not tainted 4.14.0-rc7+ #499
>> [  492.526247] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
>> disables this message.
>> [  492.535904] fio D13208  3263   3211 0x
>> [  492.542535] Call Trace:
>> [  492.545764]  __schedule+0x279/0x720
>> [  492.550151]  schedule+0x33/0x90
>> [  492.554145]  io_schedule+0x16/0x40
>> [  492.558435]  blk_mq_get_tag+0x148/0x250
>> [  492.563211]  ? finish_wait+0x90/0x90
>> [  492.567693]  blk_mq_get_request+0xf0/0x3e0
>> [  492.572760]  blk_mq_make_request+0xe2/0x690
>> [  492.577913]  generic_make_request+0xfc/0x2f0
>> [  492.583177]  submit_bio+0x64/0x120
>> [  492.587475]  ? set_page_dirty_lock+0x4b/0x60
>> [  492.592736]  ? submit_bio+0x64/0x120
>> [  492.597309]  ? bio_set_pages_dirty+0x55/0x60
>> [  492.602570]  blkdev_direct_IO+0x388/0x3c0
>> [  492.607546]  ? free_ioctx_users+0xe0/0xe0
>> [  492.612511]  ? blk_mq_dispatch_rq_list+0x238/0x3a0
>> [  492.618353]  ? _raw_spin_unlock+0xe/0x20
>> [  492.623227]  generic_file_read_iter+0xb3/0xa00
>> [  492.628682]  ? generic_file_read_iter+0xb3/0xa00
>> [  492.634334]  ? security_file_permission+0x9b/0xc0
>> [  492.640114]  blkdev_read_iter+0x35/0x40
>> [  492.644877]  aio_read+0xc5/0x120
>> [  492.648973]  ? aio_read_events+0x24c/0x340
>> [  492.654124]  ? __might_sleep+0x4a/0x80
>> [  492.658800]  do_io_submit+0x47c/0x5e0
>> [  492.663373]  ? do_io_submit+0x47c/0x5e0
>> [  492.668234]  SyS_io_submit+0x10/0x20
>> [  492.672715]  ? SyS_io_submit+0x10/0x20
>> [  492.677394]  entry_SYSCALL_64_fastpath+0x13/0x94
>> [  492.683039] RIP: 0033:0x7f83d1871717
>> [  492.687521] RSP: 002b:7ffd38fe5a88 EFLAGS: 0202 ORIG_RAX: 
>> 00d1
>> [  492.696969] RAX: ffda RBX: 7f83b6972b50 RCX: 
>> 7f83d1871717
>> [  492.705423] RDX: 01f41418 RSI: 0001 RDI: 
>> 7f83e4d36000
>> [  492.713889] RBP: 0001 R08: 0001 R09: 
>> 01f3f2e0
>> [  492.722352] R10: 1000 R11: 0202 R12: 
>> 7ffd38fe5be0
>> [  492.730827] R13: 7f83b6972b01 R14: 7f83b69824b8 R15: 
>> 7f83b6982368
>>
>> and if we look at the debug entries, it's waiting on a scheduler tag:
>>
>> sched_tags:nr_tags=2
>> sched_tags:nr_reserved_tags=0
>> sched_tags:active_queues=0
>> sched_tags:bitmap_tags:
>> sched_tags:depth=2
>> 

Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Jens Axboe
On 11/07/2017 09:17 AM, Bart Van Assche wrote:
> On Tue, 2017-11-07 at 18:15 +0800, Ming Lei wrote:
>> Last time, you didn't mention the target patch for setting its
>> can_queue as 1, so I think you can't reproduce the issue on upstream
>> kernel without out-of-tree patch. Then looks it is another issue,
>> and we are making progress actually.
> 
> If I don't trust a patch I introduce additional tests. The fact that I
> modified the SRP initiator before this hang occurred does not mean that the
> approach of your patch is fine. What this means is that all your patch does
> is to reduce the race window and that there is still a race window.

I agree, reducing the depth to test that specific case is perfectly
valid, it doesn't make the test invalid. It's perfectly possible that
other use cases out there have exactly that configuration. My null_blk
test case is basically the same. But it would be nice if all of this was
out in the open, so everybody is clear on exactly what is being
run/tested.

However, where my trace is hung off getting a scheduler tag, yours seems
to be waiting on IO completion. So not the same fingerprint, which is
worrisome.

-- 
Jens Axboe



Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Ming Lei
On Tue, Nov 07, 2017 at 08:01:48PM -0700, Jens Axboe wrote:
> On 11/07/2017 06:03 PM, Ming Lei wrote:
> > On Tue, Nov 07, 2017 at 03:06:31PM -0700, Jens Axboe wrote:
> >> On 11/07/2017 10:36 AM, Jens Axboe wrote:
> >>> On 11/07/2017 10:10 AM, Jens Axboe wrote:
>  On 11/07/2017 09:29 AM, Jens Axboe wrote:
> > On 11/07/2017 09:20 AM, Bart Van Assche wrote:
> >> On Tue, 2017-11-07 at 10:11 +0800, Ming Lei wrote:
> >>> If you can reproduce, please provide me at least the following log
> >>> first:
> >>>
> >>>   find /sys/kernel/debug/block -name tags | xargs cat | grep busy
> >>>
> >>> If any pending requests arn't completed, please provide the related
> >>> info in dbgfs about where is the request.
> >>
> >> Every time I ran the above or a similar command its output was empty. I
> >> assume that's because the hang usually occurs in a phase where these 
> >> debugfs
> >> attributes either have not yet been created or have already 
> >> disappeared.
> >
> > Bart, do you still see a hang with the patch that fixes the tag leak 
> > when
> > we fail to get a dispatch budget?
> >
> > https://marc.info/?l=linux-block=151004881411480=2
> >
> > I've run a lot of stability testing here, and I haven't run into any
> > issues. This is with shared tags as well. So if you still see the 
> > failure
> > case with the current tree AND the above patch, then I'll try and get
> > a test case setup that hits it too so we can get to the bottom of this.
> 
>  Ming, I managed to reproduce the hang using null_blk. Note this is
>  WITHOUT the patch mentioned above, running with that now.
> 
>  # modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 
>  submit_queues=1 hw_queue_depth=1
> 
>  and using this fio file:
> 
>  [global]
>  bs=4k
>  rw=randread
>  norandommap
>  direct=1
>  ioengine=libaio
>  iodepth=4
> 
>  [nullb0]
>  filename=/dev/nullb0
>  [nullb1]
>  filename=/dev/nullb1
>  [nullb2]
>  filename=/dev/nullb2
>  [nullb3]
>  filename=/dev/nullb3
> 
>  it seemed to keep running, but it hung when exiting. The troublesome
>  device was nullb1:
> 
>  [  492.513374] INFO: task fio:3263 blocked for more than 120 seconds.
>  [  492.520782]   Not tainted 4.14.0-rc7+ #499
>  [  492.526247] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
>  disables this message.
>  [  492.535904] fio D13208  3263   3211 0x
>  [  492.542535] Call Trace:
>  [  492.545764]  __schedule+0x279/0x720
>  [  492.550151]  schedule+0x33/0x90
>  [  492.554145]  io_schedule+0x16/0x40
>  [  492.558435]  blk_mq_get_tag+0x148/0x250
>  [  492.563211]  ? finish_wait+0x90/0x90
>  [  492.567693]  blk_mq_get_request+0xf0/0x3e0
>  [  492.572760]  blk_mq_make_request+0xe2/0x690
>  [  492.577913]  generic_make_request+0xfc/0x2f0
>  [  492.583177]  submit_bio+0x64/0x120
>  [  492.587475]  ? set_page_dirty_lock+0x4b/0x60
>  [  492.592736]  ? submit_bio+0x64/0x120
>  [  492.597309]  ? bio_set_pages_dirty+0x55/0x60
>  [  492.602570]  blkdev_direct_IO+0x388/0x3c0
>  [  492.607546]  ? free_ioctx_users+0xe0/0xe0
>  [  492.612511]  ? blk_mq_dispatch_rq_list+0x238/0x3a0
>  [  492.618353]  ? _raw_spin_unlock+0xe/0x20
>  [  492.623227]  generic_file_read_iter+0xb3/0xa00
>  [  492.628682]  ? generic_file_read_iter+0xb3/0xa00
>  [  492.634334]  ? security_file_permission+0x9b/0xc0
>  [  492.640114]  blkdev_read_iter+0x35/0x40
>  [  492.644877]  aio_read+0xc5/0x120
>  [  492.648973]  ? aio_read_events+0x24c/0x340
>  [  492.654124]  ? __might_sleep+0x4a/0x80
>  [  492.658800]  do_io_submit+0x47c/0x5e0
>  [  492.663373]  ? do_io_submit+0x47c/0x5e0
>  [  492.668234]  SyS_io_submit+0x10/0x20
>  [  492.672715]  ? SyS_io_submit+0x10/0x20
>  [  492.677394]  entry_SYSCALL_64_fastpath+0x13/0x94
>  [  492.683039] RIP: 0033:0x7f83d1871717
>  [  492.687521] RSP: 002b:7ffd38fe5a88 EFLAGS: 0202 ORIG_RAX: 
>  00d1
>  [  492.696969] RAX: ffda RBX: 7f83b6972b50 RCX: 
>  7f83d1871717
>  [  492.705423] RDX: 01f41418 RSI: 0001 RDI: 
>  7f83e4d36000
>  [  492.713889] RBP: 0001 R08: 0001 R09: 
>  01f3f2e0
>  [  492.722352] R10: 1000 R11: 0202 R12: 
>  7ffd38fe5be0
>  [  492.730827] R13: 7f83b6972b01 R14: 7f83b69824b8 R15: 
>  7f83b6982368
> 
>  and if we look at the debug entries, it's waiting on a scheduler tag:
> 
>  sched_tags:nr_tags=2
>  sched_tags:nr_reserved_tags=0
>  sched_tags:active_queues=0
>  sched_tags:bitmap_tags:
>  sched_tags:depth=2
>  sched_tags:busy=2
>  

Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Jens Axboe
On 11/07/2017 07:58 PM, Ming Lei wrote:
> On Tue, Nov 07, 2017 at 07:55:32PM -0700, Jens Axboe wrote:
>> On 11/07/2017 05:39 PM, Ming Lei wrote:
>>> On Tue, Nov 07, 2017 at 04:20:08PM +, Bart Van Assche wrote:
 On Tue, 2017-11-07 at 10:11 +0800, Ming Lei wrote:
> If you can reproduce, please provide me at least the following log
> first:
>
>   find /sys/kernel/debug/block -name tags | xargs cat | grep busy
>
> If any pending requests arn't completed, please provide the related
> info in dbgfs about where is the request.

 Every time I ran the above or a similar command its output was empty. I
 assume that's because the hang usually occurs in a phase where these 
 debugfs
 attributes either have not yet been created or have already disappeared.
>>>
>>> Could you dump all tags? Then you can see if this attribute is disappeared.
>>>
>>> If that output is empty, it often means there isn't pending request not
>>> completed. So if that is true, your hang is _not_ related with RESTART.
>>
>> You need to check sched_tags as well. It could still be a restart race
>> or problem, if tags is empty but sched_tags has busy bits.
> 
> Yeah, I didn't mention because SRP is MQ hardware, and the default
> scheduler is none, but if Bart changes that, the sched_tags need to
> checked first.

At this point, I have no idea what Bart's setup looks like. Bart, it
would be REALLY helpful if you could tell us how you are reproducing
your hang. I don't know why this has to be dragged out.

Ming/Bart - there seems to be an increasing amount of tension between
you two, for reasons that are unknown to me. I suggest you put that
aside in the pursuit of fixing the current issue, and then we can
discuss how to best resolve these going forward. But right now the top
priority is getting to the bottom of this. There's a chance that the
issue I can reproduce is the same that Bart is seeing, in which case we
might be fixing both in one fell swoop. But if that isn't the case, then
we have some work to do this week.

-- 
Jens Axboe



Re: [PATCH V2] scsi_debugfs: fix crash in scsi_show_rq()

2017-11-07 Thread James Bottomley
On Wed, 2017-11-08 at 09:15 +0800, Ming Lei wrote:
> On Wed, Nov 08, 2017 at 01:06:44AM +, Bart Van Assche wrote:
> > 
> > On Wed, 2017-11-08 at 08:59 +0800, Ming Lei wrote:
> > > 
> > > On Tue, Nov 07, 2017 at 04:13:48PM +, Bart Van Assche wrote:
> > > > 
> > > > On Tue, 2017-11-07 at 23:21 +0800, Ming Lei wrote:
> > > > > 
> > > > > diff --git a/drivers/scsi/scsi_debugfs.c
> > > > > b/drivers/scsi/scsi_debugfs.c
> > > > > index 5e9755008aed..7a50878446b4 100644
> > > > > --- a/drivers/scsi/scsi_debugfs.c
> > > > > +++ b/drivers/scsi/scsi_debugfs.c
> > > > > @@ -9,7 +9,10 @@ void scsi_show_rq(struct seq_file *m, struct
> > > > > request *rq)
> > > > >   int msecs = jiffies_to_msecs(jiffies - cmd-
> > > > > >jiffies_at_alloc);
> > > > >   char buf[80];
> > > > >  
> > > > > - __scsi_format_command(buf, sizeof(buf), cmd->cmnd,
> > > > > cmd->cmd_len);
> > > > > + if (cmd->cmnd == scsi_req(rq)->cmd)
> > > > > + __scsi_format_command(buf, sizeof(buf), cmd-
> > > > > >cmnd, cmd->cmd_len);
> > > > > + else
> > > > > + strcpy(buf, "unknown");
> > > > >   seq_printf(m, ", .cmd=%s, .retries=%d, allocated
> > > > > %d.%03d s ago", buf,
> > > > >      cmd->retries, msecs / 1000, msecs %
> > > > > 1000);
> > > > >  }
> > > > 
> > > > This change introduces a new bug, namely that "unknown" will
> > > > appear in the debugfs output if (cmd->cmnd != scsi_req(rq)-
> > > > >cmd). Have you considered to use
> > > 
> > > Because there isn't reliable way to get the command safely, and I
> > > don't think it is a new bug.
> > > 
> > > > 
> > > > the test (cmd->cmnd != NULL) instead?
> > > 
> > > No, that is worse, because you may cause use-after-free if you
> > > read a freed buffer.
> > 
> > It seems like your operating mode is still to contradict all
> > feedback you get instead of trying to address the feedback you get?
> > 
> > Anyway, this is a debugging facility so I'm not convinced that
> > avoiding a (very sporadic) use-after-free in this code is better
> > than omitting very useful output.
> 
> OK, if no one objects the use-after-free, because this way may
> trigger warning from some utility, such as KASAN.

Good grief, this list is supposed to be for technical discussions not
kindergarten playground squabbles; could you both please act your age?

The patch as proposed would lose us all information about PI commands,
hence the objection.  For the concern about use after free on the NULL
check, what about modifying sd_uninit_command() to NULL SCpnt->cmnd
before it calls mempool_free()?  That will likely eliminate the race
window for printing the command.

James



Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Jens Axboe
On 11/07/2017 06:03 PM, Ming Lei wrote:
> On Tue, Nov 07, 2017 at 03:06:31PM -0700, Jens Axboe wrote:
>> On 11/07/2017 10:36 AM, Jens Axboe wrote:
>>> On 11/07/2017 10:10 AM, Jens Axboe wrote:
 On 11/07/2017 09:29 AM, Jens Axboe wrote:
> On 11/07/2017 09:20 AM, Bart Van Assche wrote:
>> On Tue, 2017-11-07 at 10:11 +0800, Ming Lei wrote:
>>> If you can reproduce, please provide me at least the following log
>>> first:
>>>
>>> find /sys/kernel/debug/block -name tags | xargs cat | grep busy
>>>
>>> If any pending requests arn't completed, please provide the related
>>> info in dbgfs about where is the request.
>>
>> Every time I ran the above or a similar command its output was empty. I
>> assume that's because the hang usually occurs in a phase where these 
>> debugfs
>> attributes either have not yet been created or have already disappeared.
>
> Bart, do you still see a hang with the patch that fixes the tag leak when
> we fail to get a dispatch budget?
>
> https://marc.info/?l=linux-block=151004881411480=2
>
> I've run a lot of stability testing here, and I haven't run into any
> issues. This is with shared tags as well. So if you still see the failure
> case with the current tree AND the above patch, then I'll try and get
> a test case setup that hits it too so we can get to the bottom of this.

 Ming, I managed to reproduce the hang using null_blk. Note this is
 WITHOUT the patch mentioned above, running with that now.

 # modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 
 submit_queues=1 hw_queue_depth=1

 and using this fio file:

 [global]
 bs=4k
 rw=randread
 norandommap
 direct=1
 ioengine=libaio
 iodepth=4

 [nullb0]
 filename=/dev/nullb0
 [nullb1]
 filename=/dev/nullb1
 [nullb2]
 filename=/dev/nullb2
 [nullb3]
 filename=/dev/nullb3

 it seemed to keep running, but it hung when exiting. The troublesome
 device was nullb1:

 [  492.513374] INFO: task fio:3263 blocked for more than 120 seconds.
 [  492.520782]   Not tainted 4.14.0-rc7+ #499
 [  492.526247] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
 this message.
 [  492.535904] fio D13208  3263   3211 0x
 [  492.542535] Call Trace:
 [  492.545764]  __schedule+0x279/0x720
 [  492.550151]  schedule+0x33/0x90
 [  492.554145]  io_schedule+0x16/0x40
 [  492.558435]  blk_mq_get_tag+0x148/0x250
 [  492.563211]  ? finish_wait+0x90/0x90
 [  492.567693]  blk_mq_get_request+0xf0/0x3e0
 [  492.572760]  blk_mq_make_request+0xe2/0x690
 [  492.577913]  generic_make_request+0xfc/0x2f0
 [  492.583177]  submit_bio+0x64/0x120
 [  492.587475]  ? set_page_dirty_lock+0x4b/0x60
 [  492.592736]  ? submit_bio+0x64/0x120
 [  492.597309]  ? bio_set_pages_dirty+0x55/0x60
 [  492.602570]  blkdev_direct_IO+0x388/0x3c0
 [  492.607546]  ? free_ioctx_users+0xe0/0xe0
 [  492.612511]  ? blk_mq_dispatch_rq_list+0x238/0x3a0
 [  492.618353]  ? _raw_spin_unlock+0xe/0x20
 [  492.623227]  generic_file_read_iter+0xb3/0xa00
 [  492.628682]  ? generic_file_read_iter+0xb3/0xa00
 [  492.634334]  ? security_file_permission+0x9b/0xc0
 [  492.640114]  blkdev_read_iter+0x35/0x40
 [  492.644877]  aio_read+0xc5/0x120
 [  492.648973]  ? aio_read_events+0x24c/0x340
 [  492.654124]  ? __might_sleep+0x4a/0x80
 [  492.658800]  do_io_submit+0x47c/0x5e0
 [  492.663373]  ? do_io_submit+0x47c/0x5e0
 [  492.668234]  SyS_io_submit+0x10/0x20
 [  492.672715]  ? SyS_io_submit+0x10/0x20
 [  492.677394]  entry_SYSCALL_64_fastpath+0x13/0x94
 [  492.683039] RIP: 0033:0x7f83d1871717
 [  492.687521] RSP: 002b:7ffd38fe5a88 EFLAGS: 0202 ORIG_RAX: 
 00d1
 [  492.696969] RAX: ffda RBX: 7f83b6972b50 RCX: 
 7f83d1871717
 [  492.705423] RDX: 01f41418 RSI: 0001 RDI: 
 7f83e4d36000
 [  492.713889] RBP: 0001 R08: 0001 R09: 
 01f3f2e0
 [  492.722352] R10: 1000 R11: 0202 R12: 
 7ffd38fe5be0
 [  492.730827] R13: 7f83b6972b01 R14: 7f83b69824b8 R15: 
 7f83b6982368

 and if we look at the debug entries, it's waiting on a scheduler tag:

 sched_tags:nr_tags=2
 sched_tags:nr_reserved_tags=0
 sched_tags:active_queues=0
 sched_tags:bitmap_tags:
 sched_tags:depth=2
 sched_tags:busy=2
 sched_tags:bits_per_word=64
 sched_tags:map_nr=1
 sched_tags:alloc_hint={0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 
 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 
 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0}
 sched_tags:wake_batch=1
 

Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Ming Lei
On Tue, Nov 07, 2017 at 07:55:32PM -0700, Jens Axboe wrote:
> On 11/07/2017 05:39 PM, Ming Lei wrote:
> > On Tue, Nov 07, 2017 at 04:20:08PM +, Bart Van Assche wrote:
> >> On Tue, 2017-11-07 at 10:11 +0800, Ming Lei wrote:
> >>> If you can reproduce, please provide me at least the following log
> >>> first:
> >>>
> >>>   find /sys/kernel/debug/block -name tags | xargs cat | grep busy
> >>>
> >>> If any pending requests arn't completed, please provide the related
> >>> info in dbgfs about where is the request.
> >>
> >> Every time I ran the above or a similar command its output was empty. I
> >> assume that's because the hang usually occurs in a phase where these 
> >> debugfs
> >> attributes either have not yet been created or have already disappeared.
> > 
> > Could you dump all tags? Then you can see if this attribute is disappeared.
> > 
> > If that output is empty, it often means there isn't pending request not
> > completed. So if that is true, your hang is _not_ related with RESTART.
> 
> You need to check sched_tags as well. It could still be a restart race
> or problem, if tags is empty but sched_tags has busy bits.

Yeah, I didn't mention because SRP is MQ hardware, and the default
scheduler is none, but if Bart changes that, the sched_tags need to
checked first.

-- 
Ming


Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Jens Axboe
On 11/07/2017 05:39 PM, Ming Lei wrote:
> On Tue, Nov 07, 2017 at 04:20:08PM +, Bart Van Assche wrote:
>> On Tue, 2017-11-07 at 10:11 +0800, Ming Lei wrote:
>>> If you can reproduce, please provide me at least the following log
>>> first:
>>>
>>> find /sys/kernel/debug/block -name tags | xargs cat | grep busy
>>>
>>> If any pending requests arn't completed, please provide the related
>>> info in dbgfs about where is the request.
>>
>> Every time I ran the above or a similar command its output was empty. I
>> assume that's because the hang usually occurs in a phase where these debugfs
>> attributes either have not yet been created or have already disappeared.
> 
> Could you dump all tags? Then you can see if this attribute is disappeared.
> 
> If that output is empty, it often means there isn't pending request not
> completed. So if that is true, your hang is _not_ related with RESTART.

You need to check sched_tags as well. It could still be a restart race
or problem, if tags is empty but sched_tags has busy bits.

-- 
Jens Axboe



Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Ming Lei
On Wed, Nov 08, 2017 at 08:53:38AM +0800, Ming Lei wrote:
> On Tue, Nov 07, 2017 at 05:34:38PM +, Bart Van Assche wrote:
> > On Tue, 2017-11-07 at 09:29 -0700, Jens Axboe wrote:
> > > On 11/07/2017 09:20 AM, Bart Van Assche wrote:
> > > > On Tue, 2017-11-07 at 10:11 +0800, Ming Lei wrote:
> > > > > If you can reproduce, please provide me at least the following log
> > > > > first:
> > > > > 
> > > > >   find /sys/kernel/debug/block -name tags | xargs cat | grep busy
> > > > > 
> > > > > If any pending requests arn't completed, please provide the related
> > > > > info in dbgfs about where is the request.
> > > > 
> > > > Every time I ran the above or a similar command its output was empty. I
> > > > assume that's because the hang usually occurs in a phase where these 
> > > > debugfs
> > > > attributes either have not yet been created or have already disappeared.
> > > 
> > > Bart, do you still see a hang with the patch that fixes the tag leak when
> > > we fail to get a dispatch budget?
> > > 
> > > https://marc.info/?l=linux-block=151004881411480=2
> > > 
> > > I've run a lot of stability testing here, and I haven't run into any
> > > issues. This is with shared tags as well. So if you still see the failure
> > > case with the current tree AND the above patch, then I'll try and get
> > > a test case setup that hits it too so we can get to the bottom of this.
> > 
> > It took a little longer than expected but I just ran into the following
> > lockup with your for-next branch of this morning (commit e8fa44bb8af9) and
> > Ming's patch "blk-mq: put driver tag if dispatch budget can't be got"
> > applied on top of it:
> > 
> > [ 2575.324678] sysrq: SysRq : Show Blocked State
> > [ 2575.332336]   taskPC stack   pid father
> > [ 2575.345239] systemd-udevd   D0 47577518 0x0106
> > [ 2575.353821] Call Trace:
> > [ 2575.358805]  __schedule+0x28b/0x890
> > [ 2575.364906]  schedule+0x36/0x80
> > [ 2575.370436]  io_schedule+0x16/0x40
> > [ 2575.376287]  __lock_page+0xfc/0x140
> > [ 2575.382061]  ? page_cache_tree_insert+0xc0/0xc0
> > [ 2575.388943]  truncate_inode_pages_range+0x5e8/0x830
> > [ 2575.396083]  truncate_inode_pages+0x15/0x20
> > [ 2575.402398]  kill_bdev+0x2f/0x40
> > [ 2575.407538]  __blkdev_put+0x74/0x1f0
> > [ 2575.413010]  ? kmem_cache_free+0x197/0x1c0
> > [ 2575.418986]  blkdev_put+0x4c/0xd0
> > [ 2575.424040]  blkdev_close+0x34/0x70
> > [ 2575.429216]  __fput+0xe7/0x220
> > [ 2575.433863]  fput+0xe/0x10
> > [ 2575.438490]  task_work_run+0x76/0x90
> > [ 2575.443619]  do_exit+0x2e0/0xaf0
> > [ 2575.448311]  do_group_exit+0x43/0xb0
> > [ 2575.453386]  get_signal+0x299/0x5e0
> > [ 2575.458303]  do_signal+0x37/0x740
> > [ 2575.462976]  ? blkdev_read_iter+0x35/0x40
> > [ 2575.468425]  ? new_sync_read+0xde/0x130
> > [ 2575.473620]  ? vfs_read+0x115/0x130
> > [ 2575.478388]  exit_to_usermode_loop+0x80/0xd0
> > [ 2575.484002]  do_syscall_64+0xb3/0xc0
> > [ 2575.488813]  entry_SYSCALL64_slow_path+0x25/0x25
> > [ 2575.494759] RIP: 0033:0x7efd829cbd11
> > [ 2575.499506] RSP: 002b:7984f978 EFLAGS: 0246 ORIG_RAX: 
> > 
> > [ 2575.508741] RAX: 00022000 RBX: 55f19f902ca0 RCX: 
> > 7efd829cbd11
> > [ 2575.517455] RDX: 0004 RSI: 55f19f902cc8 RDI: 
> > 0007
> > [ 2575.526163] RBP: 55f19f7fb9d0 R08:  R09: 
> > 55f19f902ca0
> > [ 2575.534860] R10: 55f19f902cb8 R11: 0246 R12: 
> > 
> > [ 2575.544250] R13: 0004 R14: 55f19f7fba20 R15: 
> > 0004
> 
> Again please show us the output of 'tags' to see if there is pending
> requests not completed.
> 
> Please run this test on linus tree(V4.14-rc7) to see if the same issue
> can be reproduced.
> 
> Also if possible, please provide us the way for reproducing.

BTW, please apply the following patch before your further test:

https://marc.info/?l=linux-block=150988386406050=2

Since you don't see busy tag in 'tags' and queue may have been frozen.
And the in-progress dispatch after queue DEAD might corrupt memory.

-- 
Ming


Re: [PATCH V2] scsi_debugfs: fix crash in scsi_show_rq()

2017-11-07 Thread Ming Lei
On Wed, Nov 08, 2017 at 01:06:44AM +, Bart Van Assche wrote:
> On Wed, 2017-11-08 at 08:59 +0800, Ming Lei wrote:
> > On Tue, Nov 07, 2017 at 04:13:48PM +, Bart Van Assche wrote:
> > > On Tue, 2017-11-07 at 23:21 +0800, Ming Lei wrote:
> > > > diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c
> > > > index 5e9755008aed..7a50878446b4 100644
> > > > --- a/drivers/scsi/scsi_debugfs.c
> > > > +++ b/drivers/scsi/scsi_debugfs.c
> > > > @@ -9,7 +9,10 @@ void scsi_show_rq(struct seq_file *m, struct request 
> > > > *rq)
> > > > int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
> > > > char buf[80];
> > > >  
> > > > -   __scsi_format_command(buf, sizeof(buf), cmd->cmnd, 
> > > > cmd->cmd_len);
> > > > +   if (cmd->cmnd == scsi_req(rq)->cmd)
> > > > +   __scsi_format_command(buf, sizeof(buf), cmd->cmnd, 
> > > > cmd->cmd_len);
> > > > +   else
> > > > +   strcpy(buf, "unknown");
> > > > seq_printf(m, ", .cmd=%s, .retries=%d, allocated %d.%03d s 
> > > > ago", buf,
> > > >cmd->retries, msecs / 1000, msecs % 1000);
> > > >  }
> > > 
> > > This change introduces a new bug, namely that "unknown" will appear in the
> > > debugfs output if (cmd->cmnd != scsi_req(rq)->cmd). Have you considered 
> > > to use
> > 
> > Because there isn't reliable way to get the command safely, and I don't
> > think it is a new bug.
> > 
> > > the test (cmd->cmnd != NULL) instead?
> > 
> > No, that is worse, because you may cause use-after-free if you read a
> > freed buffer.
> 
> It seems like your operating mode is still to contradict all feedback you get
> instead of trying to address the feedback you get?
> 
> Anyway, this is a debugging facility so I'm not convinced that avoiding a
> (very sporadic) use-after-free in this code is better than omitting very
> useful output.

OK, if no one objects the use-after-free, because this way may trigger
warning from some utility, such as KASAN.

-- 
Ming


Re: [PATCH V2] scsi_debugfs: fix crash in scsi_show_rq()

2017-11-07 Thread Bart Van Assche
On Wed, 2017-11-08 at 08:59 +0800, Ming Lei wrote:
> On Tue, Nov 07, 2017 at 04:13:48PM +, Bart Van Assche wrote:
> > On Tue, 2017-11-07 at 23:21 +0800, Ming Lei wrote:
> > > diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c
> > > index 5e9755008aed..7a50878446b4 100644
> > > --- a/drivers/scsi/scsi_debugfs.c
> > > +++ b/drivers/scsi/scsi_debugfs.c
> > > @@ -9,7 +9,10 @@ void scsi_show_rq(struct seq_file *m, struct request *rq)
> > >   int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
> > >   char buf[80];
> > >  
> > > - __scsi_format_command(buf, sizeof(buf), cmd->cmnd, cmd->cmd_len);
> > > + if (cmd->cmnd == scsi_req(rq)->cmd)
> > > + __scsi_format_command(buf, sizeof(buf), cmd->cmnd, 
> > > cmd->cmd_len);
> > > + else
> > > + strcpy(buf, "unknown");
> > >   seq_printf(m, ", .cmd=%s, .retries=%d, allocated %d.%03d s ago", buf,
> > >  cmd->retries, msecs / 1000, msecs % 1000);
> > >  }
> > 
> > This change introduces a new bug, namely that "unknown" will appear in the
> > debugfs output if (cmd->cmnd != scsi_req(rq)->cmd). Have you considered to 
> > use
> 
> Because there isn't reliable way to get the command safely, and I don't
> think it is a new bug.
> 
> > the test (cmd->cmnd != NULL) instead?
> 
> No, that is worse, because you may cause use-after-free if you read a
> freed buffer.

It seems like your operating mode is still to contradict all feedback you get
instead of trying to address the feedback you get?

Anyway, this is a debugging facility so I'm not convinced that avoiding a
(very sporadic) use-after-free in this code is better than omitting very
useful output.

Bart.

Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Ming Lei
On Tue, Nov 07, 2017 at 03:06:31PM -0700, Jens Axboe wrote:
> On 11/07/2017 10:36 AM, Jens Axboe wrote:
> > On 11/07/2017 10:10 AM, Jens Axboe wrote:
> >> On 11/07/2017 09:29 AM, Jens Axboe wrote:
> >>> On 11/07/2017 09:20 AM, Bart Van Assche wrote:
>  On Tue, 2017-11-07 at 10:11 +0800, Ming Lei wrote:
> > If you can reproduce, please provide me at least the following log
> > first:
> >
> > find /sys/kernel/debug/block -name tags | xargs cat | grep busy
> >
> > If any pending requests arn't completed, please provide the related
> > info in dbgfs about where is the request.
> 
>  Every time I ran the above or a similar command its output was empty. I
>  assume that's because the hang usually occurs in a phase where these 
>  debugfs
>  attributes either have not yet been created or have already disappeared.
> >>>
> >>> Bart, do you still see a hang with the patch that fixes the tag leak when
> >>> we fail to get a dispatch budget?
> >>>
> >>> https://marc.info/?l=linux-block=151004881411480=2
> >>>
> >>> I've run a lot of stability testing here, and I haven't run into any
> >>> issues. This is with shared tags as well. So if you still see the failure
> >>> case with the current tree AND the above patch, then I'll try and get
> >>> a test case setup that hits it too so we can get to the bottom of this.
> >>
> >> Ming, I managed to reproduce the hang using null_blk. Note this is
> >> WITHOUT the patch mentioned above, running with that now.
> >>
> >> # modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 
> >> submit_queues=1 hw_queue_depth=1
> >>
> >> and using this fio file:
> >>
> >> [global]
> >> bs=4k
> >> rw=randread
> >> norandommap
> >> direct=1
> >> ioengine=libaio
> >> iodepth=4
> >>
> >> [nullb0]
> >> filename=/dev/nullb0
> >> [nullb1]
> >> filename=/dev/nullb1
> >> [nullb2]
> >> filename=/dev/nullb2
> >> [nullb3]
> >> filename=/dev/nullb3
> >>
> >> it seemed to keep running, but it hung when exiting. The troublesome
> >> device was nullb1:
> >>
> >> [  492.513374] INFO: task fio:3263 blocked for more than 120 seconds.
> >> [  492.520782]   Not tainted 4.14.0-rc7+ #499
> >> [  492.526247] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> >> this message.
> >> [  492.535904] fio D13208  3263   3211 0x
> >> [  492.542535] Call Trace:
> >> [  492.545764]  __schedule+0x279/0x720
> >> [  492.550151]  schedule+0x33/0x90
> >> [  492.554145]  io_schedule+0x16/0x40
> >> [  492.558435]  blk_mq_get_tag+0x148/0x250
> >> [  492.563211]  ? finish_wait+0x90/0x90
> >> [  492.567693]  blk_mq_get_request+0xf0/0x3e0
> >> [  492.572760]  blk_mq_make_request+0xe2/0x690
> >> [  492.577913]  generic_make_request+0xfc/0x2f0
> >> [  492.583177]  submit_bio+0x64/0x120
> >> [  492.587475]  ? set_page_dirty_lock+0x4b/0x60
> >> [  492.592736]  ? submit_bio+0x64/0x120
> >> [  492.597309]  ? bio_set_pages_dirty+0x55/0x60
> >> [  492.602570]  blkdev_direct_IO+0x388/0x3c0
> >> [  492.607546]  ? free_ioctx_users+0xe0/0xe0
> >> [  492.612511]  ? blk_mq_dispatch_rq_list+0x238/0x3a0
> >> [  492.618353]  ? _raw_spin_unlock+0xe/0x20
> >> [  492.623227]  generic_file_read_iter+0xb3/0xa00
> >> [  492.628682]  ? generic_file_read_iter+0xb3/0xa00
> >> [  492.634334]  ? security_file_permission+0x9b/0xc0
> >> [  492.640114]  blkdev_read_iter+0x35/0x40
> >> [  492.644877]  aio_read+0xc5/0x120
> >> [  492.648973]  ? aio_read_events+0x24c/0x340
> >> [  492.654124]  ? __might_sleep+0x4a/0x80
> >> [  492.658800]  do_io_submit+0x47c/0x5e0
> >> [  492.663373]  ? do_io_submit+0x47c/0x5e0
> >> [  492.668234]  SyS_io_submit+0x10/0x20
> >> [  492.672715]  ? SyS_io_submit+0x10/0x20
> >> [  492.677394]  entry_SYSCALL_64_fastpath+0x13/0x94
> >> [  492.683039] RIP: 0033:0x7f83d1871717
> >> [  492.687521] RSP: 002b:7ffd38fe5a88 EFLAGS: 0202 ORIG_RAX: 
> >> 00d1
> >> [  492.696969] RAX: ffda RBX: 7f83b6972b50 RCX: 
> >> 7f83d1871717
> >> [  492.705423] RDX: 01f41418 RSI: 0001 RDI: 
> >> 7f83e4d36000
> >> [  492.713889] RBP: 0001 R08: 0001 R09: 
> >> 01f3f2e0
> >> [  492.722352] R10: 1000 R11: 0202 R12: 
> >> 7ffd38fe5be0
> >> [  492.730827] R13: 7f83b6972b01 R14: 7f83b69824b8 R15: 
> >> 7f83b6982368
> >>
> >> and if we look at the debug entries, it's waiting on a scheduler tag:
> >>
> >> sched_tags:nr_tags=2
> >> sched_tags:nr_reserved_tags=0
> >> sched_tags:active_queues=0
> >> sched_tags:bitmap_tags:
> >> sched_tags:depth=2
> >> sched_tags:busy=2
> >> sched_tags:bits_per_word=64
> >> sched_tags:map_nr=1
> >> sched_tags:alloc_hint={0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 
> >> 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 
> >> 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0}
> >> sched_tags:wake_batch=1
> >> sched_tags:wake_index=4
> >> sched_tags:ws={
> >> 

Re: [PATCH V2] scsi_debugfs: fix crash in scsi_show_rq()

2017-11-07 Thread Ming Lei
On Tue, Nov 07, 2017 at 04:13:48PM +, Bart Van Assche wrote:
> On Tue, 2017-11-07 at 23:21 +0800, Ming Lei wrote:
> > cmd->cmnd can be allocated/freed dynamically in case of 
> > T10_PI_TYPE2_PROTECTION,
> > so we can't access it in scsi_show_rq() if 'SCpnt->cmnd != 
> > scsi_req(rq)->cmd',
> > because this request can be freed any time.
> 
> That description is inaccurate. It is not allowed to free SCpnt->cmnd while
> a command is being executed.

But can you prevent this request being freed when reading this debugfs
file?

> 
> > diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c
> > index 5e9755008aed..7a50878446b4 100644
> > --- a/drivers/scsi/scsi_debugfs.c
> > +++ b/drivers/scsi/scsi_debugfs.c
> > @@ -9,7 +9,10 @@ void scsi_show_rq(struct seq_file *m, struct request *rq)
> > int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
> > char buf[80];
> >  
> > -   __scsi_format_command(buf, sizeof(buf), cmd->cmnd, cmd->cmd_len);
> > +   if (cmd->cmnd == scsi_req(rq)->cmd)
> > +   __scsi_format_command(buf, sizeof(buf), cmd->cmnd, 
> > cmd->cmd_len);
> > +   else
> > +   strcpy(buf, "unknown");
> > seq_printf(m, ", .cmd=%s, .retries=%d, allocated %d.%03d s ago", buf,
> >cmd->retries, msecs / 1000, msecs % 1000);
> >  }
> 
> This change introduces a new bug, namely that "unknown" will appear in the
> debugfs output if (cmd->cmnd != scsi_req(rq)->cmd). Have you considered to use

Because there isn't reliable way to get the command safely, and I don't
think it is a new bug.

> the test (cmd->cmnd != NULL) instead?

No, that is worse, because you may cause use-after-free if you read a
freed buffer.


-- 
Ming


Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Ming Lei
On Tue, Nov 07, 2017 at 05:34:38PM +, Bart Van Assche wrote:
> On Tue, 2017-11-07 at 09:29 -0700, Jens Axboe wrote:
> > On 11/07/2017 09:20 AM, Bart Van Assche wrote:
> > > On Tue, 2017-11-07 at 10:11 +0800, Ming Lei wrote:
> > > > If you can reproduce, please provide me at least the following log
> > > > first:
> > > > 
> > > > find /sys/kernel/debug/block -name tags | xargs cat | grep busy
> > > > 
> > > > If any pending requests arn't completed, please provide the related
> > > > info in dbgfs about where is the request.
> > > 
> > > Every time I ran the above or a similar command its output was empty. I
> > > assume that's because the hang usually occurs in a phase where these 
> > > debugfs
> > > attributes either have not yet been created or have already disappeared.
> > 
> > Bart, do you still see a hang with the patch that fixes the tag leak when
> > we fail to get a dispatch budget?
> > 
> > https://marc.info/?l=linux-block=151004881411480=2
> > 
> > I've run a lot of stability testing here, and I haven't run into any
> > issues. This is with shared tags as well. So if you still see the failure
> > case with the current tree AND the above patch, then I'll try and get
> > a test case setup that hits it too so we can get to the bottom of this.
> 
> It took a little longer than expected but I just ran into the following
> lockup with your for-next branch of this morning (commit e8fa44bb8af9) and
> Ming's patch "blk-mq: put driver tag if dispatch budget can't be got"
> applied on top of it:
> 
> [ 2575.324678] sysrq: SysRq : Show Blocked State
> [ 2575.332336]   taskPC stack   pid father
> [ 2575.345239] systemd-udevd   D0 47577518 0x0106
> [ 2575.353821] Call Trace:
> [ 2575.358805]  __schedule+0x28b/0x890
> [ 2575.364906]  schedule+0x36/0x80
> [ 2575.370436]  io_schedule+0x16/0x40
> [ 2575.376287]  __lock_page+0xfc/0x140
> [ 2575.382061]  ? page_cache_tree_insert+0xc0/0xc0
> [ 2575.388943]  truncate_inode_pages_range+0x5e8/0x830
> [ 2575.396083]  truncate_inode_pages+0x15/0x20
> [ 2575.402398]  kill_bdev+0x2f/0x40
> [ 2575.407538]  __blkdev_put+0x74/0x1f0
> [ 2575.413010]  ? kmem_cache_free+0x197/0x1c0
> [ 2575.418986]  blkdev_put+0x4c/0xd0
> [ 2575.424040]  blkdev_close+0x34/0x70
> [ 2575.429216]  __fput+0xe7/0x220
> [ 2575.433863]  fput+0xe/0x10
> [ 2575.438490]  task_work_run+0x76/0x90
> [ 2575.443619]  do_exit+0x2e0/0xaf0
> [ 2575.448311]  do_group_exit+0x43/0xb0
> [ 2575.453386]  get_signal+0x299/0x5e0
> [ 2575.458303]  do_signal+0x37/0x740
> [ 2575.462976]  ? blkdev_read_iter+0x35/0x40
> [ 2575.468425]  ? new_sync_read+0xde/0x130
> [ 2575.473620]  ? vfs_read+0x115/0x130
> [ 2575.478388]  exit_to_usermode_loop+0x80/0xd0
> [ 2575.484002]  do_syscall_64+0xb3/0xc0
> [ 2575.488813]  entry_SYSCALL64_slow_path+0x25/0x25
> [ 2575.494759] RIP: 0033:0x7efd829cbd11
> [ 2575.499506] RSP: 002b:7984f978 EFLAGS: 0246 ORIG_RAX: 
> 
> [ 2575.508741] RAX: 00022000 RBX: 55f19f902ca0 RCX: 
> 7efd829cbd11
> [ 2575.517455] RDX: 0004 RSI: 55f19f902cc8 RDI: 
> 0007
> [ 2575.526163] RBP: 55f19f7fb9d0 R08:  R09: 
> 55f19f902ca0
> [ 2575.534860] R10: 55f19f902cb8 R11: 0246 R12: 
> 
> [ 2575.544250] R13: 0004 R14: 55f19f7fba20 R15: 
> 0004

Again please show us the output of 'tags' to see if there is pending
requests not completed.

Please run this test on linus tree(V4.14-rc7) to see if the same issue
can be reproduced.

Also if possible, please provide us the way for reproducing.

-- 
Ming


Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Ming Lei
On Tue, Nov 07, 2017 at 10:34:35PM +, Bart Van Assche wrote:
> On Tue, 2017-11-07 at 15:06 -0700, Jens Axboe wrote:
> > Just to keep everyone in the loop, this bug is not new to
> > for-4.15/block, nor is it new to the current 4.41-rc or 4.13. So it's
> > probably different to what Bart is hitting, but it's a bug none the
> > less...
> 
> Hello Jens,
> 
> There are several reasons why I think that patch "blk-mq: don't handle
> TAG_SHARED in restart" really should be reverted:
> * That patch is based on the assumption that only the SCSI driver uses shared
>   tags. That assumption is not correct. null_blk and nvme also use shared 
> tags.

No, both null_blk and nvme should be handled by BLK_MQ_S_TAG_WAITING, not need
to waste CPU to check all shared tags.

> * As my tests have shown, the algorithm for restarting queues based on the

Your test doesn't show it is related with RESTART since there isn't
pending request in output of 'tags'.

>   SCSI starved list is flawed. So using that mechanism instead of the blk-mq
>   shared queue restarting algorithm is wrong.

The algorithm based on starved list has been used for dozens of years
for SCSI, I don't think it is flawed enough.

> * We are close to the merge window. It is too late for trying to fix the
>   "blk-mq: don't handle TAG_SHARED in restart" patch.

If you can provide us the reproduction approach, the time is enough to fix it
before V4.15 release.

> 
> My proposal is to make sure that what will be sent to Linus during the v4.15
> merge window works reliably. That means using the v4.13/v4.14 algorithm for
> queue restarting which is an algorithm that is trusted by the community. If
> Roman Penyaev's patch could get applied that would be even better.

Frankly speaking, the algorithm for blk-mq's restarting won't be used by SCSI at
all because scsi_end_request() restarts the queue before the restart for 
TAG_SHARED.

For NVMe and null_blk, it is basically same since we cover that via 
BLK_MQ_S_TAG_WAITING.

So Nak your proposal.

-- 
Ming


Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Ming Lei
On Tue, Nov 07, 2017 at 04:20:08PM +, Bart Van Assche wrote:
> On Tue, 2017-11-07 at 10:11 +0800, Ming Lei wrote:
> > If you can reproduce, please provide me at least the following log
> > first:
> > 
> > find /sys/kernel/debug/block -name tags | xargs cat | grep busy
> > 
> > If any pending requests arn't completed, please provide the related
> > info in dbgfs about where is the request.
> 
> Every time I ran the above or a similar command its output was empty. I
> assume that's because the hang usually occurs in a phase where these debugfs
> attributes either have not yet been created or have already disappeared.

Could you dump all tags? Then you can see if this attribute is disappeared.

If that output is empty, it often means there isn't pending request not
completed. So if that is true, your hang is _not_ related with RESTART.

Thakns,
Ming


Re: [PATCH] scsi: Use vzalloc instead of vmalloc/memset

2017-11-07 Thread Julia Lawall


On Wed, 8 Nov 2017, Himanshu Jha wrote:

> On Tue, Nov 07, 2017 at 08:51:36PM +0100, Luis R. Rodriguez wrote:
> > On Sun, Nov 05, 2017 at 03:26:26AM +0530, Himanshu Jha wrote:
> > > Use vzalloc instead of vmalloc/memset to allocate memory filled with 0
> > > value.
> > >
> > > Done using Coccinelle.
> > > Semantic patch used :
> > >
> > > @@
> > > expression x,a;
> > > statement S;
> > > @@
> > >
> > > - x = vmalloc(a);
> > > + x = vzalloc(a);
> > >   if (x == NULL || ...) S
> > > - memset(x, 0, a);
> >
> > How many false positives do you get? Have you identified any?
> > If not you should consider adding this SmPL rule to:
> >
> > scripts/coccinelle/api/
> >
> > Some maintainers may ask you for the SmPL rule to be upstream first,
> > not all though. So its good practice for you to strive for this.
> > Another reason for it to go upstream is then other maintainers
> > can / should be running coccicheck against their subsystem to avoid
> > stupid regressions.
> >
> > You may want to explain for patches like these that they have been
> > tested by 0-day without any issues found.
> >
> > Also add the tag:
> >
> > Generated-by: Coccinelle SmPL
> >
> > > Signed-off-by: Himanshu Jha 
> > > ---
> > >  drivers/scsi/bfa/bfad.c| 3 +--
> > >  drivers/scsi/bfa/bfad_debugfs.c| 8 ++--
> > >  drivers/scsi/qla2xxx/qla_bsg.c | 3 +--
> > >  drivers/scsi/qla2xxx/tcm_qla2xxx.c | 5 +
> > >  drivers/scsi/scsi_debug.c  | 6 ++
> > >  drivers/scsi/snic/snic_trc.c   | 3 +--
> > >  6 files changed, 8 insertions(+), 20 deletions(-)
> >
> > Split this up per driver, and resend by using ./scripts/get_maintainer.pl
> > foo.patch and ensuring the right folks get the email.  Right now you
> > just spammed tons of people and the changes may be preferred to go
> > upstream atomically per driver, always assume this first.

Depending on the subsystem, you may get similar pushback if you send one
patch per file - "why send so many patches for such a small change when
they are all going through my tree".  So consider grouping the patches by
set of maintainers.

julia

> >
> > Other than this, feel free to add to each of the patches you created:
> >
>
> Thanks for the feeedback! I will resend the patch with the necessary
> changes.
>
> > Acked-by: Luis R. Rodriguez 
>
>
> Thanks
> Himanshu Jha
>


Re: [PATCH 0/4] qla2xxx: Add FC-NVMe Target support

2017-11-07 Thread Madhani, Himanshu
Hi Christoph, 

> On Nov 7, 2017, at 7:07 AM, Christoph Hellwig  wrote:
> 
> Please send this to the linux-nvme list, and the nvme FC maintainer.
> 
> Also I'd really like to see Cavium involved with the community a bit
> before we take on another driver.  Right now it's basically just
> James with a bit of help from Johannes.

Sure. I’ll talk to my management about how we can help in the community. 

Thanks,
- Himanshu



Re: [PATCH] sd: Fix a disk probing hang

2017-11-07 Thread James Bottomley
On Tue, 2017-11-07 at 22:42 +, Bart Van Assche wrote:
> On Tue, 2017-11-07 at 10:09 -0800, James Bottomley wrote:
> > 
> > but can you investigate the root cause rather than trying this
> > bandaid?
> 
> Hello James,
> 
> Thanks for your reply. I think that the root cause is that SCSI
> scanning activity can continue to submit I/O even after
> scsi_remove_host() has unlocked scan_mutex but that
> scsi_remove_host() removes some of the infrastructure that is
> essential to process SCSI requests.

That's not really a useful answer: how does it submit I/O after the
device goes into DEL?  In theory every I/O submitted after this is
returned with an immediate error.  I could buy the fact that we have
pending I/O submitted before we go into DEL, which would argue for some
sort of quiesce wait, but I don't see how I/O submitted after DEL
causes a hang.

>  Are you OK with
> e.g. moving a significant part of scsi_remove_host() into
> scsi_host_dev_release()?

Well not really without seeing the root cause.  Before scsi_forget_host
()it's all about state and after it's just removing some user visible
host attributes, so I can't see how either matters much.
 scsi_forget_host() must be executed from scsi_remove_host() because
that's how the devices go into the DEL state and how we error the
requests without troubling the device driver, so that can't be moved to
release

James



Re: [PATCH] sd: Fix a disk probing hang

2017-11-07 Thread Bart Van Assche
On Tue, 2017-11-07 at 10:09 -0800, James Bottomley wrote:
> but can you investigate the root cause rather than trying this bandaid?

Hello James,

Thanks for your reply. I think that the root cause is that SCSI scanning
activity can continue to submit I/O even after scsi_remove_host() has
unlocked scan_mutex but that scsi_remove_host() removes some of the
infrastructure that is essential to process SCSI requests. Are you OK with
e.g. moving a significant part of scsi_remove_host() into
scsi_host_dev_release()?

Thanks,

Bart.

Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Jens Axboe
On 11/07/2017 03:34 PM, Bart Van Assche wrote:
> On Tue, 2017-11-07 at 15:06 -0700, Jens Axboe wrote:
>> Just to keep everyone in the loop, this bug is not new to
>> for-4.15/block, nor is it new to the current 4.14-rc or 4.13. So it's
>> probably different to what Bart is hitting, but it's a bug none the
>> less...
> 
> Hello Jens,
> 
> There are several reasons why I think that patch "blk-mq: don't handle
> TAG_SHARED in restart" really should be reverted:
> * That patch is based on the assumption that only the SCSI driver uses shared
>   tags. That assumption is not correct. null_blk and nvme also use shared 
> tags.
> * As my tests have shown, the algorithm for restarting queues based on the
>   SCSI starved list is flawed. So using that mechanism instead of the blk-mq
>   shared queue restarting algorithm is wrong.
> * We are close to the merge window. It is too late for trying to fix the
>   "blk-mq: don't handle TAG_SHARED in restart" patch.
> 
> My proposal is to make sure that what will be sent to Linus during the v4.15
> merge window works reliably. That means using the v4.13/v4.14 algorithm for
> queue restarting which is an algorithm that is trusted by the community. If
> Roman Penyaev's patch could get applied that would be even better.

I'm fine with reverting a single patch, that's still a far cry from the
giant list. I'd rather get a fix in though, if at all possible. The code
it removed wasn't exactly the fastest or prettiest solution. But the
most important part is that we have something that works. If you have a
test case that is runnable AND reproduces the problem, I'd love to have
it. I've seen comments to that effect spread over several messages,
would be nice to have it summarized and readily available for all that
want to work on it.

The issue above is NOT a new bug, I ran into it by accident trying to
reproduce your case. Debugging that one right now, hopefully we'll have
some closure on that tomorrow and we can move forward on the shared tag
restart/loop. It smells like a TAG_WAITING race, or a restart race.

-- 
Jens Axboe



Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Bart Van Assche
On Tue, 2017-11-07 at 15:06 -0700, Jens Axboe wrote:
> Just to keep everyone in the loop, this bug is not new to
> for-4.15/block, nor is it new to the current 4.41-rc or 4.13. So it's
> probably different to what Bart is hitting, but it's a bug none the
> less...

Hello Jens,

There are several reasons why I think that patch "blk-mq: don't handle
TAG_SHARED in restart" really should be reverted:
* That patch is based on the assumption that only the SCSI driver uses shared
  tags. That assumption is not correct. null_blk and nvme also use shared tags.
* As my tests have shown, the algorithm for restarting queues based on the
  SCSI starved list is flawed. So using that mechanism instead of the blk-mq
  shared queue restarting algorithm is wrong.
* We are close to the merge window. It is too late for trying to fix the
  "blk-mq: don't handle TAG_SHARED in restart" patch.

My proposal is to make sure that what will be sent to Linus during the v4.15
merge window works reliably. That means using the v4.13/v4.14 algorithm for
queue restarting which is an algorithm that is trusted by the community. If
Roman Penyaev's patch could get applied that would be even better.

Bart.

Re: network namespace, netlink and sysfs changes for iSCSI (Re: [PATCH 0/9] use network namespace for iSCSI control interfaces)

2017-11-07 Thread Chris Leech
On Tue, Nov 07, 2017 at 12:45:26PM -0800, James Bottomley wrote:
> On Tue, 2017-11-07 at 10:01 -0800, Chris Leech wrote:
> > Hello,
> > 
> > I have this set of changes to the iSCSI control interfaces pending
> > review, but seeing as it's sysfs and netlink changes there's not a
> > lot of feedback from linux-scsi.
> 
> Well, it's a bit unlikely that they understand network namespaces.

I get that, just looking for suggestions of the right place to go before
going to crazy with the cross posting.

> > I was hoping I could get a brief review on the adding of network
> > namespace support here.
> 
> Most network namespace work is done on net...@vger.kernel.org, but they
> probably won't understand all the SCSI bits, but perhaps they don't
> need to; it's basically the netlink and the filters, right?
> 
> The other list that would take a more generic view is the containers
> one contain...@lists.linux-foundation.org because that's where most
> namespace stuff ends up.

Thanks James, the container list isn't one I would have come up with.

Chris

> 
> James
> 
> 
> 
> > Thank you,
> > Chris
> > 
> > On Tue, Oct 31, 2017 at 03:40:55PM -0700, Chris Leech wrote:
> > > 
> > > This series of changes makes the iSCSI netlink and sysfs control
> > > interfaces filtered by network namespace.  This is required to run
> > > iscsid in any network namespace other than the initial default one.
> > > 
> > > Currently the netlink communication will fail if iscsid is started
> > > in a non-default network namespace, as there is no kernel side
> > > socket.  After fixing that, the rest of these changes are to filter
> > > visibility of the iSCSI transport objects by netns.  This allows
> > > for multiple iscsid instances to be run, one per netns, each
> > > controlling it's own set of iSCSI sessions.
> > > 
> > > The iSCSI transport objects are filtered, but not the SCSI or block
> > > layer devices.  So while iSCSI hosts and sessions become limited to
> > > a network namespace, any attached devices remain visible system
> > > wide.
> > > 
> > > This currently only supports iscsi_tcp running in a new namespace,
> > > as it creates a virtual host per session.  Support could be added
> > > later to allow assignment of iSCSI HBAs to network namespace, much
> > > as is done for network interfaces.
> > > 
> > > Chris Leech (9):
> > >   iscsi: create per-net iscsi netlink kernel sockets
> > >   iscsi: associate endpoints with a host
> > >   iscsi: sysfs filtering by network namespace
> > >   iscsi: make all iSCSI netlink multicast namespace aware
> > >   iscsi: set netns for iscsi_tcp hosts
> > >   iscsi: check net namespace for all iscsi lookups
> > >   iscsi: convert flashnode devices from bus to class
> > >   iscsi: rename iscsi_bus_flash_* to iscsi_flash_*
> > >   iscsi: filter flashnode sysfs by net namespace
> > > 
> > >  drivers/infiniband/ulp/iser/iscsi_iser.c |   7 +-
> > >  drivers/scsi/be2iscsi/be_iscsi.c |   6 +-
> > >  drivers/scsi/bnx2i/bnx2i_iscsi.c |   6 +-
> > >  drivers/scsi/cxgbi/libcxgbi.c|   6 +-
> > >  drivers/scsi/iscsi_tcp.c |   7 +
> > >  drivers/scsi/qedi/qedi_iscsi.c   |   6 +-
> > >  drivers/scsi/qla4xxx/ql4_os.c|  62 +--
> > >  drivers/scsi/scsi_transport_iscsi.c  | 625
> > > ++-
> > >  include/scsi/scsi_transport_iscsi.h  |  63 ++--
> > >  9 files changed, 538 insertions(+), 250 deletions(-)
> > > 
> > > -- 
> > > 2.9.5
> > > 
> > 
> 


Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Jens Axboe
On 11/07/2017 10:36 AM, Jens Axboe wrote:
> On 11/07/2017 10:10 AM, Jens Axboe wrote:
>> On 11/07/2017 09:29 AM, Jens Axboe wrote:
>>> On 11/07/2017 09:20 AM, Bart Van Assche wrote:
 On Tue, 2017-11-07 at 10:11 +0800, Ming Lei wrote:
> If you can reproduce, please provide me at least the following log
> first:
>
>   find /sys/kernel/debug/block -name tags | xargs cat | grep busy
>
> If any pending requests arn't completed, please provide the related
> info in dbgfs about where is the request.

 Every time I ran the above or a similar command its output was empty. I
 assume that's because the hang usually occurs in a phase where these 
 debugfs
 attributes either have not yet been created or have already disappeared.
>>>
>>> Bart, do you still see a hang with the patch that fixes the tag leak when
>>> we fail to get a dispatch budget?
>>>
>>> https://marc.info/?l=linux-block=151004881411480=2
>>>
>>> I've run a lot of stability testing here, and I haven't run into any
>>> issues. This is with shared tags as well. So if you still see the failure
>>> case with the current tree AND the above patch, then I'll try and get
>>> a test case setup that hits it too so we can get to the bottom of this.
>>
>> Ming, I managed to reproduce the hang using null_blk. Note this is
>> WITHOUT the patch mentioned above, running with that now.
>>
>> # modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 submit_queues=1 
>> hw_queue_depth=1
>>
>> and using this fio file:
>>
>> [global]
>> bs=4k
>> rw=randread
>> norandommap
>> direct=1
>> ioengine=libaio
>> iodepth=4
>>
>> [nullb0]
>> filename=/dev/nullb0
>> [nullb1]
>> filename=/dev/nullb1
>> [nullb2]
>> filename=/dev/nullb2
>> [nullb3]
>> filename=/dev/nullb3
>>
>> it seemed to keep running, but it hung when exiting. The troublesome
>> device was nullb1:
>>
>> [  492.513374] INFO: task fio:3263 blocked for more than 120 seconds.
>> [  492.520782]   Not tainted 4.14.0-rc7+ #499
>> [  492.526247] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
>> this message.
>> [  492.535904] fio D13208  3263   3211 0x
>> [  492.542535] Call Trace:
>> [  492.545764]  __schedule+0x279/0x720
>> [  492.550151]  schedule+0x33/0x90
>> [  492.554145]  io_schedule+0x16/0x40
>> [  492.558435]  blk_mq_get_tag+0x148/0x250
>> [  492.563211]  ? finish_wait+0x90/0x90
>> [  492.567693]  blk_mq_get_request+0xf0/0x3e0
>> [  492.572760]  blk_mq_make_request+0xe2/0x690
>> [  492.577913]  generic_make_request+0xfc/0x2f0
>> [  492.583177]  submit_bio+0x64/0x120
>> [  492.587475]  ? set_page_dirty_lock+0x4b/0x60
>> [  492.592736]  ? submit_bio+0x64/0x120
>> [  492.597309]  ? bio_set_pages_dirty+0x55/0x60
>> [  492.602570]  blkdev_direct_IO+0x388/0x3c0
>> [  492.607546]  ? free_ioctx_users+0xe0/0xe0
>> [  492.612511]  ? blk_mq_dispatch_rq_list+0x238/0x3a0
>> [  492.618353]  ? _raw_spin_unlock+0xe/0x20
>> [  492.623227]  generic_file_read_iter+0xb3/0xa00
>> [  492.628682]  ? generic_file_read_iter+0xb3/0xa00
>> [  492.634334]  ? security_file_permission+0x9b/0xc0
>> [  492.640114]  blkdev_read_iter+0x35/0x40
>> [  492.644877]  aio_read+0xc5/0x120
>> [  492.648973]  ? aio_read_events+0x24c/0x340
>> [  492.654124]  ? __might_sleep+0x4a/0x80
>> [  492.658800]  do_io_submit+0x47c/0x5e0
>> [  492.663373]  ? do_io_submit+0x47c/0x5e0
>> [  492.668234]  SyS_io_submit+0x10/0x20
>> [  492.672715]  ? SyS_io_submit+0x10/0x20
>> [  492.677394]  entry_SYSCALL_64_fastpath+0x13/0x94
>> [  492.683039] RIP: 0033:0x7f83d1871717
>> [  492.687521] RSP: 002b:7ffd38fe5a88 EFLAGS: 0202 ORIG_RAX: 
>> 00d1
>> [  492.696969] RAX: ffda RBX: 7f83b6972b50 RCX: 
>> 7f83d1871717
>> [  492.705423] RDX: 01f41418 RSI: 0001 RDI: 
>> 7f83e4d36000
>> [  492.713889] RBP: 0001 R08: 0001 R09: 
>> 01f3f2e0
>> [  492.722352] R10: 1000 R11: 0202 R12: 
>> 7ffd38fe5be0
>> [  492.730827] R13: 7f83b6972b01 R14: 7f83b69824b8 R15: 
>> 7f83b6982368
>>
>> and if we look at the debug entries, it's waiting on a scheduler tag:
>>
>> sched_tags:nr_tags=2
>> sched_tags:nr_reserved_tags=0
>> sched_tags:active_queues=0
>> sched_tags:bitmap_tags:
>> sched_tags:depth=2
>> sched_tags:busy=2
>> sched_tags:bits_per_word=64
>> sched_tags:map_nr=1
>> sched_tags:alloc_hint={0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 
>> 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 
>> 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0}
>> sched_tags:wake_batch=1
>> sched_tags:wake_index=4
>> sched_tags:ws={
>> sched_tags:  {.wait_cnt=-102, .wait=inactive},
>> sched_tags:  {.wait_cnt=-126, .wait=inactive},
>> sched_tags:  {.wait_cnt=-72, .wait=inactive},
>> sched_tags:  {.wait_cnt=-96, .wait=inactive},
>> sched_tags:  {.wait_cnt=-134, .wait=inactive},
>> sched_tags:  {.wait_cnt=-116, .wait=inactive},
>> sched_tags:  

Re: [PATCH] scsi: Use vzalloc instead of vmalloc/memset

2017-11-07 Thread Himanshu Jha
On Tue, Nov 07, 2017 at 08:51:36PM +0100, Luis R. Rodriguez wrote:
> On Sun, Nov 05, 2017 at 03:26:26AM +0530, Himanshu Jha wrote:
> > Use vzalloc instead of vmalloc/memset to allocate memory filled with 0
> > value.
> > 
> > Done using Coccinelle.
> > Semantic patch used :
> > 
> > @@
> > expression x,a;
> > statement S;
> > @@
> > 
> > - x = vmalloc(a);
> > + x = vzalloc(a);
> >   if (x == NULL || ...) S
> > - memset(x, 0, a);
> 
> How many false positives do you get? Have you identified any?
> If not you should consider adding this SmPL rule to:
> 
> scripts/coccinelle/api/
> 
> Some maintainers may ask you for the SmPL rule to be upstream first,
> not all though. So its good practice for you to strive for this.
> Another reason for it to go upstream is then other maintainers
> can / should be running coccicheck against their subsystem to avoid
> stupid regressions.
> 
> You may want to explain for patches like these that they have been
> tested by 0-day without any issues found.
> 
> Also add the tag:
> 
> Generated-by: Coccinelle SmPL
> 
> > Signed-off-by: Himanshu Jha 
> > ---
> >  drivers/scsi/bfa/bfad.c| 3 +--
> >  drivers/scsi/bfa/bfad_debugfs.c| 8 ++--
> >  drivers/scsi/qla2xxx/qla_bsg.c | 3 +--
> >  drivers/scsi/qla2xxx/tcm_qla2xxx.c | 5 +
> >  drivers/scsi/scsi_debug.c  | 6 ++
> >  drivers/scsi/snic/snic_trc.c   | 3 +--
> >  6 files changed, 8 insertions(+), 20 deletions(-)
> 
> Split this up per driver, and resend by using ./scripts/get_maintainer.pl
> foo.patch and ensuring the right folks get the email.  Right now you 
> just spammed tons of people and the changes may be preferred to go
> upstream atomically per driver, always assume this first.
> 
> Other than this, feel free to add to each of the patches you created:
> 

Thanks for the feeedback! I will resend the patch with the necessary
changes.

> Acked-by: Luis R. Rodriguez 


Thanks
Himanshu Jha


[PATCH] lpfc: Fix hard lock up NMI in els timeout handling.

2017-11-07 Thread James Smart
From: Dick Kennedy 

System crashed due to a hard lockup at lpfc_els_timeout_handler+0x128.

The els ring's txcmplq list is corrupted: the last element in the list
does not point back the the head causing a loop. Issue is the
els processing path for sli4 hbas are using the hbalock instead of
the ring_lock for removing elements from the txcmplq list.

Use the adapter SLI_REV to determine which lock should be used for
removing iocbqs from the els rings txcmplq.

note: the future refactoring will address this so that we don't have
this ugly type-based lock code.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_sli.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 1229f58bdd09..c1c7df607604 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -2732,7 +2732,8 @@ lpfc_sli_process_unsol_iocb(struct lpfc_hba *phba, struct 
lpfc_sli_ring *pring,
  *
  * This function looks up the iocb_lookup table to get the command iocb
  * corresponding to the given response iocb using the iotag of the
- * response iocb. This function is called with the hbalock held.
+ * response iocb. This function is called with the hbalock held
+ * for sli3 devices or the ring_lock for sli4 devices.
  * This function returns the command iocb object if it finds the command
  * iocb else returns NULL.
  **/
@@ -2828,9 +2829,15 @@ lpfc_sli_process_sol_iocb(struct lpfc_hba *phba, struct 
lpfc_sli_ring *pring,
unsigned long iflag;
 
/* Based on the iotag field, get the cmd IOCB from the txcmplq */
-   spin_lock_irqsave(>hbalock, iflag);
+   if (phba->sli_rev == LPFC_SLI_REV4)
+   spin_lock_irqsave(>ring_lock, iflag);
+   else
+   spin_lock_irqsave(>hbalock, iflag);
cmdiocbp = lpfc_sli_iocbq_lookup(phba, pring, saveq);
-   spin_unlock_irqrestore(>hbalock, iflag);
+   if (phba->sli_rev == LPFC_SLI_REV4)
+   spin_unlock_irqrestore(>ring_lock, iflag);
+   else
+   spin_unlock_irqrestore(>hbalock, iflag);
 
if (cmdiocbp) {
if (cmdiocbp->iocb_cmpl) {
-- 
2.13.1



Re: network namespace, netlink and sysfs changes for iSCSI (Re: [PATCH 0/9] use network namespace for iSCSI control interfaces)

2017-11-07 Thread James Bottomley
On Tue, 2017-11-07 at 10:01 -0800, Chris Leech wrote:
> Hello,
> 
> I have this set of changes to the iSCSI control interfaces pending
> review, but seeing as it's sysfs and netlink changes there's not a
> lot of feedback from linux-scsi.

Well, it's a bit unlikely that they understand network namespaces.

> I was hoping I could get a brief review on the adding of network
> namespace support here.

Most network namespace work is done on net...@vger.kernel.org, but they
probably won't understand all the SCSI bits, but perhaps they don't
need to; it's basically the netlink and the filters, right?

The other list that would take a more generic view is the containers
one contain...@lists.linux-foundation.org because that's where most
namespace stuff ends up.

James



> Thank you,
> Chris
> 
> On Tue, Oct 31, 2017 at 03:40:55PM -0700, Chris Leech wrote:
> > 
> > This series of changes makes the iSCSI netlink and sysfs control
> > interfaces filtered by network namespace.  This is required to run
> > iscsid in any network namespace other than the initial default one.
> > 
> > Currently the netlink communication will fail if iscsid is started
> > in a non-default network namespace, as there is no kernel side
> > socket.  After fixing that, the rest of these changes are to filter
> > visibility of the iSCSI transport objects by netns.  This allows
> > for multiple iscsid instances to be run, one per netns, each
> > controlling it's own set of iSCSI sessions.
> > 
> > The iSCSI transport objects are filtered, but not the SCSI or block
> > layer devices.  So while iSCSI hosts and sessions become limited to
> > a network namespace, any attached devices remain visible system
> > wide.
> > 
> > This currently only supports iscsi_tcp running in a new namespace,
> > as it creates a virtual host per session.  Support could be added
> > later to allow assignment of iSCSI HBAs to network namespace, much
> > as is done for network interfaces.
> > 
> > Chris Leech (9):
> >   iscsi: create per-net iscsi netlink kernel sockets
> >   iscsi: associate endpoints with a host
> >   iscsi: sysfs filtering by network namespace
> >   iscsi: make all iSCSI netlink multicast namespace aware
> >   iscsi: set netns for iscsi_tcp hosts
> >   iscsi: check net namespace for all iscsi lookups
> >   iscsi: convert flashnode devices from bus to class
> >   iscsi: rename iscsi_bus_flash_* to iscsi_flash_*
> >   iscsi: filter flashnode sysfs by net namespace
> > 
> >  drivers/infiniband/ulp/iser/iscsi_iser.c |   7 +-
> >  drivers/scsi/be2iscsi/be_iscsi.c |   6 +-
> >  drivers/scsi/bnx2i/bnx2i_iscsi.c |   6 +-
> >  drivers/scsi/cxgbi/libcxgbi.c|   6 +-
> >  drivers/scsi/iscsi_tcp.c |   7 +
> >  drivers/scsi/qedi/qedi_iscsi.c   |   6 +-
> >  drivers/scsi/qla4xxx/ql4_os.c|  62 +--
> >  drivers/scsi/scsi_transport_iscsi.c  | 625
> > ++-
> >  include/scsi/scsi_transport_iscsi.h  |  63 ++--
> >  9 files changed, 538 insertions(+), 250 deletions(-)
> > 
> > -- 
> > 2.9.5
> > 
> 



Re: network namespace, netlink and sysfs changes for iSCSI (Re: [PATCH 0/9] use network namespace for iSCSI control interfaces)

2017-11-07 Thread Greg Kroah-Hartman
On Tue, Nov 07, 2017 at 10:01:56AM -0800, Chris Leech wrote:
> Hello,
> 
> I have this set of changes to the iSCSI control interfaces pending
> review, but seeing as it's sysfs and netlink changes there's not a lot
> of feedback from linux-scsi.
> 
> I was hoping I could get a brief review on the adding of network
> namespace support here.

Where is "here"?  I don't see any patches "here" to review :(

And no, I'm not on the scsi mailing lists :)

thanks,

greg k-h


Re: [PATCH] scsi: Use vzalloc instead of vmalloc/memset

2017-11-07 Thread Luis R. Rodriguez
On Sun, Nov 05, 2017 at 03:26:26AM +0530, Himanshu Jha wrote:
> Use vzalloc instead of vmalloc/memset to allocate memory filled with 0
> value.
> 
> Done using Coccinelle.
> Semantic patch used :
> 
> @@
> expression x,a;
> statement S;
> @@
> 
> - x = vmalloc(a);
> + x = vzalloc(a);
>   if (x == NULL || ...) S
> - memset(x, 0, a);

How many false positives do you get? Have you identified any?
If not you should consider adding this SmPL rule to:

scripts/coccinelle/api/

Some maintainers may ask you for the SmPL rule to be upstream first,
not all though. So its good practice for you to strive for this.
Another reason for it to go upstream is then other maintainers
can / should be running coccicheck against their subsystem to avoid
stupid regressions.

You may want to explain for patches like these that they have been
tested by 0-day without any issues found.

Also add the tag:

Generated-by: Coccinelle SmPL

> Signed-off-by: Himanshu Jha 
> ---
>  drivers/scsi/bfa/bfad.c| 3 +--
>  drivers/scsi/bfa/bfad_debugfs.c| 8 ++--
>  drivers/scsi/qla2xxx/qla_bsg.c | 3 +--
>  drivers/scsi/qla2xxx/tcm_qla2xxx.c | 5 +
>  drivers/scsi/scsi_debug.c  | 6 ++
>  drivers/scsi/snic/snic_trc.c   | 3 +--
>  6 files changed, 8 insertions(+), 20 deletions(-)

Split this up per driver, and resend by using ./scripts/get_maintainer.pl
foo.patch and ensuring the right folks get the email.  Right now you 
just spammed tons of people and the changes may be preferred to go
upstream atomically per driver, always assume this first.

Other than this, feel free to add to each of the patches you created:

Acked-by: Luis R. Rodriguez 


Re: [PATCH v3] lpfc: Tie in to new dev_loss_tmo interface in nvme transport

2017-11-07 Thread Christoph Hellwig
Thanks, applied to nvme-4.15.


Re: [PATCH] sd: Fix a disk probing hang

2017-11-07 Thread James Bottomley
On Tue, 2017-11-07 at 09:38 -0800, Bart Van Assche wrote:
> Avoid that disk probing hangs as follows if a SCSI host is removed
> after disk scanning started and before it completed:
> 
> Call Trace:
>  __schedule+0x2fa/0xbb0
>  schedule+0x36/0x90
>  schedule_timeout+0x22c/0x570
>  io_schedule_timeout+0x1e/0x50
>  wait_for_completion_io_timeout+0x11f/0x180
>  blk_execute_rq+0x86/0xc0
>  scsi_execute+0xdb/0x1f0
>  sd_revalidate_disk+0xed/0x1c70 [sd_mod]
>  sd_probe_async+0xc3/0x1d0 [sd_mod]
>  async_run_entry_fn+0x38/0x160
>  process_one_work+0x20a/0x660
>  worker_thread+0x3d/0x3b0
>  kthread+0x13a/0x150
>  ret_from_fork+0x27/0x40
> 
> Signed-off-by: Bart Van Assche 
> Cc: Christoph Hellwig 
> Cc: Hannes Reinecke 
> Cc: Johannes Thumshirn 
> ---
>  drivers/scsi/sd.c | 23 ++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 0313486d85c8..d5e2b73c02ea 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3225,11 +3225,13 @@ static void sd_probe_async(void *data,
> async_cookie_t cookie)
>  {
>   struct scsi_disk *sdkp = data;
>   struct scsi_device *sdp;
> + struct Scsi_Host *host;
>   struct gendisk *gd;
>   u32 index;
>   struct device *dev;
>  
>   sdp = sdkp->device;
> + host = sdp->host;
>   gd = sdkp->disk;
>   index = sdkp->index;
>   dev = >sdev_gendev;
> @@ -3253,6 +3255,13 @@ static void sd_probe_async(void *data,
> async_cookie_t cookie)
>   sdkp->first_scan = 1;
>   sdkp->max_medium_access_timeouts = SD_MAX_MEDIUM_TIMEOUTS;
>  
> + mutex_lock(>scan_mutex);

I really don't like this: by taking the scan mutex here, you
synchronize this with everything else and make this routine single
threaded with every other host scan operation.  That would make the
name sd_probe_async() a complete lie.

Additionally, any reference to the disk should *automatically* hold the
host, because the last reference to the host is in the disk release
routine, so this explicit taking of a reference should be completely
unnecessary (and if it isn't, we need to fix the bug at source, not
hide it like this).

The whole point about our async routines is that they're supposed to
rely on refcounting.  So, the host cannot be freed until the last
device reference is gone.  However, the host and its devices can go
into DEL state, which means the mid-layer replies error for them and
the async scan is supposed to take that error and pass it up.  The hang
you're getting may be the result of a missing scsi_device_online()
check, or it could be some premature failure of the underlying device
driver (going into SHOST_DEL with outstanding commands causes them to
get frozen) but can you investigate the root cause rather than trying
this bandaid?

Thanks,

James



network namespace, netlink and sysfs changes for iSCSI (Re: [PATCH 0/9] use network namespace for iSCSI control interfaces)

2017-11-07 Thread Chris Leech
Hello,

I have this set of changes to the iSCSI control interfaces pending
review, but seeing as it's sysfs and netlink changes there's not a lot
of feedback from linux-scsi.

I was hoping I could get a brief review on the adding of network
namespace support here.

Thank you,
Chris

On Tue, Oct 31, 2017 at 03:40:55PM -0700, Chris Leech wrote:
> This series of changes makes the iSCSI netlink and sysfs control
> interfaces filtered by network namespace.  This is required to run
> iscsid in any network namespace other than the initial default one.
> 
> Currently the netlink communication will fail if iscsid is started in a
> non-default network namespace, as there is no kernel side socket.  After
> fixing that, the rest of these changes are to filter visibility of the
> iSCSI transport objects by netns.  This allows for multiple iscsid
> instances to be run, one per netns, each controlling it's own set of
> iSCSI sessions.
> 
> The iSCSI transport objects are filtered, but not the SCSI or block
> layer devices.  So while iSCSI hosts and sessions become limited to a
> network namespace, any attached devices remain visible system wide.
> 
> This currently only supports iscsi_tcp running in a new namespace, as it
> creates a virtual host per session.  Support could be added later to
> allow assignment of iSCSI HBAs to network namespace, much as is done for
> network interfaces.
> 
> Chris Leech (9):
>   iscsi: create per-net iscsi netlink kernel sockets
>   iscsi: associate endpoints with a host
>   iscsi: sysfs filtering by network namespace
>   iscsi: make all iSCSI netlink multicast namespace aware
>   iscsi: set netns for iscsi_tcp hosts
>   iscsi: check net namespace for all iscsi lookups
>   iscsi: convert flashnode devices from bus to class
>   iscsi: rename iscsi_bus_flash_* to iscsi_flash_*
>   iscsi: filter flashnode sysfs by net namespace
> 
>  drivers/infiniband/ulp/iser/iscsi_iser.c |   7 +-
>  drivers/scsi/be2iscsi/be_iscsi.c |   6 +-
>  drivers/scsi/bnx2i/bnx2i_iscsi.c |   6 +-
>  drivers/scsi/cxgbi/libcxgbi.c|   6 +-
>  drivers/scsi/iscsi_tcp.c |   7 +
>  drivers/scsi/qedi/qedi_iscsi.c   |   6 +-
>  drivers/scsi/qla4xxx/ql4_os.c|  62 +--
>  drivers/scsi/scsi_transport_iscsi.c  | 625 
> ++-
>  include/scsi/scsi_transport_iscsi.h  |  63 ++--
>  9 files changed, 538 insertions(+), 250 deletions(-)
> 
> -- 
> 2.9.5
> 


Re: [PATCH 0/9] use network namespace for iSCSI control interfaces

2017-11-07 Thread Chris Leech
Anyone have comments or criticisms?  Lee?

Suggestions on anyone else I should be pinging for feedback?
These changes are in the iSCSI code, but are all netlink and sysfs
related namespace issues so I get that it's not a typical SCSI review.

- Chris

On Tue, Oct 31, 2017 at 03:40:55PM -0700, Chris Leech wrote:
> This series of changes makes the iSCSI netlink and sysfs control
> interfaces filtered by network namespace.  This is required to run
> iscsid in any network namespace other than the initial default one.
> 
> Currently the netlink communication will fail if iscsid is started in a
> non-default network namespace, as there is no kernel side socket.  After
> fixing that, the rest of these changes are to filter visibility of the
> iSCSI transport objects by netns.  This allows for multiple iscsid
> instances to be run, one per netns, each controlling it's own set of
> iSCSI sessions.
> 
> The iSCSI transport objects are filtered, but not the SCSI or block
> layer devices.  So while iSCSI hosts and sessions become limited to a
> network namespace, any attached devices remain visible system wide.
> 
> This currently only supports iscsi_tcp running in a new namespace, as it
> creates a virtual host per session.  Support could be added later to
> allow assignment of iSCSI HBAs to network namespace, much as is done for
> network interfaces.
> 
> Chris Leech (9):
>   iscsi: create per-net iscsi netlink kernel sockets
>   iscsi: associate endpoints with a host
>   iscsi: sysfs filtering by network namespace
>   iscsi: make all iSCSI netlink multicast namespace aware
>   iscsi: set netns for iscsi_tcp hosts
>   iscsi: check net namespace for all iscsi lookups
>   iscsi: convert flashnode devices from bus to class
>   iscsi: rename iscsi_bus_flash_* to iscsi_flash_*
>   iscsi: filter flashnode sysfs by net namespace
> 
>  drivers/infiniband/ulp/iser/iscsi_iser.c |   7 +-
>  drivers/scsi/be2iscsi/be_iscsi.c |   6 +-
>  drivers/scsi/bnx2i/bnx2i_iscsi.c |   6 +-
>  drivers/scsi/cxgbi/libcxgbi.c|   6 +-
>  drivers/scsi/iscsi_tcp.c |   7 +
>  drivers/scsi/qedi/qedi_iscsi.c   |   6 +-
>  drivers/scsi/qla4xxx/ql4_os.c|  62 +--
>  drivers/scsi/scsi_transport_iscsi.c  | 625 
> ++-
>  include/scsi/scsi_transport_iscsi.h  |  63 ++--
>  9 files changed, 538 insertions(+), 250 deletions(-)
> 
> -- 
> 2.9.5
> 


RE: [PATCH] aacraid: use timespec64 instead of timeval

2017-11-07 Thread Dave Carroll
> 
> aacraid passes the current time to the firmware in one of two ways,
> either as year/month/day/... or as 32-bit unsigned seconds.
> 
> The first one is broken on 32-bit architectures as it cannot
> go past year 2038. Using timespec64 here makes it behave properly
> on both 32-bit and 64-bit architectures, and avoids relying
> on signed integer overflow to pass times into the second interface.
> 
> The interface used in aac_send_hosttime() however is still
> problematic in year 2106 when 32-bit seconds overflow. Hopefully
> we don't have to worry about aacraid by that time.
> 
> Signed-off-by: Arnd Bergmann 
> ---

Reviewed-by: Dave Carroll 


[PATCH] sd: Fix a disk probing hang

2017-11-07 Thread Bart Van Assche
Avoid that disk probing hangs as follows if a SCSI host is removed
after disk scanning started and before it completed:

Call Trace:
 __schedule+0x2fa/0xbb0
 schedule+0x36/0x90
 schedule_timeout+0x22c/0x570
 io_schedule_timeout+0x1e/0x50
 wait_for_completion_io_timeout+0x11f/0x180
 blk_execute_rq+0x86/0xc0
 scsi_execute+0xdb/0x1f0
 sd_revalidate_disk+0xed/0x1c70 [sd_mod]
 sd_probe_async+0xc3/0x1d0 [sd_mod]
 async_run_entry_fn+0x38/0x160
 process_one_work+0x20a/0x660
 worker_thread+0x3d/0x3b0
 kthread+0x13a/0x150
 ret_from_fork+0x27/0x40

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
---
 drivers/scsi/sd.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 0313486d85c8..d5e2b73c02ea 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3225,11 +3225,13 @@ static void sd_probe_async(void *data, async_cookie_t 
cookie)
 {
struct scsi_disk *sdkp = data;
struct scsi_device *sdp;
+   struct Scsi_Host *host;
struct gendisk *gd;
u32 index;
struct device *dev;
 
sdp = sdkp->device;
+   host = sdp->host;
gd = sdkp->disk;
index = sdkp->index;
dev = >sdev_gendev;
@@ -3253,6 +3255,13 @@ static void sd_probe_async(void *data, async_cookie_t 
cookie)
sdkp->first_scan = 1;
sdkp->max_medium_access_timeouts = SD_MAX_MEDIUM_TIMEOUTS;
 
+   mutex_lock(>scan_mutex);
+   if (!scsi_host_scan_allowed(host)) {
+   sd_printk(KERN_NOTICE, sdkp, "%s: host being removed\n",
+ __func__);
+   goto unlock;
+   }
+
sd_revalidate_disk(gd);
 
gd->flags = GENHD_FL_EXT_DEVT;
@@ -3276,8 +3285,12 @@ static void sd_probe_async(void *data, async_cookie_t 
cookie)
 
sd_printk(KERN_NOTICE, sdkp, "Attached SCSI %sdisk\n",
  sdp->removable ? "removable " : "");
+unlock:
+   mutex_unlock(>scan_mutex);
+   scsi_host_put(host);
scsi_autopm_put_device(sdp);
put_device(>dev);
+   return;
 }
 
 /**
@@ -3377,7 +3390,15 @@ static int sd_probe(struct device *dev)
get_device(dev);
dev_set_drvdata(dev, sdkp);
 
-   get_device(>dev); /* prevent release before async_schedule */
+   /* prevent release before async_schedule */
+   error = -ENODEV;
+   if (scsi_host_get(sdp->host) == NULL) {
+   sd_printk(KERN_NOTICE, sdkp, "%s: host being removed\n",
+ __func__);
+   put_device(>dev);
+   goto out;
+   }
+   get_device(>dev);
async_schedule_domain(sd_probe_async, sdkp, _sd_probe_domain);
 
return 0;
-- 
2.14.3



Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Jens Axboe
On 11/07/2017 10:10 AM, Jens Axboe wrote:
> On 11/07/2017 09:29 AM, Jens Axboe wrote:
>> On 11/07/2017 09:20 AM, Bart Van Assche wrote:
>>> On Tue, 2017-11-07 at 10:11 +0800, Ming Lei wrote:
 If you can reproduce, please provide me at least the following log
 first:

find /sys/kernel/debug/block -name tags | xargs cat | grep busy

 If any pending requests arn't completed, please provide the related
 info in dbgfs about where is the request.
>>>
>>> Every time I ran the above or a similar command its output was empty. I
>>> assume that's because the hang usually occurs in a phase where these debugfs
>>> attributes either have not yet been created or have already disappeared.
>>
>> Bart, do you still see a hang with the patch that fixes the tag leak when
>> we fail to get a dispatch budget?
>>
>> https://marc.info/?l=linux-block=151004881411480=2
>>
>> I've run a lot of stability testing here, and I haven't run into any
>> issues. This is with shared tags as well. So if you still see the failure
>> case with the current tree AND the above patch, then I'll try and get
>> a test case setup that hits it too so we can get to the bottom of this.
> 
> Ming, I managed to reproduce the hang using null_blk. Note this is
> WITHOUT the patch mentioned above, running with that now.
> 
> # modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 submit_queues=1 
> hw_queue_depth=1
> 
> and using this fio file:
> 
> [global]
> bs=4k
> rw=randread
> norandommap
> direct=1
> ioengine=libaio
> iodepth=4
> 
> [nullb0]
> filename=/dev/nullb0
> [nullb1]
> filename=/dev/nullb1
> [nullb2]
> filename=/dev/nullb2
> [nullb3]
> filename=/dev/nullb3
> 
> it seemed to keep running, but it hung when exiting. The troublesome
> device was nullb1:
> 
> [  492.513374] INFO: task fio:3263 blocked for more than 120 seconds.
> [  492.520782]   Not tainted 4.14.0-rc7+ #499
> [  492.526247] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [  492.535904] fio D13208  3263   3211 0x
> [  492.542535] Call Trace:
> [  492.545764]  __schedule+0x279/0x720
> [  492.550151]  schedule+0x33/0x90
> [  492.554145]  io_schedule+0x16/0x40
> [  492.558435]  blk_mq_get_tag+0x148/0x250
> [  492.563211]  ? finish_wait+0x90/0x90
> [  492.567693]  blk_mq_get_request+0xf0/0x3e0
> [  492.572760]  blk_mq_make_request+0xe2/0x690
> [  492.577913]  generic_make_request+0xfc/0x2f0
> [  492.583177]  submit_bio+0x64/0x120
> [  492.587475]  ? set_page_dirty_lock+0x4b/0x60
> [  492.592736]  ? submit_bio+0x64/0x120
> [  492.597309]  ? bio_set_pages_dirty+0x55/0x60
> [  492.602570]  blkdev_direct_IO+0x388/0x3c0
> [  492.607546]  ? free_ioctx_users+0xe0/0xe0
> [  492.612511]  ? blk_mq_dispatch_rq_list+0x238/0x3a0
> [  492.618353]  ? _raw_spin_unlock+0xe/0x20
> [  492.623227]  generic_file_read_iter+0xb3/0xa00
> [  492.628682]  ? generic_file_read_iter+0xb3/0xa00
> [  492.634334]  ? security_file_permission+0x9b/0xc0
> [  492.640114]  blkdev_read_iter+0x35/0x40
> [  492.644877]  aio_read+0xc5/0x120
> [  492.648973]  ? aio_read_events+0x24c/0x340
> [  492.654124]  ? __might_sleep+0x4a/0x80
> [  492.658800]  do_io_submit+0x47c/0x5e0
> [  492.663373]  ? do_io_submit+0x47c/0x5e0
> [  492.668234]  SyS_io_submit+0x10/0x20
> [  492.672715]  ? SyS_io_submit+0x10/0x20
> [  492.677394]  entry_SYSCALL_64_fastpath+0x13/0x94
> [  492.683039] RIP: 0033:0x7f83d1871717
> [  492.687521] RSP: 002b:7ffd38fe5a88 EFLAGS: 0202 ORIG_RAX: 
> 00d1
> [  492.696969] RAX: ffda RBX: 7f83b6972b50 RCX: 
> 7f83d1871717
> [  492.705423] RDX: 01f41418 RSI: 0001 RDI: 
> 7f83e4d36000
> [  492.713889] RBP: 0001 R08: 0001 R09: 
> 01f3f2e0
> [  492.722352] R10: 1000 R11: 0202 R12: 
> 7ffd38fe5be0
> [  492.730827] R13: 7f83b6972b01 R14: 7f83b69824b8 R15: 
> 7f83b6982368
> 
> and if we look at the debug entries, it's waiting on a scheduler tag:
> 
> sched_tags:nr_tags=2
> sched_tags:nr_reserved_tags=0
> sched_tags:active_queues=0
> sched_tags:bitmap_tags:
> sched_tags:depth=2
> sched_tags:busy=2
> sched_tags:bits_per_word=64
> sched_tags:map_nr=1
> sched_tags:alloc_hint={0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 
> 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 
> 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0}
> sched_tags:wake_batch=1
> sched_tags:wake_index=4
> sched_tags:ws={
> sched_tags:   {.wait_cnt=-102, .wait=inactive},
> sched_tags:   {.wait_cnt=-126, .wait=inactive},
> sched_tags:   {.wait_cnt=-72, .wait=inactive},
> sched_tags:   {.wait_cnt=-96, .wait=inactive},
> sched_tags:   {.wait_cnt=-134, .wait=inactive},
> sched_tags:   {.wait_cnt=-116, .wait=inactive},
> sched_tags:   {.wait_cnt=-90, .wait=inactive},
> sched_tags:   {.wait_cnt=-115, .wait=active},
> sched_tags:}
> sched_tags:round_robin=0
> sched_tags_bitmap:: 03
> 
> 

Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Bart Van Assche
On Tue, 2017-11-07 at 09:29 -0700, Jens Axboe wrote:
> On 11/07/2017 09:20 AM, Bart Van Assche wrote:
> > On Tue, 2017-11-07 at 10:11 +0800, Ming Lei wrote:
> > > If you can reproduce, please provide me at least the following log
> > > first:
> > > 
> > >   find /sys/kernel/debug/block -name tags | xargs cat | grep busy
> > > 
> > > If any pending requests arn't completed, please provide the related
> > > info in dbgfs about where is the request.
> > 
> > Every time I ran the above or a similar command its output was empty. I
> > assume that's because the hang usually occurs in a phase where these debugfs
> > attributes either have not yet been created or have already disappeared.
> 
> Bart, do you still see a hang with the patch that fixes the tag leak when
> we fail to get a dispatch budget?
> 
> https://marc.info/?l=linux-block=151004881411480=2
> 
> I've run a lot of stability testing here, and I haven't run into any
> issues. This is with shared tags as well. So if you still see the failure
> case with the current tree AND the above patch, then I'll try and get
> a test case setup that hits it too so we can get to the bottom of this.

It took a little longer than expected but I just ran into the following
lockup with your for-next branch of this morning (commit e8fa44bb8af9) and
Ming's patch "blk-mq: put driver tag if dispatch budget can't be got"
applied on top of it:

[ 2575.324678] sysrq: SysRq : Show Blocked State
[ 2575.332336]   taskPC stack   pid father
[ 2575.345239] systemd-udevd   D0 47577518 0x0106
[ 2575.353821] Call Trace:
[ 2575.358805]  __schedule+0x28b/0x890
[ 2575.364906]  schedule+0x36/0x80
[ 2575.370436]  io_schedule+0x16/0x40
[ 2575.376287]  __lock_page+0xfc/0x140
[ 2575.382061]  ? page_cache_tree_insert+0xc0/0xc0
[ 2575.388943]  truncate_inode_pages_range+0x5e8/0x830
[ 2575.396083]  truncate_inode_pages+0x15/0x20
[ 2575.402398]  kill_bdev+0x2f/0x40
[ 2575.407538]  __blkdev_put+0x74/0x1f0
[ 2575.413010]  ? kmem_cache_free+0x197/0x1c0
[ 2575.418986]  blkdev_put+0x4c/0xd0
[ 2575.424040]  blkdev_close+0x34/0x70
[ 2575.429216]  __fput+0xe7/0x220
[ 2575.433863]  fput+0xe/0x10
[ 2575.438490]  task_work_run+0x76/0x90
[ 2575.443619]  do_exit+0x2e0/0xaf0
[ 2575.448311]  do_group_exit+0x43/0xb0
[ 2575.453386]  get_signal+0x299/0x5e0
[ 2575.458303]  do_signal+0x37/0x740
[ 2575.462976]  ? blkdev_read_iter+0x35/0x40
[ 2575.468425]  ? new_sync_read+0xde/0x130
[ 2575.473620]  ? vfs_read+0x115/0x130
[ 2575.478388]  exit_to_usermode_loop+0x80/0xd0
[ 2575.484002]  do_syscall_64+0xb3/0xc0
[ 2575.488813]  entry_SYSCALL64_slow_path+0x25/0x25
[ 2575.494759] RIP: 0033:0x7efd829cbd11
[ 2575.499506] RSP: 002b:7984f978 EFLAGS: 0246 ORIG_RAX: 

[ 2575.508741] RAX: 00022000 RBX: 55f19f902ca0 RCX: 7efd829cbd11
[ 2575.517455] RDX: 0004 RSI: 55f19f902cc8 RDI: 0007
[ 2575.526163] RBP: 55f19f7fb9d0 R08:  R09: 55f19f902ca0
[ 2575.534860] R10: 55f19f902cb8 R11: 0246 R12: 
[ 2575.544250] R13: 0004 R14: 55f19f7fba20 R15: 0004

Bart.

RE: [bug report] scsi: mpt3sas: Added support for nvme encapsulated request message.

2017-11-07 Thread Sathya Prakash Veerichetty
Dan,
The MPI structures are of variable length and can go up to a maximum of
128 bytes (a MPI frame size) and as MPI standard the variable length MPI
structures are left out with the last element as a single dword array.
Can we ignore the warning?  If not we need to modify the MPI structure to
have the NVMe_Command array to the maximum size of the frame (which is
typically 128 but can change across hardware generations)

Thanks
Sathya

-Original Message-
From: mpt-fusionlinux@broadcom.com
[mailto:mpt-fusionlinux@broadcom.com] On Behalf Of Dan Carpenter
Sent: Tuesday, November 7, 2017 4:34 AM
To: suganath-prabu.subram...@broadcom.com
Cc: mpt-fusionlinux@broadcom.com; linux-scsi@vger.kernel.org
Subject: [bug report] scsi: mpt3sas: Added support for nvme encapsulated
request message.

Hello Suganath Prabu Subramani,

The patch aff39e61218f: "scsi: mpt3sas: Added support for nvme
encapsulated request message." from Oct 31, 2017, leads to the following
static checker warning:

drivers/scsi/mpt3sas/mpt3sas_base.c:1459 _base_build_nvme_prp()
error: buffer overflow 'nvme_encap_request->NVMe_Command' 4 <= 24

drivers/scsi/mpt3sas/mpt3sas_base.c
  1453  /*
  1454   * Set pointers to PRP1 and PRP2, which are in the NVMe
command.
  1455   * PRP1 is located at a 24 byte offset from the start of
the NVMe
^^^ The ->NVMe_Command is
declared as a 4 byte array so this makes static checkers puzzled how there
are more than 24 bytes in it.

  1456   * command.  Then set the current PRP entry pointer to
PRP1.
  1457   */
  1458  prp1_entry = (__le64 *)(nvme_encap_request->NVMe_Command +
  1459  NVME_CMD_PRP1_OFFSET);
  1460  prp2_entry = (__le64 *)(nvme_encap_request->NVMe_Command +
  1461  NVME_CMD_PRP2_OFFSET);
  1462  prp_entry = prp1_entry;
  1463  /*
  1464   * For the PRP entries, use the specially allocated buffer
of
  1465   * contiguous memory.
  1466   */
  1467  prp_page = (__le64 *)mpt3sas_base_get_pcie_sgl(ioc, smid);
  1468  prp_page_phys = (__le64
*)mpt3sas_base_get_pcie_sgl_dma(ioc, smid);
  1469

regards,
dan carpenter


Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Jens Axboe
On 11/07/2017 09:29 AM, Jens Axboe wrote:
> On 11/07/2017 09:20 AM, Bart Van Assche wrote:
>> On Tue, 2017-11-07 at 10:11 +0800, Ming Lei wrote:
>>> If you can reproduce, please provide me at least the following log
>>> first:
>>>
>>> find /sys/kernel/debug/block -name tags | xargs cat | grep busy
>>>
>>> If any pending requests arn't completed, please provide the related
>>> info in dbgfs about where is the request.
>>
>> Every time I ran the above or a similar command its output was empty. I
>> assume that's because the hang usually occurs in a phase where these debugfs
>> attributes either have not yet been created or have already disappeared.
> 
> Bart, do you still see a hang with the patch that fixes the tag leak when
> we fail to get a dispatch budget?
> 
> https://marc.info/?l=linux-block=151004881411480=2
> 
> I've run a lot of stability testing here, and I haven't run into any
> issues. This is with shared tags as well. So if you still see the failure
> case with the current tree AND the above patch, then I'll try and get
> a test case setup that hits it too so we can get to the bottom of this.

Ming, I managed to reproduce the hang using null_blk. Note this is
WITHOUT the patch mentioned above, running with that now.

# modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 submit_queues=1 
hw_queue_depth=1

and using this fio file:

[global]
bs=4k
rw=randread
norandommap
direct=1
ioengine=libaio
iodepth=4

[nullb0]
filename=/dev/nullb0
[nullb1]
filename=/dev/nullb1
[nullb2]
filename=/dev/nullb2
[nullb3]
filename=/dev/nullb3

it seemed to keep running, but it hung when exiting. The troublesome
device was nullb1:

[  492.513374] INFO: task fio:3263 blocked for more than 120 seconds.
[  492.520782]   Not tainted 4.14.0-rc7+ #499
[  492.526247] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  492.535904] fio D13208  3263   3211 0x
[  492.542535] Call Trace:
[  492.545764]  __schedule+0x279/0x720
[  492.550151]  schedule+0x33/0x90
[  492.554145]  io_schedule+0x16/0x40
[  492.558435]  blk_mq_get_tag+0x148/0x250
[  492.563211]  ? finish_wait+0x90/0x90
[  492.567693]  blk_mq_get_request+0xf0/0x3e0
[  492.572760]  blk_mq_make_request+0xe2/0x690
[  492.577913]  generic_make_request+0xfc/0x2f0
[  492.583177]  submit_bio+0x64/0x120
[  492.587475]  ? set_page_dirty_lock+0x4b/0x60
[  492.592736]  ? submit_bio+0x64/0x120
[  492.597309]  ? bio_set_pages_dirty+0x55/0x60
[  492.602570]  blkdev_direct_IO+0x388/0x3c0
[  492.607546]  ? free_ioctx_users+0xe0/0xe0
[  492.612511]  ? blk_mq_dispatch_rq_list+0x238/0x3a0
[  492.618353]  ? _raw_spin_unlock+0xe/0x20
[  492.623227]  generic_file_read_iter+0xb3/0xa00
[  492.628682]  ? generic_file_read_iter+0xb3/0xa00
[  492.634334]  ? security_file_permission+0x9b/0xc0
[  492.640114]  blkdev_read_iter+0x35/0x40
[  492.644877]  aio_read+0xc5/0x120
[  492.648973]  ? aio_read_events+0x24c/0x340
[  492.654124]  ? __might_sleep+0x4a/0x80
[  492.658800]  do_io_submit+0x47c/0x5e0
[  492.663373]  ? do_io_submit+0x47c/0x5e0
[  492.668234]  SyS_io_submit+0x10/0x20
[  492.672715]  ? SyS_io_submit+0x10/0x20
[  492.677394]  entry_SYSCALL_64_fastpath+0x13/0x94
[  492.683039] RIP: 0033:0x7f83d1871717
[  492.687521] RSP: 002b:7ffd38fe5a88 EFLAGS: 0202 ORIG_RAX: 
00d1
[  492.696969] RAX: ffda RBX: 7f83b6972b50 RCX: 7f83d1871717
[  492.705423] RDX: 01f41418 RSI: 0001 RDI: 7f83e4d36000
[  492.713889] RBP: 0001 R08: 0001 R09: 01f3f2e0
[  492.722352] R10: 1000 R11: 0202 R12: 7ffd38fe5be0
[  492.730827] R13: 7f83b6972b01 R14: 7f83b69824b8 R15: 7f83b6982368

and if we look at the debug entries, it's waiting on a scheduler tag:

sched_tags:nr_tags=2
sched_tags:nr_reserved_tags=0
sched_tags:active_queues=0
sched_tags:bitmap_tags:
sched_tags:depth=2
sched_tags:busy=2
sched_tags:bits_per_word=64
sched_tags:map_nr=1
sched_tags:alloc_hint={0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 
1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 
0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0}
sched_tags:wake_batch=1
sched_tags:wake_index=4
sched_tags:ws={
sched_tags: {.wait_cnt=-102, .wait=inactive},
sched_tags: {.wait_cnt=-126, .wait=inactive},
sched_tags: {.wait_cnt=-72, .wait=inactive},
sched_tags: {.wait_cnt=-96, .wait=inactive},
sched_tags: {.wait_cnt=-134, .wait=inactive},
sched_tags: {.wait_cnt=-116, .wait=inactive},
sched_tags: {.wait_cnt=-90, .wait=inactive},
sched_tags: {.wait_cnt=-115, .wait=active},
sched_tags:}
sched_tags:round_robin=0
sched_tags_bitmap:: 03

with SCHED_RESTART being set:

state:SCHED_RESTART

and with the driver tags being idle:

tags:nr_tags=1
tags:nr_reserved_tags=0
tags:active_queues=0
tags:bitmap_tags:
tags:depth=1
tags:busy=0
tags:bits_per_word=64
tags:map_nr=1
tags:alloc_hint={0, 0, 0, 0, 0, 0, 

Re: [PATCH 1/4] qla2xxx_nvmet: Add files for FC-NVMe Target support

2017-11-07 Thread James Smart

On 11/6/2017 11:55 AM, Himanshu Madhani wrote:

From: Anil Gurumurthy 

+static struct nvmet_fc_target_template qla_nvmet_fc_transport = {
+   .targetport_delete  = qla_nvmet_targetport_delete,
+   .xmt_ls_rsp = qla_nvmet_ls_rsp,
+   .fcp_op = qla_nvmet_fcp_op,
+   .fcp_abort  = qla_nvmet_fcp_abort,
+   .fcp_req_release= qla_nvmet_fcp_req_release,
+   .max_hw_queues  = 8,
+   .max_sgl_segments   = 128,
+   .max_dif_sgl_segments   = 64,
+   .dma_boundary   = 0x,
+   .target_features= NVMET_FCTGTFEAT_READDATA_RSP |
+   NVMET_FCTGTFEAT_CMD_IN_ISR |
+   NVMET_FCTGTFEAT_OPDONE_IN_ISR,
+   .target_priv_sz = sizeof(struct nvme_private),
+};
+#endif



Do you really need the xxx_IN_ISR features ?  e.g. are you calling the 
nvmet_fc cmd receive and op done calls in ISR context or something that 
can't/shouldn't continue into the nvmet layers ?


I was looking to remove those flags and the work_q items behind it as I 
believed the qla2xxx driver called everything in a deferred callback.


-- james




Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Jens Axboe
On 11/07/2017 09:20 AM, Bart Van Assche wrote:
> On Tue, 2017-11-07 at 10:11 +0800, Ming Lei wrote:
>> If you can reproduce, please provide me at least the following log
>> first:
>>
>>  find /sys/kernel/debug/block -name tags | xargs cat | grep busy
>>
>> If any pending requests arn't completed, please provide the related
>> info in dbgfs about where is the request.
> 
> Every time I ran the above or a similar command its output was empty. I
> assume that's because the hang usually occurs in a phase where these debugfs
> attributes either have not yet been created or have already disappeared.

Bart, do you still see a hang with the patch that fixes the tag leak when
we fail to get a dispatch budget?

https://marc.info/?l=linux-block=151004881411480=2

I've run a lot of stability testing here, and I haven't run into any
issues. This is with shared tags as well. So if you still see the failure
case with the current tree AND the above patch, then I'll try and get
a test case setup that hits it too so we can get to the bottom of this.

-- 
Jens Axboe



Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Bart Van Assche
On Tue, 2017-11-07 at 10:11 +0800, Ming Lei wrote:
> If you can reproduce, please provide me at least the following log
> first:
> 
>   find /sys/kernel/debug/block -name tags | xargs cat | grep busy
> 
> If any pending requests arn't completed, please provide the related
> info in dbgfs about where is the request.

Every time I ran the above or a similar command its output was empty. I
assume that's because the hang usually occurs in a phase where these debugfs
attributes either have not yet been created or have already disappeared.

Bart.

Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Bart Van Assche
On Tue, 2017-11-07 at 18:15 +0800, Ming Lei wrote:
> Last time, you didn't mention the target patch for setting its
> can_queue as 1, so I think you can't reproduce the issue on upstream
> kernel without out-of-tree patch. Then looks it is another issue,
> and we are making progress actually.

If I don't trust a patch I introduce additional tests. The fact that I
modified the SRP initiator before this hang occurred does not mean that the
approach of your patch is fine. What this means is that all your patch does
is to reduce the race window and that there is still a race window.

Bart.

Re: [PATCH V2] scsi_debugfs: fix crash in scsi_show_rq()

2017-11-07 Thread Bart Van Assche
On Tue, 2017-11-07 at 23:21 +0800, Ming Lei wrote:
> cmd->cmnd can be allocated/freed dynamically in case of 
> T10_PI_TYPE2_PROTECTION,
> so we can't access it in scsi_show_rq() if 'SCpnt->cmnd != scsi_req(rq)->cmd',
> because this request can be freed any time.

That description is inaccurate. It is not allowed to free SCpnt->cmnd while
a command is being executed.

> diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c
> index 5e9755008aed..7a50878446b4 100644
> --- a/drivers/scsi/scsi_debugfs.c
> +++ b/drivers/scsi/scsi_debugfs.c
> @@ -9,7 +9,10 @@ void scsi_show_rq(struct seq_file *m, struct request *rq)
>   int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
>   char buf[80];
>  
> - __scsi_format_command(buf, sizeof(buf), cmd->cmnd, cmd->cmd_len);
> + if (cmd->cmnd == scsi_req(rq)->cmd)
> + __scsi_format_command(buf, sizeof(buf), cmd->cmnd, 
> cmd->cmd_len);
> + else
> + strcpy(buf, "unknown");
>   seq_printf(m, ", .cmd=%s, .retries=%d, allocated %d.%03d s ago", buf,
>  cmd->retries, msecs / 1000, msecs % 1000);
>  }

This change introduces a new bug, namely that "unknown" will appear in the
debugfs output if (cmd->cmnd != scsi_req(rq)->cmd). Have you considered to use
the test (cmd->cmnd != NULL) instead?

Additionally, if you really want to avoid that this code crashes if NULL is
assigned to cmd->cmnd you will have to use READ_ONCE(cmd->cmnd).

Bart.

[PATCH V2] scsi_debugfs: fix crash in scsi_show_rq()

2017-11-07 Thread Ming Lei
cmd->cmnd can be allocated/freed dynamically in case of T10_PI_TYPE2_PROTECTION,
so we can't access it in scsi_show_rq() if 'SCpnt->cmnd != scsi_req(rq)->cmd',
because this request can be freed any time.

This patch trys to fix the following kernel crash when dumping request via
block's debugfs interface:

[  252.962045] BUG: unable to handle kernel NULL pointer dereference at 
  (null)
[  252.963007] IP: scsi_format_opcode_name+0x1a/0x1c0
[  252.963007] PGD 25e75a067 P4D 25e75a067 PUD 25e75b067 PMD 0
[  252.963007] Oops:  [#1] PREEMPT SMP
[  252.963007] Dumping ftrace buffer:
[  252.963007](ftrace buffer empty)
[  252.963007] Modules linked in: scsi_debug ebtable_filter ebtables 
ip6table_filter ip6_tables xt_CHECKSUM iptable_mangle ipt_MASQUERADE 
nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
nf_nat nf_conntrack libcrc32c bridge stp llc iptable_filter fuse ip_tables 
sd_mod sg mptsas mptscsih mptbase crc32c_intel ahci libahci nvme serio_raw 
scsi_transport_sas libata lpc_ich nvme_core virtio_scsi binfmt_misc dm_mod 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi null_blk configs
[  252.963007] CPU: 1 PID: 1881 Comm: cat Not tainted 
4.14.0-rc2.blk_mq_io_hang+ #516
[  252.963007] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.9.3-1.fc25 04/01/2014
[  252.963007] task: 88025e6f6000 task.stack: c90001bd
[  252.963007] RIP: 0010:scsi_format_opcode_name+0x1a/0x1c0
[  252.963007] RSP: 0018:c90001bd3c50 EFLAGS: 00010286
[  252.963007] RAX: 4843 RBX: 0050 RCX: 
[  252.963007] RDX:  RSI: 0050 RDI: c90001bd3cd8
[  252.963007] RBP: c90001bd3c88 R08: 1000 R09: 
[  252.963007] R10: 880275134000 R11: 88027513406c R12: 0050
[  252.963007] R13: c90001bd3cd8 R14:  R15: 
[  252.963007] FS:  7f4d11762700() GS:88027fc4() 
knlGS:
[  252.963007] CS:  0010 DS:  ES:  CR0: 80050033
[  252.963007] CR2:  CR3: 00025e789003 CR4: 003606e0
[  252.963007] DR0:  DR1:  DR2: 
[  252.963007] DR3:  DR6: fffe0ff0 DR7: 0400
[  252.963007] Call Trace:
[  252.963007]  __scsi_format_command+0x27/0xc0
[  252.963007]  scsi_show_rq+0x5c/0xc0
[  252.963007]  ? seq_printf+0x4e/0x70
[  252.963007]  ? blk_flags_show+0x5b/0xf0
[  252.963007]  __blk_mq_debugfs_rq_show+0x116/0x130
[  252.963007]  blk_mq_debugfs_rq_show+0xe/0x10
[  252.963007]  seq_read+0xfe/0x3b0
[  252.963007]  ? __handle_mm_fault+0x631/0x1150
[  252.963007]  full_proxy_read+0x54/0x90
[  252.963007]  __vfs_read+0x37/0x160
[  252.963007]  ? security_file_permission+0x9b/0xc0
[  252.963007]  vfs_read+0x96/0x130
[  252.963007]  SyS_read+0x55/0xc0
[  252.963007]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[  252.963007] RIP: 0033:0x7f4d1127e9b0
[  252.963007] RSP: 002b:7ffd27082568 EFLAGS: 0246 ORIG_RAX: 

[  252.963007] RAX: ffda RBX: 7f4d1154bb20 RCX: 7f4d1127e9b0
[  252.963007] RDX: 0002 RSI: 7f4d115a7000 RDI: 0003
[  252.963007] RBP: 00021010 R08:  R09: 
[  252.963007] R10: 037b R11: 0246 R12: 00022000
[  252.963007] R13: 7f4d1154bb78 R14: 1000 R15: 0002
[  252.963007] Code: c6 e8 1b ca 24 00 eb 8c e8 74 2c ae ff 0f 1f 40 00 0f 1f 
44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49 89 fd 49 89 f4 48 83 ec 18 <44> 0f 
b6 32 48 c7 45 c8 00 00 00 00 65 48 8b 04 25 28 00 00 00
[  252.963007] RIP: scsi_format_opcode_name+0x1a/0x1c0 RSP: c90001bd3c50
[  252.963007] CR2: 
[  252.963007] ---[ end trace 83c5bddfbaa6573c ]---
[  252.963007] Kernel panic - not syncing: Fatal exception
[  252.963007] Dumping ftrace buffer:
[  252.963007](ftrace buffer empty)
[  252.963007] Kernel Offset: disabled
[  252.963007] ---[ end Kernel panic - not syncing: Fatal exception

Fixes: 0eebd005dd07(scsi: Implement blk_mq_ops.show_rq())
Cc: Bart Van Assche 
Cc: Omar Sandoval 
Cc: Martin K. Petersen 
Cc: James Bottomley 
Cc: Hannes Reinecke 
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei 
---
V2:
fix a typo found by John Garry.

 drivers/scsi/scsi_debugfs.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c
index 5e9755008aed..7a50878446b4 100644
--- a/drivers/scsi/scsi_debugfs.c
+++ b/drivers/scsi/scsi_debugfs.c
@@ -9,7 +9,10 @@ void scsi_show_rq(struct seq_file *m, struct request *rq)
int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
char buf[80];
 
-   

Re: [PATCH] scsi_debugfs: fix crash in scsi_show_rq()

2017-11-07 Thread Ming Lei
On Tue, Nov 07, 2017 at 03:04:09PM +, John Garry wrote:
> Hi Ming,
> 
> On 07/11/2017 14:55, Ming Lei wrote:
> > +   strcpy(buf, "unknow");
> 
> typo?

Yeah, will fix it in V2, thanks!

-- 
Ming


Re: [PATCH 0/4] qla2xxx: Add FC-NVMe Target support

2017-11-07 Thread Christoph Hellwig
Please send this to the linux-nvme list, and the nvme FC maintainer.

Also I'd really like to see Cavium involved with the community a bit
before we take on another driver.  Right now it's basically just
James with a bit of help from Johannes.


Re: [PATCH] scsi_debugfs: fix crash in scsi_show_rq()

2017-11-07 Thread John Garry

Hi Ming,

On 07/11/2017 14:55, Ming Lei wrote:
> +  strcpy(buf, "unknow");

typo?

Thanks,
John



[PATCH] scsi_debugfs: fix crash in scsi_show_rq()

2017-11-07 Thread Ming Lei
cmd->cmnd can be allocated/freed dynamically in case of T10_PI_TYPE2_PROTECTION,
so we can't access it in scsi_show_rq() if 'SCpnt->cmnd != scsi_req(rq)->cmd',
because this request can be freed any time.

This patch trys to fix the following kernel crash when dumping request via
block's debugfs interface:

[  252.962045] BUG: unable to handle kernel NULL pointer dereference at 
  (null)
[  252.963007] IP: scsi_format_opcode_name+0x1a/0x1c0
[  252.963007] PGD 25e75a067 P4D 25e75a067 PUD 25e75b067 PMD 0
[  252.963007] Oops:  [#1] PREEMPT SMP
[  252.963007] Dumping ftrace buffer:
[  252.963007](ftrace buffer empty)
[  252.963007] Modules linked in: scsi_debug ebtable_filter ebtables 
ip6table_filter ip6_tables xt_CHECKSUM iptable_mangle ipt_MASQUERADE 
nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
nf_nat nf_conntrack libcrc32c bridge stp llc iptable_filter fuse ip_tables 
sd_mod sg mptsas mptscsih mptbase crc32c_intel ahci libahci nvme serio_raw 
scsi_transport_sas libata lpc_ich nvme_core virtio_scsi binfmt_misc dm_mod 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi null_blk configs
[  252.963007] CPU: 1 PID: 1881 Comm: cat Not tainted 
4.14.0-rc2.blk_mq_io_hang+ #516
[  252.963007] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.9.3-1.fc25 04/01/2014
[  252.963007] task: 88025e6f6000 task.stack: c90001bd
[  252.963007] RIP: 0010:scsi_format_opcode_name+0x1a/0x1c0
[  252.963007] RSP: 0018:c90001bd3c50 EFLAGS: 00010286
[  252.963007] RAX: 4843 RBX: 0050 RCX: 
[  252.963007] RDX:  RSI: 0050 RDI: c90001bd3cd8
[  252.963007] RBP: c90001bd3c88 R08: 1000 R09: 
[  252.963007] R10: 880275134000 R11: 88027513406c R12: 0050
[  252.963007] R13: c90001bd3cd8 R14:  R15: 
[  252.963007] FS:  7f4d11762700() GS:88027fc4() 
knlGS:
[  252.963007] CS:  0010 DS:  ES:  CR0: 80050033
[  252.963007] CR2:  CR3: 00025e789003 CR4: 003606e0
[  252.963007] DR0:  DR1:  DR2: 
[  252.963007] DR3:  DR6: fffe0ff0 DR7: 0400
[  252.963007] Call Trace:
[  252.963007]  __scsi_format_command+0x27/0xc0
[  252.963007]  scsi_show_rq+0x5c/0xc0
[  252.963007]  ? seq_printf+0x4e/0x70
[  252.963007]  ? blk_flags_show+0x5b/0xf0
[  252.963007]  __blk_mq_debugfs_rq_show+0x116/0x130
[  252.963007]  blk_mq_debugfs_rq_show+0xe/0x10
[  252.963007]  seq_read+0xfe/0x3b0
[  252.963007]  ? __handle_mm_fault+0x631/0x1150
[  252.963007]  full_proxy_read+0x54/0x90
[  252.963007]  __vfs_read+0x37/0x160
[  252.963007]  ? security_file_permission+0x9b/0xc0
[  252.963007]  vfs_read+0x96/0x130
[  252.963007]  SyS_read+0x55/0xc0
[  252.963007]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[  252.963007] RIP: 0033:0x7f4d1127e9b0
[  252.963007] RSP: 002b:7ffd27082568 EFLAGS: 0246 ORIG_RAX: 

[  252.963007] RAX: ffda RBX: 7f4d1154bb20 RCX: 7f4d1127e9b0
[  252.963007] RDX: 0002 RSI: 7f4d115a7000 RDI: 0003
[  252.963007] RBP: 00021010 R08:  R09: 
[  252.963007] R10: 037b R11: 0246 R12: 00022000
[  252.963007] R13: 7f4d1154bb78 R14: 1000 R15: 0002
[  252.963007] Code: c6 e8 1b ca 24 00 eb 8c e8 74 2c ae ff 0f 1f 40 00 0f 1f 
44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49 89 fd 49 89 f4 48 83 ec 18 <44> 0f 
b6 32 48 c7 45 c8 00 00 00 00 65 48 8b 04 25 28 00 00 00
[  252.963007] RIP: scsi_format_opcode_name+0x1a/0x1c0 RSP: c90001bd3c50
[  252.963007] CR2: 
[  252.963007] ---[ end trace 83c5bddfbaa6573c ]---
[  252.963007] Kernel panic - not syncing: Fatal exception
[  252.963007] Dumping ftrace buffer:
[  252.963007](ftrace buffer empty)
[  252.963007] Kernel Offset: disabled
[  252.963007] ---[ end Kernel panic - not syncing: Fatal exception

Fixes: 0eebd005dd07(scsi: Implement blk_mq_ops.show_rq())
Cc: Bart Van Assche 
Cc: Omar Sandoval 
Cc: Martin K. Petersen 
Cc: James Bottomley 
Cc: Hannes Reinecke 
Cc: linux-scsi@vger.kernel.org
Cc: Jens Axboe 
Signed-off-by: Ming Lei 
---
 drivers/scsi/scsi_debugfs.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c
index 5e9755008aed..fe3efefc00d3 100644
--- a/drivers/scsi/scsi_debugfs.c
+++ b/drivers/scsi/scsi_debugfs.c
@@ -9,7 +9,10 @@ void scsi_show_rq(struct seq_file *m, struct request *rq)
int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
char buf[80];
 
-   

Re: [PATCH V8 0/8] block/scsi: safe SCSI quiescing

2017-11-07 Thread Ming Lei
On Tue, Nov 07, 2017 at 01:32:44PM +0100, Oleksandr Natalenko wrote:
> Hi Ming, Jens.
> 
> What is the fate of this patchset please?

Hi Oleksandr,

Recently Bart is working on this patchset, and you may try to see
if his V11 works for you:

https://marc.info/?t=15094035109=1=2

Thanks,
Ming



[PATCH] scsi: hpsa: remove an unecessary NULL check

2017-11-07 Thread Dan Carpenter
device->scsi3addr[] is an array, not a pointer, so it can't be NULL.
I've removed the check.

Signed-off-by: Dan Carpenter 

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 287e5eb0723f..b0aa5dc1d54c 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -8223,8 +8223,6 @@ static void hpsa_set_ioaccel_status(struct ctlr_info *h)
 
if (!device)
continue;
-   if (!device->scsi3addr)
-   continue;
if (!hpsa_vpd_page_supported(h, device->scsi3addr,
HPSA_VPD_LV_IOACCEL_STATUS))
continue;


Re: [PATCH V8 0/8] block/scsi: safe SCSI quiescing

2017-11-07 Thread Oleksandr Natalenko

Hi Ming, Jens.

What is the fate of this patchset please?

03.10.2017 16:03, Ming Lei wrote:

Hi Jens,

Please consider this patchset for V4.15, and it fixes one
kind of long-term I/O hang issue in either block legacy path
or blk-mq.

The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.

Once SCSI device is put into QUIESCE, no new request except for
RQF_PREEMPT can be dispatched to SCSI successfully, and
scsi_device_quiesce() just simply waits for completion of I/Os
dispatched to SCSI stack. It isn't enough at all.

Because new request still can be comming, but all the allocated
requests can't be dispatched successfully, so request pool can be
consumed up easily.

Then request with RQF_PREEMPT can't be allocated and wait forever,
then system hangs forever, such as during system suspend or
sending SCSI domain alidation in case of transport_spi.

Both IO hang inside system suspend[1] or SCSI domain validation
were reported before.

This patch introduces preempt only mode, and solves the issue
by allowing RQF_PREEMP only during SCSI quiesce.

Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
them all.

V8:
- fix one race as pointed out by Bart
- pass 'op' to blk_queue_enter() as suggested by Christoph

V7:
- add Reviewed-by & Tested-by
- one line change in patch 5 for checking preempt request

V6:
- borrow Bart's idea of preempt only, with clean
  implementation(patch 5/patch 6)
- needn't any external driver's dependency, such as MD's
change

V5:
- fix one tiny race by introducing blk_queue_enter_preempt_freeze()
given this change is small enough compared with V4, I added
tested-by directly

V4:
- reorganize patch order to make it more reasonable
- support nested preempt freeze, as required by SCSI transport spi
- check preempt freezing in slow path of of blk_queue_enter()
- add "SCSI: transport_spi: resume a quiesced device"
- wake up freeze queue in setting dying for both blk-mq and legacy
- rename blk_mq_[freeze|unfreeze]_queue() in one patch
- rename .mq_freeze_wq and .mq_freeze_depth
- improve comment

V3:
- introduce q->preempt_unfreezing to fix one bug of preempt freeze
- call blk_queue_enter_live() only when queue is preempt frozen
- cleanup a bit on the implementation of preempt freeze
- only patch 6 and 7 are changed

V2:
- drop the 1st patch in V1 because percpu_ref_is_dying() is
enough as pointed by Tejun
- introduce preempt version of blk_[freeze|unfreeze]_queue
- sync between preempt freeze and normal freeze
- fix warning from percpu-refcount as reported by Oleksandr


[1] https://marc.info/?t=150340250100013=3=2


Thanks,
Ming

Bart Van Assche (1):
  block: Convert RQF_PREEMPT into REQ_PREEMPT

Ming Lei (7):
  blk-mq: only run hw queues for blk-mq
  block: tracking request allocation with q_usage_counter
  block: pass 'op' to blk_queue_enter()
  percpu-refcount: introduce __percpu_ref_tryget_live
  blk-mq: return if queue is frozen via current blk_freeze_queue_start
  block: support PREEMPT_ONLY
  SCSI: set block queue at preempt only when SCSI device is put into
quiesce

 block/blk-core.c| 66 
+

 block/blk-mq-debugfs.c  |  2 +-
 block/blk-mq.c  | 26 
 block/blk-mq.h  |  1 -
 block/blk-timeout.c |  2 +-
 block/blk.h |  2 +-
 drivers/ide/ide-atapi.c |  3 +-
 drivers/ide/ide-io.c|  2 +-
 drivers/ide/ide-pm.c|  4 +--
 drivers/scsi/scsi_lib.c | 31 +++
 fs/block_dev.c  |  4 +--
 include/linux/blk-mq.h  |  4 +--
 include/linux/blk_types.h   |  6 
 include/linux/blkdev.h  | 10 ---
 include/linux/percpu-refcount.h | 27 ++---
 15 files changed, 137 insertions(+), 53 deletions(-)


[bug report] scsi: mpt3sas: Added support for nvme encapsulated request message.

2017-11-07 Thread Dan Carpenter
Hello Suganath Prabu Subramani,

The patch aff39e61218f: "scsi: mpt3sas: Added support for nvme
encapsulated request message." from Oct 31, 2017, leads to the
following static checker warning:

drivers/scsi/mpt3sas/mpt3sas_base.c:1459 _base_build_nvme_prp()
error: buffer overflow 'nvme_encap_request->NVMe_Command' 4 <= 24

drivers/scsi/mpt3sas/mpt3sas_base.c
  1453  /*
  1454   * Set pointers to PRP1 and PRP2, which are in the NVMe command.
  1455   * PRP1 is located at a 24 byte offset from the start of the 
NVMe
^^^
The ->NVMe_Command is declared as a 4 byte array so this makes static
checkers puzzled how there are more than 24 bytes in it.

  1456   * command.  Then set the current PRP entry pointer to PRP1.
  1457   */
  1458  prp1_entry = (__le64 *)(nvme_encap_request->NVMe_Command +
  1459  NVME_CMD_PRP1_OFFSET);
  1460  prp2_entry = (__le64 *)(nvme_encap_request->NVMe_Command +
  1461  NVME_CMD_PRP2_OFFSET);
  1462  prp_entry = prp1_entry;
  1463  /*
  1464   * For the PRP entries, use the specially allocated buffer of
  1465   * contiguous memory.
  1466   */
  1467  prp_page = (__le64 *)mpt3sas_base_get_pcie_sgl(ioc, smid);
  1468  prp_page_phys = (__le64 *)mpt3sas_base_get_pcie_sgl_dma(ioc, 
smid);
  1469  

regards,
dan carpenter


[PATCH] aacraid: use timespec64 instead of timeval

2017-11-07 Thread Arnd Bergmann
aacraid passes the current time to the firmware in one of two ways,
either as year/month/day/... or as 32-bit unsigned seconds.

The first one is broken on 32-bit architectures as it cannot
go past year 2038. Using timespec64 here makes it behave properly
on both 32-bit and 64-bit architectures, and avoids relying
on signed integer overflow to pass times into the second interface.

The interface used in aac_send_hosttime() however is still
problematic in year 2106 when 32-bit seconds overflow. Hopefully
we don't have to worry about aacraid by that time.

Signed-off-by: Arnd Bergmann 
---
 drivers/scsi/aacraid/commsup.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index dfe8e70f8d99..525a652dab48 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -2383,19 +2383,19 @@ static int aac_send_wellness_command(struct aac_dev 
*dev, char *wellness_str,
goto out;
 }
 
-int aac_send_safw_hostttime(struct aac_dev *dev, struct timeval *now)
+int aac_send_safw_hostttime(struct aac_dev *dev, struct timespec64 *now)
 {
struct tm cur_tm;
char wellness_str[] = "TD\010\0\0\0\0\0\0\0\0\0DW\0\0ZZ";
u32 datasize = sizeof(wellness_str);
-   unsigned long local_time;
+   time64_t local_time;
int ret = -ENODEV;
 
if (!dev->sa_firmware)
goto out;
 
-   local_time = (u32)(now->tv_sec - (sys_tz.tz_minuteswest * 60));
-   time_to_tm(local_time, 0, _tm);
+   local_time = (now->tv_sec - (sys_tz.tz_minuteswest * 60));
+   time64_to_tm(local_time, 0, _tm);
cur_tm.tm_mon += 1;
cur_tm.tm_year += 1900;
wellness_str[8] = bin2bcd(cur_tm.tm_hour);
@@ -2412,7 +2412,7 @@ int aac_send_safw_hostttime(struct aac_dev *dev, struct 
timeval *now)
return ret;
 }
 
-int aac_send_hosttime(struct aac_dev *dev, struct timeval *now)
+int aac_send_hosttime(struct aac_dev *dev, struct timespec64 *now)
 {
int ret = -ENOMEM;
struct fib *fibptr;
@@ -2424,7 +2424,7 @@ int aac_send_hosttime(struct aac_dev *dev, struct timeval 
*now)
 
aac_fib_init(fibptr);
info = (__le32 *)fib_data(fibptr);
-   *info = cpu_to_le32(now->tv_sec);
+   *info = cpu_to_le32(now->tv_sec); /* overflow in y2106 */
ret = aac_fib_send(SendHostTime, fibptr, sizeof(*info), FsaNormal,
1, 1, NULL, NULL);
 
@@ -2496,7 +2496,7 @@ int aac_command_thread(void *data)
}
if (!time_before(next_check_jiffies,next_jiffies)
 && ((difference = next_jiffies - jiffies) <= 0)) {
-   struct timeval now;
+   struct timespec64 now;
int ret;
 
/* Don't even try to talk to adapter if its sick */
@@ -2506,15 +2506,15 @@ int aac_command_thread(void *data)
next_check_jiffies = jiffies
   + ((long)(unsigned)check_interval)
   * HZ;
-   do_gettimeofday();
+   ktime_get_real_ts64();
 
/* Synchronize our watches */
-   if (((100 - (100 / HZ)) > now.tv_usec)
-&& (now.tv_usec > (100 / HZ)))
-   difference = (((100 - now.tv_usec) * HZ)
- + 50) / 100;
+   if (((NSEC_PER_SEC - (NSEC_PER_SEC / HZ)) > now.tv_nsec)
+&& (now.tv_nsec > (NSEC_PER_SEC / HZ)))
+   difference = (((NSEC_PER_SEC - now.tv_nsec) * 
HZ)
+ + NSEC_PER_SEC / 2) / NSEC_PER_SEC;
else {
-   if (now.tv_usec > 50)
+   if (now.tv_nsec > NSEC_PER_SEC / 2)
++now.tv_sec;
 
if (dev->sa_firmware)
-- 
2.9.0



[PATCH] fnic: use 64-bit timestamps

2017-11-07 Thread Arnd Bergmann
struct timespec is deprecated since it overflows in 2038 on 32-bit
architectures, so we should use timespec64 consistently.

I'm slightly adapting the format strings here, to make sure we
print the nanoseconds with the correct number of leading zeroes.

Signed-off-by: Arnd Bergmann 
---
 drivers/scsi/fnic/fnic_debugfs.c |  2 +-
 drivers/scsi/fnic/fnic_stats.h   |  4 +--
 drivers/scsi/fnic/fnic_trace.c   | 58 
 3 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/drivers/scsi/fnic/fnic_debugfs.c b/drivers/scsi/fnic/fnic_debugfs.c
index 5e3d909cfc53..db44e1bf1ecd 100644
--- a/drivers/scsi/fnic/fnic_debugfs.c
+++ b/drivers/scsi/fnic/fnic_debugfs.c
@@ -632,7 +632,7 @@ static ssize_t fnic_reset_stats_write(struct file *file,
sizeof(struct io_path_stats) - sizeof(u64));
memset(fw_stats_p+1, 0,
sizeof(struct fw_stats) - sizeof(u64));
-   getnstimeofday(>stats_timestamps.last_reset_time);
+   ktime_get_real_ts64(>stats_timestamps.last_reset_time);
}
 
(*ppos)++;
diff --git a/drivers/scsi/fnic/fnic_stats.h b/drivers/scsi/fnic/fnic_stats.h
index e007feedbf72..9daa6ada6fa0 100644
--- a/drivers/scsi/fnic/fnic_stats.h
+++ b/drivers/scsi/fnic/fnic_stats.h
@@ -18,8 +18,8 @@
 #define _FNIC_STATS_H_
 
 struct stats_timestamps {
-   struct timespec last_reset_time;
-   struct timespec last_read_time;
+   struct timespec64 last_reset_time;
+   struct timespec64 last_read_time;
 };
 
 struct io_path_stats {
diff --git a/drivers/scsi/fnic/fnic_trace.c b/drivers/scsi/fnic/fnic_trace.c
index 4826f596cb31..abddde11982b 100644
--- a/drivers/scsi/fnic/fnic_trace.c
+++ b/drivers/scsi/fnic/fnic_trace.c
@@ -111,7 +111,7 @@ int fnic_get_trace_data(fnic_dbgfs_t *fnic_dbgfs_prt)
int len = 0;
unsigned long flags;
char str[KSYM_SYMBOL_LEN];
-   struct timespec val;
+   struct timespec64 val;
fnic_trace_data_t *tbp;
 
spin_lock_irqsave(_trace_lock, flags);
@@ -129,10 +129,10 @@ int fnic_get_trace_data(fnic_dbgfs_t *fnic_dbgfs_prt)
/* Convert function pointer to function name */
if (sizeof(unsigned long) < 8) {
sprint_symbol(str, tbp->fnaddr.low);
-   jiffies_to_timespec(tbp->timestamp.low, );
+   jiffies_to_timespec64(tbp->timestamp.low, );
} else {
sprint_symbol(str, tbp->fnaddr.val);
-   jiffies_to_timespec(tbp->timestamp.val, );
+   jiffies_to_timespec64(tbp->timestamp.val, );
}
/*
 * Dump trace buffer entry to memory file
@@ -140,8 +140,8 @@ int fnic_get_trace_data(fnic_dbgfs_t *fnic_dbgfs_prt)
 */
len += snprintf(fnic_dbgfs_prt->buffer + len,
  (trace_max_pages * PAGE_SIZE * 3) - len,
- "%16lu.%16lu %-50s %8x %8x %16llx %16llx "
- "%16llx %16llx %16llx\n", val.tv_sec,
+ "%16llu.%09lu %-50s %8x %8x %16llx %16llx "
+ "%16llx %16llx %16llx\n", (u64)val.tv_sec,
  val.tv_nsec, str, tbp->host_no, tbp->tag,
  tbp->data[0], tbp->data[1], tbp->data[2],
  tbp->data[3], tbp->data[4]);
@@ -171,10 +171,10 @@ int fnic_get_trace_data(fnic_dbgfs_t *fnic_dbgfs_prt)
/* Convert function pointer to function name */
if (sizeof(unsigned long) < 8) {
sprint_symbol(str, tbp->fnaddr.low);
-   jiffies_to_timespec(tbp->timestamp.low, );
+   jiffies_to_timespec64(tbp->timestamp.low, );
} else {
sprint_symbol(str, tbp->fnaddr.val);
-   jiffies_to_timespec(tbp->timestamp.val, );
+   jiffies_to_timespec64(tbp->timestamp.val, );
}
/*
 * Dump trace buffer entry to memory file
@@ -182,8 +182,8 @@ int fnic_get_trace_data(fnic_dbgfs_t *fnic_dbgfs_prt)
 */
len += snprintf(fnic_dbgfs_prt->buffer + len,
  (trace_max_pages * PAGE_SIZE * 3) - len,
- "%16lu.%16lu %-50s %8x %8x %16llx %16llx "
- "%16llx %16llx %16llx\n", val.tv_sec,
+ "%16llu.%09lu %-50s %8x %8x %16llx %16llx "
+ "%16llx %16llx %16llx\n", 

Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()

2017-11-07 Thread Ming Lei
On Mon, Nov 06, 2017 at 07:45:23PM +, Bart Van Assche wrote:
> On Sat, 2017-11-04 at 08:19 -0600, Jens Axboe wrote:
> > On 11/03/2017 07:55 PM, Ming Lei wrote:
> > > It is very expensive to atomic_inc/atomic_dec the host wide counter of
> > > host->busy_count, and it should have been avoided via blk-mq's mechanism
> > > of getting driver tag, which uses the more efficient way of sbitmap queue.
> > > 
> > > Also we don't check atomic_read(>device_busy) in 
> > > scsi_mq_get_budget()
> > > and don't run queue if the counter becomes zero, so IO hang may be caused
> > > if all requests are completed just before the current SCSI device
> > > is added to shost->starved_list.
> > 
> > This looks like an improvement. I have added it for 4.15.
> > 
> > Bart, does this fix your hang?
> 
> No, it doesn't. After I had reduced starget->can_queue in the SRP initiator I
> ran into the following hang while running the srp-test software:
> 
> sysrq: SysRq : Show Blocked State
>   taskPC stack   pid father
> systemd-udevd   D0 19882467 0x8106
> Call Trace:
>  __schedule+0x2fa/0xbb0
>  schedule+0x36/0x90
>  io_schedule+0x16/0x40
>  __lock_page+0x10a/0x140
>  truncate_inode_pages_range+0x4ff/0x800
>  truncate_inode_pages+0x15/0x20
>  kill_bdev+0x35/0x40
>  __blkdev_put+0x6d/0x1f0
>  blkdev_put+0x4e/0x130
>  blkdev_close+0x25/0x30
>  __fput+0xed/0x1f0
>  fput+0xe/0x10
>  task_work_run+0x8b/0xc0
>  do_exit+0x38d/0xc70
>  do_group_exit+0x50/0xd0
>  get_signal+0x2ad/0x8c0
>  do_signal+0x28/0x680
>  exit_to_usermode_loop+0x5a/0xa0
>  do_syscall_64+0x12e/0x170
>  entry_SYSCALL64_slow_path+0x25/0x25
> 
> The SRP initiator driver was modified as follows for this test:
> 
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
> b/drivers/infiniband/ulp/srp/ib_srp.c
> index a6664467651e..9d24a871cc2e 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> 
> @@ -2835,6 +2839,13 @@ static int srp_reset_host(struct scsi_cmnd *scmnd)
>   return srp_reconnect_rport(target->rport) == 0 ? SUCCESS : FAILED;
>  }
>  
> +static int srp_target_alloc(struct scsi_target *starget)
> +{
> + starget->can_queue = 1;
> + return 0;
> +}
> +
>  static int srp_slave_alloc(struct scsi_device *sdev)
>  {
>   struct Scsi_Host *shost = sdev->host;
> @@ -3039,6 +3050,7 @@ static struct scsi_host_template srp_template = {
>   .module = THIS_MODULE,
>   .name   = "InfiniBand SRP initiator",
>   .proc_name  = DRV_NAME,
> + .target_alloc   = srp_target_alloc,
>   .slave_alloc= srp_slave_alloc,
>   .slave_configure= srp_slave_configure,
>   .info   = srp_target_info,

Last time, you didn't mention the target patch for setting its
can_queue as 1, so I think you can't reproduce the issue on upstream
kernel without out-of-tree patch. Then looks it is another issue,
and we are making progress actually.

I just posted a one-line patch, which should address the small queue
depth issue, please let us know if it fixes your issue:

https://marc.info/?l=linux-block=151004881411480=2

-- 
Ming


Re: [PATCH 2/4] qla2xxx_nvmet: Added Makefile and Kconfig changes

2017-11-07 Thread kbuild test robot
Hi Anil,

I love your patch! Yet something to improve:

[auto build test ERROR on scsi/for-next]
[also build test ERROR on v4.14-rc8 next-20171106]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Himanshu-Madhani/qla2xxx-Add-FC-NVMe-Target-support/20171107-153645
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
config: i386-randconfig-x001-201745 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

Note: the 
linux-review/Himanshu-Madhani/qla2xxx-Add-FC-NVMe-Target-support/20171107-153645
 HEAD 9c5e24e821aa40552221b3103bc914bc4cd42293 builds fine.
  It only hurts bisectibility.

All error/warnings (new ones prefixed by >>):

   In file included from drivers/scsi/qla2xxx/qla_nvmet.c:14:0:
   drivers/scsi/qla2xxx/qla_nvmet.h:31:25: error: field 'nvme_cmd_iu' has 
incomplete type
 struct atio7_nvme_cmnd nvme_cmd_iu;
^~~
   drivers/scsi/qla2xxx/qla_nvmet.c: In function 'qlt_nvmet_ls_done':
>> drivers/scsi/qla2xxx/qla_nvmet.c:48:46: error: 'struct ' has no 
>> member named 'cmd'
 struct qla_nvmet_cmd *tgt_cmd = nvme->u.nvme.cmd;
 ^
   drivers/scsi/qla2xxx/qla_nvmet.c:55:47: error: 'struct ' has no 
member named 'cmd'
  sp, sp->vha, nvme->u.nvme.desc, nvme->u.nvme.cmd);
  ^
   drivers/scsi/qla2xxx/qla_nvmet.c: In function 'qla_nvmet_ls_rsp':
>> drivers/scsi/qla2xxx/qla_nvmet.c:92:13: error: 'SRB_NVMET_LS' undeclared 
>> (first use in this function)
 sp->type = SRB_NVMET_LS;
^~~~
   drivers/scsi/qla2xxx/qla_nvmet.c:92:13: note: each undeclared identifier is 
reported only once for each function it appears in
>> drivers/scsi/qla2xxx/qla_nvmet.c:100:14: error: 'struct ' has no 
>> member named 'exchange_address'
 nvme->u.nvme.exchange_address = tgt_cmd->atio.u.pt_ls4.exchange_address;
 ^
>> drivers/scsi/qla2xxx/qla_nvmet.c:100:49: error: 'union ' has no 
>> member named 'pt_ls4'
 nvme->u.nvme.exchange_address = tgt_cmd->atio.u.pt_ls4.exchange_address;
^
>> drivers/scsi/qla2xxx/qla_nvmet.c:101:14: error: 'struct ' has no 
>> member named 'nport_handle'
 nvme->u.nvme.nport_handle = tgt_cmd->atio.u.pt_ls4.nport_handle;
 ^
   drivers/scsi/qla2xxx/qla_nvmet.c:101:45: error: 'union ' has no 
member named 'pt_ls4'
 nvme->u.nvme.nport_handle = tgt_cmd->atio.u.pt_ls4.nport_handle;
^
>> drivers/scsi/qla2xxx/qla_nvmet.c:102:14: error: 'struct ' has no 
>> member named 'vp_index'
 nvme->u.nvme.vp_index = tgt_cmd->atio.u.pt_ls4.vp_index;
 ^
   drivers/scsi/qla2xxx/qla_nvmet.c:102:41: error: 'union ' has no 
member named 'pt_ls4'
 nvme->u.nvme.vp_index = tgt_cmd->atio.u.pt_ls4.vp_index;
^
   drivers/scsi/qla2xxx/qla_nvmet.c:104:14: error: 'struct ' has no 
member named 'cmd'
 nvme->u.nvme.cmd = tgt_cmd; /* To be freed */
 ^
   drivers/scsi/qla2xxx/qla_nvmet.c: In function 'qla_nvmet_fcp_abort':
>> drivers/scsi/qla2xxx/qla_nvmet.c:168:13: error: 'SRB_NVMET_SEND_ABTS' 
>> undeclared (first use in this function)
 sp->type = SRB_NVMET_SEND_ABTS;
^~~
   drivers/scsi/qla2xxx/qla_nvmet.c: In function 'qla_nvmet_create_targetport':
>> drivers/scsi/qla2xxx/qla_nvmet.c:226:9: error: 'ql_dbg_nvme' undeclared 
>> (first use in this function)
 ql_dbg(ql_dbg_nvme, vha, 0xe081,
^~~
>> drivers/scsi/qla2xxx/qla_nvmet.c:236:10: error: 'struct scsi_qla_host' has 
>> no member named 'targetport'
 >targetport);
 ^~
   drivers/scsi/qla2xxx/qla_nvmet.c:243:41: error: 'struct scsi_qla_host' has 
no member named 'targetport'
 tport = (struct qla_nvmet_tgtport *)vha->targetport->private;
^~
   drivers/scsi/qla2xxx/qla_nvmet.c: In function 'qla_nvmet_delete':
>> drivers/scsi/qla2xxx/qla_nvmet.c:261:17: error: 'volatile struct 
>> ' has no member named 'nvmet_enabled'; did you mean 
>> 'nvme_enabled'?
 if (!vha->flags.nvmet_enabled)
^
   drivers/scsi/qla2xxx/qla_nvmet.c:263:9: error: 'struct scsi_qla_host' has no 
member named 'targetport'
 if (vha->targetport) {
^~
   drivers/scsi/qla2xxx/qla_nvmet.c:264:42: error: 'struct scsi_qla_host' has 
no member named 'targetport'
  tport = (struct qla_nvmet_tgtport *)vha->targe

Re: [PATCH 4/4] qla2xxx_nvmet: Add FC-NVMe Target handling

2017-11-07 Thread kbuild test robot
Hi Anil,

I love your patch! Perhaps something to improve:

[auto build test WARNING on scsi/for-next]
[also build test WARNING on next-20171106]
[cannot apply to v4.14-rc8]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Himanshu-Madhani/qla2xxx-Add-FC-NVMe-Target-support/20171107-153645
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
config: xtensa-allmodconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 4.9.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=xtensa 

All warnings (new ones prefixed by >>):

   drivers/scsi/qla2xxx/qla_target.c: In function 'qlt_send_els_resp':
>> drivers/scsi/qla2xxx/qla_target.c:400:6: warning: format '%llx' expects 
>> argument of type 'long long unsigned int', but argument 7 has type 
>> 'dma_addr_t' [-Wformat=]
 sp, purex, udma, loop_id);
 ^
   drivers/scsi/qla2xxx/qla_target.c: In function 'qlt_nvme_els_done':
   drivers/scsi/qla2xxx/qla_target.c:446:6: warning: format '%llx' expects 
argument of type 'long long unsigned int', but argument 8 has type 'dma_addr_t' 
[-Wformat=]
 sp->gen1);
 ^
   drivers/scsi/qla2xxx/qla_target.c: In function 'qlt_send_plogi_resp':
   drivers/scsi/qla2xxx/qla_target.c:475:6: warning: format '%llx' expects 
argument of type 'long long unsigned int', but argument 8 has type 'dma_addr_t' 
[-Wformat=]
 sp, vha, plogi_ack_buf, plogi_ack_udma);
 ^
   drivers/scsi/qla2xxx/qla_target.c:488:40: warning: cast from pointer to 
integer of different size [-Wpointer-to-int-cast]
 ret = qla2x00_get_plogi_template(vha, (uint64_t)tmp, (116/4 - 1));
   ^
   drivers/scsi/qla2xxx/qla_target.c: In function 'qlt_process_logo':
   drivers/scsi/qla2xxx/qla_target.c:689:6: warning: format '%llx' expects 
argument of type 'long long unsigned int', but argument 8 has type 'dma_addr_t' 
[-Wformat=]
 sp, vha, logo_ack_buf, logo_ack_udma);
 ^
   drivers/scsi/qla2xxx/qla_target.c: In function 'qlt_process_prli':
   drivers/scsi/qla2xxx/qla_target.c:754:6: warning: format '%llx' expects 
argument of type 'long long unsigned int', but argument 8 has type 'dma_addr_t' 
[-Wformat=]
 sp, vha, prli_ack_buf, prli_ack_udma);
 ^

vim +400 drivers/scsi/qla2xxx/qla_target.c

   375  
   376  /* Send an ELS response */
   377  int qlt_send_els_resp(srb_t *sp, struct __els_pt *els_pkt)
   378  {
   379  struct purex_entry_24xx *purex = (struct purex_entry_24xx *)
   380  sp->u.snvme_els.ptr;
   381  dma_addr_t udma = sp->u.snvme_els.dma_addr;
   382  struct fc_port *fcport;
   383  port_id_t port_id;
   384  uint16_t loop_id;
   385  
   386  port_id.b.domain = purex->s_id[2];
   387  port_id.b.area   = purex->s_id[1];
   388  port_id.b.al_pa  = purex->s_id[0];
   389  port_id.b.rsvd_1 = 0;
   390  
   391  fcport = qla2x00_find_fcport_by_nportid(sp->vha, _id, 1);
   392  if (fcport)
   393  /* There is no session with the swt */
   394  loop_id = fcport->loop_id;
   395  else
   396  loop_id = 0x;
   397  
   398  ql_log(ql_log_info, sp->vha, 0xfff9,
   399  "sp: %p, purex: %p, udam: %#llx, loop_id: 0x%x\n",
 > 400  sp, purex, udma, loop_id);
   401  
   402  els_pkt->entry_type = ELS_IOCB_TYPE;
   403  els_pkt->entry_count = 1;
   404  
   405  els_pkt->handle = sp->handle;
   406  els_pkt->nphdl = cpu_to_le16(loop_id);
   407  els_pkt->tx_dsd_cnt = cpu_to_le16(1);
   408  els_pkt->vp_index = purex->vp_idx;
   409  els_pkt->sof = EST_SOFI3;
   410  els_pkt->rcv_exchg_id = cpu_to_le32(purex->rx_xchg_addr);
   411  els_pkt->op_code = sp->cmd_type;
   412  els_pkt->did_lo = cpu_to_le16(purex->s_id[0] | (purex->s_id[1] 
<< 8));
   413  els_pkt->did_hi = purex->s_id[2];
   414  els_pkt->sid_hi = purex->d_id[2];
   415  els_pkt->sid_lo = cpu_to_le16(purex->d_id[0] | (purex->d_id[1] 
<< 8));
   416  
   417  if (sp->gen2 == ELS_ACC)
   418  els_pkt->cntl_flags = cpu_to_le16(EPD_ELS_ACC);
   419  else
   420  els_pkt->cntl_flags = cpu_to_le16(EPD_ELS_RJT);
   421  
   422  els_pkt->tx_bc = cpu_to_le32(sp->gen1);
   423  els_pkt->tx_dsd[0] = cpu_to_le32(LSD(udma));
   424  els_pkt->tx_dsd

Re: [PATCH 2/4] qla2xxx_nvmet: Added Makefile and Kconfig changes

2017-11-07 Thread kbuild test robot
Hi Anil,

I love your patch! Yet something to improve:

[auto build test ERROR on scsi/for-next]
[also build test ERROR on v4.14-rc8 next-20171106]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Himanshu-Madhani/qla2xxx-Add-FC-NVMe-Target-support/20171107-153645
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
config: xtensa-allmodconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 4.9.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=xtensa 

Note: the 
linux-review/Himanshu-Madhani/qla2xxx-Add-FC-NVMe-Target-support/20171107-153645
 HEAD 9c5e24e821aa40552221b3103bc914bc4cd42293 builds fine.
  It only hurts bisectibility.

All errors (new ones prefixed by >>):

   In file included from drivers/scsi//qla2xxx/qla_nvmet.c:14:0:
   drivers/scsi//qla2xxx/qla_nvmet.h:31:25: error: field 'nvme_cmd_iu' has 
incomplete type
 struct atio7_nvme_cmnd nvme_cmd_iu;
^
   drivers/scsi//qla2xxx/qla_nvmet.c: In function 'qlt_nvmet_ls_done':
   drivers/scsi//qla2xxx/qla_nvmet.c:48:46: error: 'struct ' has no 
member named 'cmd'
 struct qla_nvmet_cmd *tgt_cmd = nvme->u.nvme.cmd;
 ^
   drivers/scsi//qla2xxx/qla_nvmet.c:55:47: error: 'struct ' has no 
member named 'cmd'
  sp, sp->vha, nvme->u.nvme.desc, nvme->u.nvme.cmd);
  ^
   drivers/scsi//qla2xxx/qla_nvmet.c: In function 'qla_nvmet_ls_rsp':
   drivers/scsi//qla2xxx/qla_nvmet.c:92:13: error: 'SRB_NVMET_LS' undeclared 
(first use in this function)
 sp->type = SRB_NVMET_LS;
^
   drivers/scsi//qla2xxx/qla_nvmet.c:92:13: note: each undeclared identifier is 
reported only once for each function it appears in
   drivers/scsi//qla2xxx/qla_nvmet.c:100:14: error: 'struct ' has no 
member named 'exchange_address'
 nvme->u.nvme.exchange_address = tgt_cmd->atio.u.pt_ls4.exchange_address;
 ^
   drivers/scsi//qla2xxx/qla_nvmet.c:100:49: error: 'union ' has no 
member named 'pt_ls4'
 nvme->u.nvme.exchange_address = tgt_cmd->atio.u.pt_ls4.exchange_address;
^
   drivers/scsi//qla2xxx/qla_nvmet.c:101:14: error: 'struct ' has no 
member named 'nport_handle'
 nvme->u.nvme.nport_handle = tgt_cmd->atio.u.pt_ls4.nport_handle;
 ^
   drivers/scsi//qla2xxx/qla_nvmet.c:101:45: error: 'union ' has no 
member named 'pt_ls4'
 nvme->u.nvme.nport_handle = tgt_cmd->atio.u.pt_ls4.nport_handle;
^
   drivers/scsi//qla2xxx/qla_nvmet.c:102:14: error: 'struct ' has no 
member named 'vp_index'
 nvme->u.nvme.vp_index = tgt_cmd->atio.u.pt_ls4.vp_index;
 ^
   drivers/scsi//qla2xxx/qla_nvmet.c:102:41: error: 'union ' has no 
member named 'pt_ls4'
 nvme->u.nvme.vp_index = tgt_cmd->atio.u.pt_ls4.vp_index;
^
   drivers/scsi//qla2xxx/qla_nvmet.c:104:14: error: 'struct ' has no 
member named 'cmd'
 nvme->u.nvme.cmd = tgt_cmd; /* To be freed */
 ^
   drivers/scsi//qla2xxx/qla_nvmet.c: In function 'qla_nvmet_fcp_abort':
   drivers/scsi//qla2xxx/qla_nvmet.c:168:13: error: 'SRB_NVMET_SEND_ABTS' 
undeclared (first use in this function)
 sp->type = SRB_NVMET_SEND_ABTS;
^
   drivers/scsi//qla2xxx/qla_nvmet.c: In function 'qla_nvmet_create_targetport':
   drivers/scsi//qla2xxx/qla_nvmet.c:226:9: error: 'ql_dbg_nvme' undeclared 
(first use in this function)
 ql_dbg(ql_dbg_nvme, vha, 0xe081,
^
   drivers/scsi//qla2xxx/qla_nvmet.c:236:10: error: 'struct scsi_qla_host' has 
no member named 'targetport'
 >targetport);
 ^
   drivers/scsi//qla2xxx/qla_nvmet.c:243:41: error: 'struct scsi_qla_host' has 
no member named 'targetport'
 tport = (struct qla_nvmet_tgtport *)vha->targetport->private;
^
   drivers/scsi//qla2xxx/qla_nvmet.c: In function 'qla_nvmet_delete':
>> drivers/scsi//qla2xxx/qla_nvmet.c:261:17: error: 'volatile struct 
>> ' has no member named 'nvmet_enabled'
 if (!vha->flags.nvmet_enabled)
^
   drivers/scsi//qla2xxx/qla_nvmet.c:263:9: error: 'struct scsi_qla_host' has 
no member named 'targetport'
 if (vha->targetport) {
^
   drivers/scsi//qla2xxx/qla_nvmet.c:264:42: error: 'struct scsi_qla_host' has 
no member named 'targetport'
  tport = (struct qla_nvmet_tgtport *)vha->targetport->private;
 ^
   drivers/scsi//qla2xx

Re: [PATCH 4/4] qla2xxx_nvmet: Add FC-NVMe Target handling

2017-11-07 Thread kbuild test robot
Hi Anil,

I love your patch! Perhaps something to improve:

[auto build test WARNING on scsi/for-next]
[also build test WARNING on next-20171106]
[cannot apply to v4.14-rc8]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Himanshu-Madhani/qla2xxx-Add-FC-NVMe-Target-support/20171107-153645
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
config: i386-randconfig-x001-201745 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/scsi//qla2xxx/qla_target.c: In function 'qlt_send_els_resp':
>> drivers/scsi//qla2xxx/qla_target.c:399:36: warning: format '%llx' expects 
>> argument of type 'long long unsigned int', but argument 7 has type 
>> 'dma_addr_t {aka unsigned int}' [-Wformat=]
 "sp: %p, purex: %p, udam: %#llx, loop_id: 0x%x\n",
   ^
   drivers/scsi//qla2xxx/qla_target.c: In function 'qlt_nvme_els_done':
   drivers/scsi//qla2xxx/qla_target.c:444:50: warning: format '%llx' expects 
argument of type 'long long unsigned int', but argument 8 has type 'dma_addr_t 
{aka unsigned int}' [-Wformat=]
 "sp: %p vha: %p, dma_ptr: %p, dma_addr: %#llx, len: %#x\n",
 ^
   drivers/scsi//qla2xxx/qla_target.c: In function 'qlt_send_plogi_resp':
   drivers/scsi//qla2xxx/qla_target.c:474:63: warning: format '%llx' expects 
argument of type 'long long unsigned int', but argument 8 has type 'dma_addr_t 
{aka unsigned int}' [-Wformat=]
 "sp: %p, vha: %p, plogi_ack_buf: %p, plogi_ack_udma: %#llx\n",
  ^
>> drivers/scsi//qla2xxx/qla_target.c:488:40: warning: cast from pointer to 
>> integer of different size [-Wpointer-to-int-cast]
 ret = qla2x00_get_plogi_template(vha, (uint64_t)tmp, (116/4 - 1));
   ^
   drivers/scsi//qla2xxx/qla_target.c: In function 'qlt_process_logo':
   drivers/scsi//qla2xxx/qla_target.c:688:60: warning: format '%llx' expects 
argument of type 'long long unsigned int', but argument 8 has type 'dma_addr_t 
{aka unsigned int}' [-Wformat=]
 "sp: %p, vha: %p, logo_ack_buf: %p, logo_ack_buf: %#llx\n",
   ^
   drivers/scsi//qla2xxx/qla_target.c: In function 'qlt_process_prli':
   drivers/scsi//qla2xxx/qla_target.c:753:61: warning: format '%llx' expects 
argument of type 'long long unsigned int', but argument 8 has type 'dma_addr_t 
{aka unsigned int}' [-Wformat=]
 "sp: %p, vha: %p, prli_ack_buf: %p, prli_ack_udma: %#llx\n",
^

vim +399 drivers/scsi//qla2xxx/qla_target.c

   375  
   376  /* Send an ELS response */
   377  int qlt_send_els_resp(srb_t *sp, struct __els_pt *els_pkt)
   378  {
   379  struct purex_entry_24xx *purex = (struct purex_entry_24xx *)
   380  sp->u.snvme_els.ptr;
   381  dma_addr_t udma = sp->u.snvme_els.dma_addr;
   382  struct fc_port *fcport;
   383  port_id_t port_id;
   384  uint16_t loop_id;
   385  
   386  port_id.b.domain = purex->s_id[2];
   387  port_id.b.area   = purex->s_id[1];
   388  port_id.b.al_pa  = purex->s_id[0];
   389  port_id.b.rsvd_1 = 0;
   390  
   391  fcport = qla2x00_find_fcport_by_nportid(sp->vha, _id, 1);
   392  if (fcport)
   393  /* There is no session with the swt */
   394  loop_id = fcport->loop_id;
   395  else
   396  loop_id = 0x;
   397  
   398  ql_log(ql_log_info, sp->vha, 0xfff9,
 > 399  "sp: %p, purex: %p, udam: %#llx, loop_id: 0x%x\n",
   400  sp, purex, udma, loop_id);
   401  
   402  els_pkt->entry_type = ELS_IOCB_TYPE;
   403  els_pkt->entry_count = 1;
   404  
   405  els_pkt->handle = sp->handle;
   406  els_pkt->nphdl = cpu_to_le16(loop_id);
   407  els_pkt->tx_dsd_cnt = cpu_to_le16(1);
   408  els_pkt->vp_index = purex->vp_idx;
   409  els_pkt->sof = EST_SOFI3;
   410  els_pkt->rcv_exchg_id = cpu_to_le32(purex->rx_xchg_addr);
   411  els_pkt->op_code = sp->cmd_type;
   412  els_pkt->did_lo = cpu_to_le16(purex->s_id[0] | (purex->s_id[1] 
<< 8));
   413  els_pkt->did_hi = purex->s_id[2];
   414  els_pkt->sid_hi = purex->d_id[2];
   415  els_pkt->sid_lo = cpu_to_le16(purex->d_id[0] | (pur