Re: [PATCH] target: Fix missing complete during ABORT_TASK + CMD_T_FABRIC_STOP

2016-05-25 Thread Christoph Hellwig
I think the function could be further cut down, given that
 
 a) there is no need to take a lock to read a single field
 b) CMD_T_FABRIC_STOP is never cleared and we can check it later

See the incremental patch below:

diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index 69abcef..dad4ab8 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2550,26 +2550,18 @@ static void target_release_cmd_kref(struct kref *kref)
struct se_cmd *se_cmd = container_of(kref, struct se_cmd, cmd_kref);
struct se_session *se_sess = se_cmd->se_sess;
unsigned long flags;
-   bool fabric_stop;
 
spin_lock_irqsave(&se_sess->sess_cmd_lock, flags);
-
-   spin_lock(&se_cmd->t_state_lock);
-   fabric_stop = (se_cmd->transport_state & CMD_T_FABRIC_STOP);
-   spin_unlock(&se_cmd->t_state_lock);
-
-   if (se_cmd->cmd_wait_set || fabric_stop) {
-   list_del_init(&se_cmd->se_cmd_list);
-   spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
-   target_free_cmd_mem(se_cmd);
-   complete(&se_cmd->cmd_wait_comp);
-   return;
-   }
list_del_init(&se_cmd->se_cmd_list);
spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
 
target_free_cmd_mem(se_cmd);
-   se_cmd->se_tfo->release_cmd(se_cmd);
+
+   if (se_cmd->cmd_wait_set ||
+   (se_cmd->transport_state & CMD_T_FABRIC_STOP))
+   complete(&se_cmd->cmd_wait_comp);
+   else
+   se_cmd->se_tfo->release_cmd(se_cmd);
 }
 
 /* target_put_sess_cmd - Check for active I/O shutdown via kref_put
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 13/13] cxgbit: add files for cxgbit.ko

2016-05-25 Thread Nicholas A. Bellinger
On Wed, 2016-05-25 at 23:41 +0300, Or Gerlitz wrote:
> On Wed, May 25, 2016 at 8:40 PM, Steve Wise  
> wrote:
> >> From: Or Gerlitz [mailto:gerlitz...@gmail.com]
> >>



> .
> >
> > Hey Or and Nicholas,
> >
> > I've been staying out of this directly due to my workload.  But I will now
> > take a look at where and how we can refactor to reduce the code duplication.
> > Here are some thoughts with a quick glance:
> >
> > DDP is not used by iWARP.  DDP is for direct placement attempt on a TCP
> > streaming mode connection.  RDMA has its own way of doing this that doesn’t
> > utilize the same HW resources.  So I would say that the DDP usage could be
> > refactored such that there is a set of helper functions to do the common
> > logic, like managing page pod adapter resources.  An example would be
> > cxgb4i.c:ddp_ppod_write_idata(), and cxgbit_ppod_write_idata() from patch
> > 13.
> 
> sounds good
> 
> > Connection Management (CM) can definitely be refactored to share helper
> > functions for setting up/tearing down TCP connections.  But the iWARP logic
> > is very different from iscsi once the TCP connection is setup and the
> > connection moved into RDMA mode.  So the sharing of logic would be in
> > sending various CPL messages to the adapter to connect/accept/close/abort
> > connections.
> 
> Ditto
> 
> > Having said that, the initiator and target wouldn't share a
> > lot by virtue of the initiator only doing active side connection setup, and
> > the target only doing passive side connection setup.
> 
> Well, I am not sure. It's like saying that if you break the IB/RoCE CM
> to active side and passive side they will not share lots of code. If
> you look on the existing CM (drivers/infiniband/core/cm.c) and think
> how to break it to two, you'll see that you end up with two code bases
> to maintain with likelihood of same bugs and similar state machines to
> implement/maintain and debug. So in that sense, it's much (much)
> better to have one CM code which does both iscsi sides.
> 
> > So refactor would be
> > common services that iw_cxgb4, cxgbi4, and cxgbit all use.An example
> > would be: iw_cxgb4/cm.c:send_connect(), and cxgb4i.c/send_act_open_req().
> 
> good and one code base which treats both sides.
> 
> > I didn't look at LRO at this point.
> >
> > Anyway, none of these are particularly difficult, but will require
> > significant effort and time.  The current cxgbit series has had a lot of
> > internal review and testing by the Chelsio iscsi team, and it would be great
> > if this refactoring can be deferred with the promise that we will get it
> > done for the 4.8 merge window.  Thoughts?
> 
> We've been there, e.g Intel recently sent iWARP driver and throughout
> the review we realized that lots of the iwarp netlink code is shared
> between existing drivers and the new driver, so the driver didn't get
> in kernel X but rather X+1 after doing that fix, it's only off by
> one...
> 
> I don't think we should be letting duplication of that extent to get
> in, for Chelsio ppl that know the materials well better vs anyone else
> it should have been clear that they create that dup without any real
> justification and till  that moment they didn't come here to even
> comment on that. Lets have them fix that for the next merge window,
> that's my thinking, Nic?
> 

Varun + Co have made common improvements between existing software
iscsi-target RX + TX PDU handling and their new driver, and no further
comments for these prerequisites have been made.

The additional improvements discussed here so far are cxgb* specific,
and will not effect other drivers, and will not change configfs user ABI
layout compatibility within /sys/kernel/config/target/iscsi/.

That being the case, I think it's a reasonable starting point for
mainline users to consume target ISO on cxgb hardware, and for Chelsio
to make further common code improvements across their existing host
drivers.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tcm_qla2xxx Add SCSI command jammer/discard capability to the tcm_qla2xxx module

2016-05-25 Thread Nicholas A. Bellinger
On Wed, 2016-05-25 at 16:12 -0400, Laurence Oberman wrote:
> 



> Hi Nicholas,
> 
> Can we pull this one in for submission now that its been acked'd by Himanshu.
> Thanks
> Laurence

This patch has been queued in target-pending/for-next for the last
weeks, and will be included in the v4.7-rc1 PULL request.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


bio-based DM multipath is back from the dead [was: Re: Notes from the four separate IO track sessions at LSF/MM]

2016-05-25 Thread Mike Snitzer
On Thu, Apr 28 2016 at 11:40am -0400,
James Bottomley  wrote:

> On Thu, 2016-04-28 at 08:11 -0400, Mike Snitzer wrote:
> > Full disclosure: I'll be looking at reinstating bio-based DM multipath to
> > regain efficiencies that now really matter when issuing IO to extremely
> > fast devices (e.g. NVMe).  bio cloning is now very cheap (due to
> > immutable biovecs), coupled with the emerging multipage biovec work that
> > will help construct larger bios, so I think it is worth pursuing to at
> > least keep our options open.

Please see the 4 topmost commits I've published here:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.8

All request-based DM multipath support/advances have been completly
preserved.  I've just made it so that we can now have bio-based DM
multipath too.

All of the various modes have been tested using mptest:
https://github.com/snitm/mptest

> OK, but remember the reason we moved from bio to request was partly to
> be nearer to the device but also because at that time requests were
> accumulations of bios which had to be broken out, go back up the stack
> individually and be re-elevated, which adds to the inefficiency.  In
> theory the bio splitting work will mean that we only have one or two
> split bios per request (because they were constructed from a split up
> huge bio), but when we send them back to the top to be reconstructed as
> requests there's no guarantee that the split will be correct a second
> time around and we might end up resplitting the already split bios.  If
> you do reassembly into the huge bio again before resend down the next
> queue, that's starting to look like quite a lot of work as well.

I've not even delved into the level you're laser-focused on here.
But I'm struggling to grasp why multipath is any different than any
other bio-based device...

FYI, the paper I reference in my "dm mpath: reinstate bio-based support"
commit gets into what I've always thought the real justification was for
the transition from bio-based to request-based.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 13/13] cxgbit: add files for cxgbit.ko

2016-05-25 Thread Or Gerlitz
On Wed, May 25, 2016 at 8:40 PM, Steve Wise  wrote:
>> From: Or Gerlitz [mailto:gerlitz...@gmail.com]
>>
>> On Tue, May 24, 2016 at 9:40 AM, Nicholas A. Bellinger
>>  wrote:
>> > Hi Or & Co,
>> > On Wed, 2016-05-18 at 14:45 +0300, Or Gerlitz wrote:
>> >> On Sat, Apr 30, 2016 at 6:54 PM, Or Gerlitz  wrote:
>> >> > On Tue, Apr 19, 2016 at 9:30 PM, Varun Prakash  
>> >> > wrote:
>> >> >> cxgbit.h - This file contains data structure
>> >> >> definitions for cxgbit.ko.
>> >> >>
>> >> >> cxgbit_lro.h - This file contains data structure
>> >> >> definitions for LRO support.
>> >> >>
>> >> >> cxgbit_main.c - This file contains code for
>> >> >> registering with iscsi target transport and
>> >> >> cxgb4 driver.
>> >> >>
>> >> >> cxgbit_cm.c - This file contains code for
>> >> >> connection management.
>> >> >>
>> >> >> cxgbit_target.c - This file contains code
>> >> >> for processing iSCSI PDU.
>> >> >>
>> >> >> cxgbit_ddp.c - This file contains code for
>> >> >> Direct Data Placement.
>> >> >
>> >> > Wait,
>> >> >
>> >> > You are adding many K's LOCs to handle things like CM (connection
>> >> > management), DDP and LRO. But your upstream solution must be using CM
>> >> > and DDP (and LRO as well) for the HW offloaded initiator side as well,
>> >> > not to mention the iWARP side of things.
>> >> >
>> >> > There must be some way to refactor things instead of repeating the
>> >> > same bits over and over, thoughts?
>>
>> >> The author haven't responded... where that this stands from your point of 
>> >> view?
>>
>> > For an initial merge, I don't have an objection to this series wrt
>> > drivers/target/iscsi/* improvements + prerequisites, and new standalone
>> > cxgbit iscsit_transport driver.
>> >
>> > That said, there are areas between cxgbi + cxgbit code that can be made
>> > common as you've pointed out.  The Cheliso folks have mentioned off-list
>> > that cxgbi as-is in mainline does not support LRO, and that the majority
>> > of DDP logic is shared between initiator + target.
>> >
>> > Are there specific pieces of logic in DDP or iWARP for cxgb* that you'd
>> > like to see Varun + Co pursue as common code in v4.8+..?
>>
>> Hi Nic,
>>
>> As I wrote above, I have good reasons to believe that there are few K
>> LOCs of duplication
>> between this series to the chelsio hw iscsi initiator or the chelsio
>> iwarp driver or both (triple).
>>
>> Ys, I'd like to see a public response from Varun and Co on this valid
>> reviewer comment
>> before you proceed with this series, makes sense? There's no point to
>> duplicate the same
>> code in the kernel again and again. **Even** if there's one
>> duplication now, we don't want
>> another one.
>>
>> Or.
>
> Hey Or and Nicholas,
>
> I've been staying out of this directly due to my workload.  But I will now
> take a look at where and how we can refactor to reduce the code duplication.
> Here are some thoughts with a quick glance:
>
> DDP is not used by iWARP.  DDP is for direct placement attempt on a TCP
> streaming mode connection.  RDMA has its own way of doing this that doesn’t
> utilize the same HW resources.  So I would say that the DDP usage could be
> refactored such that there is a set of helper functions to do the common
> logic, like managing page pod adapter resources.  An example would be
> cxgb4i.c:ddp_ppod_write_idata(), and cxgbit_ppod_write_idata() from patch
> 13.

sounds good

> Connection Management (CM) can definitely be refactored to share helper
> functions for setting up/tearing down TCP connections.  But the iWARP logic
> is very different from iscsi once the TCP connection is setup and the
> connection moved into RDMA mode.  So the sharing of logic would be in
> sending various CPL messages to the adapter to connect/accept/close/abort
> connections.

Ditto

> Having said that, the initiator and target wouldn't share a
> lot by virtue of the initiator only doing active side connection setup, and
> the target only doing passive side connection setup.

Well, I am not sure. It's like saying that if you break the IB/RoCE CM
to active side and passive side they will not share lots of code. If
you look on the existing CM (drivers/infiniband/core/cm.c) and think
how to break it to two, you'll see that you end up with two code bases
to maintain with likelihood of same bugs and similar state machines to
implement/maintain and debug. So in that sense, it's much (much)
better to have one CM code which does both iscsi sides.

> So refactor would be
> common services that iw_cxgb4, cxgbi4, and cxgbit all use.An example
> would be: iw_cxgb4/cm.c:send_connect(), and cxgb4i.c/send_act_open_req().

good and one code base which treats both sides.

> I didn't look at LRO at this point.
>
> Anyway, none of these are particularly difficult, but will require
> significant effort and time.  The current cxgbit series has had a lot of
> internal review and testing by the Chelsio iscsi team, and it would be great
> if this refactoring can be deferred with the promise that w

Re: [PATCH] tcm_qla2xxx Add SCSI command jammer/discard capability to the tcm_qla2xxx module

2016-05-25 Thread Laurence Oberman


- Original Message -
> From: "Himanshu Madhani" 
> To: "Laurence Oberman" , "Nicholas A. Bellinger" 
> 
> Cc: "Bart Van Assche" , "linux-scsi" 
> , "target-devel"
> , "Quinn Tran" 
> Sent: Monday, May 9, 2016 1:08:36 PM
> Subject: Re: [PATCH]  tcm_qla2xxx Add SCSI command jammer/discard capability 
> to the tcm_qla2xxx module
> 
> On 5/9/16, 7:56 AM, "Laurence Oberman"  wrote:
> 
> 
> 
> >
> >
> >- Original Message -
> >> From: "Laurence Oberman" 
> >> To: "Nicholas A. Bellinger" 
> >> Cc: "Himanshu Madhani" , "Bart Van Assche"
> >> , "linux-scsi"
> >> , "target-devel"
> >> , "Quinn Tran" 
> >> Sent: Monday, April 4, 2016 6:50:03 PM
> >> Subject: Re: [PATCH]  tcm_qla2xxx Add SCSI command jammer/discard
> >> capability to the tcm_qla2xxx module
> >> 
> >> Hello Nicholas
> >> 
> >> Its fixed now.
> >> Many Thanks.
> >> 
> >> $ scripts/checkpatch.pl
> >> 0001-tcm_qla2xxx-Add-SCSI-command-jammer-discard-capabili.patch
> >> WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
> >> #12:
> >> new file mode 100644
> >> 
> >> total: 0 errors, 1 warnings, 91 lines checked
> >> 
> >> 0001-tcm_qla2xxx-Add-SCSI-command-jammer-discard-capabili.patch has style
> >> problems, please review.
> >> 
> >> NOTE: If any of the errors are false positives, please report
> >>   them to the maintainer, see CHECKPATCH in MAINTAINERS.
> >> 
> >> 
> >> 
> >> Tested by: Laurence Oberman 
> >> Signed-off-by: Laurence Oberman 
> >> ---
> >>  Documentation/scsi/tcm_qla2xxx.txt |   22 ++
> >>  drivers/scsi/qla2xxx/Kconfig   |9 +
> >>  drivers/scsi/qla2xxx/tcm_qla2xxx.c |   20 
> >>  drivers/scsi/qla2xxx/tcm_qla2xxx.h |1 +
> >>  4 files changed, 52 insertions(+), 0 deletions(-)
> >>  create mode 100644 Documentation/scsi/tcm_qla2xxx.txt
> >> 
> >> diff --git a/Documentation/scsi/tcm_qla2xxx.txt
> >> b/Documentation/scsi/tcm_qla2xxx.txt
> >> new file mode 100644
> >> index 000..c3a670a
> >> --- /dev/null
> >> +++ b/Documentation/scsi/tcm_qla2xxx.txt
> >> @@ -0,0 +1,22 @@
> >> +tcm_qla2xxx jam_host attribute
> >> +--
> >> +There is now a new module endpoint atribute called jam_host
> >> +attribute: jam_host: boolean=0/1
> >> +This attribute and accompanying code is only included if the
> >> +Kconfig parameter TCM_QLA2XXX_DEBUG is set to Y
> >> +By default this jammer code and functionality is disabled
> >> +
> >> +Use this attribute to control the discarding of SCSI commands to a
> >> +selected host.
> >> +This may be useful for testing error handling and simulating slow drain
> >> +and other fabric issues.
> >> +
> >> +Setting a boolean of 1 for the jam_host attribute for a particular host
> >> + will discard the commands for that host.
> >> +Reset back to 0 to stop the jamming.
> >> +
> >> +Enable host 4 to be jammed
> >> +echo 1 >
> >> /sys/kernel/config/target/qla2xxx/21:00:00:24:ff:27:8f:ae/tpgt_1/attrib/jam_host
> >> +
> >> +Disable jamming on host 4
> >> +echo 0 >
> >> /sys/kernel/config/target/qla2xxx/21:00:00:24:ff:27:8f:ae/tpgt_1/attrib/jam_host
> >> diff --git a/drivers/scsi/qla2xxx/Kconfig b/drivers/scsi/qla2xxx/Kconfig
> >> index 10aa18b..67c0d5a 100644
> >> --- a/drivers/scsi/qla2xxx/Kconfig
> >> +++ b/drivers/scsi/qla2xxx/Kconfig
> >> @@ -36,3 +36,12 @@ config TCM_QLA2XXX
> >>default n
> >>---help---
> >>Say Y here to enable the TCM_QLA2XXX fabric module for QLogic 24xx+
> >>series
> >>target mode HBAs
> >> +
> >> +if TCM_QLA2XXX
> >> +config TCM_QLA2XXX_DEBUG
> >> +  bool "TCM_QLA2XXX fabric module DEBUG mode for QLogic 24xx+ series
> >> target
> >> mode HBAs"
> >> +  default n
> >> +  ---help---
> >> +  Say Y here to enable the TCM_QLA2XXX fabric module DEBUG for QLogic
> >> 24xx+
> >> series target mode HBAs
> >> +  This will include code to enable the SCSI command jammer
> >> +endif
> >> diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c
> >> b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
> >> index 1808a01..948224e 100644
> >> --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c
> >> +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
> >> @@ -457,6 +457,10 @@ static int tcm_qla2xxx_handle_cmd(scsi_qla_host_t
> >> *vha,
> >> struct qla_tgt_cmd *cmd,
> >>struct se_cmd *se_cmd = &cmd->se_cmd;
> >>struct se_session *se_sess;
> >>struct qla_tgt_sess *sess;
> >> +#ifdef CONFIG_TCM_QLA2XXX_DEBUG
> >> +  struct se_portal_group *se_tpg;
> >> +  struct tcm_qla2xxx_tpg *tpg;
> >> +#endif
> >>int flags = TARGET_SCF_ACK_KREF;
> >>  
> >>if (bidi)
> >> @@ -477,6 +481,15 @@ static int tcm_qla2xxx_handle_cmd(scsi_qla_host_t
> >> *vha,
> >> struct qla_tgt_cmd *cmd,
> >>return -EINVAL;
> >>}
> >>  
> >> +#ifdef CONFIG_TCM_QLA2XXX_DEBUG
> >> +  se_tpg = se_sess->se_tpg;
> >> +  tpg = container_of(se_tpg, struct tcm_qla2xxx_tpg, se_tpg);
> >> +  if (unlikely(tpg->tpg_attrib.jam_host)) {
> >> +  /* return, and dont run target_submit_cmd,discarding command */
> >> +  return 0

[PATCH] target: Fix missing complete during ABORT_TASK + CMD_T_FABRIC_STOP

2016-05-25 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

During transport_generic_free_cmd() with a concurrent TMR
ABORT_TASK and shutdown CMD_T_FABRIC_STOP bit set, the
caller will be blocked on se_cmd->cmd_wait_stop completion
until the final kref_put() -> target_release_cmd_kref()
has been invoked to call complete().

However, when ABORT_TASK is completed with FUNCTION_COMPLETE
in core_tmr_abort_task(), the aborted se_cmd will have already
been removed from se_sess->sess_cmd_list via list_del_init().

This results in target_release_cmd_kref() hitting the
legacy list_empty() == true check, invoking ->release_cmd()
but skipping complete() to wakeup se_cmd->cmd_wait_stop
blocked earlier in transport_generic_free_cmd() code.

To address this bug, it's safe to go ahead and drop the
original list_empty() check so that fabric_stop invokes
the complete() as expected, since list_del_init() can
safely be used on a empty list.

Cc: Mike Christie 
Cc: Quinn Tran 
Cc: Himanshu Madhani 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: sta...@vger.kernel.org # 3.14+
Signed-off-by: Nicholas Bellinger 
---
 drivers/target/target_core_transport.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index f3e93dd..019d34f 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2578,12 +2578,6 @@ static void target_release_cmd_kref(struct kref *kref)
bool fabric_stop;
 
spin_lock_irqsave(&se_sess->sess_cmd_lock, flags);
-   if (list_empty(&se_cmd->se_cmd_list)) {
-   spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
-   target_free_cmd_mem(se_cmd);
-   se_cmd->se_tfo->release_cmd(se_cmd);
-   return;
-   }
 
spin_lock(&se_cmd->t_state_lock);
fabric_stop = (se_cmd->transport_state & CMD_T_FABRIC_STOP);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Converting ipr to use ata_port_operations->error_handler

2016-05-25 Thread Tejun Heo
Hello, Brian.

So, of all the ata drivers, ipr seems to be the only driver which
doesn't implement ata_port_opeations->error_handler and thus depends
on the old error handling path. I'm wondering whether it'd be possible
to convert ipr to implement ->error_handler and drop the old path.
Would that be difficult?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] mpt3sas - avoid mpt3sas_transport_port_add NULL parent_dev

2016-05-25 Thread Joe Lawrence
If _scsih_sas_host_add's call to mpt3sas_config_get_sas_iounit_pg0
fails, ioc->sas_hba.parent_dev may be left uninitialized.  A later
device probe could invoke mpt3sas_transport_port_add which will call
sas_port_alloc_num [scsi_transport_sas] with a NULL parent_dev pointer.

Signed-off-by: Joe Lawrence 
---
 drivers/scsi/mpt3sas/mpt3sas_transport.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_transport.c 
b/drivers/scsi/mpt3sas/mpt3sas_transport.c
index 6a84b82d71bb..ff93286bc32f 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_transport.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_transport.c
@@ -705,6 +705,11 @@ mpt3sas_transport_port_add(struct MPT3SAS_ADAPTER *ioc, 
u16 handle,
goto out_fail;
}
 
+   if (!sas_node->parent_dev) {
+   pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n",
+   ioc->name, __FILE__, __LINE__, __func__);
+   goto out_fail;
+   }
port = sas_port_alloc_num(sas_node->parent_dev);
if ((sas_port_add(port))) {
pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n",
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] mpt3sas - set num_phys after allocating phy[] space

2016-05-25 Thread Joe Lawrence
In _scsih_sas_host_add, the number of HBA phys are determined and then
later used to allocate an array of struct  _sas_phy's.  If the routine
sets ioc->sas_hba.num_phys, but then fails to allocate the
ioc->sas_hba.phy array (by kcalloc error or other intermediate
error/exit path), ioc->sas_hba is left in a dangerous state: all readers
of ioc->sas_hba.phy[] do so by indexing it from 0..ioc->sas_hba.num_phys
without checking that the space was ever allocated.

Modify _scsih_sas_host_add to set ioc->sas_hba.num_phys only after
successfully allocating ioc->sas_hba.phy[].

Signed-off-by: Joe Lawrence 
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index f2139e5604a3..6e36d20c9e0b 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -4893,13 +4893,22 @@ _scsih_sas_host_add(struct MPT3SAS_ADAPTER *ioc)
u16 ioc_status;
u16 sz;
u8 device_missing_delay;
+   u8 num_phys;
 
-   mpt3sas_config_get_number_hba_phys(ioc, &ioc->sas_hba.num_phys);
-   if (!ioc->sas_hba.num_phys) {
+   mpt3sas_config_get_number_hba_phys(ioc, &num_phys);
+   if (!num_phys) {
pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n",
ioc->name, __FILE__, __LINE__, __func__);
return;
}
+   ioc->sas_hba.phy = kcalloc(num_phys,
+   sizeof(struct _sas_phy), GFP_KERNEL);
+   if (!ioc->sas_hba.phy) {
+   pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n",
+   ioc->name, __FILE__, __LINE__, __func__);
+   goto out;
+   }
+   ioc->sas_hba.num_phys = num_phys;
 
/* sas_iounit page 0 */
sz = offsetof(Mpi2SasIOUnitPage0_t, PhyData) + (ioc->sas_hba.num_phys *
@@ -4959,13 +4968,6 @@ _scsih_sas_host_add(struct MPT3SAS_ADAPTER *ioc)
MPI2_SASIOUNIT1_REPORT_MISSING_TIMEOUT_MASK;
 
ioc->sas_hba.parent_dev = &ioc->shost->shost_gendev;
-   ioc->sas_hba.phy = kcalloc(ioc->sas_hba.num_phys,
-   sizeof(struct _sas_phy), GFP_KERNEL);
-   if (!ioc->sas_hba.phy) {
-   pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n",
-   ioc->name, __FILE__, __LINE__, __func__);
-   goto out;
-   }
for (i = 0; i < ioc->sas_hba.num_phys ; i++) {
if ((mpt3sas_config_get_phy_pg0(ioc, &mpi_reply, &phy_pg0,
i))) {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] blk mq: take ref to q when running it

2016-05-25 Thread Mike Christie
On 05/25/2016 10:53 AM, Bart Van Assche wrote:
> On 05/25/2016 12:54 AM, mchri...@redhat.com wrote:
>> If the __blk_mq_run_hw_queue is a result of a async run or retry
>> then we do not have a reference to the queue. At this time,
>> blk_cleanup_queue could begin tearing it down while the queue_rq
>> function is being run.
>>
>> This patch just has us do a blk_queue_enter call.
>>
>> Signed-off-by: Mike Christie 
>> ---
>>  block/blk-mq.c | 10 +-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 7df9c92..5958a02 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -738,8 +738,13 @@ static void __blk_mq_run_hw_queue(struct
>> blk_mq_hw_ctx *hctx)
>>
>>  WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask));
>>
>> -if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
>> +if (blk_queue_enter(q, false)) {
>> +blk_mq_run_hw_queue(hctx, true);
>>  return;
>> +}
>> +
>> +if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
>> +goto exit_queue;
>>
>>  hctx->run++;
>>
>> @@ -832,6 +837,9 @@ static void __blk_mq_run_hw_queue(struct
>> blk_mq_hw_ctx *hctx)
>>   **/
>>  blk_mq_run_hw_queue(hctx, true);
>>  }
>> +
>> +exit_queue:
>> +blk_queue_exit(q);
>>  }
> 
> Hello Mike,
> 
> blk_cleanup_queue() waits until delay_work and run_work have finished
> inside blk_sync_queue(). The code in blk_cleanup_queue() before the
> blk_sync_queue() call should be safe to execute concurrently with
> queue_rq(). Or did I perhaps miss something?
> 

My concern was when blk_mq_run_hw_queue is run with async=false then we
are not always run from the work queue or a under another
blk_queue_enter call already.

For example, in the scsi cases where we call
scsi_run_queue->blk_mq_start_stopped_hw_queues we can be in soft irq
context, some driver thread or the scsi eh thread.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] _scsih_sas_host_add early exits can crash system

2016-05-25 Thread Joe Lawrence
There are many error paths in _scsih_sas_host_add that lead to an early
exit and a few that leave IOC resources uninitialized, setting the stage
for a later crash.

This can be emulated using a systemtap script like:

  % stap -g -e \
  'probe module("mpt3sas").function("mpt3sas_config_get_sas_iounit_pg0").return 
{ $return = -1 }'

to force early exit, while remove/re-adding an MPT3 adapter:

  % lspci -D | grep MPT
  :54:00.0 Mass storage controller: LSI Logic / Symbios Logic SAS3008 
PCI-Express Fusion-MPT SAS-3 (rev 02)

  % SYSFS=$(find /sys/devices -name :54:00.0)
  % SYSFS_PARENT=$(dirname $SYSFS)

  % echo 1 > $SYSFS/remove
  % sleep 1m
  % echo 1 > $SYSFS_PARENT/rescan

These two patches fix:
  1) referencing unallocated ioc->sas_hba.phy[] space
  2) passing a NULL ioc->sas_hba.parent_dev to the scsi_transport_sas
 layer.

Note: these changes don't improve or retry adapter initialization, but
  only try to prevent the system from crashing

Joe Lawrence (2):
  mpt3sas - set num_phys after allocating phy[] space
  mpt3sas - avoid mpt3sas_transport_port_add NULL parent_dev

 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 20 +++-
 drivers/scsi/mpt3sas/mpt3sas_transport.c |  5 +
 2 files changed, 16 insertions(+), 9 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] blk mq: take ref to q when running it

2016-05-25 Thread Mike Christie
On 05/25/2016 02:15 PM, Mike Christie wrote:
> On 05/25/2016 10:53 AM, Bart Van Assche wrote:
>> On 05/25/2016 12:54 AM, mchri...@redhat.com wrote:
>>> If the __blk_mq_run_hw_queue is a result of a async run or retry
>>> then we do not have a reference to the queue. At this time,
>>> blk_cleanup_queue could begin tearing it down while the queue_rq
>>> function is being run.
>>>
>>> This patch just has us do a blk_queue_enter call.
>>>
>>> Signed-off-by: Mike Christie 
>>> ---
>>>  block/blk-mq.c | 10 +-
>>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 7df9c92..5958a02 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -738,8 +738,13 @@ static void __blk_mq_run_hw_queue(struct
>>> blk_mq_hw_ctx *hctx)
>>>
>>>  WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask));
>>>
>>> -if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
>>> +if (blk_queue_enter(q, false)) {
>>> +blk_mq_run_hw_queue(hctx, true);
>>>  return;
>>> +}
>>> +
>>> +if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
>>> +goto exit_queue;
>>>
>>>  hctx->run++;
>>>
>>> @@ -832,6 +837,9 @@ static void __blk_mq_run_hw_queue(struct
>>> blk_mq_hw_ctx *hctx)
>>>   **/
>>>  blk_mq_run_hw_queue(hctx, true);
>>>  }
>>> +
>>> +exit_queue:
>>> +blk_queue_exit(q);
>>>  }
>>
>> Hello Mike,
>>
>> blk_cleanup_queue() waits until delay_work and run_work have finished
>> inside blk_sync_queue(). The code in blk_cleanup_queue() before the
>> blk_sync_queue() call should be safe to execute concurrently with
>> queue_rq(). Or did I perhaps miss something?
>>
> 
> My concern was when blk_mq_run_hw_queue is run with async=false then we
> are not always run from the work queue or a under another
> blk_queue_enter call already.
> 
> For example, in the scsi cases where we call
> scsi_run_queue->blk_mq_start_stopped_hw_queues we can be in soft irq
> context, some driver thread or the scsi eh thread.

Doh, ok. I guess if my analysis of the code is correct, then I should
just move the blk_queue_enter call into blk_mq_run_hw_queue's
async=false code path. I am not sure what I was thinking.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] block: add queue reset support

2016-05-25 Thread Mike Christie
On 05/25/2016 11:13 AM, Bart Van Assche wrote:
> On 05/25/2016 12:55 AM, mchri...@redhat.com wrote:
>> +spin_lock_irq(q->queue_lock);
>> +wait_event_lock_irq(q->reset_wq,
>> +!queue_flag_test_and_set(QUEUE_FLAG_RESETTING, q),
>> +*q->queue_lock);
>> +if (blk_queue_dead(q)) {
>> +rc = -ENODEV;
>> +spin_unlock_irq(q->queue_lock);
>> +goto done;
>> +}
>> +spin_unlock_irq(q->queue_lock);
> 
> Is the flag QUEUE_FLAG_RESETTING used to serialize blk_reset_queue()
> calls? If so, please consider using a mutex instead. Lockdep can detect
> lock inversion for mutexes and spinlocks but not for serialization that
> is based on wait_event_lock_irq() loops.

Will do.

> 
>> +if (q->mq_ops) {
>> +blk_mq_stop_hw_queues(q);
>> +blk_mq_freeze_queue(q);
>> +
>> +eh_rc = q->mq_ops->reset(q);
>> +
>> +blk_mq_unfreeze_queue(q);
>> +blk_mq_start_stopped_hw_queues(q, true);
>> +} else if (q->reset_fn) {
>> +spin_lock_irq(q->queue_lock);
>> +blk_stop_queue(q);
>> +spin_unlock_irq(q->queue_lock);
>> +
>> +while (q->request_fn_active)
>> +msleep(10);
>> +
>> +eh_rc = q->reset_fn(q);
>> +
>> +spin_lock_irq(q->queue_lock);
>> +blk_start_queue(q);
>> +spin_unlock_irq(q->queue_lock);
>> +}
> 
> Some of the above code is similar to some of the code in
> scsi_internal_device_block() and scsi_internal_device_unblock(). Have
> you considered to avoid this duplication by introducing helpers in the
> block layer for blocking and unblocking queues?
> 

Will do. Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 13/13] cxgbit: add files for cxgbit.ko

2016-05-25 Thread Steve Wise
> From: Or Gerlitz [mailto:gerlitz...@gmail.com]
> 
> On Tue, May 24, 2016 at 9:40 AM, Nicholas A. Bellinger
>  wrote:
> > Hi Or & Co,
> > On Wed, 2016-05-18 at 14:45 +0300, Or Gerlitz wrote:
> >> On Sat, Apr 30, 2016 at 6:54 PM, Or Gerlitz  wrote:
> >> > On Tue, Apr 19, 2016 at 9:30 PM, Varun Prakash  wrote:
> >> >> cxgbit.h - This file contains data structure
> >> >> definitions for cxgbit.ko.
> >> >>
> >> >> cxgbit_lro.h - This file contains data structure
> >> >> definitions for LRO support.
> >> >>
> >> >> cxgbit_main.c - This file contains code for
> >> >> registering with iscsi target transport and
> >> >> cxgb4 driver.
> >> >>
> >> >> cxgbit_cm.c - This file contains code for
> >> >> connection management.
> >> >>
> >> >> cxgbit_target.c - This file contains code
> >> >> for processing iSCSI PDU.
> >> >>
> >> >> cxgbit_ddp.c - This file contains code for
> >> >> Direct Data Placement.
> >> >
> >> > Wait,
> >> >
> >> > You are adding many K's LOCs to handle things like CM (connection
> >> > management), DDP and LRO. But your upstream solution must be using CM
> >> > and DDP (and LRO as well) for the HW offloaded initiator side as well,
> >> > not to mention the iWARP side of things.
> >> >
> >> > There must be some way to refactor things instead of repeating the
> >> > same bits over and over, thoughts?
> 
> >> The author haven't responded... where that this stands from your point of 
> >> view?
> 
> > For an initial merge, I don't have an objection to this series wrt
> > drivers/target/iscsi/* improvements + prerequisites, and new standalone
> > cxgbit iscsit_transport driver.
> >
> > That said, there are areas between cxgbi + cxgbit code that can be made
> > common as you've pointed out.  The Cheliso folks have mentioned off-list
> > that cxgbi as-is in mainline does not support LRO, and that the majority
> > of DDP logic is shared between initiator + target.
> >
> > Are there specific pieces of logic in DDP or iWARP for cxgb* that you'd
> > like to see Varun + Co pursue as common code in v4.8+..?
> 
> Hi Nic,
> 
> As I wrote above, I have good reasons to believe that there are few K
> LOCs of duplication
> between this series to the chelsio hw iscsi initiator or the chelsio
> iwarp driver or both (triple).
> 
> Ys, I'd like to see a public response from Varun and Co on this valid
> reviewer comment
> before you proceed with this series, makes sense? There's no point to
> duplicate the same
> code in the kernel again and again. **Even** if there's one
> duplication now, we don't want
> another one.
> 
> Or.

Hey Or and Nicholas,

I've been staying out of this directly due to my workload.  But I will now 
take a look at where and how we can refactor to reduce the code duplication. 
Here are some thoughts with a quick glance:

DDP is not used by iWARP.  DDP is for direct placement attempt on a TCP 
streaming mode connection.  RDMA has its own way of doing this that doesn’t 
utilize the same HW resources.  So I would say that the DDP usage could be 
refactored such that there is a set of helper functions to do the common 
logic, like managing page pod adapter resources.  An example would be 
cxgb4i.c:ddp_ppod_write_idata(), and cxgbit_ppod_write_idata() from patch 
13.

Connection Management (CM) can definitely be refactored to share helper 
functions for setting up/tearing down TCP connections.  But the iWARP logic 
is very different from iscsi once the TCP connection is setup and the 
connection moved into RDMA mode.  So the sharing of logic would be in 
sending various CPL messages to the adapter to connect/accept/close/abort 
connections.  Having said that, the initiator and target wouldn't share a 
lot by virtue of the initiator only doing active side connection setup, and 
the target only doing passive side connection setup.  So refactor would be 
common services that iw_cxgb4, cxgbi4, and cxgbit all use.An example 
would be: iw_cxgb4/cm.c:send_connect(), and cxgb4i.c/send_act_open_req().

I didn't look at LRO at this point.

Anyway, none of these are particularly difficult, but will require 
significant effort and time.  The current cxgbit series has had a lot of 
internal review and testing by the Chelsio iscsi team, and it would be great 
if this refactoring can be deferred with the promise that we will get it 
done for the 4.8 merge window.  Thoughts?

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] block: add queue reset support

2016-05-25 Thread Bart Van Assche

On 05/25/2016 12:55 AM, mchri...@redhat.com wrote:

+   spin_lock_irq(q->queue_lock);
+   wait_event_lock_irq(q->reset_wq,
+   !queue_flag_test_and_set(QUEUE_FLAG_RESETTING, q),
+   *q->queue_lock);
+   if (blk_queue_dead(q)) {
+   rc = -ENODEV;
+   spin_unlock_irq(q->queue_lock);
+   goto done;
+   }
+   spin_unlock_irq(q->queue_lock);


Is the flag QUEUE_FLAG_RESETTING used to serialize blk_reset_queue() 
calls? If so, please consider using a mutex instead. Lockdep can detect 
lock inversion for mutexes and spinlocks but not for serialization that 
is based on wait_event_lock_irq() loops.



+   if (q->mq_ops) {
+   blk_mq_stop_hw_queues(q);
+   blk_mq_freeze_queue(q);
+
+   eh_rc = q->mq_ops->reset(q);
+
+   blk_mq_unfreeze_queue(q);
+   blk_mq_start_stopped_hw_queues(q, true);
+   } else if (q->reset_fn) {
+   spin_lock_irq(q->queue_lock);
+   blk_stop_queue(q);
+   spin_unlock_irq(q->queue_lock);
+
+   while (q->request_fn_active)
+   msleep(10);
+
+   eh_rc = q->reset_fn(q);
+
+   spin_lock_irq(q->queue_lock);
+   blk_start_queue(q);
+   spin_unlock_irq(q->queue_lock);
+   }


Some of the above code is similar to some of the code in 
scsi_internal_device_block() and scsi_internal_device_unblock(). Have 
you considered to avoid this duplication by introducing helpers in the 
block layer for blocking and unblocking queues?


Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] blk mq: take ref to q when running it

2016-05-25 Thread Bart Van Assche

On 05/25/2016 12:54 AM, mchri...@redhat.com wrote:

If the __blk_mq_run_hw_queue is a result of a async run or retry
then we do not have a reference to the queue. At this time,
blk_cleanup_queue could begin tearing it down while the queue_rq
function is being run.

This patch just has us do a blk_queue_enter call.

Signed-off-by: Mike Christie 
---
 block/blk-mq.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7df9c92..5958a02 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -738,8 +738,13 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx 
*hctx)

WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask));

-   if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
+   if (blk_queue_enter(q, false)) {
+   blk_mq_run_hw_queue(hctx, true);
return;
+   }
+
+   if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
+   goto exit_queue;

hctx->run++;

@@ -832,6 +837,9 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx 
*hctx)
 **/
blk_mq_run_hw_queue(hctx, true);
}
+
+exit_queue:
+   blk_queue_exit(q);
 }


Hello Mike,

blk_cleanup_queue() waits until delay_work and run_work have finished 
inside blk_sync_queue(). The code in blk_cleanup_queue() before the 
blk_sync_queue() call should be safe to execute concurrently with 
queue_rq(). Or did I perhaps miss something?


Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 13/13] cxgbit: add files for cxgbit.ko

2016-05-25 Thread Or Gerlitz
On Tue, May 24, 2016 at 9:40 AM, Nicholas A. Bellinger
 wrote:
> Hi Or & Co,
> On Wed, 2016-05-18 at 14:45 +0300, Or Gerlitz wrote:
>> On Sat, Apr 30, 2016 at 6:54 PM, Or Gerlitz  wrote:
>> > On Tue, Apr 19, 2016 at 9:30 PM, Varun Prakash  wrote:
>> >> cxgbit.h - This file contains data structure
>> >> definitions for cxgbit.ko.
>> >>
>> >> cxgbit_lro.h - This file contains data structure
>> >> definitions for LRO support.
>> >>
>> >> cxgbit_main.c - This file contains code for
>> >> registering with iscsi target transport and
>> >> cxgb4 driver.
>> >>
>> >> cxgbit_cm.c - This file contains code for
>> >> connection management.
>> >>
>> >> cxgbit_target.c - This file contains code
>> >> for processing iSCSI PDU.
>> >>
>> >> cxgbit_ddp.c - This file contains code for
>> >> Direct Data Placement.
>> >
>> > Wait,
>> >
>> > You are adding many K's LOCs to handle things like CM (connection
>> > management), DDP and LRO. But your upstream solution must be using CM
>> > and DDP (and LRO as well) for the HW offloaded initiator side as well,
>> > not to mention the iWARP side of things.
>> >
>> > There must be some way to refactor things instead of repeating the
>> > same bits over and over, thoughts?

>> The author haven't responded... where that this stands from your point of 
>> view?

> For an initial merge, I don't have an objection to this series wrt
> drivers/target/iscsi/* improvements + prerequisites, and new standalone
> cxgbit iscsit_transport driver.
>
> That said, there are areas between cxgbi + cxgbit code that can be made
> common as you've pointed out.  The Cheliso folks have mentioned off-list
> that cxgbi as-is in mainline does not support LRO, and that the majority
> of DDP logic is shared between initiator + target.
>
> Are there specific pieces of logic in DDP or iWARP for cxgb* that you'd
> like to see Varun + Co pursue as common code in v4.8+..?

Hi Nic,

As I wrote above, I have good reasons to believe that there are few K
LOCs of duplication
between this series to the chelsio hw iscsi initiator or the chelsio
iwarp driver or both (triple).

Ys, I'd like to see a public response from Varun and Co on this valid
reviewer comment
before you proceed with this series, makes sense? There's no point to
duplicate the same
code in the kernel again and again. **Even** if there's one
duplication now, we don't want
another one.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


sd: ZBC Update Read/Write Commands to SCSI spec

2016-05-25 Thread Petros Koutoupis
This patch is based on:
git.kernel.org:/pub/scm/linux/kernel/git/hare/scsi-devel branch zbc.v6

According to the Zoned Block Commands (ZBC) Revision 5 SCSI specification,
a Host Managed and Host Aware device is limited to support at a minimum the
READ (16) and WRITE (16) read/write CDBs. This means that a drive manufacturer
is not obligated to support anything else. We shouldn't assume that we can
send other read/write commands and should instead limit sd to only the
mandatory list highlighted in the specification.

Note, I did not place the a check for zoned devices in the DIF-enabled
conditional above, under the assumption that if the media supports Type-2
then it should also support the 32-byte commands. Thoughts?

Signed-off-by: Petros Koutoupis 

--- scsi-devel/drivers/scsi/sd.c.orig   2016-05-12 23:21:04.31400 -0500
+++ scsi-devel/drivers/scsi/sd.c2016-05-25 08:55:21.715506056 -0500
@@ -1159,7 +1159,8 @@ static int sd_setup_read_write_cmnd(stru
    SCpnt->cmnd[29] = (unsigned char) (this_count >> 16) & 0xff;
    SCpnt->cmnd[30] = (unsigned char) (this_count >> 8) & 0xff;
    SCpnt->cmnd[31] = (unsigned char) this_count & 0xff;
-   } else if (sdp->use_16_for_rw || (this_count > 0x)) {
+   } else if (sdp->use_16_for_rw || (this_count > 0x) ||
+      sdkp->zoned == 1 || sdp->type == TYPE_ZBC) {
    SCpnt->cmnd[0] += READ_16 - READ_6;
    SCpnt->cmnd[1] = protect | ((rq->cmd_flags & REQ_FUA) ? 0x8 : 
0);
    SCpnt->cmnd[2] = sizeof(block) > 4 ? (unsigned char) (block >> 
56) & 0xff : 0;
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] ibmvscsis: clean up functions

2016-05-25 Thread Joe Perches
On Wed, 2016-05-25 at 09:17 -0500, Bryant G. Ly wrote:
> From: bryantly 

Please use your whole name here and for your sign-off like:

From: Bryant G. Ly 
Signed-off-by: Bryant G. Ly 

> This patch removes forward declarations and re-organizes the
> functions within the driver. This patch also fixes MAINTAINERS
> for ibmvscsis.

trivial note:

> diff --git a/drivers/scsi/ibmvscsi/ibmvscsis.c 
> b/drivers/scsi/ibmvscsi/ibmvscsis.c
[]
>  static inline long h_copy_rdma(s64 length, u64 sliobn, u64 slioba,
>      u64 dliobn, u64 dlioba)
>  {
> 

Functions like this would be less indented and less
line wrapped for 80 columns if they were written:

if (!se_cmd->residual_count)
return;

[unindented one level...]

etc...

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] ibmvscsis: Addressing Bart's comments

2016-05-25 Thread Bryant G. Ly
From: bryantly 

This patch contains cleaning up the code for styling and also addresses Bart's 
comments.

Signed-off-by: bryantly 
---
 MAINTAINERS   |   4 +-
 drivers/scsi/Kconfig  |  36 +--
 drivers/scsi/Makefile |   4 +-
 drivers/scsi/ibmvscsi/Makefile|   3 +-
 drivers/scsi/ibmvscsi/ibmvscsis.c | 450 ++
 drivers/scsi/ibmvscsi/ibmvscsis.h |  40 ++--
 drivers/scsi/ibmvscsi/libsrp.c| 386 
 drivers/scsi/ibmvscsi/libsrp.h|  91 
 drivers/scsi/libsrp.c | 387 
 include/scsi/libsrp.h |  95 
 10 files changed, 737 insertions(+), 759 deletions(-)
 create mode 100644 drivers/scsi/ibmvscsi/libsrp.c
 create mode 100644 drivers/scsi/ibmvscsi/libsrp.h
 delete mode 100644 drivers/scsi/libsrp.c
 delete mode 100644 include/scsi/libsrp.h

diff --git a/MAINTAINERS b/MAINTAINERS
index b520e6c..a11905c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5387,9 +5387,9 @@ L:linux-scsi@vger.kernel.org
 L: target-de...@vger.kernel.org
 S: Supported
 F: drivers/scsi/ibmvscsi/ibmvscsis.c
-F:  drivers/scsi/ibmvscsi/ibmvscsis.h
+F: drivers/scsi/ibmvscsi/ibmvscsis.h
 F: drivers/scsi/libsrp.h
-F:  drivers/scsi/libsrp.c
+F: drivers/scsi/libsrp.c
 
 IBM Power Virtual FC Device Drivers
 M: Tyrel Datwyler 
diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 6adf8f1..1221f6b 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -848,18 +848,21 @@ config SCSI_IBMVSCSI
  module will be called ibmvscsi.
 
 config SCSI_IBMVSCSIS
-   tristate "IBM Virtual SCSI Server support"
-   depends on PPC_PSERIES && SCSI_SRP && TARGET_CORE
-   help
- This is the IBM POWER Virtual SCSI Target Server
+   tristate "IBM Virtual SCSI Server support"
+   depends on PPC_PSERIES && SCSI_SRP && TARGET_CORE
+   help
+ This is the IBM POWER Virtual SCSI Target Server
+ This driver uses the SRP protocol for communication betwen servers
+ guest and/or the host that run on the same server.
+ More information on VSCSI protocol can be found at www.power.org
 
-  The userspace component needed to initialize the driver and
- documentation can be found:
+ The userspace component needed to initialize the driver and
+ documentation can be found:
 
-  https://github.com/powervm/ibmvscsis
+ https://github.com/powervm/ibmvscsis
 
-  To compile this driver as a module, choose M here: the
- module will be called ibmvstgt.
+ To compile this driver as a module, choose M here: the
+ module will be called ibmvstgt.
 
 config SCSI_IBMVFC
tristate "IBM Virtual FC support"
@@ -1743,14 +1746,17 @@ config SCSI_PM8001
  based host adapters.
 
 config SCSI_SRP
-   tristate "SCSI RDMA Protocol helper library"
-   depends on SCSI && PCI
-   help
- This scsi srp module is a library for ibmvscsi target driver.
+   tristate "SCSI RDMA Protocol helper library"
+   depends on SCSI && PCI
+   help
+ This SCSI SRP module is a library for ibmvscsi target driver.
+ This module can only be used by SRP drivers that utilize synchronous
+ data transfers and not by SRP drivers that use asynchronous.
+
  If you wish to use SRP target drivers, say Y.
 
- To compile this driver as a module, choose M here. The module will
- be called libsrp.
+ To compile this driver as a module, choose M here. The module will
+ be called libsrp.
 
 config SCSI_BFA_FC
tristate "Brocade BFA Fibre Channel Support"
diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index 8692dd4..9dfa4da 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -127,9 +127,9 @@ obj-$(CONFIG_SCSI_LASI700)  += 53c700.o lasi700.o
 obj-$(CONFIG_SCSI_SNI_53C710)  += 53c700.o sni_53c710.o
 obj-$(CONFIG_SCSI_NSP32)   += nsp32.o
 obj-$(CONFIG_SCSI_IPR) += ipr.o
-obj-$(CONFIG_SCSI_SRP)  += libsrp.o
+obj-$(CONFIG_SCSI_SRP) += ibmvscsi/
 obj-$(CONFIG_SCSI_IBMVSCSI)+= ibmvscsi/
-obj-$(CONFIG_SCSI_IBMVSCSIS)+= ibmvscsi/
+obj-$(CONFIG_SCSI_IBMVSCSIS)   += ibmvscsi/
 obj-$(CONFIG_SCSI_IBMVFC)  += ibmvscsi/
 obj-$(CONFIG_SCSI_HPTIOP)  += hptiop.o
 obj-$(CONFIG_SCSI_STEX)+= stex.o
diff --git a/drivers/scsi/ibmvscsi/Makefile b/drivers/scsi/ibmvscsi/Makefile
index b241a567..e7a1663 100644
--- a/drivers/scsi/ibmvscsi/Makefile
+++ b/drivers/scsi/ibmvscsi/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_SCSI_IBMVSCSI)+= ibmvscsi.o
-obj-$(CONFIG_SCSI_IBMVSCSIS)+= ibmvscsis.o
+obj-$(CONFIG_SCSI_IBMVSCSIS)   += ibmvscsis.o
 obj-$(CONFIG_SCSI_IBMVFC)  += ibmvfc.o
+obj-$(CONFIG_SCSI_SRP)  += libsrp.o
diff --git a/drivers/scsi/ibmvscsi/ibmvscsis.c 
b/drivers/scsi/ibmvscsi

[PATCH 3/3] ibmvscsis: clean up functions

2016-05-25 Thread Bryant G. Ly
From: bryantly 

This patch removes forward declarations and re-organizes the
functions within the driver. This patch also fixes MAINTAINERS
for ibmvscsis.

Signed-off-by: bryantly 
---
 MAINTAINERS   |4 +-
 drivers/scsi/ibmvscsi/ibmvscsis.c | 2709 ++---
 2 files changed, 1315 insertions(+), 1398 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index a11905c..3f09a15 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5388,8 +5388,8 @@ L:target-de...@vger.kernel.org
 S: Supported
 F: drivers/scsi/ibmvscsi/ibmvscsis.c
 F: drivers/scsi/ibmvscsi/ibmvscsis.h
-F: drivers/scsi/libsrp.h
-F: drivers/scsi/libsrp.c
+F: drivers/scsi/ibmvscsi/libsrp.c
+F: drivers/scsi/ibmvscsi/libsrp.h
 
 IBM Power Virtual FC Device Drivers
 M: Tyrel Datwyler 
diff --git a/drivers/scsi/ibmvscsi/ibmvscsis.c 
b/drivers/scsi/ibmvscsi/ibmvscsis.c
index 6f9e9ba..610f4f5 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsis.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsis.c
@@ -62,88 +62,17 @@
 
 #define SRP_RSP_SENSE_DATA_LEN 18
 
+static const char ibmvscsis_driver_name[] = "ibmvscsis";
+static char system_id[SYS_ID_NAME_LEN] = "";
+static char partition_name[PARTITION_NAMELEN] = "UNKNOWN";
+static unsigned int partition_number = -1;
+
 static struct workqueue_struct *vtgtd;
 static unsigned max_vdma_size = MAX_H_COPY_RDMA;
 
 static DEFINE_SPINLOCK(ibmvscsis_dev_lock);
 static LIST_HEAD(ibmvscsis_dev_list);
 
-static int ibmvscsis_probe(struct vio_dev *vdev,
-  const struct vio_device_id *id);
-static void ibmvscsis_dev_release(struct device *dev);
-static void ibmvscsis_modify_rep_luns(struct se_cmd *se_cmd);
-static void ibmvscsis_modify_std_inquiry(struct se_cmd *se_cmd);
-static int read_dma_window(struct vio_dev *vdev,
-  struct ibmvscsis_adapter *adapter);
-static char *ibmvscsis_get_fabric_name(void);
-static char *ibmvscsis_get_fabric_wwn(struct se_portal_group *se_tpg);
-static u16 ibmvscsis_get_tag(struct se_portal_group *se_tpg);
-static u32 ibmvscsis_get_default_depth(struct se_portal_group *se_tpg);
-static int ibmvscsis_check_true(struct se_portal_group *se_tpg);
-static int ibmvscsis_check_false(struct se_portal_group *se_tpg);
-static u32 ibmvscsis_tpg_get_inst_index(struct se_portal_group *se_tpg);
-static int ibmvscsis_check_stop_free(struct se_cmd *se_cmd);
-static void ibmvscsis_release_cmd(struct se_cmd *se_cmd);
-static int ibmvscsis_shutdown_session(struct se_session *se_sess);
-static void ibmvscsis_close_session(struct se_session *se_sess);
-static u32 ibmvscsis_sess_get_index(struct se_session *se_sess);
-static int ibmvscsis_write_pending(struct se_cmd *se_cmd);
-static int ibmvscsis_write_pending_status(struct se_cmd *se_cmd);
-static void ibmvscsis_set_default_node_attrs(struct se_node_acl *nacl);
-static int ibmvscsis_get_cmd_state(struct se_cmd *se_cmd);
-static int ibmvscsis_queue_data_in(struct se_cmd *se_cmd);
-static int ibmvscsis_queue_status(struct se_cmd *se_cmd);
-static void ibmvscsis_queue_tm_rsp(struct se_cmd *se_cmd);
-static void ibmvscsis_aborted_task(struct se_cmd *se_cmd);
-static struct se_wwn *ibmvscsis_make_tport(struct target_fabric_configfs *tf,
-  struct config_group *group,
-  const char *name);
-static void ibmvscsis_drop_tport(struct se_wwn *wwn);
-static struct se_portal_group *ibmvscsis_make_tpg(struct se_wwn *wwn,
- struct config_group *group,
- const char *name);
-static void ibmvscsis_drop_tpg(struct se_portal_group *se_tpg);
-static int ibmvscsis_remove(struct vio_dev *vdev);
-static ssize_t system_id_show(struct device *dev,
- struct device_attribute *attr,
- char *buf);
-static ssize_t partition_number_show(struct device *dev,
-struct device_attribute *attr,
-char *buf);
-static ssize_t unit_address_show(struct device *dev,
-struct device_attribute *attr, char *buf);
-static int get_system_info(void);
-static irqreturn_t ibmvscsis_interrupt(int dummy, void *data);
-static int process_srp_iu(struct iu_entry *iue);
-static void process_iu(struct viosrp_crq *crq,
-  struct ibmvscsis_adapter *adapter);
-static void process_crq(struct viosrp_crq *crq,
-   struct ibmvscsis_adapter *adapter);
-static void handle_crq(struct work_struct *work);
-static int ibmvscsis_reset_crq_queue(struct ibmvscsis_adapter *adapter);
-static void crq_queue_destroy(struct ibmvscsis_adapter *adapter);
-static inline struct viosrp_crq *next_crq(struct crq_queue *queue);
-static int send_iu(struct iu_entry *iue, u64 length, u8 format);
-static int send_adapter_info(struct iu_entry *iue,
- 

IBM VSCSI Target Driver Initial Patch Sets

2016-05-25 Thread Bryant G. Ly
This patch series addresses comments by Joe with runing checkpatch with a
--strict. It cleans up all the misc styling and removes all the forward
declarations.

The patch also addresses all of Bart's comments besides the preallocation of
buffers before IO starts and the merging of unpack_lun with scsiluntoint.
Those will come in later patch sets.

[PATCH 2/3] ibmvscsis: Addressing Bart's comments
[PATCH 3/3] ibmvscsis: clean up functions

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] USB: uas: Fix slave queue_depth not being set

2016-05-25 Thread Tom Yan
What if a UAS bridge requires specific SCSI command (e.g. UNMAP) to be
issued unqueued/untagged? Would track_queue_depth help?

On 25 May 2016 at 19:04, Hans de Goede  wrote:
> Hi,
>
>
> On 24-05-16 14:44, James Bottomley wrote:
>>
>> On Tue, 2016-05-24 at 08:53 +0200, Hans de Goede wrote:
>>>
>>> Hi,
>>>
>>> On 23-05-16 19:36, James Bottomley wrote:

 On Mon, 2016-05-23 at 13:49 +0200, Hans de Goede wrote:
>
> Commit 198de51dbc34 ("USB: uas: Limit qdepth at the scsi-host
> level")
> removed the scsi_change_queue_depth() call from
> uas_slave_configure() assuming that the slave would inherit the
> host's queue_depth, which that commit sets to the same value.
>
> This is incorrect, without the scsi_change_queue_depth() call the
> slave's queue_depth defaults to 1, introducing a performance
> regression.
>
> This commit restores the call, fixing the performance regression.
>
> Cc: sta...@vger.kernel.org
> Fixes: 198de51dbc34 ("USB: uas: Limit qdepth at the scsi-host
> level")
> Reported-by: Tom Yan 
> Signed-off-by: Hans de Goede 
> ---
>  drivers/usb/storage/uas.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/usb/storage/uas.c
> b/drivers/usb/storage/uas.c
> index 16bc679..ecc7d4b 100644
> --- a/drivers/usb/storage/uas.c
> +++ b/drivers/usb/storage/uas.c
> @@ -835,6 +835,7 @@ static int uas_slave_configure(struct
> scsi_device
> *sdev)
> if (devinfo->flags & US_FL_BROKEN_FUA)
> sdev->broken_fua = 1;
>
> +   scsi_change_queue_depth(sdev, devinfo->qdepth - 2);


 Are you sure about this?  For spinning rust, experiments imply that
 the optimal queue depth per device is somewhere between 2 and 4.
  Obviously that's not true for SSDs, so it depends on your use
 case.  Plus, for ATA NCQ devices (which I believe most UAS is
 bridged to) you have a maximum NCQ depth of 31.
>>>
>>>
>>> So this value is the same as host.can_queue, and is what uas has
>>> always used, basically this says it is ok to queue as much as the
>>> bridge can handle. We've seen a few rare multi-lun devices, but
>>> typically almost all uas devices have one lun, what I really want to
>>> do here is give a maximum and let say the sd driver lower that if it
>>> is sub-optimal.
>>
>>
>> If that's what you actually want, you should be setting sdev
>> ->max_queue_depth and .track_queue_depth = 1 in the template.
>
>
> Hmm, I've been looking into this, but that does not seem right.
>
> max_queue_depth is never set by drivers, it is set to sdev->queue_depth
> in scsi_scan.c: scsi_add_lun() after calling the host drivers'
> slave_configure callback. So it seems that the right way to set
> max_queue_depth is to actually set queue_depth, or iow restore the
> call to scsi_change_queue_depth() as the patch we're discussing does.
>
> As for track_queue_depth = 1 that seems to be only set by some drivers
> under drivers/scsi and is never set by any drivers under drivers/ata,
> and we're almost exclusively dealing with sata disks in uas. It seems
> that track_queue_depth = 1 is mostly used for iscsi and fibre channel
> iow enterprise class storage stuff, so looking at existing drivers
> usage of this flag using it does not seem a good idea.
>
> Anyways this patch is a (partial) revert of a previous bug-fix patch
> (which has also gone to stable) so for now I would really like to
> move forward with this patch and get it upstream and in stable
> to fix the performance regressions people are seeing caused by
> me wrongly dropping the scsi_change_queue_depth() call.
>
> Then if we want to tweak the queuing further we can do that
> on top of this fix, and put that in next and let it get some testing
> first.
>
> So are you ok with moving this patch forward ?
>
> Regards,
>
> Hans
>
>
>
>
>> You might also need to add calls to scsi_track_queue_full() but only if
>> the devices aren't responding QUEUE_FULL correctly.
>>
>> James
>>
>>> Also notice that uas is used a lot with ssd-s, that is mostly what
>>> I want to optimize for, but it is definitely also used with spinning
>>> rust.
>>>
>>> And yes almost all uas devices are bridged sata devices (this may
>>> change in the near future though, with ssd-s specifically designed
>>> for usb-3 attachment, although sofar these all seem to use an
>>> embbeded sata bridge), so from this pov an upper limit of 31 makes
>>> sense, I guess, but I've not seen any bridges which actually do more
>>> then 32 streams anyways.
>>>
>>> Still this is a bug-fix patch, essentially a partial revert, to
>>> address performance regressions, so lets get this out as is and take
>>> our time to come up with some tweaks (if necessary) for the say 4.8.
>>>
 There's a good reason why you don't want a queue deeper than you
 can handle: it tends to interfere with writeback because you build
 up a lot of pending I/O in t

Re: [PATCH] USB: uas: Fix slave queue_depth not being set

2016-05-25 Thread Hans de Goede

Hi,

On 24-05-16 14:44, James Bottomley wrote:

On Tue, 2016-05-24 at 08:53 +0200, Hans de Goede wrote:

Hi,

On 23-05-16 19:36, James Bottomley wrote:

On Mon, 2016-05-23 at 13:49 +0200, Hans de Goede wrote:

Commit 198de51dbc34 ("USB: uas: Limit qdepth at the scsi-host
level")
removed the scsi_change_queue_depth() call from
uas_slave_configure() assuming that the slave would inherit the
host's queue_depth, which that commit sets to the same value.

This is incorrect, without the scsi_change_queue_depth() call the
slave's queue_depth defaults to 1, introducing a performance
regression.

This commit restores the call, fixing the performance regression.

Cc: sta...@vger.kernel.org
Fixes: 198de51dbc34 ("USB: uas: Limit qdepth at the scsi-host
level")
Reported-by: Tom Yan 
Signed-off-by: Hans de Goede 
---
 drivers/usb/storage/uas.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/storage/uas.c
b/drivers/usb/storage/uas.c
index 16bc679..ecc7d4b 100644
--- a/drivers/usb/storage/uas.c
+++ b/drivers/usb/storage/uas.c
@@ -835,6 +835,7 @@ static int uas_slave_configure(struct
scsi_device
*sdev)
if (devinfo->flags & US_FL_BROKEN_FUA)
sdev->broken_fua = 1;

+   scsi_change_queue_depth(sdev, devinfo->qdepth - 2);


Are you sure about this?  For spinning rust, experiments imply that
the optimal queue depth per device is somewhere between 2 and 4.
 Obviously that's not true for SSDs, so it depends on your use
case.  Plus, for ATA NCQ devices (which I believe most UAS is
bridged to) you have a maximum NCQ depth of 31.


So this value is the same as host.can_queue, and is what uas has
always used, basically this says it is ok to queue as much as the
bridge can handle. We've seen a few rare multi-lun devices, but
typically almost all uas devices have one lun, what I really want to
do here is give a maximum and let say the sd driver lower that if it
is sub-optimal.


If that's what you actually want, you should be setting sdev
->max_queue_depth and .track_queue_depth = 1 in the template.


Hmm, I've been looking into this, but that does not seem right.

max_queue_depth is never set by drivers, it is set to sdev->queue_depth
in scsi_scan.c: scsi_add_lun() after calling the host drivers'
slave_configure callback. So it seems that the right way to set
max_queue_depth is to actually set queue_depth, or iow restore the
call to scsi_change_queue_depth() as the patch we're discussing does.

As for track_queue_depth = 1 that seems to be only set by some drivers
under drivers/scsi and is never set by any drivers under drivers/ata,
and we're almost exclusively dealing with sata disks in uas. It seems
that track_queue_depth = 1 is mostly used for iscsi and fibre channel
iow enterprise class storage stuff, so looking at existing drivers
usage of this flag using it does not seem a good idea.

Anyways this patch is a (partial) revert of a previous bug-fix patch
(which has also gone to stable) so for now I would really like to
move forward with this patch and get it upstream and in stable
to fix the performance regressions people are seeing caused by
me wrongly dropping the scsi_change_queue_depth() call.

Then if we want to tweak the queuing further we can do that
on top of this fix, and put that in next and let it get some testing
first.

So are you ok with moving this patch forward ?

Regards,

Hans




You might also need to add calls to scsi_track_queue_full() but only if
the devices aren't responding QUEUE_FULL correctly.

James


Also notice that uas is used a lot with ssd-s, that is mostly what
I want to optimize for, but it is definitely also used with spinning
rust.

And yes almost all uas devices are bridged sata devices (this may
change in the near future though, with ssd-s specifically designed
for usb-3 attachment, although sofar these all seem to use an
embbeded sata bridge), so from this pov an upper limit of 31 makes
sense, I guess, but I've not seen any bridges which actually do more
then 32 streams anyways.

Still this is a bug-fix patch, essentially a partial revert, to
address performance regressions, so lets get this out as is and take
our time to come up with some tweaks (if necessary) for the say 4.8.


There's a good reason why you don't want a queue deeper than you
can handle: it tends to interfere with writeback because you build
up a lot of pending I/O in the queue which can't be issued (it's
very similar to why bufferbloat is a problem in networks).  In
theory, as long as your devices return the correct indicator
(QUEUE_FULL status), we'll handle most of this in the mid-layer by
plugging the block queue, but given what I've seen from UAS
devices, that's less than probable.


So any smart ideas how to be nicer to spinning rust, without
negatively impacting ssd-s? As said if I've to choice I think we
should chose optimizing ssd-s, as that is where uas is used a lot
(although usb  attached harddisks are switching over to it too).

Note I just checked the 1TB sata/ahci 

mpt3sas: memory allocation for firmware upgrade DMA memory question

2016-05-25 Thread Johannes Thumshirn
Hi Chaitra and Suganath,

I've got a question regarding mpt3sas' memory allocation used when doing a
firmware upgrade. Currently you're doing a pci_alloc_consitent() which tries
to allocate memory via GFP_ATOMIC. This memory then is passed as a single
element on a sg_list.

Jeff reported it returned -ENOMEM on his Server due to highly fragmented
memory.

Is it required to have the memory for the DMA operation contiguous, or can I
just allocate several non-contiguous junks of memory and map it to a sg_list?

If not, is GFP_ATMOIC really needed? I've converted the driver to use
GFP_KERNEL but I'm a bit reluctant to test below patch on real hardware to not
brick the HBA.

Thanks,
Johannes

RFC patch for GFP_KERNEL allocation, though splitting into multiple sg mapped
elements is the preferred fix here:

>From 06e63654d887df7f740dc5abcb40d441a8da7fa5 Mon Sep 17 00:00:00 2001
From: Johannes Thumshirn 
Date: Tue, 24 May 2016 17:25:59 +0200
Subject: [RFC PATCH] mpt3sas: Don't do atomic memory allocations for firmware
 update DMA

Currently mpt3sas uses pci_alloc_consistent() to allocate memory for the DMA
used to do firmware updates. pci_alloc_consistent() in turn uses GFP_ATOMIC
allocations. On a host with high memory fragmention this can lead to page
allocation failures, as the DMA buffer holds the complete firmware update and
thus can need page allocations of higher orders.

As the firmware update code path may sleep, convert allocation to a normal
kzalloc() call with GFP_KERNEL and map it to the DMA buffers.

Reported-by: Jeff Mahoney 
Signed-off-by: Johannes Thumshirn 
---
 drivers/scsi/mpt3sas/mpt3sas_ctl.c | 39 --
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_ctl.c 
b/drivers/scsi/mpt3sas/mpt3sas_ctl.c
index 7d00f09..14be3cf 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_ctl.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_ctl.c
@@ -751,8 +751,7 @@ _ctl_do_mpt_command(struct MPT3SAS_ADAPTER *ioc, struct 
mpt3_ioctl_command karg,
 
/* obtain dma-able memory for data transfer */
if (data_out_sz) /* WRITE */ {
-   data_out = pci_alloc_consistent(ioc->pdev, data_out_sz,
-   &data_out_dma);
+   data_out = kzalloc(data_out_sz, GFP_KERNEL);
if (!data_out) {
pr_err("failure at %s:%d/%s()!\n", __FILE__,
__LINE__, __func__);
@@ -760,6 +759,14 @@ _ctl_do_mpt_command(struct MPT3SAS_ADAPTER *ioc, struct 
mpt3_ioctl_command karg,
mpt3sas_base_free_smid(ioc, smid);
goto out;
}
+   data_out_dma = pci_map_single(ioc->pdev, data_out,
+ data_out_sz, PCI_DMA_TODEVICE);
+   if (dma_mapping_error(&ioc->pdev->dev, data_out_dma)) {
+   ret = -EINVAL;
+   mpt3sas_base_free_smid(ioc, smid);
+   goto out_free_data_out;
+   }
+
if (copy_from_user(data_out, karg.data_out_buf_ptr,
data_out_sz)) {
pr_err("failure at %s:%d/%s()!\n", __FILE__,
@@ -771,8 +778,7 @@ _ctl_do_mpt_command(struct MPT3SAS_ADAPTER *ioc, struct 
mpt3_ioctl_command karg,
}
 
if (data_in_sz) /* READ */ {
-   data_in = pci_alloc_consistent(ioc->pdev, data_in_sz,
-   &data_in_dma);
+   data_in = kzalloc(data_in_sz, GFP_KERNEL);
if (!data_in) {
pr_err("failure at %s:%d/%s()!\n", __FILE__,
__LINE__, __func__);
@@ -780,6 +786,13 @@ _ctl_do_mpt_command(struct MPT3SAS_ADAPTER *ioc, struct 
mpt3_ioctl_command karg,
mpt3sas_base_free_smid(ioc, smid);
goto out;
}
+   data_in_dma = pci_map_single(ioc->pdev, data_in,
+data_in_sz, PCI_DMA_FROMDEVICE);
+   if (dma_mapping_error(&ioc->pdev->dev, data_in_dma)) {
+   ret = -EINVAL;
+   mpt3sas_base_free_smid(ioc, smid);
+   goto out_free_data_in;
+   }
}
 
psge = (void *)request + (karg.data_sge_offset*4);
@@ -1013,13 +1026,19 @@ _ctl_do_mpt_command(struct MPT3SAS_ADAPTER *ioc, struct 
mpt3_ioctl_command karg,
  out:
 
/* free memory associated with sg buffers */
-   if (data_in)
-   pci_free_consistent(ioc->pdev, data_in_sz, data_in,
-   data_in_dma);
+   if (data_in) {
+   pci_unmap_single(ioc->pdev, data_in_dma,
+data_in_sz, PCI_DMA_TODEVICE);
+out_free_data_in:
+   kfree(data_in);
+   }
 
-   if (data_out)
-   pci_free_consistent(ioc->pdev, data_out_sz, data_out,
-   data_out_dma);
+   if (data_out

Re: [PATCHv3 0/2] target: make location of /var/targets configurable

2016-05-25 Thread Zhu Lingshan

Hi experts,

I think these patches are great, and I am ready to help in user space.

Thanks,
BR
Zhu Lingshan

On 05/09/2016 09:17 AM, Lee Duncan wrote:

On 04/14/2016 06:18 PM, Lee Duncan wrote:

These patches make the location of "/var/target" configurable,
though it still defauls to "/var/target".

This "target database directory" can only be changed
after the target_core_mod loads but before any
fabric drivers are loaded, and must be the pathname
of an existing directory.

This configuration is accomplished via the configfs
top-level target attribute "dbroot", i.e. dumping
out "/sys/kernel/config/target/dbroot" will normally
return "/var/target". Writing to this attribute
changes the loation where the kernel looks for the
target database.

The first patch creates this configurable value for
the "dbroot", and the second patch modifies users
of this directory to use this new attribute.

Changes from v2:
  * Add locking around access to target driver list

Changes from v1:
  * Only allow changing target DB root before it
can be used by others
  * Validate that new DB root is a valid directory

Lee Duncan (2):
   target: make target db location configurable
   target: use new "dbroot" target attribute

  drivers/target/target_core_alua.c |  6 ++--
  drivers/target/target_core_configfs.c | 62 +++
  drivers/target/target_core_internal.h |  6 
  drivers/target/target_core_pr.c   |  2 +-
  4 files changed, 72 insertions(+), 4 deletions(-)


Ping?


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] iscsi initiator: support eh_async_device_reset_handler

2016-05-25 Thread mchristi
From: Mike Christie 

Like a lot of drivers, iscsi already supported SG_SCSI_RESET_DEVICE
and did not need the host stopped, so supporting
eh_async_device_reset_handler just required some renames, and
changing the callout argument from a scsi_cmnd to a scsi_device.

Signed-off-by: Mike Christie 
---
 drivers/infiniband/ulp/iser/iscsi_iser.c |  2 +-
 drivers/scsi/be2iscsi/be_main.c  | 10 +-
 drivers/scsi/bnx2i/bnx2i_iscsi.c |  2 +-
 drivers/scsi/cxgbi/cxgb3i/cxgb3i.c   |  2 +-
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c   |  2 +-
 drivers/scsi/iscsi_tcp.c |  2 +-
 drivers/scsi/libiscsi.c  | 16 
 include/scsi/libiscsi.h  |  2 +-
 8 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c 
b/drivers/infiniband/ulp/iser/iscsi_iser.c
index 80b6bed..244e330 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -992,7 +992,7 @@ static struct scsi_host_template iscsi_iser_sht = {
.max_sectors= ISER_DEF_MAX_SECTORS,
.cmd_per_lun= ISER_DEF_CMD_PER_LUN,
.eh_abort_handler   = iscsi_eh_abort,
-   .eh_device_reset_handler= iscsi_eh_device_reset,
+   .eh_async_device_reset_handler= iscsi_eh_device_reset,
.eh_target_reset_handler = iscsi_eh_recover_target,
.target_alloc   = iscsi_target_alloc,
.use_clustering = ENABLE_CLUSTERING,
diff --git a/drivers/scsi/be2iscsi/be_main.c b/drivers/scsi/be2iscsi/be_main.c
index f05e773..fd1dd20 100644
--- a/drivers/scsi/be2iscsi/be_main.c
+++ b/drivers/scsi/be2iscsi/be_main.c
@@ -294,7 +294,7 @@ static int beiscsi_eh_abort(struct scsi_cmnd *sc)
return iscsi_eh_abort(sc);
 }
 
-static int beiscsi_eh_device_reset(struct scsi_cmnd *sc)
+static int beiscsi_eh_device_reset(struct scsi_device *sdev)
 {
struct iscsi_task *abrt_task;
struct beiscsi_io_task *abrt_io_task;
@@ -309,7 +309,7 @@ static int beiscsi_eh_device_reset(struct scsi_cmnd *sc)
int rc;
 
/* invalidate iocbs */
-   cls_session = starget_to_session(scsi_target(sc->device));
+   cls_session = starget_to_session(scsi_target(sdev));
session = cls_session->dd_data;
spin_lock_bh(&session->frwd_lock);
if (!session->leadconn || session->state != ISCSI_STATE_LOGGED_IN) {
@@ -329,7 +329,7 @@ static int beiscsi_eh_device_reset(struct scsi_cmnd *sc)
if (!abrt_task->sc || abrt_task->state == ISCSI_TASK_FREE)
continue;
 
-   if (sc->device->lun != abrt_task->sc->device->lun)
+   if (sdev->lun != abrt_task->sc->device->lun)
continue;
 
/* Invalidate WRB Posted for this Task */
@@ -371,7 +371,7 @@ static int beiscsi_eh_device_reset(struct scsi_cmnd *sc)
if (rc != -EBUSY)
pci_free_consistent(phba->ctrl.pdev, nonemb_cmd.size,
nonemb_cmd.va, nonemb_cmd.dma);
-   return iscsi_eh_device_reset(sc);
+   return iscsi_eh_device_reset(sdev);
 }
 
 static ssize_t beiscsi_show_boot_tgt_info(void *data, int type, char *buf)
@@ -560,7 +560,7 @@ static struct scsi_host_template beiscsi_sht = {
.slave_configure = beiscsi_slave_configure,
.target_alloc = iscsi_target_alloc,
.eh_abort_handler = beiscsi_eh_abort,
-   .eh_device_reset_handler = beiscsi_eh_device_reset,
+   .eh_async_device_reset_handler = beiscsi_eh_device_reset,
.eh_target_reset_handler = iscsi_eh_session_reset,
.shost_attrs = beiscsi_attrs,
.sg_tablesize = BEISCSI_SGLIST_ELEMENTS,
diff --git a/drivers/scsi/bnx2i/bnx2i_iscsi.c b/drivers/scsi/bnx2i/bnx2i_iscsi.c
index 7289437..e73ee8d 100644
--- a/drivers/scsi/bnx2i/bnx2i_iscsi.c
+++ b/drivers/scsi/bnx2i/bnx2i_iscsi.c
@@ -2260,7 +2260,7 @@ static struct scsi_host_template bnx2i_host_template = {
.proc_name  = "bnx2i",
.queuecommand   = iscsi_queuecommand,
.eh_abort_handler   = iscsi_eh_abort,
-   .eh_device_reset_handler = iscsi_eh_device_reset,
+   .eh_async_device_reset_handler = iscsi_eh_device_reset,
.eh_target_reset_handler = iscsi_eh_recover_target,
.change_queue_depth = scsi_change_queue_depth,
.target_alloc   = iscsi_target_alloc,
diff --git a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c 
b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
index e22a268..f70e9f5 100644
--- a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
+++ b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
@@ -91,7 +91,7 @@ static struct scsi_host_template cxgb3i_host_template = {
.max_sectors= 0x,
.cmd_per_lun= ISCSI_DEF_CMD_PER_LUN,
.eh_abort_handler = iscsi_eh_abort,
-   .eh_device_reset_handler = iscsi_eh_device_reset,
+   .eh_async_device_reset_handler = iscsi_eh_device_reset,
.eh_target_reset_handl

[PATCH 1/5] blk mq: take ref to q when running it

2016-05-25 Thread mchristi
From: Mike Christie 

If the __blk_mq_run_hw_queue is a result of a async run or retry
then we do not have a reference to the queue. At this time,
blk_cleanup_queue could begin tearing it down while the queue_rq
function is being run.

This patch just has us do a blk_queue_enter call.

Signed-off-by: Mike Christie 
---
 block/blk-mq.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7df9c92..5958a02 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -738,8 +738,13 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx 
*hctx)
 
WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask));
 
-   if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
+   if (blk_queue_enter(q, false)) {
+   blk_mq_run_hw_queue(hctx, true);
return;
+   }
+
+   if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
+   goto exit_queue;
 
hctx->run++;
 
@@ -832,6 +837,9 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx 
*hctx)
 **/
blk_mq_run_hw_queue(hctx, true);
}
+
+exit_queue:
+   blk_queue_exit(q);
 }
 
 /*
-- 
2.7.2

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] target: call queue reset if supported

2016-05-25 Thread mchristi
From: Mike Christie 

Instead of waiting for commands, map the lun reset operation
to a queue reset.

We do not check the result of blk_reset_queue because if
it works then we need to wait for the bio/request completions
and if it failed we might as wait and hope like we did before.

Signed-off-by: Mike Christie 
---
 drivers/target/target_core_file.c| 12 
 drivers/target/target_core_iblock.c  |  8 
 drivers/target/target_core_pscsi.c   |  8 
 drivers/target/target_core_tmr.c |  3 +++
 include/target/target_core_backend.h |  1 +
 5 files changed, 32 insertions(+)

diff --git a/drivers/target/target_core_file.c 
b/drivers/target/target_core_file.c
index 75f0f08..d6af1f2 100644
--- a/drivers/target/target_core_file.c
+++ b/drivers/target/target_core_file.c
@@ -97,6 +97,17 @@ static struct se_device *fd_alloc_device(struct se_hba *hba, 
const char *name)
return &fd_dev->dev;
 }
 
+static int fd_reset_device(struct se_device *dev)
+{
+   struct fd_dev *fd_dev = FD_DEV(dev);
+   struct inode *inode;
+
+   inode = fd_dev->fd_file->f_mapping->host;
+   if (!S_ISBLK(inode->i_mode))
+   return -EOPNOTSUPP;
+   return blk_reset_queue(bdev_get_queue(inode->i_bdev));
+}
+
 static int fd_configure_device(struct se_device *dev)
 {
struct fd_dev *fd_dev = FD_DEV(dev);
@@ -813,6 +824,7 @@ static const struct target_backend_ops fileio_ops = {
.attach_hba = fd_attach_hba,
.detach_hba = fd_detach_hba,
.alloc_device   = fd_alloc_device,
+   .reset_device   = fd_reset_device,
.configure_device   = fd_configure_device,
.free_device= fd_free_device,
.parse_cdb  = fd_parse_cdb,
diff --git a/drivers/target/target_core_iblock.c 
b/drivers/target/target_core_iblock.c
index 7c4efb4..0a7dd59 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -79,6 +79,13 @@ static struct se_device *iblock_alloc_device(struct se_hba 
*hba, const char *nam
return &ib_dev->dev;
 }
 
+static int iblock_reset_device(struct se_device *dev)
+{
+   struct iblock_dev *ib_dev = IBLOCK_DEV(dev);
+
+   return blk_reset_queue(bdev_get_queue(ib_dev->ibd_bd));
+}
+
 static int iblock_configure_device(struct se_device *dev)
 {
struct iblock_dev *ib_dev = IBLOCK_DEV(dev);
@@ -848,6 +855,7 @@ static const struct target_backend_ops iblock_ops = {
.detach_hba = iblock_detach_hba,
.alloc_device   = iblock_alloc_device,
.configure_device   = iblock_configure_device,
+   .reset_device   = iblock_reset_device,
.free_device= iblock_free_device,
.parse_cdb  = iblock_parse_cdb,
.set_configfs_dev_params = iblock_set_configfs_dev_params,
diff --git a/drivers/target/target_core_pscsi.c 
b/drivers/target/target_core_pscsi.c
index de18790..fa0505b 100644
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -455,6 +455,13 @@ static int pscsi_create_type_other(struct se_device *dev,
return 0;
 }
 
+static int pscsi_reset_device(struct se_device *dev)
+{
+   struct pscsi_dev_virt *pdv = PSCSI_DEV(dev);
+
+   return blk_reset_queue(pdv->pdv_sd->request_queue);
+}
+
 static int pscsi_configure_device(struct se_device *dev)
 {
struct se_hba *hba = dev->se_hba;
@@ -1131,6 +1138,7 @@ static const struct target_backend_ops pscsi_ops = {
.detach_hba = pscsi_detach_hba,
.pmode_enable_hba   = pscsi_pmode_enable_hba,
.alloc_device   = pscsi_alloc_device,
+   .reset_device   = pscsi_reset_device,
.configure_device   = pscsi_configure_device,
.free_device= pscsi_free_device,
.transport_complete = pscsi_transport_complete,
diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
index 4f229e7..609ec53 100644
--- a/drivers/target/target_core_tmr.c
+++ b/drivers/target/target_core_tmr.c
@@ -431,6 +431,9 @@ int core_tmr_lun_reset(
(preempt_and_abort_list) ? "Preempt" : "TMR",
dev->transport->name, tas);
 
+   if (dev->transport->reset_device(dev))
+   dev->transport->reset_device(dev);
+
core_tmr_drain_tmr_list(dev, tmr, preempt_and_abort_list);
core_tmr_drain_state_list(dev, prout_cmd, tmr_sess, tas,
preempt_and_abort_list);
diff --git a/include/target/target_core_backend.h 
b/include/target/target_core_backend.h
index 28ee5c2..a055e4f 100644
--- a/include/target/target_core_backend.h
+++ b/include/target/target_core_backend.h
@@ -17,6 +17,7 @@ struct target_backend_ops {
 
struct se_device *(*alloc_device)(struct se_hba *, const char *);
int (*configure_device)(struct se_device *);
+   int (*reset_device)(struct se_device *);
   

[PATCH 2/5] block: add queue reset support

2016-05-25 Thread mchristi
From: Mike Christie 

This adds a request_queue/mq_ops callout which when called
should force the completion/failure of requests that have been
dequeued by the driver. Requests can be completed normally
or should be failed with -EINTR.

On success the reset callout should complete/fail dequeued
requests and then return BLK_EH_HANDLED.

If the reset callout fails, it should return BLK_EH_NOT_HANDLED.

Signed-off-by: Mike Christie 
---
 block/blk-core.c   |  8 ++
 block/blk-settings.c   |  6 +
 block/blk-timeout.c| 68 ++
 include/linux/blk-mq.h |  5 
 include/linux/blkdev.h |  8 ++
 5 files changed, 95 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 2475b1c7..2aeac9c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -575,6 +575,9 @@ void blk_cleanup_queue(struct request_queue *q)
if (!q->mq_ops)
__blk_drain_queue(q, true);
queue_flag_set(QUEUE_FLAG_DEAD, q);
+
+   /* wait for resets that might have started as result of drain */
+   wait_event_lock_irq(q->reset_wq, !blk_queue_resetting(q), *lock);
spin_unlock_irq(lock);
 
/* for synchronous bio-based driver finish in-flight integrity i/o */
@@ -728,6 +731,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
 
kobject_init(&q->kobj, &blk_queue_ktype);
 
+   init_waitqueue_head(&q->reset_wq);
mutex_init(&q->sysfs_lock);
spin_lock_init(&q->__queue_lock);
 
@@ -850,6 +854,7 @@ blk_init_allocated_queue(struct request_queue *q, 
request_fn_proc *rfn,
 
INIT_WORK(&q->timeout_work, blk_timeout_work);
q->request_fn   = rfn;
+   q->reset_fn = NULL,
q->prep_rq_fn   = NULL;
q->unprep_rq_fn = NULL;
q->queue_flags  |= QUEUE_FLAG_DEFAULT;
@@ -2619,6 +2624,9 @@ bool blk_update_request(struct request *req, int error, 
unsigned int nr_bytes)
case -ENODATA:
error_type = "critical medium";
break;
+   case -EINTR:
+   error_type = "critical command";
+   break;
case -EIO:
default:
error_type = "I/O";
diff --git a/block/blk-settings.c b/block/blk-settings.c
index f679ae1..1d529ba 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -71,6 +71,12 @@ void blk_queue_rq_timed_out(struct request_queue *q, 
rq_timed_out_fn *fn)
 }
 EXPORT_SYMBOL_GPL(blk_queue_rq_timed_out);
 
+void blk_queue_reset(struct request_queue *q, reset_fn *fn)
+{
+   q->reset_fn = fn;
+}
+EXPORT_SYMBOL_GPL(blk_queue_reset);
+
 void blk_queue_lld_busy(struct request_queue *q, lld_busy_fn *fn)
 {
q->lld_busy_fn = fn;
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index a30441a..96b73786 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "blk.h"
 #include "blk-mq.h"
@@ -172,6 +173,73 @@ void blk_abort_request(struct request *req)
 }
 EXPORT_SYMBOL_GPL(blk_abort_request);
 
+/**
+ * blk_reset_queue - force completion of requests executing in queue
+ * @q: request queue to reset
+ *
+ * On success the driver returns BLK_EH_HANDLED from the callout and
+ * either complete requests successfully with 0 or if abnormally completed
+ * with the error code -EINTR.
+ *
+ * On failure the driver returns BLK_EH_NOT_HANDLED, and requests may still
+ * be executing.
+ */
+int blk_reset_queue(struct request_queue *q)
+{
+   enum blk_eh_timer_return eh_rc;
+   int rc;
+
+   spin_lock_irq(q->queue_lock);
+   wait_event_lock_irq(q->reset_wq,
+   !queue_flag_test_and_set(QUEUE_FLAG_RESETTING, q),
+   *q->queue_lock);
+   if (blk_queue_dead(q)) {
+   rc = -ENODEV;
+   spin_unlock_irq(q->queue_lock);
+   goto done;
+   }
+   spin_unlock_irq(q->queue_lock);
+
+   if (q->mq_ops) {
+   blk_mq_stop_hw_queues(q);
+   blk_mq_freeze_queue(q);
+
+   eh_rc = q->mq_ops->reset(q);
+
+   blk_mq_unfreeze_queue(q);
+   blk_mq_start_stopped_hw_queues(q, true);
+   } else if (q->reset_fn) {
+   spin_lock_irq(q->queue_lock);
+   blk_stop_queue(q);
+   spin_unlock_irq(q->queue_lock);
+
+   while (q->request_fn_active)
+   msleep(10);
+
+   eh_rc = q->reset_fn(q);
+
+   spin_lock_irq(q->queue_lock);
+   blk_start_queue(q);
+   spin_unlock_irq(q->queue_lock);
+   } else {
+   rc = -EOPNOTSUPP;
+   goto done;
+   }
+
+   if (eh_rc == BLK_EH_HANDLED)
+   rc = 0;
+   else
+   rc = -EIO;
+
+done:
+   spin_lock_irq(q->queue_lo

[PATCH 0/5] block/target queue/LUN reset support

2016-05-25 Thread mchristi
Currently, for SCSI LUN_RESETs the target layer can only wait on
bio/requests it has sent. This normally results in the LUN_RESET
timing out on the initiator side and that SCSI error handler
escalating to something more disruptive.

To fix this, the following patches add a block layer helper and
callout to reset a request queue which the target layer can use
to force drivers to complete/fail executing requests.

Patches were made over Jens's block tree's for-next branch.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] scsi: add new async device reset support

2016-05-25 Thread mchristi
From: Mike Christie 

Currently, if the SCSI eh runs then before we do a LUN_RESET
we stop the host. This patch and the block layer one before it
begin to add infrastructure to be able to do a LUN_RESET and
eventually do a transport level recovery without having to stop the
host.

For LUn-reset, this patch adds a new callout, eh_async_device_reset_handler,
which works similar to how LLDs handle SG_SCSI_RESET_DEVICE where the
LLD manages the commands that are affected.

eh_async_device_reset_handler:

The LLD should perform a LUN RESET that affects all commands
that have been accepted by its queuecommand callout for the
device passed in to the callout. While the reset handler is running,
queuecommand will not be running or called for the device.

Unlike eh_device_reset_handler, queuecommand may still be
called for other devices, and the LLD must call scsi_done for the
commands that have been affected by the reset.

If SUCCESS or FAST_IO_FAIL is returned, the scsi_cmnds cleaned up
must be failed with DID_ABORT.

Signed-off-by: Mike Christie 
---
 drivers/scsi/scsi_error.c | 31 ---
 drivers/scsi/scsi_lib.c   |  6 ++
 drivers/scsi/scsi_priv.h  |  1 +
 include/scsi/scsi_host.h  | 17 +
 4 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 984ddcb..cec2dfb 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -853,16 +853,41 @@ static int scsi_try_bus_device_reset(struct scsi_cmnd 
*scmd)
 {
int rtn;
struct scsi_host_template *hostt = scmd->device->host->hostt;
+   struct scsi_device *sdev = scmd->device;
 
-   if (!hostt->eh_device_reset_handler)
+   if (!hostt->eh_device_reset_handler &&
+   !hostt->eh_async_device_reset_handler)
return FAILED;
 
-   rtn = hostt->eh_device_reset_handler(scmd);
+   if (hostt->eh_device_reset_handler) {
+   rtn = hostt->eh_device_reset_handler(scmd);
+   } else {
+   if (!blk_reset_queue(sdev->request_queue))
+   rtn = SUCCESS;
+   else
+   rtn = FAILED;
+   }
if (rtn == SUCCESS)
-   __scsi_report_device_reset(scmd->device, NULL);
+   __scsi_report_device_reset(sdev, NULL);
return rtn;
 }
 
+enum blk_eh_timer_return scsi_reset_queue(struct request_queue *q)
+{
+   struct scsi_device *sdev = q->queuedata;
+   struct scsi_host_template *hostt = sdev->host->hostt;
+   int rtn;
+
+   if (!hostt->eh_async_device_reset_handler)
+   return -EOPNOTSUPP;
+
+   rtn = hostt->eh_async_device_reset_handler(sdev);
+   if (rtn == SUCCESS || rtn == FAST_IO_FAIL)
+   return BLK_EH_HANDLED;
+
+   return BLK_EH_NOT_HANDLED;
+}
+
 /**
  * scsi_try_to_abort_cmd - Ask host to abort a SCSI command
  * @hostt: SCSI driver host template
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 8106515..11374dd 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -779,6 +779,10 @@ static int __scsi_error_from_host_byte(struct scsi_cmnd 
*cmd, int result)
set_host_byte(cmd, DID_OK);
error = -ENODATA;
break;
+   case DID_ABORT:
+   set_host_byte(cmd, DID_OK);
+   error = -EINTR;
+   break;
default:
error = -EIO;
break;
@@ -2159,6 +2163,7 @@ struct request_queue *scsi_alloc_queue(struct scsi_device 
*sdev)
blk_queue_softirq_done(q, scsi_softirq_done);
blk_queue_rq_timed_out(q, scsi_times_out);
blk_queue_lld_busy(q, scsi_lld_busy);
+   blk_queue_reset(q, scsi_reset_queue);
return q;
 }
 
@@ -2167,6 +2172,7 @@ static struct blk_mq_ops scsi_mq_ops = {
.queue_rq   = scsi_queue_rq,
.complete   = scsi_softirq_done,
.timeout= scsi_timeout,
+   .reset  = scsi_reset_queue,
.init_request   = scsi_init_request,
.exit_request   = scsi_exit_request,
 };
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 27b4d0a..2e03168 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -67,6 +67,7 @@ extern void scsi_exit_devinfo(void);
 
 /* scsi_error.c */
 extern void scmd_eh_abort_handler(struct work_struct *work);
+extern enum blk_eh_timer_return scsi_reset_queue(struct request_queue *q);
 extern enum blk_eh_timer_return scsi_times_out(struct request *req);
 extern int scsi_error_handler(void *host);
 extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index fcfa3d7..532deb5 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -146,6 +146,23 @@ struct scsi_host_template {
 */
int (* eh_abort_handler)(struct scsi_cmnd *);
int (* eh_device_reset_handler)