date:20140919

[ANNOUNCE]: SCST 3.0 released

2014-09-19 Thread Vladislav Bolkhovitin


Hi All,

I'm glad to announce that SCST 3.0 has just been released. This release includes SCST 
core, target drivers iSCSI-SCST for iSCSI, including iSER support (thanks to 
Mellanox!), qla2x00t for QLogic Fibre Channel adapters, ib_srpt for InfiniBand SRP, 
fcst for FCoE and scst_local for local loopback-like access as well as SCST management 
utility scstadmin. Also separately you can download from Emulex development portal 
stable and fully functional target driver for the current generation of Emulex Fibre 
Channel adapters.


SCST is alternative SCSI target stack for Linux. SCST allows creation of sophisticated 
storage devices, which provide advanced functionality, like replication, thin 
provisioning, deduplication, high availability, automatic backup, etc. Majority of 
recently developed SAN appliances, especially higher end ones, are SCST based. It might 
well be that your favorite storage appliance running SCST in the firmware.


More info about SCST and its modules you can find on: 
http://scst.sourceforge.net

Thanks to all who made it happen!

Vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] target: Add a user-passthrough backstore

2014-09-19 Thread Andy Grover


On 09/19/2014 04:51 PM, Alex Elsayed wrote:


Not sure I follow..  How does the proposed passthrough mode prevent
someone from emulating OSDs, media changers, optical disks or anything
else in userspace with TCMU..?

The main thing that the above comments highlight is why attempting to
combine the existing in-kernel emulation with a userspace backend
providing it's own emulation can open up a number of problems with
mismatched state between the two.


It doesn't prevent it, but it _does_ put it in the exact same place as PSCSI
regarding the warnings on the wiki. It means that if someone wants to
implement (say) the optical disc or OSD CDBs, they then lose out on ALUA &co
unless they implement it themselves - which seems unnecessary and painful,
since those should really be disjoint. In particular, an OSD backed by RADOS
objects could be a very nice thing indeed, _and_ could really benefit from
ALUA.


Some possible solutions:

1) The first time TCMU sees an opcode it passes it up and notes whether 
it is handled or not. If it was handled then future cmds with that 
opcode are passed up but not retried in the kernel. If it was not 
handled then it and all future commands with that opcode are handled by 
the kernel and not passed up.


2) Same as #1 but TCMU operates on SCSI spec granularity - e.g. handling 
any SSC opcode means userspace must handle all SSC commands.


3) Like #2 but define opcode groupings that must all be implemented 
together, independent of the specifications.


4) Have passthrough mode set at creation, but with more than two modes, 
either grouped by SCSI spec or our own groupings.


5) Never pass up certain opcodes, like the ALUA ones or whatever.


Have a good weekend -- Andy

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] target: Add a user-passthrough backstore

2014-09-19 Thread Alex Elsayed

Nicholas A. Bellinger wrote:

> On Fri, 2014-09-19 at 14:43 -0700, Alex Elsayed wrote:
>> Nicholas A. Bellinger wrote:
>> 
>> 
>> > So the idea of allowing the in-kernel CDB emulation to run after
>> > user-space has returned unsupported opcode is problematic for a couple
>> > of different reasons.
>> > 
>> > First, if the correct feature bits in standard INQUIRY + EVPD INQUIRY,
>> > etc are not populated by user-space to match what in-kernel CDB
>> > emulation actually supports, this can result in potentially undefined
>> > results initiator side.
>> > 
>> > Also for example, if user-space decided to emulate only a subset of PR
>> > operations, and leaves the rest of it up to the in-kernel emulation,
>> > there's not really a way to sync current state between the two, which
>> > also can end up with undefined results.
>> > 
>> > So that said, I think a saner approach would be two define two modes of
>> > operation for TCMU:
>> > 
>> >*) Passthrough Mode: All CDBs are passed to user-space, and no
>> >   in-kernel emulation is done in the event of an unsupported
>> >   opcode response.
>> > 
>> >*) I/O Mode: I/O related CDBs are passed into user-space, but
>> >   all control CDBs continue to be processed by in-kernel emulation.
>> >   This effectively limits operation to TYPE_DISK, but with this
>> >   mode it's probably OK to assume this.
>> > 
>> > This seems like the best trade-off between flexibility when everything
>> > should be handled by user-space, vs. functionality when only block
>> > remapping of I/O is occurring in user-space code.
>> 
>> The problem there is that the first case has all the issues of pscsi and
>> simply becomes a performance optimization over tgt+iscsi client+pscsi and
>> the latter case excludes the main use cases I'm interested in - OSDs,
>> media changers, optical discs (the biggest thing for me), and so forth.
>> 
>> One of the main things I want to do with this is hook up a plugin that
>> uses libmirage to handle various optical disc image formats.
>> 
> 
> Not sure I follow..  How does the proposed passthrough mode prevent
> someone from emulating OSDs, media changers, optical disks or anything
> else in userspace with TCMU..?
> 
> The main thing that the above comments highlight is why attempting to
> combine the existing in-kernel emulation with a userspace backend
> providing it's own emulation can open up a number of problems with
> mismatched state between the two.

It doesn't prevent it, but it _does_ put it in the exact same place as PSCSI 
regarding the warnings on the wiki. It means that if someone wants to 
implement (say) the optical disc or OSD CDBs, they then lose out on ALUA &co 
unless they implement it themselves - which seems unnecessary and painful, 
since those should really be disjoint. In particular, an OSD backed by RADOS 
objects could be a very nice thing indeed, _and_ could really benefit from 
ALUA.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] scsi: fix kconfig dependency warnings for SCSI_FC_ATTRS

2014-09-19 Thread Randy Dunlap

From: Randy Dunlap 

Fix kconfig dependency warnings which can lead to build errors:

warning: (SCSI_BNX2X_FCOE && LIBFCOE && TCM_QLA2XXX) selects LIBFC which has 
unmet direct dependencies (SCSI_LOWLEVEL && SCSI && SCSI_FC_ATTRS)

warning: (FCOE && FCOE_FNIC) selects LIBFCOE which has unmet direct 
dependencies (SCSI_LOWLEVEL && SCSI && SCSI_FC_ATTRS)

Signed-off-by: Randy Dunlap 
---
 drivers/scsi/Kconfig |3 +++
 drivers/scsi/qla2xxx/Kconfig |2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

--- linux-next-20140918.orig/drivers/scsi/Kconfig
+++ linux-next-20140918/drivers/scsi/Kconfig
@@ -601,6 +601,7 @@ config LIBFC
 
 config LIBFCOE
tristate "LibFCoE module"
+   depends on SCSI_FC_ATTRS
select LIBFC
---help---
  Library for Fibre Channel over Ethernet module
@@ -608,6 +609,7 @@ config LIBFCOE
 config FCOE
tristate "FCoE module"
depends on PCI
+   depends on SCSI_FC_ATTRS
select LIBFCOE
---help---
  Fibre Channel over Ethernet module
@@ -615,6 +617,7 @@ config FCOE
 config FCOE_FNIC
tristate "Cisco FNIC Driver"
depends on PCI && X86
+   depends on SCSI_FC_ATTRS
select LIBFCOE
help
  This is support for the Cisco PCI-Express FCoE HBA.
--- linux-next-20140918.orig/drivers/scsi/qla2xxx/Kconfig
+++ linux-next-20140918/drivers/scsi/qla2xxx/Kconfig
@@ -30,7 +30,7 @@ config SCSI_QLA_FC
 
 config TCM_QLA2XXX
tristate "TCM_QLA2XXX fabric module for Qlogic 2xxx series target mode 
HBAs"
-   depends on SCSI_QLA_FC && TARGET_CORE
+   depends on SCSI_QLA_FC && TARGET_CORE && SCSI_FC_ATTRS
select LIBFC
select BTREE
default n
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] target: Add a user-passthrough backstore

2014-09-19 Thread Nicholas A. Bellinger

On Fri, 2014-09-19 at 14:43 -0700, Alex Elsayed wrote:
> Nicholas A. Bellinger wrote:
> 
> 
> > So the idea of allowing the in-kernel CDB emulation to run after
> > user-space has returned unsupported opcode is problematic for a couple
> > of different reasons.
> > 
> > First, if the correct feature bits in standard INQUIRY + EVPD INQUIRY,
> > etc are not populated by user-space to match what in-kernel CDB
> > emulation actually supports, this can result in potentially undefined
> > results initiator side.
> > 
> > Also for example, if user-space decided to emulate only a subset of PR
> > operations, and leaves the rest of it up to the in-kernel emulation,
> > there's not really a way to sync current state between the two, which
> > also can end up with undefined results.
> > 
> > So that said, I think a saner approach would be two define two modes of
> > operation for TCMU:
> > 
> >*) Passthrough Mode: All CDBs are passed to user-space, and no
> >   in-kernel emulation is done in the event of an unsupported
> >   opcode response.
> > 
> >*) I/O Mode: I/O related CDBs are passed into user-space, but
> >   all control CDBs continue to be processed by in-kernel emulation.
> >   This effectively limits operation to TYPE_DISK, but with this mode
> >   it's probably OK to assume this.
> > 
> > This seems like the best trade-off between flexibility when everything
> > should be handled by user-space, vs. functionality when only block
> > remapping of I/O is occurring in user-space code.
> 
> The problem there is that the first case has all the issues of pscsi and 
> simply becomes a performance optimization over tgt+iscsi client+pscsi and 
> the latter case excludes the main use cases I'm interested in - OSDs, media 
> changers, optical discs (the biggest thing for me), and so forth.
> 
> One of the main things I want to do with this is hook up a plugin that uses 
> libmirage to handle various optical disc image formats.
> 

Not sure I follow..  How does the proposed passthrough mode prevent
someone from emulating OSDs, media changers, optical disks or anything
else in userspace with TCMU..?

The main thing that the above comments highlight is why attempting to
combine the existing in-kernel emulation with a userspace backend
providing it's own emulation can open up a number of problems with
mismatched state between the two.

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL] target fixes for v3.17-rc6

2014-09-19 Thread Nicholas A. Bellinger

Hi Linus,

Here are the target pending fixes for v3.17-rc6.  Please pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git master

Included are Sagi's long overdue fixes related to iser-target shutdown,
along with a couple of fixes from Sebastian related to ALUA Referrals
changes that when in during the v3.14 time-frame.

Also included are a few iscsi-target fixes, most recently of which where
found during Joern's Coverity scanning of target code.

Thanks,

--nab

Joern Engel (1):
  iscsi-target: avoid NULL pointer in iscsi_copy_param_list failure

Nicholas Bellinger (2):
  iscsi-target: Ignore ICF_GOT_LAST_DATAOUT during Data-Out ITT lookup
  iscsi-target: Fix memory corruption in
iscsit_logout_post_handler_diffcid

Sagi Grimberg (4):
  Target/iser: Get isert_conn reference once got to connected_handler
  Target/iser: Don't put isert_conn inside disconnected handler
  Target/iser: Avoid calling rdma_disconnect twice
  Target/iser: Fix initiator_depth and responder_resources

Sebastian Herbszt (2):
  target: Fix user data segment multiplier in spc_emulate_evpd_b3()
  target: Fix inverted logic in SE_DEV_ALUA_SUPPORT_STATE_STORE

 drivers/infiniband/ulp/isert/ib_isert.c|   20 +++-
 drivers/target/iscsi/iscsi_target.c|4 +++-
 drivers/target/iscsi/iscsi_target_parameters.c |2 +-
 drivers/target/iscsi/iscsi_target_util.c   |2 ++
 drivers/target/target_core_configfs.c  |2 +-
 drivers/target/target_core_spc.c   |2 +-
 6 files changed, 19 insertions(+), 13 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] target: Add a user-passthrough backstore

2014-09-19 Thread Alex Elsayed

Nicholas A. Bellinger wrote:


> So the idea of allowing the in-kernel CDB emulation to run after
> user-space has returned unsupported opcode is problematic for a couple
> of different reasons.
> 
> First, if the correct feature bits in standard INQUIRY + EVPD INQUIRY,
> etc are not populated by user-space to match what in-kernel CDB
> emulation actually supports, this can result in potentially undefined
> results initiator side.
> 
> Also for example, if user-space decided to emulate only a subset of PR
> operations, and leaves the rest of it up to the in-kernel emulation,
> there's not really a way to sync current state between the two, which
> also can end up with undefined results.
> 
> So that said, I think a saner approach would be two define two modes of
> operation for TCMU:
> 
>*) Passthrough Mode: All CDBs are passed to user-space, and no
>   in-kernel emulation is done in the event of an unsupported
>   opcode response.
> 
>*) I/O Mode: I/O related CDBs are passed into user-space, but
>   all control CDBs continue to be processed by in-kernel emulation.
>   This effectively limits operation to TYPE_DISK, but with this mode
>   it's probably OK to assume this.
> 
> This seems like the best trade-off between flexibility when everything
> should be handled by user-space, vs. functionality when only block
> remapping of I/O is occurring in user-space code.

The problem there is that the first case has all the issues of pscsi and 
simply becomes a performance optimization over tgt+iscsi client+pscsi and 
the latter case excludes the main use cases I'm interested in - OSDs, media 
changers, optical discs (the biggest thing for me), and so forth.

One of the main things I want to do with this is hook up a plugin that uses 
libmirage to handle various optical disc image formats.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] target: Add a user-passthrough backstore

2014-09-19 Thread Nicholas A. Bellinger

Hi Andy,

A few comments are inline below.

On Mon, 2014-09-15 at 16:12 -0700, Andy Grover wrote:
> Add a LIO storage engine that presents commands to userspace for execution.
> This would allow more complex backstores to be implemented out-of-kernel,
> and also make experimentation a-la FUSE (but at the SCSI level -- "SUSE"?)
> possible.
> 
> It uses a mmap()able UIO device per LUN to share a command ring and data
> area. The commands are raw SCSI CDBs and iovs for in/out data. The command
> ring is also reused for returning scsi command status and optional sense
> data.
> 
> This implementation is based on Shaohua Li's earlier version but heavily
> modified. Differences include:
> 
> * Shared memory allocated by kernel, not locked-down user pages
> * Single ring for command request and response
> * Offsets instead of embedded pointers
> * Generic SCSI CDB passthrough instead of per-cmd specialization in ring
>   format.
> * Uses UIO device instead of anon_file passed in mailbox.
> * Optional in-kernel handling of some commands.
> 

Shaohua, do you have any comments are these changes..?  It's not clear
that all of these changes are beneficial, so I'd really like to hear
input given your the original author of this code.

> The main reason for these differences is to permit greater resiliency
> if the user process dies or hangs.
> 
> Things not yet implemented (on purpose):
> 
> * Zero copy. The data area is flexible enough to allow page flipping or
>   backend-allocated pages to be used by fabrics, but it's not clear these
>   are performance wins. Can come later.
> * Out-of-order command completion by userspace. Possible to add by just
>   allowing userspace to change cmd_id in rsp cmd entries, but currently
>   not supported.
> * No locks between kernel cmd submission and completion routines. Sounds
>   like it's possible, but this can come later.
> * Sparse allocation of mmaped area. Current code vmallocs the whole thing.
>   If the mapped area was larger and not fully mapped then the driver would
>   have more freedom to change cmd and data area sizes based on demand.
> 
> Current code open issues:
> 
> * The use of idrs may be overkill -- we maybe can replace them with a
>   simple counter to generate cmd_ids, and a hash table to get a cmd_id's
>   associated pointer.
> * Use of a free-running counter for cmd ring instead of explicit modulo
>   math. This would require power-of-2 cmd ring size.
> 
> Signed-off-by: Andy Grover 
> ---
>  drivers/target/Kconfig |5 +
>  drivers/target/Makefile|1 +
>  drivers/target/target_core_transport.c |4 +
>  drivers/target/target_core_user.c  | 1169 
> 
>  include/uapi/linux/Kbuild  |1 +
>  include/uapi/linux/target_core_user.h  |  142 
>  6 files changed, 1322 insertions(+)
>  create mode 100644 drivers/target/target_core_user.c
>  create mode 100644 include/uapi/linux/target_core_user.h
> 



> +
> +/* Core requires execute_rw be set, but just return unsupported */
> +static sense_reason_t
> +tcmu_retry_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
> +   enum dma_data_direction data_direction)
> +{
> + return TCM_UNSUPPORTED_SCSI_OPCODE;
> +}
> +
> +static struct sbc_ops tcmu_retry_ops = {
> + .execute_rw = tcmu_retry_rw,
> +};
> +
> +static void tcmu_work(struct work_struct *work)
> +{
> + struct tcmu_cmd *cmd = container_of(work, struct tcmu_cmd, work);
> + struct se_cmd *se_cmd = cmd->se_cmd;
> +
> + target_execute_cmd(se_cmd);
> + kmem_cache_free(tcmu_cmd_cache, cmd);
> +}
> +
> +static void tcmu_emulate_cmd(struct tcmu_cmd *cmd)
> +{
> + struct se_cmd *se_cmd = cmd->se_cmd;
> + sense_reason_t ret;
> + unsigned long flags;
> +
> + /* Re-run parsing to set execute_cmd to value for possible emulation */
> + se_cmd->execute_cmd = NULL;
> +
> + /*
> +  * Can't optionally call generic_request_failure if flags indicate it's
> +  * still being handled by us.
> + */
> + spin_lock_irqsave(&se_cmd->t_state_lock, flags);
> + se_cmd->transport_state &= ~CMD_T_BUSY|CMD_T_SENT;
> + spin_unlock_irqrestore(&se_cmd->t_state_lock, flags);
> +
> + ret = sbc_parse_cdb(se_cmd, &tcmu_retry_ops);
> + if (ret == TCM_NO_SENSE && se_cmd->execute_cmd) {
> + schedule_work(&cmd->work);
> + } else {
> + /* Can't emulate. */
> + transport_generic_request_failure(se_cmd, ret);
> + kmem_cache_free(tcmu_cmd_cache, cmd);
> + }
> +}

So the idea of allowing the in-kernel CDB emulation to run after
user-space has returned unsupported opcode is problematic for a couple
of different reasons.

First, if the correct feature bits in standard INQUIRY + EVPD INQUIRY,
etc are not populated by user-space to match what in-kernel CDB
emulation actually supports, this can result in potentially undefined
results initiator side.

Also for exa

Re: boot stall regression due to blk-mq: use percpu_ref for mq usage count

2014-09-19 Thread Jens Axboe

On 09/19/2014 05:38 AM, Christoph Hellwig wrote:
> Hi Jens, hi Tejun,
> 
> I've seen multi-second boot stalls in one of my KVM setups during
> the initial scsi scan:
> 
> [0.949892] scsi host0: Virtio SCSI HBA
> [1.007864] scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK1.1. 
> PQ: 0 ANSI: 5
> [1.021299] scsi 0:0:1:0: Direct-Access QEMU QEMU HARDDISK1.1. 
> PQ: 0 ANSI: 5
> [1.520356] tsc: Refined TSC clocksource calibration: 2491.910 MHz
> 
> 
> 
> [   16.186549] sd 0:0:0:0: Attached scsi generic sg0 type 0
> [   16.190478] sd 0:0:1:0: Attached scsi generic sg1 type 0
> [   16.194099] osd: LOADED open-osd 0.2.1
> [   16.203202] sd 0:0:0:0: [sda] 31457280 512-byte logical blocks: (16.1 
> GB/15.0 GiB)
> [   16.208478] sd 0:0:0:0: [sda] Write Protect is off
> [   16.211439] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, 
> doesn't support DPO or FUA
> [   16.218771] sd 0:0:1:0: [sdb] 31457280 512-byte logical blocks: (16.1 
> GB/15.0 GiB)
> [   16.223264] sd 0:0:1:0: [sdb] Write Protect is off
> [   16.225682] sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, 
> doesn't support DPO or FUA
> 
> 
> 
> I've tracked this down to "blk-mq: use percpu_ref for mq usage count" in
> a rather painful way as that one introduced enough other regressions
> to mess up bisect.
> 
> If I revert the following commits:
> 
> dd840087086f3b93ac20f7472b4fca59aff7b79f
> cddd5d17642cc6881352732693c2ae6930e9ce65
> add703fda981b9719d37f371498b9f129acbd997
> 
> which are the above mentioned commit and two fixes to it the problem goes
> away.

Thanks for bisecting this, I ran into something that I think is the same
issue about a week ago. Tejun, any ideas before I dig into this one?

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/8] IB/srp: Separate target and channel variables

2014-09-19 Thread Sagi Grimberg

On 9/19/2014 3:59 PM, Bart Van Assche wrote:
> Changes in this patch:
> - Move channel variables into a new structure (struct srp_rdma_ch).
> - cm_id and completion handler context pointer are now of type
>srp_rdma_ch * insteoad of srp_target_port *.
>

s/insteoad/instead

> No functionality is changed.
>

errr... long patch

I'll review it more thoroughly later...

> Signed-off-by: Bart Van Assche 
> ---
>   drivers/infiniband/ulp/srp/ib_srp.c | 707 
> +++-
>   drivers/infiniband/ulp/srp/ib_srp.h |  59 +--
>   2 files changed, 417 insertions(+), 349 deletions(-)
>
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
> b/drivers/infiniband/ulp/srp/ib_srp.c
> index fd88fb8..9feeea1 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -125,8 +125,8 @@ MODULE_PARM_DESC(dev_loss_tmo,
>
>   static void srp_add_one(struct ib_device *device);
>   static void srp_remove_one(struct ib_device *device);
> -static void srp_recv_completion(struct ib_cq *cq, void *target_ptr);
> -static void srp_send_completion(struct ib_cq *cq, void *target_ptr);
> +static void srp_recv_completion(struct ib_cq *cq, void *ch_ptr);
> +static void srp_send_completion(struct ib_cq *cq, void *ch_ptr);
>   static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event 
> *event);
>
>   static struct scsi_transport_template *ib_srp_transport_template;
> @@ -262,7 +262,7 @@ static int srp_init_qp(struct srp_target_port *target,
>
>   ret = ib_find_pkey(target->srp_host->srp_dev->dev,
> target->srp_host->port,
> -   be16_to_cpu(target->path.pkey),
> +   be16_to_cpu(target->pkey),
> &attr->pkey_index);
>   if (ret)
>   goto out;
> @@ -283,18 +283,23 @@ out:
>   return ret;
>   }
>
> -static int srp_new_cm_id(struct srp_target_port *target)
> +static int srp_new_cm_id(struct srp_rdma_ch *ch)
>   {
> + struct srp_target_port *target = ch->target;
>   struct ib_cm_id *new_cm_id;
>
>   new_cm_id = ib_create_cm_id(target->srp_host->srp_dev->dev,
> -srp_cm_handler, target);
> +srp_cm_handler, ch);
>   if (IS_ERR(new_cm_id))
>   return PTR_ERR(new_cm_id);
>
> - if (target->cm_id)
> - ib_destroy_cm_id(target->cm_id);
> - target->cm_id = new_cm_id;
> + if (ch->cm_id)
> + ib_destroy_cm_id(ch->cm_id);
> + ch->cm_id = new_cm_id;
> + ch->path.sgid = target->sgid;
> + ch->path.dgid = target->orig_dgid;
> + ch->path.pkey = target->pkey;
> + ch->path.service_id = target->service_id;
>
>   return 0;
>   }
> @@ -443,8 +448,9 @@ static struct srp_fr_pool *srp_alloc_fr_pool(struct 
> srp_target_port *target)
>dev->max_pages_per_mr);
>   }
>
> -static int srp_create_target_ib(struct srp_target_port *target)
> +static int srp_create_ch_ib(struct srp_rdma_ch *ch)
>   {
> + struct srp_target_port *target = ch->target;
>   struct srp_device *dev = target->srp_host->srp_dev;
>   struct ib_qp_init_attr *init_attr;
>   struct ib_cq *recv_cq, *send_cq;
> @@ -458,15 +464,15 @@ static int srp_create_target_ib(struct srp_target_port 
> *target)
>   if (!init_attr)
>   return -ENOMEM;
>
> - recv_cq = ib_create_cq(dev->dev, srp_recv_completion, NULL, target,
> -   target->queue_size, target->comp_vector);
> + recv_cq = ib_create_cq(dev->dev, srp_recv_completion, NULL, ch,
> +   target->queue_size, ch->comp_vector);
>   if (IS_ERR(recv_cq)) {
>   ret = PTR_ERR(recv_cq);
>   goto err;
>   }
>
> - send_cq = ib_create_cq(dev->dev, srp_send_completion, NULL, target,
> -   m * target->queue_size, target->comp_vector);
> + send_cq = ib_create_cq(dev->dev, srp_send_completion, NULL, ch,
> +   m * target->queue_size, ch->comp_vector);
>   if (IS_ERR(send_cq)) {
>   ret = PTR_ERR(send_cq);
>   goto err_recv_cq;
> @@ -502,9 +508,9 @@ static int srp_create_target_ib(struct srp_target_port 
> *target)
>   "FR pool allocation failed (%d)\n", ret);
>   goto err_qp;
>   }
> - if (target->fr_pool)
> - srp_destroy_fr_pool(target->fr_pool);
> - target->fr_pool = fr_pool;
> + if (ch->fr_pool)
> + srp_destroy_fr_pool(ch->fr_pool);
> + ch->fr_pool = fr_pool;
>   } else if (!dev->use_fast_reg && dev->has_fmr) {
>   fmr_pool = srp_alloc_fmr_pool(target);
>   if (IS_ERR(fmr_pool)) {
> @@ -513,21 +519,21 @@ static int srp_create_target_ib(struct srp_target_port 
> *target)
>   "FMR pool allocation failed (%d)\n", ret);
>   goto err_qp;
>   }
> - if (target->fmr_pool)
> - ib_destroy_fmr_pool(target->fmr_pool);
> - target->fmr_pool = fmr_pool;
> + if (ch->fmr_pool)
> + ib_destroy_fmr_pool(ch->fmr_pool);
> + ch->fmr_pool = fmr_pool;
>   }
>
> - if (target->qp)
> - ib_destroy_qp(target->qp);
> - if (target->recv_cq)
> - ib_destroy_cq(target->recv_cq);
> - if (target->send_cq)
> - ib_destroy_cq(target->send_cq);
> + if (ch->qp)
> + ib_destroy_qp(ch->qp);
> + if (ch->recv_cq)
> + ib_destroy_cq(ch->recv_cq);
> + if (ch->send_cq)
> + ib_destroy_cq(ch->send_cq);
>
> - target->qp = qp;
> - target->recv_cq = recv_cq;
> - target->send_cq = send_cq;
> + ch->qp = qp;
> + ch->recv_cq =

Re: [PATCH RFC 0/8] IB/srp: Add multichannel support

2014-09-19 Thread Jens Axboe

On 09/19/2014 06:55 AM, Bart Van Assche wrote:
> Hello,
> 
> Although the SRP protocol supports multichannel operation, although
> since considerable time RDMA HCA's are available that support multiple
> completion vectors and although multichannel operation yields better
> performance than using a single channel, the Linux SRP initiator does
> not yet support multichannel operation. While adding multichannel
> support in the SRP initiator I encountered a few challenges of which I
> think these need wider discussion. The topics I would like invite wider
> discussion about are as follows:
> - How to avoid unneeded inter-socket cache traffic. Should the blk-mq
>   layer e.g. assign CPU cores to hardware queues such that all CPU cores
>   associated with a single hardware queue reside on the same CPU socket?
>   (see also patch 1/8)

Right now these are deliberately symmetric, and hence can result in
hardware queues being left unused. Whether it makes sense to make this
change or not, I think will greatly depend on how many queues are
available. There's a crossover point where having more queues doesn't
help, and it can even make things worse (interrupt load, etc). For your
specific case or 4 queues, it probably DOES make sense to use them all,
however.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/8] IB/srp: Remove stale connection retry mechanism

2014-09-19 Thread Sagi Grimberg

On 9/19/2014 3:58 PM, Bart Van Assche wrote:
> Attempting to connect three times may be insufficient after an
> initiator system that was using multiple RDMA channels tries to
> relogin. Additionally, this login retry mechanism is a workaround
> for particular behavior of the IB/CM. Since the srp_daemon retries
> a failed login attempt anyway, remove the stale connection retry
> mechanism.
>

I agree, this work-around doesn't even always work for some targets. In
case the RDMA stack restarted while IB ports were down and srp initiator
didn't manage to teardown the connections, some targets are keeping
cm_ids online returning "stale connection" rejects longer then expected.

let user-space retry from scratch...

Reviewed-by: Sagi Grimberg 

> Signed-off-by: Bart Van Assche 
> ---
>   drivers/infiniband/ulp/srp/ib_srp.c | 16 +++-
>   1 file changed, 3 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
> b/drivers/infiniband/ulp/srp/ib_srp.c
> index d3c712f..9608e7a 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -904,7 +904,6 @@ static void srp_rport_delete(struct srp_rport *rport)
>
>   static int srp_connect_target(struct srp_target_port *target)
>   {
> - int retries = 3;
>   int ret;
>
>   WARN_ON_ONCE(target->connected);
> @@ -945,19 +944,10 @@ static int srp_connect_target(struct srp_target_port 
> *target)
>   break;
>
>   case SRP_STALE_CONN:
> - /* Our current CM id was stale, and is now in timewait.
> - * Try to reconnect with a new one.
> - */
> - if (!retries-- || srp_new_cm_id(target)) {
> - shost_printk(KERN_ERR, target->scsi_host, PFX
> - "giving up on stale connection\n");
> - target->status = -ECONNRESET;
> - return target->status;
> - }
> -
>   shost_printk(KERN_ERR, target->scsi_host, PFX
> - "retrying stale connection\n");
> - break;
> + "giving up on stale connection\n");
> + target->status = -ECONNRESET;
> + return target->status;
>
>   default:
>   return target->status;
>
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/8] scsi-mq: Add support for multiple hardware queues

2014-09-19 Thread Christoph Hellwig

On Fri, Sep 19, 2014 at 09:05:53PM +0300, Sagi Grimberg wrote:
> I think this patch should be squashed with passing LLD hctx patch (in
> whatever form it ends up).

Agreed.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/8] IB/srp: Add multichannel support

2014-09-19 Thread Christoph Hellwig

On Fri, Sep 19, 2014 at 11:33:15AM -0600, Jens Axboe wrote:
> That'd be fine as well. The mapping is cheap, though, but it would make
> sense to have an appropriate way to just pass it in like it happens for
> ->queue_rq() for native blk-mq drivers.

I think just passing the hw_ctx is fine.  But I don't want drivers to
have to implement both methods, so we should make sure the new method
works for both the blk-mq and !blk-mq case.  Note that once thing the
new method should get is a bool last paramter similar to the one I
added to the queue_rq method in the block tree tree.

It might be worth it to simply do a global search and replace and pass
a hw_ctx method to all instances, too.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/8] IB/srp: Move ib_destroy_cm_id() call into srp_free_ch_ib()

2014-09-19 Thread Sagi Grimberg

On 9/19/2014 3:58 PM, Bart Van Assche wrote:
> The patch that adds multichannel support into the SRP initiator
> driver introduces an additional call to srp_free_ch_ib(). This
> patch helps to keep that later patch simple.
>
> Signed-off-by: Bart Van Assche 
> ---
>   drivers/infiniband/ulp/srp/ib_srp.c | 11 ++-
>   1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
> b/drivers/infiniband/ulp/srp/ib_srp.c
> index 62d2a18..d3c712f 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -555,6 +555,11 @@ static void srp_free_target_ib(struct srp_target_port 
> *target)
>   struct srp_device *dev = target->srp_host->srp_dev;
>   int i;
>
> + if (target->cm_id) {
> + ib_destroy_cm_id(target->cm_id);
> + target->cm_id = NULL;
> + }
> +
>   if (dev->use_fast_reg) {
>   if (target->fr_pool)
>   srp_destroy_fr_pool(target->fr_pool);
> @@ -868,7 +873,6 @@ static void srp_remove_target(struct srp_target_port 
> *target)
>   scsi_remove_host(target->scsi_host);
>   srp_stop_rport_timers(target->rport);
>   srp_disconnect_target(target);
> - ib_destroy_cm_id(target->cm_id);
>   srp_free_target_ib(target);
>   cancel_work_sync(&target->tl_err_work);
>   srp_rport_put(target->rport);
> @@ -3043,7 +3047,7 @@ static ssize_t srp_create_target(struct device *dev,
>   if (ret) {
>   shost_printk(KERN_ERR, target->scsi_host,
>   PFX "Connection failed\n");
> - goto err_cm_id;
> + goto err_free_ib;
>   }
>
>   ret = srp_add_target(host, target);
> @@ -3067,9 +3071,6 @@ out:
>   err_disconnect:
>   srp_disconnect_target(target);
>
> -err_cm_id:
> - ib_destroy_cm_id(target->cm_id);
> -
>   err_free_ib:
>   srp_free_target_ib(target);
>
>

Looks good.

Reviewed-by: Sagi Grimberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/8] scsi-mq: Add support for multiple hardware queues

2014-09-19 Thread Sagi Grimberg

On 9/19/2014 3:57 PM, Bart Van Assche wrote:
> Allow a SCSI LLD to declare how many hardware queues it supports
> by setting Scsi_Host.nr_hw_queues before calling scsi_add_host().
>
> Note: it is assumed that each hardware queue has a queue depth of
> shost->can_queue. In other words, the total queue depth per host
> is (number of hardware queues) * (shost->can_queue).
>
> Signed-off-by: Bart Van Assche 
> Cc: Christoph Hellwig 
> ---
>   drivers/scsi/scsi_lib.c  | 2 +-
>   include/scsi/scsi_host.h | 4 
>   2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index d837dc1..b0b6117 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2071,7 +2071,7 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
>
>   memset(&shost->tag_set, 0, sizeof(shost->tag_set));
>   shost->tag_set.ops = &scsi_mq_ops;
> - shost->tag_set.nr_hw_queues = 1;
> + shost->tag_set.nr_hw_queues = shost->nr_hw_queues ? : 1;
>   shost->tag_set.queue_depth = shost->can_queue;
>   shost->tag_set.cmd_size = cmd_size;
>   shost->tag_set.numa_node = NUMA_NO_NODE;
> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> index ba20347..0a867d9 100644
> --- a/include/scsi/scsi_host.h
> +++ b/include/scsi/scsi_host.h
> @@ -638,6 +638,10 @@ struct Scsi_Host {
>   short unsigned int sg_prot_tablesize;
>   unsigned int max_sectors;
>   unsigned long dma_boundary;
> + /*
> + * In scsi-mq mode, the number of hardware queues supported by the LLD.
> + */
> + unsigned nr_hw_queues;
>   /*
>   * Used to assign serial numbers to the cmds.
>   * Protected by the host lock.
>

I think this patch should be squashed with passing LLD hctx patch (in
whatever form it ends up).
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 0/8] IB/srp: Add multichannel support

2014-09-19 Thread Sagi Grimberg

On 9/19/2014 3:56 PM, Bart Van Assche wrote:
> [PATCH 1/8] blk-mq: Use all available hardware queues
>
> Suppose that a system has two CPU sockets, three cores per socket,
> that it does not support hyperthreading and that four hardware
> queues are provided by a block driver. With the current algorithm
> this will lead to the following assignment of CPU cores to hardware
> queues:
>
>HWQ 0: 0 1
>HWQ 1: 2 3
>HWQ 2: 4 5
>HWQ 3: (none)
>
> This patch changes the queue assignment into:
>
>HWQ 0: 0 1
>HWQ 1: 2
>HWQ 2: 3 4
>HWQ 3: 5
>
> In other words, this patch has the following three effects:
> - All four hardware queues are used instead of only three.
> - CPU cores are spread more evenly over hardware queues. For the
>above example the range of the number of CPU cores associated
>with a single HWQ is reduced from [0..2] to [1..2].
> - If the number of HWQ's is a multiple of the number of CPU sockets
>it is now guaranteed that all CPU cores associated with a single
>HWQ reside on the same CPU socket.
>
> Signed-off-by: Bart Van Assche 
> Cc: Jens Axboe 
> Cc: Christoph Hellwig 
> Cc: Ming Lei 
> ---
>   block/blk-mq-cpumap.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index 1065d7c..8e56455 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -17,7 +17,7 @@
>   static int cpu_to_queue_index(unsigned int nr_cpus, unsigned int nr_queues,
>const int cpu)
>   {
> - return cpu / ((nr_cpus + nr_queues - 1) / nr_queues);
> + return cpu * nr_queues / nr_cpus;
>   }
>
>   static int get_first_sibling(unsigned int cpu)
>

Seems reasonable enough.

Reviewed-by: Sagi Grimberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/8] IB/srp: Add multichannel support

2014-09-19 Thread Jens Axboe

On 09/19/2014 11:30 AM, Sagi Grimberg wrote:
> On 9/19/2014 6:38 PM, Jens Axboe wrote:
>> On 09/19/2014 09:35 AM, Bart Van Assche wrote:
>>> On 09/19/14 17:27, Ming Lei wrote:
 On Fri, Sep 19, 2014 at 11:21 PM, Bart Van Assche  
 wrote:
> On 09/19/14 16:28, Ming Lei wrote:
>>
>> On Fri, Sep 19, 2014 at 9:00 PM, Bart Van Assche 
>> wrote:
>>>
>>> @@ -2643,7 +2754,8 @@ static struct scsi_host_template srp_template = {
>>>.proc_name  = DRV_NAME,
>>>.slave_configure= srp_slave_configure,
>>>.info   = srp_target_info,
>>> -   .queuecommand   = srp_queuecommand,
>>> +   .queuecommand   = srp_sq_queuecommand,
>>> +   .mq_queuecommand= srp_mq_queuecommand,
>>
>>
>> Another choice is to obtain hctx from request directly, then mq can
>> reuse the .queuecommand interface too.
>
>
> Hello Ming,
>
> Is the hctx information already available in the request data structure ? 
> I
> have found a mq_ctx member but no hctx member. Did I perhaps overlook
> something ?

 You are right, but the mq_ctx can be mapped to hctx like below way:

 ctx = rq->mq_ctx;
 hctx = q->mq_ops->map_queue(q, ctx->cpu);
>>>
>>> Hello Ming,
>>>
>>> Had you already noticed that the blk_mq_ctx data structure is a private
>>> data structure (declared in block/blk-mq.h) and hence not available to
>>> SCSI LLDs ? However, what might be possible is to cache the hctx pointer
>>> in the request structure, e.g. like in the (completely untested) patch
>>> below.
>>
>> ctx was meant to be private, unfortunately it's leaked a bit into other
>> parts of block/. But it's still private within that, at least.
>>
>> Lets not add more stuff to struct request, it's already way too large.
>> We could add an exported
>>
>> struct blk_mq_hw_ctx *blk_mq_request_to_hctx(struct request *rq)
>> {
>> struct request_queue *q = rq->q;
>>
>> return q->mq_ops->map_queue(q, rq->mq_ctx->cpu);
>> }
>>
>> for this.
>>
> 
> Hey,
> 
> So I agree that we shouldn't overload struct request with another
> pointer, but it also seems a bit unnecessary to map again just to get
> the hctx. Since in the future we would like LLDs to use scsi-mq why not
> modify existing .queuecommand to pass hctx (or even better
> hctx->driver_data) and for now LLDs won't use it. Once they choose to,
> it will be available to them.

That'd be fine as well. The mapping is cheap, though, but it would make
sense to have an appropriate way to just pass it in like it happens for
->queue_rq() for native blk-mq drivers.


-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/8] IB/srp: Add multichannel support

2014-09-19 Thread Sagi Grimberg

On 9/19/2014 6:38 PM, Jens Axboe wrote:
> On 09/19/2014 09:35 AM, Bart Van Assche wrote:
>> On 09/19/14 17:27, Ming Lei wrote:
>>> On Fri, Sep 19, 2014 at 11:21 PM, Bart Van Assche  
>>> wrote:
 On 09/19/14 16:28, Ming Lei wrote:
>
> On Fri, Sep 19, 2014 at 9:00 PM, Bart Van Assche 
> wrote:
>>
>> @@ -2643,7 +2754,8 @@ static struct scsi_host_template srp_template = {
>>.proc_name  = DRV_NAME,
>>.slave_configure= srp_slave_configure,
>>.info   = srp_target_info,
>> -   .queuecommand   = srp_queuecommand,
>> +   .queuecommand   = srp_sq_queuecommand,
>> +   .mq_queuecommand= srp_mq_queuecommand,
>
>
> Another choice is to obtain hctx from request directly, then mq can
> reuse the .queuecommand interface too.


 Hello Ming,

 Is the hctx information already available in the request data structure ? I
 have found a mq_ctx member but no hctx member. Did I perhaps overlook
 something ?
>>>
>>> You are right, but the mq_ctx can be mapped to hctx like below way:
>>>
>>> ctx = rq->mq_ctx;
>>> hctx = q->mq_ops->map_queue(q, ctx->cpu);
>>
>> Hello Ming,
>>
>> Had you already noticed that the blk_mq_ctx data structure is a private
>> data structure (declared in block/blk-mq.h) and hence not available to
>> SCSI LLDs ? However, what might be possible is to cache the hctx pointer
>> in the request structure, e.g. like in the (completely untested) patch
>> below.
>
> ctx was meant to be private, unfortunately it's leaked a bit into other
> parts of block/. But it's still private within that, at least.
>
> Lets not add more stuff to struct request, it's already way too large.
> We could add an exported
>
> struct blk_mq_hw_ctx *blk_mq_request_to_hctx(struct request *rq)
> {
> struct request_queue *q = rq->q;
>
> return q->mq_ops->map_queue(q, rq->mq_ctx->cpu);
> }
>
> for this.
>

Hey,

So I agree that we shouldn't overload struct request with another
pointer, but it also seems a bit unnecessary to map again just to get
the hctx. Since in the future we would like LLDs to use scsi-mq why not
modify existing .queuecommand to pass hctx (or even better
hctx->driver_data) and for now LLDs won't use it. Once they choose to,
it will be available to them.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/8] IB/srp: Add multichannel support

2014-09-19 Thread Jens Axboe

On 09/19/2014 09:35 AM, Bart Van Assche wrote:
> On 09/19/14 17:27, Ming Lei wrote:
>> On Fri, Sep 19, 2014 at 11:21 PM, Bart Van Assche  wrote:
>>> On 09/19/14 16:28, Ming Lei wrote:

 On Fri, Sep 19, 2014 at 9:00 PM, Bart Van Assche 
 wrote:
>
> @@ -2643,7 +2754,8 @@ static struct scsi_host_template srp_template = {
>   .proc_name  = DRV_NAME,
>   .slave_configure= srp_slave_configure,
>   .info   = srp_target_info,
> -   .queuecommand   = srp_queuecommand,
> +   .queuecommand   = srp_sq_queuecommand,
> +   .mq_queuecommand= srp_mq_queuecommand,


 Another choice is to obtain hctx from request directly, then mq can
 reuse the .queuecommand interface too.
>>>
>>>
>>> Hello Ming,
>>>
>>> Is the hctx information already available in the request data structure ? I
>>> have found a mq_ctx member but no hctx member. Did I perhaps overlook
>>> something ?
>>
>> You are right, but the mq_ctx can be mapped to hctx like below way:
>>
>> ctx = rq->mq_ctx;
>> hctx = q->mq_ops->map_queue(q, ctx->cpu);
> 
> Hello Ming,
> 
> Had you already noticed that the blk_mq_ctx data structure is a private
> data structure (declared in block/blk-mq.h) and hence not available to
> SCSI LLDs ? However, what might be possible is to cache the hctx pointer
> in the request structure, e.g. like in the (completely untested) patch
> below.

ctx was meant to be private, unfortunately it's leaked a bit into other
parts of block/. But it's still private within that, at least.

Lets not add more stuff to struct request, it's already way too large.
We could add an exported

struct blk_mq_hw_ctx *blk_mq_request_to_hctx(struct request *rq)
{
struct request_queue *q = rq->q;

return q->mq_ops->map_queue(q, rq->mq_ctx->cpu);
}

for this.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/8] IB/srp: Add multichannel support

2014-09-19 Thread Bart Van Assche

On 09/19/14 17:27, Ming Lei wrote:
> On Fri, Sep 19, 2014 at 11:21 PM, Bart Van Assche  wrote:
>> On 09/19/14 16:28, Ming Lei wrote:
>>>
>>> On Fri, Sep 19, 2014 at 9:00 PM, Bart Van Assche 
>>> wrote:

 @@ -2643,7 +2754,8 @@ static struct scsi_host_template srp_template = {
   .proc_name  = DRV_NAME,
   .slave_configure= srp_slave_configure,
   .info   = srp_target_info,
 -   .queuecommand   = srp_queuecommand,
 +   .queuecommand   = srp_sq_queuecommand,
 +   .mq_queuecommand= srp_mq_queuecommand,
>>>
>>>
>>> Another choice is to obtain hctx from request directly, then mq can
>>> reuse the .queuecommand interface too.
>>
>>
>> Hello Ming,
>>
>> Is the hctx information already available in the request data structure ? I
>> have found a mq_ctx member but no hctx member. Did I perhaps overlook
>> something ?
> 
> You are right, but the mq_ctx can be mapped to hctx like below way:
> 
> ctx = rq->mq_ctx;
> hctx = q->mq_ops->map_queue(q, ctx->cpu);

Hello Ming,

Had you already noticed that the blk_mq_ctx data structure is a private
data structure (declared in block/blk-mq.h) and hence not available to
SCSI LLDs ? However, what might be possible is to cache the hctx pointer
in the request structure, e.g. like in the (completely untested) patch
below.

[PATCH] blk-mq: Cache hardware context pointer in struct request

Cache the hardware context pointer in struct request such that block
drivers can determine which hardware queue has been selected by
reading rq->mq_hctx->queue_num. This information is needed to select
the appropriate hardware queue in e.g. SCSI LLDs.
---
 block/blk-flush.c  |  6 +-
 block/blk-mq.c | 16 +---
 include/linux/blkdev.h |  1 +
 3 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 3cb5e9e..698812d 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -327,13 +327,9 @@ static void flush_data_end_io(struct request *rq, int 
error)
 static void mq_flush_data_end_io(struct request *rq, int error)
 {
struct request_queue *q = rq->q;
-   struct blk_mq_hw_ctx *hctx;
-   struct blk_mq_ctx *ctx;
+   struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
unsigned long flags;
 
-   ctx = rq->mq_ctx;
-   hctx = q->mq_ops->map_queue(q, ctx->cpu);
-
/*
 * After populating an empty queue, kick it to avoid stall.  Read
 * the comment in flush_end_io().
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 383ea0c..8e428fe 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -210,6 +210,7 @@ __blk_mq_alloc_request(struct blk_mq_alloc_data *data, int 
rw)
}
 
rq->tag = tag;
+   rq->mq_hctx = data->hctx;
blk_mq_rq_ctx_init(data->q, data->ctx, rq, rw);
return rq;
}
@@ -267,12 +268,10 @@ static void __blk_mq_free_request(struct blk_mq_hw_ctx 
*hctx,
 void blk_mq_free_request(struct request *rq)
 {
struct blk_mq_ctx *ctx = rq->mq_ctx;
-   struct blk_mq_hw_ctx *hctx;
-   struct request_queue *q = rq->q;
+   struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
ctx->rq_completed[rq_is_sync(rq)]++;
 
-   hctx = q->mq_ops->map_queue(q, ctx->cpu);
__blk_mq_free_request(hctx, ctx, rq);
 }
 
@@ -287,10 +286,10 @@ void blk_mq_free_request(struct request *rq)
 void blk_mq_clone_flush_request(struct request *flush_rq,
struct request *orig_rq)
 {
-   struct blk_mq_hw_ctx *hctx =
-   orig_rq->q->mq_ops->map_queue(orig_rq->q, orig_rq->mq_ctx->cpu);
+   struct blk_mq_hw_ctx *hctx = orig_rq->mq_hctx;
 
flush_rq->mq_ctx = orig_rq->mq_ctx;
+   flush_rq->mq_hctx = hctx;
flush_rq->tag = orig_rq->tag;
memcpy(blk_mq_rq_to_pdu(flush_rq), blk_mq_rq_to_pdu(orig_rq),
hctx->cmd_size);
@@ -956,6 +955,7 @@ void blk_mq_insert_request(struct request *rq, bool 
at_head, bool run_queue,
rq->mq_ctx = ctx = current_ctx;
 
hctx = q->mq_ops->map_queue(q, ctx->cpu);
+   rq->mq_hctx = hctx;
 
if (rq->cmd_flags & (REQ_FLUSH | REQ_FUA) &&
!(rq->cmd_flags & (REQ_FLUSH_SEQ))) {
@@ -1001,6 +1001,7 @@ static void blk_mq_insert_requests(struct request_queue 
*q,
rq = list_first_entry(list, struct request, queuelist);
list_del_init(&rq->queuelist);
rq->mq_ctx = ctx;
+   rq->mq_hctx = hctx;
__blk_mq_insert_request(hctx, rq, false);
}
spin_unlock(&ctx->lock);
@@ -1475,6 +1476,8 @@ static int blk_mq_hctx_cpu_offline(struct blk_mq_hw_ctx 
*hctx, int cpu)
return NOTIFY_OK;
 
ctx = blk_mq_get_ctx(q);
+   hctx = q->mq_ops->map_queue(q, ctx->cpu);
+
spin_lock(&ctx->lock);
 
while (!list_empt

Re: [PATCH 8/8] IB/srp: Add multichannel support

2014-09-19 Thread Ming Lei

On Fri, Sep 19, 2014 at 11:21 PM, Bart Van Assche  wrote:
> On 09/19/14 16:28, Ming Lei wrote:
>>
>> On Fri, Sep 19, 2014 at 9:00 PM, Bart Van Assche 
>> wrote:
>>>
>>> @@ -2643,7 +2754,8 @@ static struct scsi_host_template srp_template = {
>>>  .proc_name  = DRV_NAME,
>>>  .slave_configure= srp_slave_configure,
>>>  .info   = srp_target_info,
>>> -   .queuecommand   = srp_queuecommand,
>>> +   .queuecommand   = srp_sq_queuecommand,
>>> +   .mq_queuecommand= srp_mq_queuecommand,
>>
>>
>> Another choice is to obtain hctx from request directly, then mq can
>> reuse the .queuecommand interface too.
>
>
> Hello Ming,
>
> Is the hctx information already available in the request data structure ? I
> have found a mq_ctx member but no hctx member. Did I perhaps overlook
> something ?

You are right, but the mq_ctx can be mapped to hctx like below way:

ctx = rq->mq_ctx;
hctx = q->mq_ops->map_queue(q, ctx->cpu);

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/8] IB/srp: Add multichannel support

2014-09-19 Thread Bart Van Assche


On 09/19/14 16:28, Ming Lei wrote:

On Fri, Sep 19, 2014 at 9:00 PM, Bart Van Assche  wrote:

@@ -2643,7 +2754,8 @@ static struct scsi_host_template srp_template = {
 .proc_name  = DRV_NAME,
 .slave_configure= srp_slave_configure,
 .info   = srp_target_info,
-   .queuecommand   = srp_queuecommand,
+   .queuecommand   = srp_sq_queuecommand,
+   .mq_queuecommand= srp_mq_queuecommand,


Another choice is to obtain hctx from request directly, then mq can
reuse the .queuecommand interface too.


Hello Ming,

Is the hctx information already available in the request data structure 
? I have found a mq_ctx member but no hctx member. Did I perhaps 
overlook something ?


Thanks,

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/8] IB/srp: Add multichannel support

2014-09-19 Thread Ming Lei

On Fri, Sep 19, 2014 at 9:00 PM, Bart Van Assche  wrote:
> Improve performance by using multiple RDMA/RC channels per SCSI host
> for communicating with an SRP target.
>
> Signed-off-by: Bart Van Assche 
> ---
>  Documentation/ABI/stable/sysfs-driver-ib_srp |  25 +-
>  drivers/infiniband/ulp/srp/ib_srp.c  | 337 
> ---
>  drivers/infiniband/ulp/srp/ib_srp.h  |  20 +-
>  3 files changed, 287 insertions(+), 95 deletions(-)
>
> diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp 
> b/Documentation/ABI/stable/sysfs-driver-ib_srp
> index b9688de..d5a459e 100644
> --- a/Documentation/ABI/stable/sysfs-driver-ib_srp
> +++ b/Documentation/ABI/stable/sysfs-driver-ib_srp
> @@ -55,12 +55,12 @@ Description:Interface for making ib_srp connect 
> to a new target.
>   only safe with partial memory descriptor list support 
> enabled
>   (allow_ext_sg=1).
> * comp_vector, a number in the range 0..n-1 specifying the
> - MSI-X completion vector. Some HCA's allocate multiple (n)
> - MSI-X vectors per HCA port. If the IRQ affinity masks of
> - these interrupts have been configured such that each MSI-X
> - interrupt is handled by a different CPU then the comp_vector
> - parameter can be used to spread the SRP completion workload
> - over multiple CPU's.
> + MSI-X completion vector of the first RDMA channel. Some
> + HCA's allocate multiple (n) MSI-X vectors per HCA port. If
> + the IRQ affinity masks of these interrupts have been
> + configured such that each MSI-X interrupt is handled by a
> + different CPU then the comp_vector parameter can be used to
> + spread the SRP completion workload over multiple CPU's.
> * tl_retry_count, a number in the range 2..7 specifying the
>   IB RC retry count.
> * queue_size, the maximum number of commands that the
> @@ -88,6 +88,13 @@ Description: Whether ib_srp is allowed to include a 
> partial memory
> descriptor list in an SRP_CMD when communicating with an SRP
> target.
>
> +What:  /sys/class/scsi_host/host/ch_count
> +Date:  November 1, 2014
> +KernelVersion: 3.18
> +Contact:   linux-r...@vger.kernel.org
> +Description:   Number of RDMA channels used for communication with the SRP
> +   target.
> +
>  What:  /sys/class/scsi_host/host/cmd_sg_entries
>  Date:  May 19, 2011
>  KernelVersion: 2.6.39
> @@ -95,6 +102,12 @@ Contact:linux-r...@vger.kernel.org
>  Description:   Maximum number of data buffer descriptors that may be sent to
> the target in a single SRP_CMD request.
>
> +What:  /sys/class/scsi_host/host/comp_vector
> +Date:  September 2, 2013
> +KernelVersion: 3.11
> +Contact:   linux-r...@vger.kernel.org
> +Description:   Completion vector used for the first RDMA channel.
> +
>  What:  /sys/class/scsi_host/host/dgid
>  Date:  June 17, 2006
>  KernelVersion: 2.6.17
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
> b/drivers/infiniband/ulp/srp/ib_srp.c
> index 9feeea1..58ca618 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -123,6 +123,16 @@ MODULE_PARM_DESC(dev_loss_tmo,
>  " if fast_io_fail_tmo has not been set. \"off\" means that"
>  " this functionality is disabled.");
>
> +static unsigned ch_count;
> +module_param(ch_count, uint, 0444);
> +MODULE_PARM_DESC(ch_count,
> +"Number of RDMA channels to use for communication with an 
> SRP"
> +" target. Using more than one channel improves performance"
> +" if the HCA supports multiple completion vectors. The"
> +" default value is the minimum of four times the number of"
> +" online CPU sockets and the number of completion vectors"
> +" supported by the HCA.");
> +
>  static void srp_add_one(struct ib_device *device);
>  static void srp_remove_one(struct ib_device *device);
>  static void srp_recv_completion(struct ib_cq *cq, void *ch_ptr);
> @@ -556,17 +566,32 @@ err:
>   * Note: this function may be called without srp_alloc_iu_bufs() having been
>   * invoked. Hence the ch->[rt]x_ring checks.
>   */
> -static void srp_free_ch_ib(struct srp_rdma_ch *ch)
> +static void srp_free_ch_ib(struct srp_target_port *target,
> +  struct srp_rdma_ch *ch)
>  {
> -   struct srp_target_port *target = ch->target;
> struct srp_device *dev = target->srp_host->srp_dev;
> int i;
>
> +   if (!ch->target)
> +   return;
> +
> +   /*
> +* Avoid that the SCSI error handler tries to use this channel after
> +* it has been freed. Th

Re: [PATCH] scsi-mq: fix hw queue hang caused by timeout

2014-09-19 Thread Jens Axboe

On 09/19/2014 08:18 AM, Ming Lei wrote:
> On Fri, Sep 19, 2014 at 9:07 PM, Ming Lei  wrote:
>> On Fri, Sep 19, 2014 at 1:03 AM, Jens Axboe  wrote:
>>> On 2014-09-18 10:35, Christoph Hellwig wrote:

 On Thu, Sep 18, 2014 at 11:59:10PM +0800, Ming Lei wrote:
>
> If there are two requests or more timed out, the dispatch queue
> is put into stopped state and never be recoverd, and there
> is no such problem in non-mq mode.
>
> This patch trys to recover the stopped queue when the queue
> becomes unbusy, then the following retries can move on.
>
> Basically this patch maintains same behavior for this situation
> with non-mq mode.


 This looks somewhat similar to the issues that Doug reported, and I
 remember
 when he was last running into boot problems it was timeout related, too.

 As far as the implementation is concerned I think the correct fix is
 to clear the BLK_MQ_S_STOPPED queue flags in blk_mq_kick_requeue_list.
>>>
>>>
>>> Since that's the kick part of the requeue, auto-starting the queue for that
>>> makes a lot of sense. I say that's the way we go.
>>
>> Yeah, that looks better.
>>
>> But it doesn't work after the simple change, and I need to
>> investigate further.
> 
> It is because of the timer miss, now it starts to work.

Excellent. I think most new issues should be fixed in for-linus for
inclusion in this round. It's much bigger than I hoped for this late in
the cycle, but lots of us have run a lot of testing, so that's not a
huge worry.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] scsi-mq: fix hw queue hang caused by timeout

2014-09-19 Thread Ming Lei

On Fri, Sep 19, 2014 at 9:07 PM, Ming Lei  wrote:
> On Fri, Sep 19, 2014 at 1:03 AM, Jens Axboe  wrote:
>> On 2014-09-18 10:35, Christoph Hellwig wrote:
>>>
>>> On Thu, Sep 18, 2014 at 11:59:10PM +0800, Ming Lei wrote:

 If there are two requests or more timed out, the dispatch queue
 is put into stopped state and never be recoverd, and there
 is no such problem in non-mq mode.

 This patch trys to recover the stopped queue when the queue
 becomes unbusy, then the following retries can move on.

 Basically this patch maintains same behavior for this situation
 with non-mq mode.
>>>
>>>
>>> This looks somewhat similar to the issues that Doug reported, and I
>>> remember
>>> when he was last running into boot problems it was timeout related, too.
>>>
>>> As far as the implementation is concerned I think the correct fix is
>>> to clear the BLK_MQ_S_STOPPED queue flags in blk_mq_kick_requeue_list.
>>
>>
>> Since that's the kick part of the requeue, auto-starting the queue for that
>> makes a lot of sense. I say that's the way we go.
>
> Yeah, that looks better.
>
> But it doesn't work after the simple change, and I need to
> investigate further.

It is because of the timer miss, now it starts to work.

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL] SCSI fixes for 3.17-rc5

2014-09-19 Thread James Bottomley

This is a set of three fixes.  One represents a nasty shared tag map
regression (another inverted condition) caused by recent SCSI MQ
patches, one is a longstanding potential buffer overrun in the iscsi
data buffer and the final one is a use after free for the rare
bidirectional commands.

The patch is available here:

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-fixes

The short changelog is:

Christoph Hellwig (1):
  fix regression that accidentally disabled block-based tcq

Daniel Gryniewicz (1):
  fix for bidi use after free

Mike Christie (1):
  libiscsi: fix potential buffer overrun in __iscsi_conn_send_pdu

And the diffstat:

 drivers/scsi/libiscsi.c | 10 ++
 drivers/scsi/scsi_lib.c |  5 +++--
 include/scsi/scsi_tcq.h |  2 +-
 3 files changed, 14 insertions(+), 3 deletions(-)

With full diff below.

James

---

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index ea025e4..191b597 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -717,11 +717,21 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct 
iscsi_hdr *hdr,
return NULL;
}
 
+   if (data_size > ISCSI_DEF_MAX_RECV_SEG_LEN) {
+   iscsi_conn_printk(KERN_ERR, conn, "Invalid buffer len 
of %u for login task. Max len is %u\n", data_size, ISCSI_DEF_MAX_RECV_SEG_LEN);
+   return NULL;
+   }
+
task = conn->login_task;
} else {
if (session->state != ISCSI_STATE_LOGGED_IN)
return NULL;
 
+   if (data_size != 0) {
+   iscsi_conn_printk(KERN_ERR, conn, "Can not send data 
buffer of len %u for op 0x%x\n", data_size, opcode);
+   return NULL;
+   }
+
BUG_ON(conn->c_stage == ISCSI_CONN_INITIAL_STAGE);
BUG_ON(conn->c_stage == ISCSI_CONN_STOPPED);
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index d837dc1..aaea4b9 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -733,12 +733,13 @@ static bool scsi_end_request(struct request *req, int 
error,
} else {
unsigned long flags;
 
+   if (bidi_bytes)
+   scsi_release_bidi_buffers(cmd);
+
spin_lock_irqsave(q->queue_lock, flags);
blk_finish_request(req, error);
spin_unlock_irqrestore(q->queue_lock, flags);
 
-   if (bidi_bytes)
-   scsi_release_bidi_buffers(cmd);
scsi_release_buffers(cmd);
scsi_next_command(cmd);
}
diff --git a/include/scsi/scsi_tcq.h b/include/scsi/scsi_tcq.h
index cdcc90b..e645835 100644
--- a/include/scsi/scsi_tcq.h
+++ b/include/scsi/scsi_tcq.h
@@ -68,7 +68,7 @@ static inline void scsi_activate_tcq(struct scsi_device 
*sdev, int depth)
return;
 
if (!shost_use_blk_mq(sdev->host) &&
-   blk_queue_tagged(sdev->request_queue))
+   !blk_queue_tagged(sdev->request_queue))
blk_queue_init_tags(sdev->request_queue, depth,
sdev->host->bqt);
 



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] scsi-mq: fix hw queue hang caused by timeout

2014-09-19 Thread Ming Lei

On Fri, Sep 19, 2014 at 1:03 AM, Jens Axboe  wrote:
> On 2014-09-18 10:35, Christoph Hellwig wrote:
>>
>> On Thu, Sep 18, 2014 at 11:59:10PM +0800, Ming Lei wrote:
>>>
>>> If there are two requests or more timed out, the dispatch queue
>>> is put into stopped state and never be recoverd, and there
>>> is no such problem in non-mq mode.
>>>
>>> This patch trys to recover the stopped queue when the queue
>>> becomes unbusy, then the following retries can move on.
>>>
>>> Basically this patch maintains same behavior for this situation
>>> with non-mq mode.
>>
>>
>> This looks somewhat similar to the issues that Doug reported, and I
>> remember
>> when he was last running into boot problems it was timeout related, too.
>>
>> As far as the implementation is concerned I think the correct fix is
>> to clear the BLK_MQ_S_STOPPED queue flags in blk_mq_kick_requeue_list.
>
>
> Since that's the kick part of the requeue, auto-starting the queue for that
> makes a lot of sense. I say that's the way we go.

Yeah, that looks better.

But it doesn't work after the simple change, and I need to
investigate further.

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 7/8] IB/srp: Separate target and channel variables

2014-09-19 Thread Bart Van Assche

Changes in this patch:
- Move channel variables into a new structure (struct srp_rdma_ch).
- cm_id and completion handler context pointer are now of type
  srp_rdma_ch * insteoad of srp_target_port *.

No functionality is changed.

Signed-off-by: Bart Van Assche 
---
 drivers/infiniband/ulp/srp/ib_srp.c | 707 +++-
 drivers/infiniband/ulp/srp/ib_srp.h |  59 +--
 2 files changed, 417 insertions(+), 349 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index fd88fb8..9feeea1 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -125,8 +125,8 @@ MODULE_PARM_DESC(dev_loss_tmo,
 
 static void srp_add_one(struct ib_device *device);
 static void srp_remove_one(struct ib_device *device);
-static void srp_recv_completion(struct ib_cq *cq, void *target_ptr);
-static void srp_send_completion(struct ib_cq *cq, void *target_ptr);
+static void srp_recv_completion(struct ib_cq *cq, void *ch_ptr);
+static void srp_send_completion(struct ib_cq *cq, void *ch_ptr);
 static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event);
 
 static struct scsi_transport_template *ib_srp_transport_template;
@@ -262,7 +262,7 @@ static int srp_init_qp(struct srp_target_port *target,
 
ret = ib_find_pkey(target->srp_host->srp_dev->dev,
   target->srp_host->port,
-  be16_to_cpu(target->path.pkey),
+  be16_to_cpu(target->pkey),
   &attr->pkey_index);
if (ret)
goto out;
@@ -283,18 +283,23 @@ out:
return ret;
 }
 
-static int srp_new_cm_id(struct srp_target_port *target)
+static int srp_new_cm_id(struct srp_rdma_ch *ch)
 {
+   struct srp_target_port *target = ch->target;
struct ib_cm_id *new_cm_id;
 
new_cm_id = ib_create_cm_id(target->srp_host->srp_dev->dev,
-   srp_cm_handler, target);
+   srp_cm_handler, ch);
if (IS_ERR(new_cm_id))
return PTR_ERR(new_cm_id);
 
-   if (target->cm_id)
-   ib_destroy_cm_id(target->cm_id);
-   target->cm_id = new_cm_id;
+   if (ch->cm_id)
+   ib_destroy_cm_id(ch->cm_id);
+   ch->cm_id = new_cm_id;
+   ch->path.sgid = target->sgid;
+   ch->path.dgid = target->orig_dgid;
+   ch->path.pkey = target->pkey;
+   ch->path.service_id = target->service_id;
 
return 0;
 }
@@ -443,8 +448,9 @@ static struct srp_fr_pool *srp_alloc_fr_pool(struct 
srp_target_port *target)
  dev->max_pages_per_mr);
 }
 
-static int srp_create_target_ib(struct srp_target_port *target)
+static int srp_create_ch_ib(struct srp_rdma_ch *ch)
 {
+   struct srp_target_port *target = ch->target;
struct srp_device *dev = target->srp_host->srp_dev;
struct ib_qp_init_attr *init_attr;
struct ib_cq *recv_cq, *send_cq;
@@ -458,15 +464,15 @@ static int srp_create_target_ib(struct srp_target_port 
*target)
if (!init_attr)
return -ENOMEM;
 
-   recv_cq = ib_create_cq(dev->dev, srp_recv_completion, NULL, target,
-  target->queue_size, target->comp_vector);
+   recv_cq = ib_create_cq(dev->dev, srp_recv_completion, NULL, ch,
+  target->queue_size, ch->comp_vector);
if (IS_ERR(recv_cq)) {
ret = PTR_ERR(recv_cq);
goto err;
}
 
-   send_cq = ib_create_cq(dev->dev, srp_send_completion, NULL, target,
-  m * target->queue_size, target->comp_vector);
+   send_cq = ib_create_cq(dev->dev, srp_send_completion, NULL, ch,
+  m * target->queue_size, ch->comp_vector);
if (IS_ERR(send_cq)) {
ret = PTR_ERR(send_cq);
goto err_recv_cq;
@@ -502,9 +508,9 @@ static int srp_create_target_ib(struct srp_target_port 
*target)
 "FR pool allocation failed (%d)\n", ret);
goto err_qp;
}
-   if (target->fr_pool)
-   srp_destroy_fr_pool(target->fr_pool);
-   target->fr_pool = fr_pool;
+   if (ch->fr_pool)
+   srp_destroy_fr_pool(ch->fr_pool);
+   ch->fr_pool = fr_pool;
} else if (!dev->use_fast_reg && dev->has_fmr) {
fmr_pool = srp_alloc_fmr_pool(target);
if (IS_ERR(fmr_pool)) {
@@ -513,21 +519,21 @@ static int srp_create_target_ib(struct srp_target_port 
*target)
 "FMR pool allocation failed (%d)\n", ret);
goto err_qp;
}
-   if (target->fmr_pool)
-   ib_destroy_fmr_pool(target->fmr_pool);
-   target->fmr_pool = fmr_pool;
+   if (ch

[PATCH 8/8] IB/srp: Add multichannel support

2014-09-19 Thread Bart Van Assche

Improve performance by using multiple RDMA/RC channels per SCSI host
for communicating with an SRP target.

Signed-off-by: Bart Van Assche 
---
 Documentation/ABI/stable/sysfs-driver-ib_srp |  25 +-
 drivers/infiniband/ulp/srp/ib_srp.c  | 337 ---
 drivers/infiniband/ulp/srp/ib_srp.h  |  20 +-
 3 files changed, 287 insertions(+), 95 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp 
b/Documentation/ABI/stable/sysfs-driver-ib_srp
index b9688de..d5a459e 100644
--- a/Documentation/ABI/stable/sysfs-driver-ib_srp
+++ b/Documentation/ABI/stable/sysfs-driver-ib_srp
@@ -55,12 +55,12 @@ Description:Interface for making ib_srp connect to 
a new target.
  only safe with partial memory descriptor list support enabled
  (allow_ext_sg=1).
* comp_vector, a number in the range 0..n-1 specifying the
- MSI-X completion vector. Some HCA's allocate multiple (n)
- MSI-X vectors per HCA port. If the IRQ affinity masks of
- these interrupts have been configured such that each MSI-X
- interrupt is handled by a different CPU then the comp_vector
- parameter can be used to spread the SRP completion workload
- over multiple CPU's.
+ MSI-X completion vector of the first RDMA channel. Some
+ HCA's allocate multiple (n) MSI-X vectors per HCA port. If
+ the IRQ affinity masks of these interrupts have been
+ configured such that each MSI-X interrupt is handled by a
+ different CPU then the comp_vector parameter can be used to
+ spread the SRP completion workload over multiple CPU's.
* tl_retry_count, a number in the range 2..7 specifying the
  IB RC retry count.
* queue_size, the maximum number of commands that the
@@ -88,6 +88,13 @@ Description: Whether ib_srp is allowed to include a partial 
memory
descriptor list in an SRP_CMD when communicating with an SRP
target.
 
+What:  /sys/class/scsi_host/host/ch_count
+Date:  November 1, 2014
+KernelVersion: 3.18
+Contact:   linux-r...@vger.kernel.org
+Description:   Number of RDMA channels used for communication with the SRP
+   target.
+
 What:  /sys/class/scsi_host/host/cmd_sg_entries
 Date:  May 19, 2011
 KernelVersion: 2.6.39
@@ -95,6 +102,12 @@ Contact:linux-r...@vger.kernel.org
 Description:   Maximum number of data buffer descriptors that may be sent to
the target in a single SRP_CMD request.
 
+What:  /sys/class/scsi_host/host/comp_vector
+Date:  September 2, 2013
+KernelVersion: 3.11
+Contact:   linux-r...@vger.kernel.org
+Description:   Completion vector used for the first RDMA channel.
+
 What:  /sys/class/scsi_host/host/dgid
 Date:  June 17, 2006
 KernelVersion: 2.6.17
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 9feeea1..58ca618 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -123,6 +123,16 @@ MODULE_PARM_DESC(dev_loss_tmo,
 " if fast_io_fail_tmo has not been set. \"off\" means that"
 " this functionality is disabled.");
 
+static unsigned ch_count;
+module_param(ch_count, uint, 0444);
+MODULE_PARM_DESC(ch_count,
+"Number of RDMA channels to use for communication with an SRP"
+" target. Using more than one channel improves performance"
+" if the HCA supports multiple completion vectors. The"
+" default value is the minimum of four times the number of"
+" online CPU sockets and the number of completion vectors"
+" supported by the HCA.");
+
 static void srp_add_one(struct ib_device *device);
 static void srp_remove_one(struct ib_device *device);
 static void srp_recv_completion(struct ib_cq *cq, void *ch_ptr);
@@ -556,17 +566,32 @@ err:
  * Note: this function may be called without srp_alloc_iu_bufs() having been
  * invoked. Hence the ch->[rt]x_ring checks.
  */
-static void srp_free_ch_ib(struct srp_rdma_ch *ch)
+static void srp_free_ch_ib(struct srp_target_port *target,
+  struct srp_rdma_ch *ch)
 {
-   struct srp_target_port *target = ch->target;
struct srp_device *dev = target->srp_host->srp_dev;
int i;
 
+   if (!ch->target)
+   return;
+
+   /*
+* Avoid that the SCSI error handler tries to use this channel after
+* it has been freed. The SCSI error handler can namely continue
+* trying to perform recovery actions after scsi_remove_host()
+* returned.
+*/
+   ch->target = NULL;
+
if (ch->cm_id) {
ib_destroy_cm_id(ch->cm_id);
ch

[PATCH 6/8] IB/srp: Avoid that I/O hangs due to a cable pull during LUN scanning

2014-09-19 Thread Bart Van Assche

If a cable is pulled during LUN scanning it can happen that the
SRP rport and the SCSI host have been created but no LUNs have been
added to the SCSI host. Since multipathd only sends SCSI commands
to a SCSI target if one or more SCSI devices are present and since
there is no keepalive mechanism for IB queue pairs this means that
after a LUN scan failed and after a reconnect has succeeded no
data will be sent over the QP and hence that a subsequent cable
pull will not be detected. Avoid this by not creating an rport or
SCSI host if a cable is pulled during a SCSI LUN scan.

Note: so far the above behavior has only been observed with the
kernel module parameter ch_count set to a value >= 2.

Signed-off-by: Bart Van Assche 
---
 drivers/infiniband/ulp/srp/ib_srp.c | 56 +++--
 drivers/infiniband/ulp/srp/ib_srp.h |  1 +
 2 files changed, 48 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 9608e7a..fd88fb8 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -,6 +,10 @@ static int srp_rport_reconnect(struct srp_rport *rport)
int i, ret;
 
srp_disconnect_target(target);
+
+   if (target->state == SRP_TARGET_SCANNING)
+   return -ENODEV;
+
/*
 * Now get a new local CM ID so that we avoid confusing the target in
 * case things are really fouled up. Doing so also ensures that all CM
@@ -2607,11 +2611,23 @@ static struct scsi_host_template srp_template = {
.shost_attrs= srp_host_attrs
 };
 
+static int srp_sdev_count(struct Scsi_Host *host)
+{
+   struct scsi_device *sdev;
+   int c = 0;
+
+   shost_for_each_device(sdev, host)
+   c++;
+
+   return c;
+}
+
 static int srp_add_target(struct srp_host *host, struct srp_target_port 
*target)
 {
struct srp_rport_identifiers ids;
struct srp_rport *rport;
 
+   target->state = SRP_TARGET_SCANNING;
sprintf(target->target_name, "SRP.T10:%016llX",
 (unsigned long long) be64_to_cpu(target->id_ext));
 
@@ -2634,11 +2650,26 @@ static int srp_add_target(struct srp_host *host, struct 
srp_target_port *target)
list_add_tail(&target->list, &host->target_list);
spin_unlock(&host->target_lock);
 
-   target->state = SRP_TARGET_LIVE;
-
scsi_scan_target(&target->scsi_host->shost_gendev,
 0, target->scsi_id, SCAN_WILD_CARD, 0);
 
+   if (!target->connected || target->qp_in_error) {
+   shost_printk(KERN_INFO, target->scsi_host,
+PFX "SCSI scan failed - removing SCSI host\n");
+   srp_queue_remove_work(target);
+   goto out;
+   }
+
+   pr_debug(PFX "%s: SCSI scan succeeded - detected %d LUNs\n",
+dev_name(&target->scsi_host->shost_gendev),
+srp_sdev_count(target->scsi_host));
+
+   spin_lock_irq(&target->lock);
+   if (target->state == SRP_TARGET_SCANNING)
+   target->state = SRP_TARGET_LIVE;
+   spin_unlock_irq(&target->lock);
+
+out:
return 0;
 }
 
@@ -3044,13 +3075,20 @@ static ssize_t srp_create_target(struct device *dev,
if (ret)
goto err_disconnect;
 
-   shost_printk(KERN_DEBUG, target->scsi_host, PFX
-"new target: id_ext %016llx ioc_guid %016llx pkey %04x 
service_id %016llx sgid %pI6 dgid %pI6\n",
-be64_to_cpu(target->id_ext),
-be64_to_cpu(target->ioc_guid),
-be16_to_cpu(target->path.pkey),
-be64_to_cpu(target->service_id),
-target->path.sgid.raw, target->path.dgid.raw);
+   /* Protects against concurrent srp_remove_target() invocation. */
+   scsi_host_get(target->scsi_host);
+
+   if (target->state != SRP_TARGET_REMOVED) {
+   shost_printk(KERN_DEBUG, target->scsi_host, PFX
+"new target: id_ext %016llx ioc_guid %016llx pkey 
%04x service_id %016llx sgid %pI6 dgid %pI6\n",
+be64_to_cpu(target->id_ext),
+be64_to_cpu(target->ioc_guid),
+be16_to_cpu(target->path.pkey),
+be64_to_cpu(target->service_id),
+target->path.sgid.raw, target->orig_dgid);
+   }
+
+   scsi_host_put(target->scsi_host);
 
ret = count;
 
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h 
b/drivers/infiniband/ulp/srp/ib_srp.h
index e46ecb1..00c7c48 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -73,6 +73,7 @@ enum {
 };
 
 enum srp_target_state {
+   SRP_TARGET_SCANNING,
SRP_TARGET_LIVE,
SRP_TARGET_REMOVED,
 };
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the

[PATCH 5/8] IB/srp: Remove stale connection retry mechanism

2014-09-19 Thread Bart Van Assche

Attempting to connect three times may be insufficient after an
initiator system that was using multiple RDMA channels tries to
relogin. Additionally, this login retry mechanism is a workaround
for particular behavior of the IB/CM. Since the srp_daemon retries
a failed login attempt anyway, remove the stale connection retry
mechanism.

Signed-off-by: Bart Van Assche 
---
 drivers/infiniband/ulp/srp/ib_srp.c | 16 +++-
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index d3c712f..9608e7a 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -904,7 +904,6 @@ static void srp_rport_delete(struct srp_rport *rport)
 
 static int srp_connect_target(struct srp_target_port *target)
 {
-   int retries = 3;
int ret;
 
WARN_ON_ONCE(target->connected);
@@ -945,19 +944,10 @@ static int srp_connect_target(struct srp_target_port 
*target)
break;
 
case SRP_STALE_CONN:
-   /* Our current CM id was stale, and is now in timewait.
-* Try to reconnect with a new one.
-*/
-   if (!retries-- || srp_new_cm_id(target)) {
-   shost_printk(KERN_ERR, target->scsi_host, PFX
-"giving up on stale connection\n");
-   target->status = -ECONNRESET;
-   return target->status;
-   }
-
shost_printk(KERN_ERR, target->scsi_host, PFX
-"retrying stale connection\n");
-   break;
+"giving up on stale connection\n");
+   target->status = -ECONNRESET;
+   return target->status;
 
default:
return target->status;
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/8] IB/srp: Move ib_destroy_cm_id() call into srp_free_ch_ib()

2014-09-19 Thread Bart Van Assche

The patch that adds multichannel support into the SRP initiator
driver introduces an additional call to srp_free_ch_ib(). This
patch helps to keep that later patch simple.

Signed-off-by: Bart Van Assche 
---
 drivers/infiniband/ulp/srp/ib_srp.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 62d2a18..d3c712f 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -555,6 +555,11 @@ static void srp_free_target_ib(struct srp_target_port 
*target)
struct srp_device *dev = target->srp_host->srp_dev;
int i;
 
+   if (target->cm_id) {
+   ib_destroy_cm_id(target->cm_id);
+   target->cm_id = NULL;
+   }
+
if (dev->use_fast_reg) {
if (target->fr_pool)
srp_destroy_fr_pool(target->fr_pool);
@@ -868,7 +873,6 @@ static void srp_remove_target(struct srp_target_port 
*target)
scsi_remove_host(target->scsi_host);
srp_stop_rport_timers(target->rport);
srp_disconnect_target(target);
-   ib_destroy_cm_id(target->cm_id);
srp_free_target_ib(target);
cancel_work_sync(&target->tl_err_work);
srp_rport_put(target->rport);
@@ -3043,7 +3047,7 @@ static ssize_t srp_create_target(struct device *dev,
if (ret) {
shost_printk(KERN_ERR, target->scsi_host,
 PFX "Connection failed\n");
-   goto err_cm_id;
+   goto err_free_ib;
}
 
ret = srp_add_target(host, target);
@@ -3067,9 +3071,6 @@ out:
 err_disconnect:
srp_disconnect_target(target);
 
-err_cm_id:
-   ib_destroy_cm_id(target->cm_id);
-
 err_free_ib:
srp_free_target_ib(target);
 
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/8] scsi-mq: Pass hctx to low-level SCSI drivers

2014-09-19 Thread Bart Van Assche

Low-level drivers (LLDs) need to know which hardware context has been
selected by the block layer. Hence pass this information to SCSI LLDs.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
---
 drivers/scsi/scsi.c  |  7 +--
 drivers/scsi/scsi_lib.c  |  4 ++--
 drivers/scsi/scsi_priv.h |  2 +-
 include/scsi/scsi_host.h | 14 ++
 4 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index d81f3cc..9bd4bd0 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -640,9 +640,10 @@ EXPORT_SYMBOL(scsi_cmd_get_serial);
  * Return: nonzero return request was rejected and device's queue needs to be
  * plugged.
  */
-int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
+int scsi_dispatch_cmd(struct blk_mq_hw_ctx *hctx, struct scsi_cmnd *cmd)
 {
struct Scsi_Host *host = cmd->device->host;
+   struct scsi_host_template *hostt = host->hostt;
int rtn = 0;
 
atomic_inc(&cmd->device->iorequest_cnt);
@@ -701,7 +702,9 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
}
 
trace_scsi_dispatch_cmd_start(cmd);
-   rtn = host->hostt->queuecommand(host, cmd);
+   rtn = hctx && hostt->mq_queuecommand ?
+   hostt->mq_queuecommand(hctx, cmd) :
+   hostt->queuecommand(host, cmd);
if (rtn) {
trace_scsi_dispatch_cmd_error(cmd, rtn);
if (rtn != SCSI_MLQUEUE_DEVICE_BUSY &&
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b0b6117..6303648 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1747,7 +1747,7 @@ static void scsi_request_fn(struct request_queue *q)
 * Dispatch the command to the low-level driver.
 */
cmd->scsi_done = scsi_done;
-   rtn = scsi_dispatch_cmd(cmd);
+   rtn = scsi_dispatch_cmd(NULL, cmd);
if (rtn) {
scsi_queue_insert(cmd, rtn);
spin_lock_irq(q->queue_lock);
@@ -1889,7 +1889,7 @@ static int scsi_queue_rq(struct blk_mq_hw_ctx *hctx, 
struct request *req)
scsi_init_cmd_errh(cmd);
cmd->scsi_done = scsi_mq_done;
 
-   reason = scsi_dispatch_cmd(cmd);
+   reason = scsi_dispatch_cmd(hctx, cmd);
if (reason) {
scsi_set_blocked(cmd, reason);
ret = BLK_MQ_RQ_QUEUE_BUSY;
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 12b8e1b..f4115f6 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -29,7 +29,7 @@ extern int scsi_init_hosts(void);
 extern void scsi_exit_hosts(void);
 
 /* scsi.c */
-extern int scsi_dispatch_cmd(struct scsi_cmnd *cmd);
+extern int scsi_dispatch_cmd(struct blk_mq_hw_ctx *hctx, struct scsi_cmnd 
*cmd);
 extern int scsi_setup_command_freelist(struct Scsi_Host *shost);
 extern void scsi_destroy_command_freelist(struct Scsi_Host *shost);
 #ifdef CONFIG_SCSI_LOGGING
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 0a867d9..c695f1c 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -133,6 +133,20 @@ struct scsi_host_template {
int (* queuecommand)(struct Scsi_Host *, struct scsi_cmnd *);
 
/*
+* scsi-mq version of queuecommand(). Must be provided by LLDDs that
+* provide multiple hardware queues and that need to know on which
+* hardware context a command has been queued by the block layer.
+*
+* Note: even if an LLDD provides an mq_queuecommand callback function
+* it still has to provide a queuecommand callback function. The SCSI
+* error handler namely can invoke the queuecommand callback function
+* even if scsi-mq is enabled.
+*
+* STATUS: OPTIONAL
+*/
+   int (* mq_queuecommand)(struct blk_mq_hw_ctx *hctx, struct scsi_cmnd *);
+
+   /*
 * This is an error handling strategy routine.  You don't need to
 * define one of these if you don't want to - there is a default
 * routine that is present that should work in most cases.  For those
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/8] scsi-mq: Add support for multiple hardware queues

2014-09-19 Thread Bart Van Assche

Allow a SCSI LLD to declare how many hardware queues it supports
by setting Scsi_Host.nr_hw_queues before calling scsi_add_host().

Note: it is assumed that each hardware queue has a queue depth of
shost->can_queue. In other words, the total queue depth per host
is (number of hardware queues) * (shost->can_queue).

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
---
 drivers/scsi/scsi_lib.c  | 2 +-
 include/scsi/scsi_host.h | 4 
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index d837dc1..b0b6117 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2071,7 +2071,7 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
 
memset(&shost->tag_set, 0, sizeof(shost->tag_set));
shost->tag_set.ops = &scsi_mq_ops;
-   shost->tag_set.nr_hw_queues = 1;
+   shost->tag_set.nr_hw_queues = shost->nr_hw_queues ? : 1;
shost->tag_set.queue_depth = shost->can_queue;
shost->tag_set.cmd_size = cmd_size;
shost->tag_set.numa_node = NUMA_NO_NODE;
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index ba20347..0a867d9 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -638,6 +638,10 @@ struct Scsi_Host {
short unsigned int sg_prot_tablesize;
unsigned int max_sectors;
unsigned long dma_boundary;
+   /*
+* In scsi-mq mode, the number of hardware queues supported by the LLD.
+*/
+   unsigned nr_hw_queues;
/* 
 * Used to assign serial numbers to the cmds.
 * Protected by the host lock.
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 0/8] IB/srp: Add multichannel support

2014-09-19 Thread Bart Van Assche

[PATCH 1/8] blk-mq: Use all available hardware queues

Suppose that a system has two CPU sockets, three cores per socket,
that it does not support hyperthreading and that four hardware
queues are provided by a block driver. With the current algorithm
this will lead to the following assignment of CPU cores to hardware
queues:

  HWQ 0: 0 1
  HWQ 1: 2 3
  HWQ 2: 4 5
  HWQ 3: (none)

This patch changes the queue assignment into:

  HWQ 0: 0 1
  HWQ 1: 2
  HWQ 2: 3 4
  HWQ 3: 5

In other words, this patch has the following three effects:
- All four hardware queues are used instead of only three.
- CPU cores are spread more evenly over hardware queues. For the
  above example the range of the number of CPU cores associated
  with a single HWQ is reduced from [0..2] to [1..2].
- If the number of HWQ's is a multiple of the number of CPU sockets
  it is now guaranteed that all CPU cores associated with a single
  HWQ reside on the same CPU socket.

Signed-off-by: Bart Van Assche 
Cc: Jens Axboe 
Cc: Christoph Hellwig 
Cc: Ming Lei 
---
 block/blk-mq-cpumap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 1065d7c..8e56455 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -17,7 +17,7 @@
 static int cpu_to_queue_index(unsigned int nr_cpus, unsigned int nr_queues,
  const int cpu)
 {
-   return cpu / ((nr_cpus + nr_queues - 1) / nr_queues);
+   return cpu * nr_queues / nr_cpus;
 }
 
 static int get_first_sibling(unsigned int cpu)
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RFC 0/8] IB/srp: Add multichannel support

2014-09-19 Thread Bart Van Assche


Hello,

Although the SRP protocol supports multichannel operation, although 
since considerable time RDMA HCA's are available that support multiple 
completion vectors and although multichannel operation yields better 
performance than using a single channel, the Linux SRP initiator does 
not yet support multichannel operation. While adding multichannel 
support in the SRP initiator I encountered a few challenges of which I 
think these need wider discussion. The topics I would like invite wider 
discussion about are as follows:

- How to avoid unneeded inter-socket cache traffic. Should the blk-mq
  layer e.g. assign CPU cores to hardware queues such that all CPU cores
  associated with a single hardware queue reside on the same CPU socket?
  (see also patch 1/8)
- How to pass the hardware context selected by the block layer to the
  SCSI LLD queuecommand() callback function ? (see also patches 2/8 and
  3/8).
- Which approach should a SCSI LLD follow for selection of an MSI-X
  completion vector to ensure that the interrupt handler is invoked on
  the same CPU socket as the blk-mq hardware context data structures ?
  As one can see patch 8/8 relies on the assumption that completion
  vectors have been spread evenly over CPU sockets. If a HCA e.g.
  supports eight completion vectors then that means that in a system
  with two CPU sockets vectors 0-3 are associated with a CPU core on
  the first CPU socket and vectors 4-7 with a CPU core on the second CPU
  socket.

The patches in this series are:
0001-blk-mq-Use-all-available-hardware-queues.patch
0002-scsi-mq-Add-support-for-multiple-hardware-queues.patch
0003-scsi-mq-Pass-hctx-to-low-level-SCSI-drivers.patch
0004-IB-srp-Move-ib_destroy_cm_id-call-into-srp_free_ch_i.patch
0005-IB-srp-Remove-stale-connection-retry-mechanism.patch
0006-IB-srp-Avoid-that-I-O-hangs-due-to-a-cable-pull-duri.patch
0007-IB-srp-Separate-target-and-channel-variables.patch
0008-IB-srp-Add-multichannel-support.patch

Note: a predecessor of this patch series has been used to measure the 
performance of Christoph's scsi-mq patches that have been merged in 
kernel version v3.17-rc1.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 20/22] scsi: align logging messages

2014-09-19 Thread Hannes Reinecke


On 09/19/2014 01:35 PM, Christoph Hellwig wrote:

On Fri, Sep 19, 2014 at 08:26:29AM +0200, Hannes Reinecke wrote:

I would rather use 'scmd->tag', as this should be a straight copy
of scmd->request->tag, but we wouldn't need to reference the request
when doing so.


Yes, for now scmd->tag is correct as drivers might not be using the
block level tagging.  Once we get rid of the legacy request path I
plan to entirely remove scmd->tag, though.


But yeah, I was planning on doing the same, only I'd rather include
it right a the start like

sd 2:0:0:3 [sdv] tag#1: abort scheduled


Sounds fine.


and print out a mapping when I/O is submitted

sd 2:0:0:3 [sdv] tag#1: scmd 0x880420891470


I don't really see a need for that.


That's mainly for debugging; it gives you a nice starting point
for debugging crash dumps.
Otherwise it's really hard to get to the actual commands.

Unless you have a better idea ...

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

boot stall regression due to blk-mq: use percpu_ref for mq usage count

2014-09-19 Thread Christoph Hellwig

Hi Jens, hi Tejun,

I've seen multi-second boot stalls in one of my KVM setups during
the initial scsi scan:

[0.949892] scsi host0: Virtio SCSI HBA
[1.007864] scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK1.1. 
PQ: 0 ANSI: 5
[1.021299] scsi 0:0:1:0: Direct-Access QEMU QEMU HARDDISK1.1. 
PQ: 0 ANSI: 5
[1.520356] tsc: Refined TSC clocksource calibration: 2491.910 MHz



[   16.186549] sd 0:0:0:0: Attached scsi generic sg0 type 0
[   16.190478] sd 0:0:1:0: Attached scsi generic sg1 type 0
[   16.194099] osd: LOADED open-osd 0.2.1
[   16.203202] sd 0:0:0:0: [sda] 31457280 512-byte logical blocks: (16.1 
GB/15.0 GiB)
[   16.208478] sd 0:0:0:0: [sda] Write Protect is off
[   16.211439] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA
[   16.218771] sd 0:0:1:0: [sdb] 31457280 512-byte logical blocks: (16.1 
GB/15.0 GiB)
[   16.223264] sd 0:0:1:0: [sdb] Write Protect is off
[   16.225682] sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA



I've tracked this down to "blk-mq: use percpu_ref for mq usage count" in
a rather painful way as that one introduced enough other regressions
to mess up bisect.

If I revert the following commits:

dd840087086f3b93ac20f7472b4fca59aff7b79f
cddd5d17642cc6881352732693c2ae6930e9ce65
add703fda981b9719d37f371498b9f129acbd997

which are the above mentioned commit and two fixes to it the problem goes
away.

My qemu command line is below:

kvm \
-m 2048 \
-smp 1 \
-kernel arch/x86/boot/bzImage \
-append "root=/dev/vda console=tty0 console=ttyS0,115200n8 
scsi_mod.use_blk_mq=Y" \
-nographic \
-drive 
if=virtio,file=/work/images/debian.qcow2,cache=none,serial="test1234" \
-drive if=none,id=test,file=/work/images/test.img,cache=none,aio=native 
\
-drive 
if=none,id=scratch,file=/work/images/scratch.img,cache=none,aio=native \
-device virtio-scsi-pci,id=scsi \
-device scsi-hd,drive=test \
-device scsi-hd,drive=scratch \
-drive 
file=/work/images/debian-7.3.0-amd64-netinst.iso,index=2,media=cdrom
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 20/22] scsi: align logging messages

2014-09-19 Thread Christoph Hellwig

On Fri, Sep 19, 2014 at 08:26:29AM +0200, Hannes Reinecke wrote:
> I would rather use 'scmd->tag', as this should be a straight copy
> of scmd->request->tag, but we wouldn't need to reference the request
> when doing so.

Yes, for now scmd->tag is correct as drivers might not be using the
block level tagging.  Once we get rid of the legacy request path I
plan to entirely remove scmd->tag, though.

> But yeah, I was planning on doing the same, only I'd rather include
> it right a the start like
> 
> sd 2:0:0:3 [sdv] tag#1: abort scheduled

Sounds fine.

> and print out a mapping when I/O is submitted
> 
> sd 2:0:0:3 [sdv] tag#1: scmd 0x880420891470

I don't really see a need for that.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

40 matches

Mail list logo