Re: [PATCH v3 13/13] cxgbit: add files for cxgbit.ko

2016-07-12 Thread Or Gerlitz
On Fri, Jul 8, 2016 at 9:36 PM, Steve Wise  wrote:
>> > > In first series libcxgb.ko will have common
>> > > iSCSI DDP Page Pod Manager that will be shared
>> > > by three Chelsio iSCSI drivers
>> > > cxgb3i, cxgb4i, cxgbit.
>> >
>> > cool
>> >
>> > > In subsequent series I will add common connection
>> > > management and other hardware specific common code
>> > > in this module.
>> >
>> > any chance to get that ready for 4.8 too?
>>
>> I am working on common connection management, it is a
>> big change as it involves multiple modules, so it is
>> difficult to get it ready for 4.8.
>
> Let's shoot for 4.9 for the common CM.  Nice work on the PPOD unification!


Taking into account that this was a reviewer comment that basically
should have been addressed for
the initial merge into 4.7 and that there's more time for 4.8 work
[1], could you try and get that for 4.8?

[1] https://lwn.net/Articles/694055/
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 13/13] cxgbit: add files for cxgbit.ko

2016-07-07 Thread Or Gerlitz
On Wed, Jul 6, 2016 at 8:17 PM, Varun Prakash  wrote:
> On Wed, Jul 06, 2016 at 12:24:43AM +0300, Or Gerlitz wrote:
>> On Thu, May 26, 2016 at 9:58 PM, Varun Prakash  wrote:
>>
>> > Hi Or, Nicholas and Steve
>> > Thank you for the feedback and apologies for the delay in my response.
>>
>> > I agree that we can refactor initiator, target and iwarp drivers to
>> > reduce code duplication as Steve has mentioned. We will work on this for
>> > 4.8 merge window.
>>
>> Nic, Varun, was anything done on this matter? we're on rc6
>
> Yes, I am working on this, I will post first series
> within two days, it will add common library module libcxgb.ko.

thanks for the update, the plan was to get that info 4.8-rc1,
and as we're after rc6 there's not much time left for submitting code
for 4.8, so doing that sooner rather than later would be good here.

> In first series libcxgb.ko will have common
> iSCSI DDP Page Pod Manager that will be shared
> by three Chelsio iSCSI drivers
> cxgb3i, cxgb4i, cxgbit.

cool

> In subsequent series I will add common connection
> management and other hardware specific common code
> in this module.

any chance to get that ready for 4.8 too?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 13/13] cxgbit: add files for cxgbit.ko

2016-07-05 Thread Or Gerlitz
On Thu, May 26, 2016 at 9:58 PM, Varun Prakash  wrote:

> Hi Or, Nicholas and Steve
> Thank you for the feedback and apologies for the delay in my response.

> On Wed, May 25, 2016 at 09:55:04PM -0700, Nicholas A. Bellinger wrote:

>> Varun + Co have made common improvements between existing software
>> iscsi-target RX + TX PDU handling and their new driver, and no further
>> comments for these prerequisites have been made.

>> The additional improvements discussed here so far are cxgb* specific,
>> and will not effect other drivers, and will not change configfs user ABI
>> layout compatibility within /sys/kernel/config/target/iscsi/.

>> That being the case, I think it's a reasonable starting point for
>> mainline users to consume target ISO on cxgb hardware, and for Chelsio
>> to make further common code improvements across their existing host
>> drivers.

> I agree that we can refactor initiator, target and iwarp drivers to
> reduce code duplication as Steve has mentioned. We will work on this for
> 4.8 merge window.

Nic, Varun, was anything done on this matter? we're on rc6

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 13/13] cxgbit: add files for cxgbit.ko

2016-05-25 Thread Or Gerlitz
On Wed, May 25, 2016 at 8:40 PM, Steve Wise  wrote:
>> From: Or Gerlitz [mailto:gerlitz...@gmail.com]
>>
>> On Tue, May 24, 2016 at 9:40 AM, Nicholas A. Bellinger
>>  wrote:
>> > Hi Or & Co,
>> > On Wed, 2016-05-18 at 14:45 +0300, Or Gerlitz wrote:
>> >> On Sat, Apr 30, 2016 at 6:54 PM, Or Gerlitz  wrote:
>> >> > On Tue, Apr 19, 2016 at 9:30 PM, Varun Prakash  
>> >> > wrote:
>> >> >> cxgbit.h - This file contains data structure
>> >> >> definitions for cxgbit.ko.
>> >> >>
>> >> >> cxgbit_lro.h - This file contains data structure
>> >> >> definitions for LRO support.
>> >> >>
>> >> >> cxgbit_main.c - This file contains code for
>> >> >> registering with iscsi target transport and
>> >> >> cxgb4 driver.
>> >> >>
>> >> >> cxgbit_cm.c - This file contains code for
>> >> >> connection management.
>> >> >>
>> >> >> cxgbit_target.c - This file contains code
>> >> >> for processing iSCSI PDU.
>> >> >>
>> >> >> cxgbit_ddp.c - This file contains code for
>> >> >> Direct Data Placement.
>> >> >
>> >> > Wait,
>> >> >
>> >> > You are adding many K's LOCs to handle things like CM (connection
>> >> > management), DDP and LRO. But your upstream solution must be using CM
>> >> > and DDP (and LRO as well) for the HW offloaded initiator side as well,
>> >> > not to mention the iWARP side of things.
>> >> >
>> >> > There must be some way to refactor things instead of repeating the
>> >> > same bits over and over, thoughts?
>>
>> >> The author haven't responded... where that this stands from your point of 
>> >> view?
>>
>> > For an initial merge, I don't have an objection to this series wrt
>> > drivers/target/iscsi/* improvements + prerequisites, and new standalone
>> > cxgbit iscsit_transport driver.
>> >
>> > That said, there are areas between cxgbi + cxgbit code that can be made
>> > common as you've pointed out.  The Cheliso folks have mentioned off-list
>> > that cxgbi as-is in mainline does not support LRO, and that the majority
>> > of DDP logic is shared between initiator + target.
>> >
>> > Are there specific pieces of logic in DDP or iWARP for cxgb* that you'd
>> > like to see Varun + Co pursue as common code in v4.8+..?
>>
>> Hi Nic,
>>
>> As I wrote above, I have good reasons to believe that there are few K
>> LOCs of duplication
>> between this series to the chelsio hw iscsi initiator or the chelsio
>> iwarp driver or both (triple).
>>
>> Ys, I'd like to see a public response from Varun and Co on this valid
>> reviewer comment
>> before you proceed with this series, makes sense? There's no point to
>> duplicate the same
>> code in the kernel again and again. **Even** if there's one
>> duplication now, we don't want
>> another one.
>>
>> Or.
>
> Hey Or and Nicholas,
>
> I've been staying out of this directly due to my workload.  But I will now
> take a look at where and how we can refactor to reduce the code duplication.
> Here are some thoughts with a quick glance:
>
> DDP is not used by iWARP.  DDP is for direct placement attempt on a TCP
> streaming mode connection.  RDMA has its own way of doing this that doesn’t
> utilize the same HW resources.  So I would say that the DDP usage could be
> refactored such that there is a set of helper functions to do the common
> logic, like managing page pod adapter resources.  An example would be
> cxgb4i.c:ddp_ppod_write_idata(), and cxgbit_ppod_write_idata() from patch
> 13.

sounds good

> Connection Management (CM) can definitely be refactored to share helper
> functions for setting up/tearing down TCP connections.  But the iWARP logic
> is very different from iscsi once the TCP connection is setup and the
> connection moved into RDMA mode.  So the sharing of logic would be in
> sending various CPL messages to the adapter to connect/accept/close/abort
> connections.

Ditto

> Having said that, the initiator and target wouldn't share a
> lot by virtue of the initiator only doing active side connection setup, and
> the target only doing passive side connection setup.

Well, I am not sure. It's like saying that if you break the IB/RoCE CM
to active side and p

Re: [PATCH v3 13/13] cxgbit: add files for cxgbit.ko

2016-05-25 Thread Or Gerlitz
On Tue, May 24, 2016 at 9:40 AM, Nicholas A. Bellinger
 wrote:
> Hi Or & Co,
> On Wed, 2016-05-18 at 14:45 +0300, Or Gerlitz wrote:
>> On Sat, Apr 30, 2016 at 6:54 PM, Or Gerlitz  wrote:
>> > On Tue, Apr 19, 2016 at 9:30 PM, Varun Prakash  wrote:
>> >> cxgbit.h - This file contains data structure
>> >> definitions for cxgbit.ko.
>> >>
>> >> cxgbit_lro.h - This file contains data structure
>> >> definitions for LRO support.
>> >>
>> >> cxgbit_main.c - This file contains code for
>> >> registering with iscsi target transport and
>> >> cxgb4 driver.
>> >>
>> >> cxgbit_cm.c - This file contains code for
>> >> connection management.
>> >>
>> >> cxgbit_target.c - This file contains code
>> >> for processing iSCSI PDU.
>> >>
>> >> cxgbit_ddp.c - This file contains code for
>> >> Direct Data Placement.
>> >
>> > Wait,
>> >
>> > You are adding many K's LOCs to handle things like CM (connection
>> > management), DDP and LRO. But your upstream solution must be using CM
>> > and DDP (and LRO as well) for the HW offloaded initiator side as well,
>> > not to mention the iWARP side of things.
>> >
>> > There must be some way to refactor things instead of repeating the
>> > same bits over and over, thoughts?

>> The author haven't responded... where that this stands from your point of 
>> view?

> For an initial merge, I don't have an objection to this series wrt
> drivers/target/iscsi/* improvements + prerequisites, and new standalone
> cxgbit iscsit_transport driver.
>
> That said, there are areas between cxgbi + cxgbit code that can be made
> common as you've pointed out.  The Cheliso folks have mentioned off-list
> that cxgbi as-is in mainline does not support LRO, and that the majority
> of DDP logic is shared between initiator + target.
>
> Are there specific pieces of logic in DDP or iWARP for cxgb* that you'd
> like to see Varun + Co pursue as common code in v4.8+..?

Hi Nic,

As I wrote above, I have good reasons to believe that there are few K
LOCs of duplication
between this series to the chelsio hw iscsi initiator or the chelsio
iwarp driver or both (triple).

Ys, I'd like to see a public response from Varun and Co on this valid
reviewer comment
before you proceed with this series, makes sense? There's no point to
duplicate the same
code in the kernel again and again. **Even** if there's one
duplication now, we don't want
another one.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 13/13] cxgbit: add files for cxgbit.ko

2016-05-18 Thread Or Gerlitz
On Sat, Apr 30, 2016 at 6:54 PM, Or Gerlitz  wrote:
> On Tue, Apr 19, 2016 at 9:30 PM, Varun Prakash  wrote:
>> cxgbit.h - This file contains data structure
>> definitions for cxgbit.ko.
>>
>> cxgbit_lro.h - This file contains data structure
>> definitions for LRO support.
>>
>> cxgbit_main.c - This file contains code for
>> registering with iscsi target transport and
>> cxgb4 driver.
>>
>> cxgbit_cm.c - This file contains code for
>> connection management.
>>
>> cxgbit_target.c - This file contains code
>> for processing iSCSI PDU.
>>
>> cxgbit_ddp.c - This file contains code for
>> Direct Data Placement.
>
> Wait,
>
> You are adding many K's LOCs to handle things like CM (connection
> management), DDP and LRO. But your upstream solution must be using CM
> and DDP (and LRO as well) for the HW offloaded initiator side as well,
> not to mention the iWARP side of things.
>
> There must be some way to refactor things instead of repeating the
> same bits over and over, thoughts?

Nic,

The author haven't responded... where that this stands from your point of view?

Or.

>>  drivers/target/iscsi/Kconfig|2 +
>>  drivers/target/iscsi/Makefile   |1 +
>>  drivers/target/iscsi/cxgbit/Kconfig |7 +
>>  drivers/target/iscsi/cxgbit/Makefile|6 +
>>  drivers/target/iscsi/cxgbit/cxgbit.h|  353 +
>>  drivers/target/iscsi/cxgbit/cxgbit_cm.c | 2086 
>> +++
>> drivers/target/iscsi/cxgbit/cxgbit_ddp.c|  325 +
>> drivers/target/iscsi/cxgbit/cxgbit_lro.h|   72 +
>> drivers/target/iscsi/cxgbit/cxgbit_main.c   |  701 +
>>  drivers/target/iscsi/cxgbit/cxgbit_target.c | 1561 
>> 10 files changed, 5114 insertions(+)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 13/13] cxgbit: add files for cxgbit.ko

2016-04-30 Thread Or Gerlitz
On Tue, Apr 19, 2016 at 9:30 PM, Varun Prakash  wrote:
> cxgbit.h - This file contains data structure
> definitions for cxgbit.ko.
>
> cxgbit_lro.h - This file contains data structure
> definitions for LRO support.
>
> cxgbit_main.c - This file contains code for
> registering with iscsi target transport and
> cxgb4 driver.
>
> cxgbit_cm.c - This file contains code for
> connection management.
>
> cxgbit_target.c - This file contains code
> for processing iSCSI PDU.
>
> cxgbit_ddp.c - This file contains code for
> Direct Data Placement.

Wait,

You are adding many K's LOCs to handle things like CM (connection
management), DDP and LRO. But your upstream solution must be using CM
and DDP (and LRO as well) for the HW offloaded initiator side as well,
not to mention the iWARP side of things.

There must be some way to refactor things instead of repeating the
same bits over and over, thoughts?

Or.

>  drivers/target/iscsi/Kconfig|2 +
>  drivers/target/iscsi/Makefile   |1 +
>  drivers/target/iscsi/cxgbit/Kconfig |7 +
>  drivers/target/iscsi/cxgbit/Makefile|6 +
>  drivers/target/iscsi/cxgbit/cxgbit.h|  353 +
>  drivers/target/iscsi/cxgbit/cxgbit_cm.c | 2086 
> +++
> drivers/target/iscsi/cxgbit/cxgbit_ddp.c|  325 +
> drivers/target/iscsi/cxgbit/cxgbit_lro.h|   72 +
> drivers/target/iscsi/cxgbit/cxgbit_main.c   |  701 +
>  drivers/target/iscsi/cxgbit/cxgbit_target.c | 1561 
> 10 files changed, 5114 insertions(+)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] IB/iser: set max_segment_size

2016-04-28 Thread Or Gerlitz
On Tue, Apr 12, 2016 at 5:13 PM, Christoph Hellwig  wrote:
> So that we don't overflow the number of MR segments allocated because
> we have to split on SGL segment into multiple MR segments.
>
> Signed-off-by: Christoph Hellwig 

nit, but please fix


IB/iser: set [...] --> IB/iser: Set max segment size in the SCSI host template

I prefer to avoid names of variables and functions in commit titles
for the iser initiator driver.

I find the usage of higher language very beneficial for maintenance  and such.

Sagi, I would appreciate if you avoid acking patches lacking this
practice for iser-I,
specifically we want people to use capital letter in the 1st word that
follows that IB/iser: prefix - ok?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 05/10] iser: Have initiator and target to share protocol structures and definitions

2015-11-18 Thread Or Gerlitz

On 11/16/2015 6:37 PM, Sagi Grimberg wrote:

+/**
+ * struct iser_hello - iSER Hello header
+ *
+ * @opcode:   opcode (must be set to ISER_HELLO)
+ * @max_min_ver:  maximum and minimum iser versions
+ * @iser_ird: iSER IRD
+ * @rsvd: reserved
+ */
+struct iser_hello {
+   u8  opcode;
+   u8  max_min_ver;
+   u16 iser_ird;
+   u8  rsvd[20];
+} __packed;
+
+/**
+ * struct iser_hello_rep - iSER Hello reply header
+ *
+ * @opcode_rej:   opcode (must be set to ISER_HELLORPLY)
+ *lower bit is reject bit
+ * @max_cur_ver:  maximum and current iser versions
+ * @iser_ord: iSER ORD
+ * @rsvd: reserved
+ */
+struct iser_hello_rep {
+   u8  opcode_rej;
+   u8  max_cur_ver;
+   u16 iser_ord;
+   u8  rsvd[20];
+} __packed;
+


I don't see the point to include these two defs, we don't use them and 
Steve even got iser to work

over iwarp without them, so why care? we should only leave


+#define ISER_HELLO 0x20
+#define ISER_HELLORPLY 0x30

to allow warnings on them if we get such packets
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] iscsi: fix regression caused by session lock patch

2015-11-18 Thread Or Gerlitz
On Mon, Nov 16, 2015 at 7:30 PM, Michael Christie  wrote:
>> On Nov 15, 2015, at 4:10 AM, Or Gerlitz  wrote:
>> On Fri, Nov 13, 2015 at 6:51 PM, Mike Christie  wrote:
>>> On 11/13/2015 09:06 AM, Or Gerlitz wrote:

>> After the locking change, adding a task to any of the connection
>> mgmtqueue, cmdqueue, or requeue lists is under the session forward lock.
>>
>> Removing tasks from any of these lists in iscsi_data_xmit is under
>> the session forward lock and **before** calling down to the transport
>> to handle the task.
>>
>> The iscsi_complete_task helper was added by Mike's commit
>> 3bbaaad95fd38ded "[SCSI] libiscsi: handle cleanup task races"
>> and is indeed typically called under the backward lock && has this section
>>
>> +   if (!list_empty(&task->running))
>> +   list_del_init(&task->running);
>>
>> which per my reading of the code never comes into play, can you comment?

> I had sent this to Sagi and your mellanox email the other day:

> The bug occurs when a target completes a command while we are still
> processing it. If we are doing a WRITE and the iscsi_task
> is on the cmdqueue because we are handling a R2T.

Mike,

I failed to find how an iscsi_task can be added again to the cmdqueue list,
nor how it can be added to the requeue list without the right locking, nor how
can an iscsi_task be on either of these lists when iscsi_complete_task
is invoked.

Specifically, my reading of the code says that these are the events over time:

t1. queuecommand :: we put the task on the cmdqueue list
(libiscsi.c:1741) - under fwd lock

t2. iscsi_data_xmit :: we remove the task from the cmdqueue list
(libiscsi.c:1537) - under fwd lock

when the R2T flow happens, we do the following

t3. iscsi_tcp_hdr_dissect --> iscsi_tcp_r2t_rsp --> iscsi_requeue_task ::
put the task on the requeue list -- the call to iscsi_tcp_r2t_rsp is
under the fwd lock

t4. iscsi_data_xmit :: we remove the task from the requeue list
(libiscsi.c: 1578) - under fwd lock

Do you agree to t1...t4 being what's possible for a given task? or I
missed something?

>> The target shouldn't
>> send a Check Condition at this time, but some do. If that happens, then
>> iscsi_queuecommand could be adding a new task to the cmdqueue, while the
>> recv path is handling the CC for the task with the outsanding R2T.  The
>> recv path iscsi_complete_task call sees that task it on the cmdqueue and
>> deletes it from the list at the same time iscsi_queuecommand is adding a new 
>> task.


>> This should not happen per the iscsi spec. There is some wording about
>> waiting to finish the sequence in progress, but targets goof this up.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] iscsi: fix regression caused by session lock patch

2015-11-17 Thread Or Gerlitz
On Mon, Nov 16, 2015 at 7:30 PM, Michael Christie  wrote:
>> On Nov 15, 2015, at 4:10 AM, Or Gerlitz  wrote:

>> After the locking change, adding a task to any of the connection
>> mgmtqueue, cmdqueue, or requeue lists is under the session forward lock.
>>
>> Removing tasks from any of these lists in iscsi_data_xmit is under
>> the session forward lock and **before** calling down to the transport
>> to handle the task.
>>
>> The iscsi_complete_task helper was added by Mike's commit
>> 3bbaaad95fd38ded "[SCSI] libiscsi: handle cleanup task races"
>> and is indeed typically called under the backward lock && has this section
>>
>> +   if (!list_empty(&task->running))
>> +   list_del_init(&task->running);
>> which per my reading of the code never comes into play, can you comment?

>> The bug occurs when a target completes a command while we are still
>> processing it. If we are doing a WRITE and the iscsi_task
>> is on the cmdqueue because we are handling a R2T. The target shouldn't
>> send a Check Condition at this time, but some do. If that happens, then
>> iscsi_queuecommand could be adding a new task to the cmdqueue, while the
>> recv path is handling the CC for the task with the outsanding R2T.  The
>> recv path iscsi_complete_task call sees that task it on the cmdqueue and
>> deletes it from the list at the same time iscsi_queuecommand is adding a new
>> task.

So we're now a bit beyond trivial bug and

>> This should not happen per the iscsi spec. There is some wording about
>> waiting to finish the sequence in progress, but targets goof this up.

we have target/s that violate the spec, this is life, but can explain
why it took us 18m to get
bug report. Can you provide few point pointers into the relevant code
pieces that one need
to look at to realize what's going on there? was this code added
before or after the patch?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ib_srp: initialize dma_length in srp_map_idb

2015-11-15 Thread Or Gerlitz
On Sun, Nov 15, 2015, Sagi Grimberg  wrote:
> On 15/11/2015 19:59, Christoph Hellwig wrote:

>> Without this sg_dma_len will return 0 on architectures tha have
>> the dma_length field.

and what wrong with that?

Christoph, probably typo here? "tha" needs to be "that"

>> Signed-off-by: Christoph Hellwig 
>> ---
>>   drivers/infiniband/ulp/srp/ib_srp.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c
>> b/drivers/infiniband/ulp/srp/ib_srp.c
>> index 32f7962..445c0a6 100644
>> --- a/drivers/infiniband/ulp/srp/ib_srp.c
>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
>> @@ -1520,6 +1520,9 @@ static int srp_map_idb(struct srp_rdma_ch *ch,
>> struct srp_request *req,
>> state.sg_nents = 1;
>> sg_set_buf(idb_sg, req->indirect_desc, idb_len);
>> idb_sg->dma_address = req->indirect_dma_addr; /* hack! */
>> +#ifdef CONFIG_NEED_SG_DMA_LENGTH
>> +   idb_sg->dma_length = idb_sg->length;  /* hack^2 */
>> +#endif
>
>
> :)
>
> We should really get this properly map/unmap per IO at some point.
> Probably do it in both code paths...

Sagi, can you please elaborate a little further on the problem, srpt
WA, what do we do in isert and what is the proposed not WA solution?

> Having said that,
> Looks fine,
> Reviewed-by: Sagi Grimberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] iscsi: fix regression caused by session lock patch

2015-11-15 Thread Or Gerlitz
On Fri, Nov 13, 2015 at 6:51 PM, Mike Christie  wrote:
> On 11/13/2015 09:06 AM, Or Gerlitz wrote:
>>> The patch has caused multiple regressions, did not even compile when
>>> > sent to me, and was poorly reviewed and I have not heard from you guys
>>> > in a week. Given the issues the patch has had and the current time, I do
>>> > not feel comfortable with it anymore. I want to re-review it and fix it
>>> > up when there is more time.
>> Mike (Hi),
>>
>> It's a complex patch that touches all the iscsi transports, and yes,
>> when it was send to you the 1st time, there was build error on one of
>> the offload transports (bad! but happens) and yes, as you pointed, one
>> static checker fix + one bug fix for it went upstream after this has
>> been merged, happens too.
>
> A patch should not cause this many issues.
>
>> What makes you say it was poorly reviewed?
>
> I just did not do a good job at looking at the patch. I should have
> caught all of these issues.
>
> - The bnx2i cleanup_task bug should have been obvious, especially for me
> because I had commented about the back lock and the abort path.
>
> - This oops, was so basic. Incorrect locking around a linked list being
> accessed from 2 threads is really one of those 1st year kernel
> programmer things.

Mike, Chris

After the locking change, adding a task to any of the connection
mgmtqueue, cmdqueue, or requeue lists is under the session forward lock.

Removing tasks from any of these lists in iscsi_data_xmit is under
the session forward lock and **before** calling down to the transport
to handle the task.

The iscsi_complete_task helper was added by Mike's commit
3bbaaad95fd38ded "[SCSI] libiscsi: handle cleanup task races"
and is indeed typically called under the backward lock && has this section

+   if (!list_empty(&task->running))
+   list_del_init(&task->running);

which per my reading of the code never comes into play, can you comment?

Lets address this area before we move to the others claims made on the patch.

Again, the patch is around for ~18 months, since 3.15, and no deep
complaints so far, lets
not jump to conclusions.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] IB/iser: Convert to CQ abstraction

2015-11-15 Thread Or Gerlitz

On 11/13/2015 3:46 PM, Christoph Hellwig wrote:

From: Sagi Grimberg


Care to sparse some text here to assist a reviewer and future bisections?!

I have asked multiple times to avoid empty change-logs for patches in 
this driver.




Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
---
  drivers/infiniband/ulp/iser/iscsi_iser.h |  68 ---
  drivers/infiniband/ulp/iser/iser_initiator.c | 142 ++-
  drivers/infiniband/ulp/iser/iser_memory.c|  21 ++-
  drivers/infiniband/ulp/iser/iser_verbs.c | 258 ++-
  4 files changed, 209 insertions(+), 280 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/9] IB/iser: Use a dedicated descriptor for login

2015-11-15 Thread Or Gerlitz

On 11/13/2015 3:46 PM, Christoph Hellwig wrote:

From: Sagi Grimberg

Makes better sense and we'll need it later with CQ abstraction.
iser switch login bufs to void


Sagi, few quick comments on this patch, please address for next version..

The 2nd sentence of the change-log needs better phrasing.

also multiple checkpatch hits on the patch, please fix

CHECK: Please don't use multiple blank lines
#26: FILE: drivers/infiniband/ulp/iser/iscsi_iser.h:329:

+

WARNING: __packed is preferred over __attribute__((packed))
#42: FILE: drivers/infiniband/ulp/iser/iscsi_iser.h:345:
+} __attribute__((packed));

CHECK: Please don't use multiple blank lines
#44: FILE: drivers/infiniband/ulp/iser/iscsi_iser.h:347:
+
+

CHECK: Alignment should match open parenthesis
#161: FILE: drivers/infiniband/ulp/iser/iser_initiator.c:209:
+   if (ib_dma_mapping_error(device->ib_device,
+   desc->req_dma))

CHECK: Alignment should match open parenthesis
#172: FILE: drivers/infiniband/ulp/iser/iser_initiator.c:220:
+   if (ib_dma_mapping_error(device->ib_device,
+   desc->rsp_dma))






--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/9] move blk_iopoll to limit and make it generally available

2015-11-15 Thread Or Gerlitz
On Sun, Nov 15, 2015 at 10:48 AM, Sagi Grimberg
 wrote:
> Or is correct,
>
> I have attempted to convert iser to use blk_iopoll in the past, however
> I've seen inconsistent performance and latency skews (comparing to
> tasklets iser is using today). This was manifested in IOPs test cases
> where I ran multiple threads with higher queue-depth and not in
> sanitized pure latency (QD=1) test cases. Unfortunately I didn't have
> the time to pick it up since.
>
> I do have every intention of testing it again with this. If it still
> exist we will need to find the root-cause of it before converting
> drivers to use it.

Good, this way (inconsistent performance and latency skews) or another
(all shines up) -- please
let us know your findings, best through commenting within V > 0 the
cover letter posts of this series
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/9] move blk_iopoll to limit and make it generally available

2015-11-13 Thread Or Gerlitz
On Fri, Nov 13, 2015 at 3:46 PM, Christoph Hellwig  wrote:
> The new name is irq_poll as iopoll is already taken.  Better suggestions
> welcome.

Sagi (or Christoph if you can address that),

@ some pointer over the last 18 months there was a port done at
mellanox for iser to use blk-iopoll and AFAIR it didn't work well or
didn't work at all. Can you tell now what was the problem and how did
you address it at your generalization?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] iscsi: fix regression caused by session lock patch

2015-11-13 Thread Or Gerlitz
On Thu, Nov 12, 2015 at 10:58 PM, Mike Christie  wrote:
> On 11/12/2015 06:03 AM, Sagi Grimberg wrote:
>>> The bug is caused by this patch:
>>>
>>> 659743b02c411075b26601725947b21df0bb29c8
>>>
>>> which allowed the task lists to be manipulated under different locks
>>> in the xmit and completion path.

>>> To fix the oops this patch just reverts that patch. It also reverts
>>> these 2 patches for regressions that were also a result of that patch:

>> Whoa now Mike, this is a major change. Can we take a step back and think
>> about this for a second?

> The issue has been on the open-iscsi list for a week! You are on the
> list still right? Or is even ccd on the thread.

The email you sent me a week ago also cc-ed open-iscsi hence was
routed by a rule in my mailer that made it to land in my open-iscsi
subscription folder which is don't visit much nowadays. Only when you
posted to linux-scsi we saw that and responded within few hours, to
begin with.

>> Can you provide a more detailed analysis of why is this list corruption
>> triggered? What scenario was not honoring the locking policy?

> Basic locking around a linked list bug. iscsi_queuecommand adds it under
> the frwd lock and iscsi_complete_task was taking it off the back_lock.

>> This code has been running reliably in our labs for a long time now
>> (both iser and tcp).

> The patch has caused multiple regressions, did not even compile when
> sent to me, and was poorly reviewed and I have not heard from you guys
> in a week. Given the issues the patch has had and the current time, I do
> not feel comfortable with it anymore. I want to re-review it and fix it
> up when there is more time.

Mike (Hi),

It's a complex patch that touches all the iscsi transports, and yes,
when it was send to you the 1st time, there was build error on one of
the offload transports (bad! but happens) and yes, as you pointed, one
static checker fix + one bug fix for it went upstream after this has
been merged, happens too.

What makes you say it was poorly reviewed?

And now after few years in upstream a possibly real bug was found
(happens), why rush and revert? lets see if we can fix.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/8] IB/srp: Remove stale connection retry mechanism

2014-09-20 Thread Or Gerlitz
On Fri, Sep 19, 2014 at 3:58 PM, Bart Van Assche  wrote:
> Attempting to connect three times may be insufficient after an
> initiator system that was using multiple RDMA channels tries to
> relogin. Additionally, this login retry mechanism is a workaround
> for particular behavior of the IB/CM.

Can you be more specific re the particular behavior of the IB CM?
added Sean, the CM maintainer.

> Since the srp_daemon retries
> a failed login attempt anyway, remove the stale connection retry
> mechanism.
>
> Signed-off-by: Bart Van Assche 
> ---
>  drivers/infiniband/ulp/srp/ib_srp.c | 16 +++-
>  1 file changed, 3 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
> b/drivers/infiniband/ulp/srp/ib_srp.c
> index d3c712f..9608e7a 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -904,7 +904,6 @@ static void srp_rport_delete(struct srp_rport *rport)
>
>  static int srp_connect_target(struct srp_target_port *target)
>  {
> -   int retries = 3;
> int ret;
>
> WARN_ON_ONCE(target->connected);
> @@ -945,19 +944,10 @@ static int srp_connect_target(struct srp_target_port 
> *target)
> break;
>
> case SRP_STALE_CONN:
> -   /* Our current CM id was stale, and is now in 
> timewait.
> -* Try to reconnect with a new one.
> -*/
> -   if (!retries-- || srp_new_cm_id(target)) {
> -   shost_printk(KERN_ERR, target->scsi_host, PFX
> -"giving up on stale 
> connection\n");
> -   target->status = -ECONNRESET;
> -   return target->status;
> -   }
> -
> shost_printk(KERN_ERR, target->scsi_host, PFX
> -"retrying stale connection\n");
> -   break;
> +"giving up on stale connection\n");
> +   target->status = -ECONNRESET;
> +   return target->status;
>
> default:
> return target->status;
> --
> 1.8.4.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/3] scsi_cmnd: Introduce scsi_transfer_length helper

2014-06-10 Thread Or Gerlitz
On Tue, Jun 10, 2014 at 10:02 PM, Martin K. Petersen
 wrote:
>> "Sagi" == Sagi Grimberg  writes:
>
> +static inline unsigned scsi_prot_length(unsigned data_length,
> +   unsigned sector_size)
> +{
> +   switch (sector_size) {
> +   case 512:
> +   return (data_length >> 9) * 8;
> +   case 1024:
> +   return (data_length >> 10) * 8;
> +   case 2048:
> +   return (data_length >> 11) * 8;
> +   case 4096:
> +   return (data_length >> 12) * 8;
> +   default:
> +   return (data_length >> ilog2(sector_size)) * 8;
> +   }
> +}
> +
> +static inline unsigned scsi_transfer_length(struct scsi_cmnd *cmd)
> +{
> +   unsigned data_length;
> +
> +   if (cmd->sc_data_direction == DMA_FROM_DEVICE) {
> +   data_length = scsi_in(cmd)->length;
> +   if (scsi_get_prot_op(cmd) ==  SCSI_PROT_NORMAL ||
> +   scsi_get_prot_op(cmd) ==  SCSI_PROT_READ_INSERT)
> +   return data_length;
> +   } else {
> +   data_length = scsi_out(cmd)->length;
> +   if (scsi_get_prot_op(cmd) ==  SCSI_PROT_NORMAL ||
> +   scsi_get_prot_op(cmd) ==  SCSI_PROT_WRITE_STRIP)
> +   return data_length;
> +   }
> +
> +   /* Protection information exists on the wire */
> +   return data_length + scsi_prot_length(data_length,
> + cmd->device->sector_size);
> +}
>
> Let's do this for 3.16:

Just to make sure, by 3.16 you also mean 3.15.y, right?
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Include protection information in iscsi header

2014-06-03 Thread Or Gerlitz
On Tue, Jun 3, 2014 at 9:16 AM, Roland Dreier  wrote:
> On Sun, Jun 1, 2014 at 9:19 AM, Sagi Grimberg  wrote:
>> Although these patches involve 3 subsystems with different
>> maintainers (scsi, iser, target) I would prefer seeing these
>> patches included together.
>
> Why?  Because they break wire compatibility?
>
> I hate to say it but even if they're merged at the same time, you
> can't guarantee that targets and initiators will be updated together.

Guys, this all deals with code merged in 3.15-rc1, and (thanks god and
linus that -rc8 took place this cycle) 3.15 isn't out yet!! -- so we
just need to act quickly and this (having the fix in 3.15 or if too
late in 3.15.1) would be OK
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-3.15] scsi/libiscsi: Fix static checker warning on bh locking

2014-03-30 Thread Or Gerlitz
From: Shlomo Pongratz 

Commit 659743b "[SCSI] libiscsi: Reduce locking contention in fast path" 
introduced a
new smatch warning on libiscsi.c "iscsi_xmit_task() warn: inconsistent returns
bottom_half:: locked (1410 [(-61)]) unlocked (1425 [0], 1425 
[s32min-(-1),1-s32max])",
which we can eliminate by using non bh locking on the nested spin_lock call.

Reported-by: Dan Carpenter 
Signed-off-by: Shlomo Pongratz 
Signed-off-by: Or Gerlitz 
---

 drivers/scsi/libiscsi.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 5b8605c..5087957 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1411,9 +1411,9 @@ static int iscsi_xmit_task(struct iscsi_conn *conn)
conn->task = NULL;
}
/* regular RX path uses back_lock */
-   spin_lock_bh(&conn->session->back_lock);
+   spin_lock(&conn->session->back_lock);
__iscsi_put_task(task);
-   spin_unlock_bh(&conn->session->back_lock);
+   spin_unlock(&conn->session->back_lock);
return rc;
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1] Target/iser: Fail SCSI WRITE command if device detected integrity error

2014-03-18 Thread Or Gerlitz

On 18/03/2014 14:52, Sagi Grimberg wrote:

If during data-transfer a data-integrity error was detected we
must fail the command with CHECK_CONDITION and not execute
the command.

Cnages from v0:
- Added reported-by Or gerlitz

Signed-off-by: Sagi Grimberg
Reported-by: Or Gerlitz
---


s/Cnages/Changes

and put the history below  the --- line, it need not go into the 
upstream change-log

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 11/13] SCSI/libiscsi: Add check_protection callback for transports

2014-03-17 Thread Or Gerlitz
On Mon, Mar 17, 2014, Sagi Grimberg  wrote:
> On 3/17/2014 6:59 PM, Mike Christie wrote:

>> This patch is ok, but happened to the patch for the xmit task path?

> Hey Mike, Thanks for your Ack on this.
> The xmit_task fix was posted as a separate patch since it is not specific
> to this set (relevant also for existing data dma_mapping in xmit_task).
> Care to pick it up? http://marc.info/?l=linux-scsi&m=139404160818424&w=2

Hi Roland, provided that Mike Christie acked the libiscsi patch from
this series, and I acked the iser patches and you had no other
comments on the series -- can you pick this for 3.15? The merge window
is coming soon, if for some reason this will not work for you, I'd
like Nic to push that through his tree along with the iser target DIF
patches, so need your asap reply, please.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] Target/dif: Introduce protection-passthough-only mode

2014-03-17 Thread Or Gerlitz
On Mon, Mar 17, 2014 at 12:52 PM, Sagi Grimberg  wrote:

> Target/dif: Introduce protection-passthough-only mode

s/passthough/passthrough/
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 11/13] SCSI/libiscsi: Add check_protection callback for transports

2014-03-17 Thread Or Gerlitz
On Wed, Mar 5, 2014 at 7:43 PM, Sagi Grimberg  wrote:
>
> iSCSI needs to be at least aware that a task involves protection
> information. In case it does, after the transaction completed
> libiscsi will ask the transport to check the protection status
> of the transaction.
>
> Unlike transport errors, DIF errors should not prevent successful
> completion of the transaction from the transport point of view,
> but should be escelated to scsi mid-layer when constructing the
> scsi result and sense data.
>
> check_protection routine will return the ascq corresponding to the
> DIF error that occured (or 0 if no error happened).
>
> return ascq:
> - 0x1: GUARD_CHECK_FAILED
> - 0x2: APPTAG_CHECK_FAILED
> - 0x3: REFTAG_CHECK_FAILED

Hi Mike, just to remove doubt, this patch is OK with you, right? can
you please ack it, so we know we're all good for the 3.15 merge window

Or.

>
> Signed-off-by: Sagi Grimberg 
> Signed-off-by: Alex Tabachnik 
> ---
>  drivers/scsi/libiscsi.c |   32 
>  include/scsi/libiscsi.h |4 
>  include/scsi/scsi_transport_iscsi.h |1 +
>  3 files changed, 37 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index 4046241..3c11acf 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -395,6 +395,10 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task 
> *task)
> if (rc)
> return rc;
> }
> +
> +   if (scsi_get_prot_op(sc) != SCSI_PROT_NORMAL)
> +   task->protected = true;
> +
> if (sc->sc_data_direction == DMA_TO_DEVICE) {
> unsigned out_len = scsi_out(sc)->length;
> struct iscsi_r2t_info *r2t = &task->unsol_r2t;
> @@ -823,6 +827,33 @@ static void iscsi_scsi_cmd_rsp(struct iscsi_conn *conn, 
> struct iscsi_hdr *hdr,
>
> sc->result = (DID_OK << 16) | rhdr->cmd_status;
>
> +   if (task->protected) {
> +   sector_t sector;
> +   u8 ascq;
> +
> +   /**
> +* Transports that didn't implement check_protection
> +* callback but still published T10-PI support to scsi-mid
> +* deserve this BUG_ON.
> +**/
> +   BUG_ON(!session->tt->check_protection);
> +
> +   ascq = session->tt->check_protection(task, §or);
> +   if (ascq) {
> +   sc->result = DRIVER_SENSE << 24 |
> +SAM_STAT_CHECK_CONDITION;
> +   scsi_build_sense_buffer(1, sc->sense_buffer,
> +   ILLEGAL_REQUEST, 0x10, ascq);
> +   sc->sense_buffer[7] = 0xc; /* Additional sense length 
> */
> +   sc->sense_buffer[8] = 0;   /* Information desc type */
> +   sc->sense_buffer[9] = 0xa; /* Additional desc length 
> */
> +   sc->sense_buffer[10] = 0x80; /* Validity bit */
> +
> +   put_unaligned_be64(sector, &sc->sense_buffer[12]);
> +   goto out;
> +   }
> +   }
> +
> if (rhdr->response != ISCSI_STATUS_CMD_COMPLETED) {
> sc->result = DID_ERROR << 16;
> goto out;
> @@ -1567,6 +1598,7 @@ static inline struct iscsi_task 
> *iscsi_alloc_task(struct iscsi_conn *conn,
> task->have_checked_conn = false;
> task->last_timeout = jiffies;
> task->last_xfer = jiffies;
> +   task->protected = false;
> INIT_LIST_HEAD(&task->running);
> return task;
>  }
> diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h
> index 309f513..1457c26 100644
> --- a/include/scsi/libiscsi.h
> +++ b/include/scsi/libiscsi.h
> @@ -133,6 +133,10 @@ struct iscsi_task {
> unsigned long   last_xfer;
> unsigned long   last_timeout;
> boolhave_checked_conn;
> +
> +   /* T10 protection information */
> +   boolprotected;
> +
> /* state set/tested under session->lock */
> int state;
> atomic_trefcount;
> diff --git a/include/scsi/scsi_transport_iscsi.h 
> b/include/scsi/scsi_transport_iscsi.h
> index 88640a4..2555ee5 100644
> --- a/include/scsi/scsi_transport_iscsi.h
> +++ b/include/scsi/scsi_transport_iscsi.h
> @@ -167,6 +167,7 @@ struct iscsi_transport {
>  struct iscsi_bus_flash_conn *fnode_conn);
> int (*logout_flashnode_sid) (struct iscsi_cls_session *cls_sess);
> int (*get_host_stats) (struct Scsi_Host *shost, char *buf, int len);
> +   u8 (*check_protection)(struct iscsi_task *task, sector_t *sector);
>  };
>
>  /*
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More maj

Re: [PATCH v2 00/13] T10-PI support for iSER initiator

2014-03-10 Thread Or Gerlitz
On Wed, Mar 5, 2014 at 7:43 PM, Sagi Grimberg  wrote:
> Hey Roland, Nic, Mike and Co
>
> This patchset adds T10 protection information offload support over
> RDMA signature verbs API. This set, along with the iSER target set,
> allow end-to-end protection information passthrough and validation.
> The patchset was tested against Linux SCSI target with iSER DIF
> support applied.
>
> iSER T10-PI support enablement is currently controlled with
> module parameters (similar to lpfc for example) which make them
> global. In the next phase we can consider passing these parameters
> from iscsid nl messages, which would make them per-connection.
>
> The approach I took with respect to escalating protection information
> errors was minimal iSCSI intervention in protection information affairs.
> I added libiscsi a hook asking the transport to check the protection
> information status, and construct the proper sense data in case of errors
> (the alternative of letting the transport to construct sense data seemed
> much less appealing).
>
> Note that this patchset comes on top of a pending patch for iSER to suppress
> fastreg completions (http://marc.info/?l=linux-rdma&m=139047309831997&w=2).
>
> v0 patches are available in target-pending git repo (branch rdma-dif) and
> passed 0-DAY testing.
>
> Roland, I would like to hear your feedback on this.

Acked-By: Or Gerlitz 

Roland, given that Sagi addressed all the feedback from the reviewers,
specifically from Mike Christie, and has my iser maintainer
ack on the series, can you please go ahead and apply that for 3.15?
>
> The set is ordered by the following:
> - Preparation patches (non/minor functionality changes).
> - Add protection information execution support.
> - Add protection information status check facilities.
> - Publish T10-DIF support to SCSI mid-layer according to
>   IB device capabilities.
>
> Changes from v1:
> - Removed extra space (BUG_ON).
> - Dropped DID_ABORT from sc result for data integrity errors.
> - Fixed failed sector report.
>
> Changes from v0:
> - Fix protection information dma registration for unaligned scatterlists
>   which may happen when the block layer merges bios.
> - Don't fail connections on devices without DIF support - warn and continue
>   without DIF.
> - reword FR -> FastReg
>
> Alex Tabachnik (2):
>   IB/iser: Introduce pi_enable, pi_guard module parameters
>   IB/iser: Initialize T10-PI resources
>
> Sagi Grimberg (11):
>   IB/iser: Avoid FRWR notation, use fastreg instead
>   IB/iser: Push the decision what memory key to use into fast_reg_mr
> routine
>   IB/iser: Move fast_reg_descriptor initialization to a function
>   IB/iser: Keep IB device attributes under iser_device
>   IB/iser: Replace fastreg descriptor valid bool with indicators
> container
>   IB/iser: Generalize iser_unmap_task_data and
> finalize_rdma_unaligned_sg
>   IB/iser: Generalize fall_to_bounce_buf routine
>   IB/iser: Support T10-PI operations
>   SCSI/libiscsi: Add check_protection callback for transports
>   IB/iser: Implement check_protection
>   IB/iser: Publish T10-PI support to SCSI midlayer
>
>  drivers/infiniband/ulp/iser/iscsi_iser.c |   46 +++-
>  drivers/infiniband/ulp/iser/iscsi_iser.h |   71 -
>  drivers/infiniband/ulp/iser/iser_initiator.c |   98 +-
>  drivers/infiniband/ulp/iser/iser_memory.c|  445 
> +++---
>  drivers/infiniband/ulp/iser/iser_verbs.c |  287 -
>  drivers/scsi/libiscsi.c  |   32 ++
>  include/scsi/libiscsi.h  |4 +
>  include/scsi/scsi_transport_iscsi.h  |1 +
>  8 files changed, 771 insertions(+), 213 deletions(-)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 10/13] IB/iser: Support T10-PI operations

2014-03-04 Thread Or Gerlitz

On 04/03/2014 16:44, Sagi Grimberg wrote:
@@ -1707,10 +1707,17 @@ int iscsi_queuecommand(struct Scsi_Host 
*host, struct scsi_cmnd *sc)

goto prepd_fault;
}
}
-   if (session->tt->xmit_task(task)) {
-   session->cmdsn--;
-   reason = FAILURE_SESSION_NOT_READY;
-   goto prepd_reject;
+
+   reason = session->tt->xmit_task(task);
+   if (reason) {
+   if (reason == -ENOMEM || reason == -EAGAIN) {
+   session->cmdsn--;


I am pretty sure this has to be done anyway, no matter why the 
xmit_task callback failed


Even if we abort? this just follows the same logic as 
iscsi_prep_scsi_cmd_pdu error flow.


yes, take a 2nd look on iscsi_prep_scsi_cmd_pdu and you'll see that all 
the possible error cases take place **before** session->cmdsn is 
incremented

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 10/13] IB/iser: Support T10-PI operations

2014-03-04 Thread Or Gerlitz

On 04/03/2014 11:59, Sagi Grimberg wrote:

On 3/4/2014 11:38 AM, Or Gerlitz wrote:

On 03/03/2014 06:44, Mike Christie wrote:
The xmit_task callout does handle failures like EINVAL. If the above 
map
calls fail then you would get infinite retries. You would currently 
want

to do the mapping in the init_task callout instead.

If it makes it easier on the driver implementation then it is ok to
modify the xmit_task callers so that they handle multiple error codes
for drivers like iser that have the xmit_task callout called from
iscsi_queuecommand.


Mike,

After looking on the code with Sagi,  it seems to us that the correct 
way to go here, would be to enhance in iscsi_queuecommand the 
processing of the result returned by session->tt->xmit_task(task) to 
behave in a similar manner to how the return value of 
iscsi_prep_scsi_cmd_pdu() is treated. E.g for errors such as ENOMEM 
and EGAIN take the "reject" flow which would cause the SCSI midlayer 
to retry the command and for other return values go to the "fault" 
flow which will cause the ML to abort the command.


Or.



Yes, we were thinking about the following:
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1707,10 +1707,17 @@ int iscsi_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *sc)

goto prepd_fault;
}
}
-   if (session->tt->xmit_task(task)) {
-   session->cmdsn--;
-   reason = FAILURE_SESSION_NOT_READY;
-   goto prepd_reject;
+
+   reason = session->tt->xmit_task(task);
+   if (reason) {
+   if (reason == -ENOMEM || reason == -EAGAIN) {
+   session->cmdsn--;


I am pretty sure this has to be done anyway, no matter why the xmit_task 
callback failed



+ reason = FAILURE_SESSION_NOT_READY;
+   goto prepd_reject;
+   } else {
+   sc->result = DID_ABORT << 16;
+   goto prepd_fault;
+   }
}
} else {
list_add_tail(&task->running, &conn->cmdqueue);

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 10/13] IB/iser: Support T10-PI operations

2014-03-04 Thread Or Gerlitz

On 03/03/2014 06:44, Mike Christie wrote:

The xmit_task callout does handle failures like EINVAL. If the above map
calls fail then you would get infinite retries. You would currently want
to do the mapping in the init_task callout instead.

If it makes it easier on the driver implementation then it is ok to
modify the xmit_task callers so that they handle multiple error codes
for drivers like iser that have the xmit_task callout called from
iscsi_queuecommand.


Mike,

After looking on the code with Sagi,  it seems to us that the correct 
way to go here, would be to enhance in iscsi_queuecommand the processing 
of the result returned by session->tt->xmit_task(task) to behave in a 
similar manner to how the return value of iscsi_prep_scsi_cmd_pdu() is 
treated. E.g for errors such as ENOMEM and EGAIN take the "reject" flow 
which would cause the SCSI midlayer to retry the command and for other 
return values go to the "fault" flow which will cause the ML to abort 
the command.


Or.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] iser-target: Fix post_send_buf_count for RDMA READ/WRITE

2014-03-03 Thread Or Gerlitz

On 04/03/2014 02:01, Nicholas A. Bellinger wrote:

This is necessary because even though IB_SEND_SIGNALED is
not set for RDMA WRITEs + READs, during a QP failure event
the work requests will be returned with exception status
from the TX completion queue.


Impossible... for rdma reads we must ask for completing, since we should 
write the data
for the back-end, I assume it's just wrong mentioning of rdma-read here, 
right?

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux rdma 3.14 merge plans

2014-01-28 Thread Or Gerlitz
On Wed, Jan 22, 2014, Sagi Grimberg  wrote:
> On 1/22/2014 2:43 AM, Roland Dreier wrote:
>> On Tue, Jan 21, 2014, Or Gerlitz  wrote:

>>> Roland, ping! the signature patches were posted > three months ago. We
>>> deserve a response from the maintainer that goes beyond "I need to
>>> think on that". Responsiveness was stated by Linus to be the #1 requirement
>>> from kernel maintainers.

> Hi Roland, I'll try to respond here. removing LKML and adding Linux-scsi.

Sorry, it seems we will not getting responses unless coping LKML, so
lets do that again -- Roland, below is the detailed response Sagi
wrote you following your "adds a bunch of complexity to support a
feature no one really cares about" comment, can we get you to respond
on that?

>> Or, I'm not sure what response you're after from me.  Linus has also
>> said that maintainers should say "no" a lot more
>> (http://lwn.net/Articles/571995/) so maybe you want me to say, "No, I
>> won't merge this patch set, since it adds a bunch of complexity to
>> support a feature no one really cares about."

> 1. I disagree about no-one cares about DIF/DIX. We are witnessing growing
> interests in this especially for RDMA.
> 2. We put a lot of efforts to avoid complexity here and plug-in as simple as
> possible.
> Application that will choose to use DIF will implement only 3 steps:
> a. allocate signature enabled MR.
> b. register signature enabled MR with DIF attributes (via post_send) and
> then do RDMA.
> c. check MR status after transaction is completed (_lightweight_ verb that
> can be called from interrupt context).

>> Is that it?  (And yes I am skeptical about this stuff -- I work at an 
>> enterprise
>> storage company and even here it's hard to find anyone who cares about
>> DIF/DIX, especially offload features that stop it from being end-to-end)


> 1. RDMA verbs are _NOT_ stopping DIF from being end-to-end.
> OS (or SCSI in our specific case) passes LLD 2 scatterlists: data {block1,
> block2, block3,...}, and protection {DIF1, DIF2, DIF3}.
> LLD is required to verify the data integrity (block guards) and to
> interleave over the wire {block1, DIF1, block2, DIF2}.
> You must support that in HW, you rather iSER/SRP will use giant copy's to
> interleave by itself? or in case OS asked LLD
> to INSERT DIF iSER/SRP will compute CRC for each data-block? RDMA storage
> ULPs are transports - they should have no business with
> data processing.

> 2. HW DIF offload also gives you protection across the PCI. the
> data-validation is done (hopefully offloaded) also
> when data+protection are written to the back-end device. end-to-end is
> preserved.

> 3. SAS & FC have T10-PI offload. This is just adding RDMA into the game.
> With this set of verbs iSER, SRP, FCoE Initiators and targets will be able
> to support T10-PI.


>> I'm sure you're not expecting me to say, "Sure, I'll merge it without
>> understanding the problem it's solving

> Problem: T10-PI offload support for RDMA based initiators. Supporting
> end-to-end data integrity while sustaining high RDMA performance.


>>   or how it's doing that,"

> How it's doing that:
> - We introduce a new type of memory region that posses protection attributes
> suited for data integrity offload.
> - We Introduce a new fast registration method that can bind all the relevant
> info for verify/generate of protection information:
>   * describe if/how to interleave data with protection.
>   * describe what method of data integrity is used (DIF type X, CRC, XOR...)
> and the seeds that HW should start calculation from.
>   * describe how to verify the data.
> - We Introduce a new lightweight check of the data-integrity status to check
> if there were any integrity errors and get information on them.

> Note: We made MR allocation routine generic enough to lay a framework to
> unite all MR allocation
> methods (get_dma_mr, alloc_fast_reg_mr, reg_phys, reg_user_mr, fmrs, and
> probably more in the future...).
> We defined ib_create_mr that can actually get mr_init_attr which can be
> easily extended as opposed to the specific calls exists today.
> So I would say this even reduces complexity.

> Hope this helps,
> Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/11] IB/isert: Initialize T10-PI resources

2014-01-12 Thread Or Gerlitz

On 12/01/2014 14:41, Sagi Grimberg wrote:

--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -48,11 +48,21 @@ struct iser_tx_desc {
 struct ib_send_wr send_wr;
  } __packed;

+struct pi_context {
+   struct ib_mr   *prot_mr;
+   bool prot_key_valid;
+   struct ib_fast_reg_page_list   *prot_frpl;
+   struct ib_mr   *sig_mr;
+   bool sig_key_valid;
+};
+
  struct fast_reg_descriptor {
-   struct list_headlist;
-   struct ib_mr*data_mr;
-   struct ib_fast_reg_page_list*data_frpl;
-   boolvalid;
+   struct list_headlist;
+   struct ib_mr   *data_mr;
+   bool data_key_valid;
+   struct ib_fast_reg_page_list   *data_frpl;
+   boolprotected;

no need for many bools in one structure... each one needs a bit,
correct? so embed them in one variable


I figured it will be more explicit this way. protected boolean 
indicates if we should check the data-integrity status, and the other 
3 indicates if the relevant MR is valid (no need to execute local 
invalidation). Do you think I should compact it somehow? usually 
xxx_valid booleans will align together although not always. 


I didn't note there are so many booleans there... sure, my personal 
preference is compaction, e.g have one field names flags and multiple 
bit locations marking the different flags e.g


FAST_REG_DESC_VALID
FAST_REG_DESC_DATA_KEY_VALID
FAST_REG_DESC_PROTECTED
etc
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/11] IB/isert: Accept RDMA_WRITE completions

2014-01-11 Thread Or Gerlitz
On Thu, Jan 9, 2014 at 6:40 PM, Sagi Grimberg  wrote:
> In case of protected transactions, we will need to check the
> protection status of the transaction before sending SCSI response.
> So be ready for RDMA_WRITE completions. currently we don't ask
> for these completions, but for T10-PI we will.
> @@ -1721,9 +1735,9 @@ __isert_send_completion(struct iser_tx_desc *tx_desc,
>   isert_conn, ib_dev);
> break;
> case ISER_IB_RDMA_WRITE:
> -   pr_err("isert_send_completion: Got ISER_IB_RDMA_WRITE\n");
> -   dump_stack();
> -   break;
> +   pr_debug("isert_send_completion: Got ISER_IB_RDMA_WRITE\n");
> +   atomic_dec(&isert_conn->post_send_buf_count);
> +   isert_completion_rdma_write(tx_desc, isert_cmd);

are we doing fall through here? why?
> case ISER_IB_RDMA_READ:
> pr_debug("isert_send_completion: Got ISER_IB_RDMA_READ:\n");


>
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/11] IB/isert: Initialize T10-PI resources

2014-01-11 Thread Or Gerlitz
On Thu, Jan 9, 2014 at 6:40 PM, Sagi Grimberg  wrote:
> @@ -557,8 +629,14 @@ isert_connect_request(struct rdma_cm_id *cma_id, struct 
> rdma_cm_event *event)
> goto out_mr;
> }
>
> +   if (pi_support && !device->pi_capable) {
> +   pr_err("Protection information requested but not 
> supported\n");
> +   ret = -EINVAL;
> +   goto out_mr;
> +   }
> +
> if (device->use_fastreg) {
> -   ret = isert_conn_create_fastreg_pool(isert_conn);
> +   ret = isert_conn_create_fastreg_pool(isert_conn, pi_support);

just a nit, the pi_support bit can be looked up from the isert_conn
struct, isn't it?

> if (ret) {
> pr_err("Conn: %p failed to create fastreg pool\n",
>isert_conn);
> @@ -566,7 +644,7 @@ isert_connect_request(struct rdma_cm_id *cma_id, struct 
> rdma_cm_event *event)
> }
> }
>
> -   ret = isert_conn_setup_qp(isert_conn, cma_id);
> +   ret = isert_conn_setup_qp(isert_conn, cma_id, pi_support);
> if (ret)
> goto out_conn_dev;
>
> @@ -2193,7 +2271,7 @@ isert_fast_reg_mr(struct fast_reg_descriptor *fr_desc,
> pagelist_len = isert_map_fr_pagelist(ib_dev, sg_start, sg_nents,
>  
> &fr_desc->data_frpl->page_list[0]);
>
> -   if (!fr_desc->valid) {
> +   if (!fr_desc->data_key_valid) {
> memset(&inv_wr, 0, sizeof(inv_wr));
> inv_wr.opcode = IB_WR_LOCAL_INV;
> inv_wr.ex.invalidate_rkey = fr_desc->data_mr->rkey;
> @@ -2225,7 +2303,7 @@ isert_fast_reg_mr(struct fast_reg_descriptor *fr_desc,
> pr_err("fast registration failed, ret:%d\n", ret);
> return ret;
> }
> -   fr_desc->valid = false;
> +   fr_desc->data_key_valid = false;
>
> ib_sge->lkey = fr_desc->data_mr->lkey;
> ib_sge->addr = fr_desc->data_frpl->page_list[0] + page_off;
> diff --git a/drivers/infiniband/ulp/isert/ib_isert.h 
> b/drivers/infiniband/ulp/isert/ib_isert.h
> index 708a069..fab8b50 100644
> --- a/drivers/infiniband/ulp/isert/ib_isert.h
> +++ b/drivers/infiniband/ulp/isert/ib_isert.h
> @@ -48,11 +48,21 @@ struct iser_tx_desc {
> struct ib_send_wr send_wr;
>  } __packed;
>
> +struct pi_context {
> +   struct ib_mr   *prot_mr;
> +   boolprot_key_valid;
> +   struct ib_fast_reg_page_list   *prot_frpl;
> +   struct ib_mr   *sig_mr;
> +   boolsig_key_valid;
> +};
> +
>  struct fast_reg_descriptor {
> -   struct list_headlist;
> -   struct ib_mr*data_mr;
> -   struct ib_fast_reg_page_list*data_frpl;
> -   boolvalid;
> +   struct list_headlist;
> +   struct ib_mr   *data_mr;
> +   booldata_key_valid;
> +   struct ib_fast_reg_page_list   *data_frpl;
> +   boolprotected;

no need for many bools in one structure... each one needs a bit,
correct? so embed them in one variable

> +   struct pi_context  *pi_ctx;
>  };



>
>  struct isert_rdma_wr {
> @@ -140,6 +150,7 @@ struct isert_cq_desc {
>
>  struct isert_device {
> int use_fastreg;
> +   boolpi_capable;

this one (and its such) is/are derived from the ib device
capabilities, so I would suggest to keep a copy of the caps instead of
derived bools

> int cqs_used;
> int refcount;
> int cq_active_qps[ISERT_MAX_CQ];
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser

2013-11-11 Thread Or Gerlitz

On 12/11/2013 06:34, Nicholas A. Bellinger wrote:

Once iscsi_conn_queue_work() is invoked here to start process context
execution of iscsi_xmitworker() -> iscsi_data_xmit() code, AFAICT there
is no logic in place within iscsi_data_xmit() to honor the last received
MaxCmdSN.

Or to put it another way: what is preventing iscsi_data_xmit() from
completely draining both conn->cmdqueue + conn->requeue, even when the
CmdSN window has potentially been closed again..?


Guys,

Note that the iser initiator transport uses the pass-through command 
submission mode of libiscsi, that is

iscsi_conn_queue_work isn't called from queuecommand at all.

This is b/c we call iscsi_host_allocwith xmit_can_sleep = 0. Hence no 
workqueue is used for the command processing/submission over the wire, 
just a call toiscsi_prep_scsi_cmd_pdu and following that to iser's 
xmit_task callbackwhich isiscsi_iser_task_xmit that calls 
iser_send_command, etc.


Mike, Nic is not using the new locking framework patches for libiscsi, 
as you know they are not upstream

yet...

Or.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] SCSI/libiscsi: Restructure iscsi_tcp r2t response logic

2013-10-27 Thread Or Gerlitz
From: Shlomo Pongratz 

Restructure the iscsi_tcp_r2t_rsp routine in order to avoid allocating
r2t from r2tpool.queue and returning it back in case the parameters
rhdr->data_length and or rhdr->data_offset prohibit the requing.

Since the values of these parameters are known prior to the allocation,
we can pre-check and thus avoid futile allocations.

Signed-off-by: Shlomo Pongratz 
Signed-off-by: Or Gerlitz 
---
 drivers/scsi/libiscsi_tcp.c |   43 ++-
 1 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/drivers/scsi/libiscsi_tcp.c b/drivers/scsi/libiscsi_tcp.c
index 1d58d53..7f59073 100644
--- a/drivers/scsi/libiscsi_tcp.c
+++ b/drivers/scsi/libiscsi_tcp.c
@@ -529,6 +529,8 @@ static int iscsi_tcp_r2t_rsp(struct iscsi_conn *conn, 
struct iscsi_task *task)
struct iscsi_r2t_rsp *rhdr = (struct iscsi_r2t_rsp *)tcp_conn->in.hdr;
struct iscsi_r2t_info *r2t;
int r2tsn = be32_to_cpu(rhdr->r2tsn);
+   u32 data_length;
+   u32 data_offset;
int rc;
 
if (tcp_conn->in.datalen) {
@@ -554,40 +556,39 @@ static int iscsi_tcp_r2t_rsp(struct iscsi_conn *conn, 
struct iscsi_task *task)
return 0;
}
 
-   rc = kfifo_out(&tcp_task->r2tpool.queue, (void*)&r2t, sizeof(void*));
-   if (!rc) {
-   iscsi_conn_printk(KERN_ERR, conn, "Could not allocate R2T. "
- "Target has sent more R2Ts than it "
- "negotiated for or driver has leaked.\n");
-   return ISCSI_ERR_PROTO;
-   }
-
-   r2t->exp_statsn = rhdr->statsn;
-   r2t->data_length = be32_to_cpu(rhdr->data_length);
-   if (r2t->data_length == 0) {
+   data_length = be32_to_cpu(rhdr->data_length);
+   if (data_length == 0) {
iscsi_conn_printk(KERN_ERR, conn,
  "invalid R2T with zero data len\n");
-   kfifo_in(&tcp_task->r2tpool.queue, (void*)&r2t,
-   sizeof(void*));
return ISCSI_ERR_DATALEN;
}
 
-   if (r2t->data_length > session->max_burst)
+   if (data_length > session->max_burst)
ISCSI_DBG_TCP(conn, "invalid R2T with data len %u and max "
  "burst %u. Attempting to execute request.\n",
- r2t->data_length, session->max_burst);
+ data_length, session->max_burst);
 
-   r2t->data_offset = be32_to_cpu(rhdr->data_offset);
-   if (r2t->data_offset + r2t->data_length > scsi_out(task->sc)->length) {
+   data_offset = be32_to_cpu(rhdr->data_offset);
+   if (data_offset + data_length > scsi_out(task->sc)->length) {
iscsi_conn_printk(KERN_ERR, conn,
  "invalid R2T with data len %u at offset %u "
- "and total length %d\n", r2t->data_length,
- r2t->data_offset, scsi_out(task->sc)->length);
-   kfifo_in(&tcp_task->r2tpool.queue, (void*)&r2t,
-   sizeof(void*));
+ "and total length %d\n", data_length,
+ data_offset, scsi_out(task->sc)->length);
return ISCSI_ERR_DATALEN;
}
 
+   rc = kfifo_out(&tcp_task->r2tpool.queue, (void*)&r2t, sizeof(void*));
+   if (!rc) {
+   iscsi_conn_printk(KERN_ERR, conn, "Could not allocate R2T. "
+ "Target has sent more R2Ts than it "
+ "negotiated for or driver has leaked.\n");
+   return ISCSI_ERR_PROTO;
+   }
+
+   r2t->exp_statsn = rhdr->statsn;
+   r2t->data_length = data_length;
+   r2t->data_offset = data_offset;
+
r2t->ttt = rhdr->ttt; /* no flip */
r2t->datasn = 0;
r2t->sent = 0;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] SCSI/libiscsi: Reduce locking contention in fast path

2013-10-27 Thread Or Gerlitz
From: Shlomo Pongratz 

Replace the session lock with two locks, a forward lock and
a backwards lock named frwd_lock and back_lock respectively.

The forward lock protects resources that change while sending a
request to the target, such as cmdsn, queued_cmdsn, and allocating
task from the commands' pool with kfifo_out.

The backward lock protects resources that change while processing
a response or in error path, such as cmdsn_exp, cmdsn_max, and
returning tasks to the commands' pool with kfifo_in.

Under a steady state fast-path situation, that is when one
or more processes/threads submit IO to an iscsi device and
a single kernel upcall (e.g softirq) is dealing with processing
of responses without errors, this patch eliminates the contention
between the queuecommand()/request response/scsi_done() flows
associated with iscsi sessions.

Between the forward and the backward locks exists a strict locking hierarchy.
The mutual exclusion zone protected by the forward lock can enclose the
mutual exclusion zone protected by the backward lock but not vice versa.

For example, in iscsi_conn_teardown or in iscsi_xmit_data when there is a 
failure and __iscsi_put_task is called, the backward lock is taken while the
forward lock is still taken. On the other hand, if in the RX path a nop is
to be sent, for example in iscsi_handle_reject or __iscsi_complete_pdu than
the forward lock is released and the backward lock is taken for the duration
of iscsi_send_nopout, later the backward lock is released and the forward
lock is retaken.

libiscsi_tcp uses two kernel fifos the r2t pool and the r2t queue.

The insertion and deletion from these queues didn't corespond to the
assumption taken by the new forward/backwards session locking paradigm.

That is, in iscsi_tcp_clenup_task which belongs to the RX (backwards)
path, r2t is taken out from r2t queue and inserted to the r2t pool.
In iscsi_tcp_get_curr_r2t which belong to the TX (forward) path, r2t
is also inserted to the r2t pool and another r2t is pulled from r2t queue.

Only in iscsi_tcp_r2t_rsp which is called in the RX path but can requeue
to the TX path, r2t is taken from the r2t pool and inserted to the r2t queue.

In order to cope with this situation, two spin locks were added,
pool2queue and queue2pool. The former protects extracting from the
r2t pool and inserting to the r2t queue, and the later protects the
extracing from the r2t queue and inserting to the r2t pool.

Signed-off-by: Shlomo Pongratz 
Signed-off-by: Or Gerlitz 
---
 drivers/scsi/be2iscsi/be_main.c  |   26 +++---
 drivers/scsi/bnx2i/bnx2i_hwi.c   |   46 
 drivers/scsi/bnx2i/bnx2i_iscsi.c |8 +-
 drivers/scsi/iscsi_tcp.c |   22 ++--
 drivers/scsi/libiscsi.c  |  214 +
 drivers/scsi/libiscsi_tcp.c  |   28 +++--
 drivers/scsi/qla4xxx/ql4_isr.c   |4 +-
 include/scsi/libiscsi.h  |   17 ++-
 include/scsi/libiscsi_tcp.h  |2 +
 9 files changed, 206 insertions(+), 161 deletions(-)

diff --git a/drivers/scsi/be2iscsi/be_main.c b/drivers/scsi/be2iscsi/be_main.c
index 1f37505..1fb16b1 100644
--- a/drivers/scsi/be2iscsi/be_main.c
+++ b/drivers/scsi/be2iscsi/be_main.c
@@ -232,20 +232,20 @@ static int beiscsi_eh_abort(struct scsi_cmnd *sc)
cls_session = starget_to_session(scsi_target(sc->device));
session = cls_session->dd_data;
 
-   spin_lock_bh(&session->lock);
+   spin_lock_bh(&session->frwd_lock);
if (!aborted_task || !aborted_task->sc) {
/* we raced */
-   spin_unlock_bh(&session->lock);
+   spin_unlock_bh(&session->fwrd_lock);
return SUCCESS;
}
 
aborted_io_task = aborted_task->dd_data;
if (!aborted_io_task->scsi_cmnd) {
/* raced or invalid command */
-   spin_unlock_bh(&session->lock);
+   spin_unlock_bh(&session->fwrd_lock);
return SUCCESS;
}
-   spin_unlock_bh(&session->lock);
+   spin_unlock_bh(&session->fwrd_lock);
/* Invalidate WRB Posted for this Task */
AMAP_SET_BITS(struct amap_iscsi_wrb, invld,
  aborted_io_task->pwrb_handle->pwrb,
@@ -307,9 +307,9 @@ static int beiscsi_eh_device_reset(struct scsi_cmnd *sc)
/* invalidate iocbs */
cls_session = starget_to_session(scsi_target(sc->device));
session = cls_session->dd_data;
-   spin_lock_bh(&session->lock);
+   spin_lock_bh(&session->frwd_lock);
if (!session->leadconn || session->state != ISCSI_STATE_LOGGED_IN) {
-   spin_unlock_bh(&session->lock);
+   spin_unlock_bh(&session->frwd_lock);
return FAILED;
}
conn = session->leadconn;
@@ -338,7 +338,7 @@ static int beiscsi_eh_device_reset(struct scsi_cmnd *sc)
nu

[PATCH 0/3] SCSI/libiscsi: Reduce locking contention in fast path

2013-10-27 Thread Or Gerlitz
Hi James, 

This series is about reducing locking contention in the IO submission/response 
processing 
fast path of libiscsi and the various iscsi transports usage of libiscsi code.

We replace the session lock with two locks, a forward lock and a backwards lock 
named frwd_lock and back_lock respectively.

The forward lock protects resources that change while sending a request to the 
target, such as cmdsn, queued_cmdsn, and allocating task from the commands' 
pool with kfifo_out.

The backward lock protects resources that change while processing a response or 
in error path, such as cmdsn_exp, cmdsn_max, and returning tasks to the 
commands' 
pool with kfifo_in. 

The 1st patch in the series is a restructuring patch for iscsi_tcp r2t response
logic, the 2nd is the main patch and the 3rd is cleanup asked by Mike who 
reviewed
the whole series when we posted in over the open-iscsi mailing list.

Under a "steady state" fast-path situation, that is when one or more 
processes/threads 
submit IO to an iscsi device and a single kernel upcall (e.g softirq) is 
dealing 
with processing  of responses without errors, this patch eliminates the 
contention
between the queuecommand()/request response/scsi_done() associated with iscsi 
sessions.

Or and Shlomo.

Shlomo Pongratz (3):
  SCSI/libiscsi: Restructure iscsi_tcp r2t response logic
  SCSI/libiscsi: Reduce locking contention in fast path
  SCSI/libiscsi: Remove unneeded code

 drivers/scsi/be2iscsi/be_main.c  |   26 ++--
 drivers/scsi/bnx2i/bnx2i_hwi.c   |   46 
 drivers/scsi/bnx2i/bnx2i_iscsi.c |8 +-
 drivers/scsi/iscsi_tcp.c |   22 ++--
 drivers/scsi/libiscsi.c  |  226 --
 drivers/scsi/libiscsi_tcp.c  |   71 +++-
 drivers/scsi/qla4xxx/ql4_isr.c   |4 +-
 include/scsi/libiscsi.h  |   17 ++-
 include/scsi/libiscsi_tcp.h  |2 +
 9 files changed, 228 insertions(+), 194 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] SCSI/libiscsi: Remove unneeded code

2013-10-27 Thread Or Gerlitz
From: Shlomo Pongratz 

We never will have a closed window and something on one of those lists.

Signed-off-by: Mike Christie 
Signed-off-by: Shlomo Pongratz 
---
 drivers/scsi/libiscsi.c |   12 
 1 files changed, 0 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 9c28c46..dd66d8a 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -108,18 +108,6 @@ static void __iscsi_update_cmdsn(struct iscsi_session 
*session,
if (exp_cmdsn != session->exp_cmdsn &&
!iscsi_sna_lt(exp_cmdsn, session->exp_cmdsn))
session->exp_cmdsn = exp_cmdsn;
-
-   if (max_cmdsn != session->max_cmdsn &&
-   !iscsi_sna_lt(max_cmdsn, session->max_cmdsn)) {
-   session->max_cmdsn = max_cmdsn;
-   /*
-* if the window closed with IO queued, then kick the
-* xmit thread
-*/
-   if (!list_empty(&session->leadconn->cmdqueue) ||
-   !list_empty(&session->leadconn->mgmtqueue))
-   iscsi_conn_queue_work(session->leadconn);
-   }
 }
 
 void iscsi_update_cmdsn(struct iscsi_session *session, struct iscsi_nopin *hdr)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/iser: Add Discovery support

2013-08-08 Thread Or Gerlitz
To run discovery over iSER we need to advertize the CAP_TEXT_NEGO capability
towards user space. Also need to make sure the login RX buffer is posted when
SendTargets TEXT PDUs are sent. For that end, we use a setting of the
ISCSI_PARAM_DISCOVERY_SESS iscsi param as an indication that this is
discovery session.

Signed-off-by: Or Gerlitz 
---

Hi James, 

I submit this patch through your tree since it depends on the below iscsi
patches (which are acknowledged by Mike) who are targted to 3.12

Or.

[PATCH V2 1/4] scsi_transport_iscsi: Exporting new attrs for iscsi session and 
connection in sysfs
http://marc.info/?l=linux-scsi&m=137267405429329&w=2

[PATCH V2 2/4] libiscsi: Exporting new attrs for iscsi session and connection 
in sysfs
http://marc.info/?l=linux-scsi&m=137267405529330&w=2

 drivers/infiniband/ulp/iser/iscsi_iser.c |3 ++-
 drivers/infiniband/ulp/iser/iser_initiator.c |   11 ++-
 drivers/scsi/libiscsi.c  |5 +
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c 
b/drivers/infiniband/ulp/iser/iscsi_iser.c
index 2e84ef8..1ec78bd 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -672,6 +672,7 @@ static umode_t iser_attr_is_visible(int param_type, int 
param)
case ISCSI_PARAM_TGT_RESET_TMO:
case ISCSI_PARAM_IFACE_NAME:
case ISCSI_PARAM_INITIATOR_NAME:
+   case ISCSI_PARAM_DISCOVERY_SESS:
return S_IRUGO;
default:
return 0;
@@ -701,7 +702,7 @@ static struct scsi_host_template iscsi_iser_sht = {
 static struct iscsi_transport iscsi_iser_transport = {
.owner  = THIS_MODULE,
.name   = "iser",
-   .caps   = CAP_RECOVERY_L0 | CAP_MULTI_R2T,
+   .caps   = CAP_RECOVERY_L0 | CAP_MULTI_R2T | 
CAP_TEXT_NEGO,
/* session management */
.create_session = iscsi_iser_session_create,
.destroy_session= iscsi_iser_session_destroy,
diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c 
b/drivers/infiniband/ulp/iser/iser_initiator.c
index b6d81a8..b31fa1d 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -234,6 +234,7 @@ void iser_free_rx_descriptors(struct iser_conn *ib_conn)
 static int iser_post_rx_bufs(struct iscsi_conn *conn, struct iscsi_hdr *req)
 {
struct iscsi_iser_conn *iser_conn = conn->dd_data;
+   struct iscsi_session *session = conn->session;
 
iser_dbg("req op %x flags %x\n", req->opcode, req->flags);
/* check if this is the last login - going to full feature phase */
@@ -248,7 +249,13 @@ static int iser_post_rx_bufs(struct iscsi_conn *conn, 
struct iscsi_hdr *req)
WARN_ON(iser_conn->ib_conn->post_recv_buf_count != 1);
WARN_ON(atomic_read(&iser_conn->ib_conn->post_send_buf_count) != 0);
 
-   iser_dbg("Initially post: %d\n", ISER_MIN_POSTED_RX);
+   if (session->discovery_sess) {
+   iser_info("Discovery session, re-using login RX buffer\n");
+   return 0;
+   } else
+   iser_info("Normal session, posting batch of RX %d buffers\n",
+ ISER_MIN_POSTED_RX);
+
/* Initial post receive buffers */
if (iser_post_recvm(iser_conn->ib_conn, ISER_MIN_POSTED_RX))
return -ENOMEM;
@@ -425,6 +432,8 @@ int iser_send_control(struct iscsi_conn *conn,
}
 
if (task == conn->login_task) {
+   iser_dbg("op %x dsl %lx, posting login rx buffer\n",
+task->hdr->opcode, data_seg_len);
err = iser_post_recvl(iser_conn->ib_conn);
if (err)
goto send_control_error;
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 86153e0..f17a692 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -3170,6 +3170,7 @@ int iscsi_set_param(struct iscsi_cls_conn *cls_conn,
 {
struct iscsi_conn *conn = cls_conn->dd_data;
struct iscsi_session *session = conn->session;
+   int val;
 
switch(param) {
case ISCSI_PARAM_FAST_ABORT:
@@ -3264,6 +3265,10 @@ int iscsi_set_param(struct iscsi_cls_conn *cls_conn,
case ISCSI_PARAM_DISCOVERY_PARENT_TYPE:
return iscsi_switch_str_param(&session->discovery_parent_type,
  buf);
+   case ISCSI_PARAM_DISCOVERY_SESS:
+   sscanf(buf, "%d", &val);
+   session->discovery_sess = !!val;
+   break;
default:
return -ENOSYS;
}
-- 
1.7.1

--
To unsubscribe from this lis

Re: [PATCH v3 0/13] IB SRP initiator patches for kernel 3.11

2013-07-03 Thread Or Gerlitz

On 03/07/2013 15:41, Bart Van Assche wrote:


[...]

Bart,


The individual patches in this series are as follows:
0001-IB-srp-Fix-remove_one-crash-due-to-resource-exhausti.patch
0002-IB-srp-Fix-race-between-srp_queuecommand-and-srp_cla.patch
0003-IB-srp-Avoid-that-srp_reset_host-is-skipped-after-a-.patch
0004-IB-srp-Fail-I-O-fast-if-target-offline.patch
0005-IB-srp-Skip-host-settle-delay.patch
0006-IB-srp-Maintain-a-single-connection-per-I_T-nexus.patch
0007-IB-srp-Keep-rport-as-long-as-the-IB-transport-layer.patch
0008-scsi_transport_srp-Add-transport-layer-error-handlin.patch
0009-IB-srp-Add-srp_terminate_io.patch
0010-IB-srp-Use-SRP-transport-layer-error-recovery.patch
0011-IB-srp-Start-timers-if-a-transport-layer-error-occur.patch
0012-IB-srp-Fail-SCSI-commands-silently.patch
0013-IB-srp-Make-HCA-completion-vector-configurable.patch
0014-IB-srp-Make-transport-layer-retry-count-configurable.patch
0015-IB-srp-Bump-driver-version-and-release-date.patch


Some of these patches were already picked by Roland (SB), I would 
suggest that you

post V4 and drop the ones which were accepted.

e8ca413 IB/srp: Bump driver version and release date
4b5e5f4 IB/srp: Make HCA completion vector configurable
96fc248 IB/srp: Maintain a single connection per I_T nexus
99e1c13 IB/srp: Fail I/O fast if target offline
2742c1d IB/srp: Skip host settle delay
086f44f IB/srp: Avoid skipping srp_reset_host() after a transport error
1fe0cb8 IB/srp: Fix remove_one crash due to resource exhaustion

Also, Would help if you use the --cover-letter of git format-patch and
the resulted cover letter  (patch 0/N) as it has standard content which
you can enhance and place your additions.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] target fixes for v3.10-rc2

2013-05-22 Thread Or Gerlitz
On Thu, May 16, 2013 at 7:53 AM, Nicholas A. Bellinger
 wrote:

> Yes, that's the patch included in the PULL request here:
> http://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/commit/?h=queue&id=52c07423a819091b0fe9abbf26977098b996f85b

Cool!

>
> Btw, It still needs to be enabled in userspace, but appending a
> rd_nullio=1 to the /control parameters with the following lio-utils
> patch should work for your testing purposes atm..

are you merging this into the lio-utils git tree and maybe sparing a
word on that in the documentation (where?) we'd like to point people
to these bits and guidelines.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] target fixes for v3.10-rc2

2013-05-15 Thread Or Gerlitz
On Wed, May 15, 2013 at 5:50 PM, Nicholas A. Bellinger
 wrote:
> A handful of fixes + minor changes this time around, along with one
> important >= v3.9 regression fix for IBLOCK backends.  The highlights include:

Hi Nic,

Are you pushing also a patch which would allow to actually test NULL
based RAM target (e.g target backend that does no actual IO)

Or.


> Nicholas Bellinger (5):
>   iscsi-target: Fix NULL pointer dereference in iscsit_send_reject
>   target/rd: Add ramdisk bit for NULLIO operation
>   iscsi-target: Fix typos in RDMAEXTENSIONS macro usage
>   MAINTAINERS: Update target git tree URL
>   target/iblock: Fix WCE=1 + DPOFUA=1 backend WRITE regression
>
> Shlomo Pongratz (1):
>   iscsi-target: Fix processing of OOO commands
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: initial LIO iSER performance numbers [was: GIT PULL] target updates for v3.10-rc1)

2013-05-02 Thread Or Gerlitz
On Thu, May 2, 2013 at 10:31 PM, Nicholas A. Bellinger
 wrote:

>> We used RAMDISK_MCP backend which was patched to act as NULL device, so
>> we can test the raw iSER wire performance.

> Btw, I'll be including a similar patch to allow for RAMDISK_NULL to be
> configured as a NULL device mode.

yep, that would be very helpful, so people can do that sort of testing
without hacks...

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: initial LIO iSER performance numbers [was: GIT PULL] target updates for v3.10-rc1)

2013-05-02 Thread Or Gerlitz

On 30/04/2013 05:59, Nicholas A. Bellinger wrote:

Hello Linus!

Here are the target pending changes for the v3.10-rc1 merge window.

Please go ahead and pull from:

   git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git 
for-next-merge

The highlights this round include:

  - Add fileio support for WRITE_SAME w/ UNMAP=1 discard (asias)
  - Add fileio support for UNMAP discard (asias)
  - Add tcm_vhost hotplug support to work with upstream QEMU
vhost-scsi-pci code (asias + mst)
  - Check for aborted sequence in tcm_fc response path (mdr)
  - Add initial iscsit_transport support into iscsi-target code (nab)
  - Refactor iscsi-target RX PDU logic + export request PDU handling  (nab)
  - Refactor iscsi-target TX queue logic + export response PDU creation (nab)
  - Add new iSCSI Extentions for RDMA (ISER) target driver (Or + nab)

The biggest changes revolve around iscsi-target refactoring in order to
support the iser-target driver.  This includes the conversion of the
iscsi-target data-path to use modern se_cmd->cmd_kref counting, and
allowing transport independent aspects of RX/TX PDU request/response
handling be shared across existing traditional iscsi-target code, and
the new iser-target code.


Hi Nic, everyone,

So LIO iser target code is now merged into Linus tree, and will be in 
kernel 3.10, exciting!


Here's some data on raw performance numbers we were able to get with the 
LIO iser code.


For single initiator and single lun, block sizes varying over the range 
1KB,2KB... 128KB

doing random read:

1KB 227,870K
2KB 458,099K
4KB 909,761K
8KB 1,679,922K
16KB 3,233,753K
32KB 4,905,139K
64KB 5,294,873K
128KB 5,565,235K

When enlarging the number of luns and still with single initiator, for 
1KB randomreads we get:


1 LUN  = 230k IOPS
2 LUNs = 420k IOPS
4 LUNs = 740k IOPS

When enlarging the number of initiators, and each having four lunswe get 
for 1KB random reads:


1 initiator  x 4 LUNs = 740k  IOPS
2 initiators x 4 LUNs = 1480k IOPS
3 initiators x 4 LUNs = 1570k IOPS

So all in all, things scale pretty nicely, and we observe a some bottleneck
in the IOPS rate around 1.6 Million IOPS, so there's where to improve...

Here's the fio command line used by the initiators

$ fio --cpumask=0xfc --rw=randread --bs=1k --numjobs=2 --iodepth=128 
--runtime=62 --time_based --size=1073741824k --loops=1 --ioengine=libaio 
--direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 
--norandommap --group_reporting --exitall --name 
dev-sdb-randread-1k-2thr-libaio-128iodepth-62sec --filename=/dev/sdb


And some details on the setup:

The nodes are HP ProLiant DL380p Gen8 with the following CPU: Intel(R) 
Xeon(R) CPU E5-2650 0 @ 2.00GHz
two NUMA nodes with eight cores each, 32GB RAM, PCI express gen3 8x, the 
HCA being Mellanox ConnectX3 with firmware 2.11.500


The target node was running upstream kernel and the initiators RHEL 6.3 
kernel, all X86_64


We used RAMDISK_MCP backend which was patched to act as NULL device, so 
we can test the raw iSER wire performance.


Or.



Thanks to Or Gerlitz + Mellanox for supporting the iser-target development 
effort!

Thank you,

--nab

Andy Grover (2):
   target/iscsi: Remove chap_set_random()
   target/iscsi: Use ISCSI_LOGIN_CURRENT/NEXT_STAGE macros

Asias He (10):
   target/file: Add WRITE_SAME w/ UNMAP=1 emulation support
   target/file: Add UNMAP emulation support
   target/file: Add fd_do_unmap() helper
   target/iblock: Add iblock_do_unmap() helper
   target: Add sbc_execute_unmap() helper
   target/file: Set is_nonrot attribute
   tcm_vhost: Refactor the lock nesting rule
   tcm_vhost: Add hotplug/hotunplug support
   tcm_vhost: Add ioctl to get and set events missed flag
   tcm_vhost: Enable VIRTIO_SCSI_F_HOTPLUG

Jörn Engel (2):
   qla2xxx: Remove unused function
   target: Change default sense key of NOT_READY

Mark Rustad (1):
   tcm_fc: Check for aborted sequence

Nicholas Bellinger (9):
   target: Add export of target_get_sess_cmd symbol
   iscsi-target: Add iscsit_transport API template
   iscsi-target: Initial traditional TCP conversion to iscsit_transport
   iscsi-target: Add iser-target parameter keys + setup during login
   iscsi-target: Add per transport iscsi_cmd alloc/free
   iscsi-target: Refactor RX PDU logic + export request PDU handling
   iscsi-target: Refactor TX queue logic + export response PDU creation
   iscsi-target: Add iser network portal attribute
   iser-target: Add iSCSI Extensions for RDMA (iSER) target driver

Wei Yongjun (1):
   tcm_fc: using kfree_rcu() to simplify the code

  drivers/infiniband/Kconfig |1 +
  drivers/infiniband/Makefile|1 +
  drivers/infiniband/ulp/isert/Kconfig   |5 +
  drivers/infiniband/ulp/isert/Makefile  |2 +
  drivers/infiniband/ulp/isert/ib_isert.c| 2281 
  drivers/infiniband/ulp/isert/ib_isert.h|  138 ++
  drivers/infiniband/ulp/isert/isert_proto.h 

Re: [RFC-v4 9/9] iser-target: Add iSCSI Extensions for RDMA (iSER) target driver

2013-04-21 Thread Or Gerlitz

On 12/04/2013 23:52, Nicholas A. Bellinger wrote:

From: Nicholas Bellinger

This patch adds support for iSCSI Extensions for RDMA target mode,
and includes CQ pooling per isert_device context distributed across
multiple active iser target sessions.


Cool, feel free to ad my S.O.B here

Signed-off-by: Or Gerlitz 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v3 9/9] iser-target: Add iSCSI Extensions for RDMA (iSER) target driver

2013-04-04 Thread Or Gerlitz

On 04/04/2013 10:24, Nicholas A. Bellinger wrote:

+static int
+isert_put_response(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
+{
+   struct isert_cmd *isert_cmd = container_of(cmd,
+   struct isert_cmd, iscsi_cmd);
+   struct isert_conn *isert_conn = (struct isert_conn *)conn->context;
+   struct ib_send_wr *send_wr = &isert_cmd->tx_desc.send_wr;
+   struct iscsi_scsi_rsp *hdr = (struct iscsi_scsi_rsp *)
+   &isert_cmd->tx_desc.iscsi_header;
+
+   isert_create_send_desc(isert_conn, isert_cmd, &isert_cmd->tx_desc);
+   iscsit_build_rsp_pdu(cmd, conn, true, hdr);
+   isert_init_tx_hdrs(isert_conn, &isert_cmd->tx_desc);
+   /*
+* Attach SENSE DATA payload to iSCSI Response PDU
+*/
+   if (cmd->se_cmd.sense_buffer &&
+   ((cmd->se_cmd.se_cmd_flags & SCF_TRANSPORT_TASK_SENSE) ||
+   (cmd->se_cmd.se_cmd_flags & SCF_EMULATED_TASK_SENSE))) {
+   struct ib_device *ib_dev = isert_conn->conn_cm_id->device;
+   struct ib_sge *tx_dsg = &isert_cmd->tx_desc.tx_sg[1];
+   u32 padding, sense_len;
+
+   put_unaligned_be16(cmd->se_cmd.scsi_sense_length,
+  cmd->sense_buffer);
+   cmd->se_cmd.scsi_sense_length += sizeof(__be16);
+
+   padding = -(cmd->se_cmd.scsi_sense_length) & 3;
+   hton24(hdr->dlength, (u32)cmd->se_cmd.scsi_sense_length);
+   sense_len = cmd->se_cmd.scsi_sense_length + padding;
+
+   isert_cmd->sense_buf_dma = ib_dma_map_single(ib_dev,
+   (void *)cmd->sense_buffer, sense_len,
+   DMA_TO_DEVICE);
+
+   isert_cmd->sense_buf_len = sense_len;
+   ib_dma_sync_single_for_cpu(ib_dev, isert_cmd->sense_buf_dma,
+  sense_len, DMA_TO_DEVICE);
+   ib_dma_sync_single_for_device(ib_dev, isert_cmd->sense_buf_dma,
+ sense_len, DMA_TO_DEVICE);
+


you just called dma_map_single, and not going to touch the buffer before 
posting it to the wire,
there's no point to sync it for the cpu and for the device, remove these 
calls.




+   tx_dsg->addr = isert_cmd->sense_buf_dma;
+   tx_dsg->length   = sense_len;
+   tx_dsg->lkey = isert_conn->conn_mr->lkey;
+   isert_cmd->tx_desc.num_sge = 2;
+   }
+
+   isert_init_send_wr(isert_cmd, send_wr);
+
+   pr_debug("Posting SCSI Response IB_WR_SEND >>\n");
+
+   return isert_post_response(isert_conn, isert_cmd);
+}


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v3 9/9] iser-target: Add iSCSI Extensions for RDMA (iSER) target driver

2013-04-04 Thread Or Gerlitz

On 04/04/2013 10:24, Nicholas A. Bellinger wrote:


+#define ISER_RECV_DATA_SEG_LEN  8192
+#define ISER_RX_PAYLOAD_SIZE(ISER_HEADERS_LEN + ISER_RECV_DATA_SEG_LEN)
[...]
+#define ISER_RX_PAD_SIZE   (16384 - (ISER_RX_PAYLOAD_SIZE + \
+   sizeof(u64) + sizeof(struct ib_sge)))


We're eating here too much ram for the pad, you need 8K + something, so 
the pad can count down
from 12K and not 16K which means each such element will consume three 
pages and not four.



+struct iser_rx_desc {
+   struct iser_hdr iser_header;
+   struct iscsi_hdr iscsi_header;
+   chardata[ISER_RECV_DATA_SEG_LEN];
+   u64 dma_addr;
+   struct ib_sge   rx_sg;
+   charpad[ISER_RX_PAD_SIZE];
+} __packed;
+
+struct isert_rx_desc {
+   struct isert_conn   *desc_conn;
+   struct work_struct  desc_work;
+   struct iser_rx_desc desc;
+} __packed;


You have way enough room in the pad field of struct iser_rx_desc to 
place there the two fields
added by struct isert_rx_desc (and you only use struct iser_rx_desc from 
within isert_rx_desc) --> any reason

not to unify  them?



Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v3 9/9] iser-target: Add iSCSI Extensions for RDMA (iSER) target driver

2013-04-04 Thread Or Gerlitz

On 04/04/2013 10:24, Nicholas A. Bellinger wrote:

+
+void isert_cq_tx_callback(struct ib_cq *, void *);
+void isert_cq_rx_callback(struct ib_cq *, void *);
+void isert_free_rx_descriptors(struct isert_conn *);

any reason not to have these as static functions (same for isert_cq_rx_work)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 11/12] iser-target: Add logic for core

2013-04-02 Thread Or Gerlitz
On Sat, Mar 23, 2013 at 1:55 AM, Nicholas A. Bellinger
 wrote:
> +static int
> +isert_put_response(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
> +{
> +   struct isert_cmd *isert_cmd = container_of(cmd,
> +   struct isert_cmd, iscsi_cmd);
> +   struct isert_conn *isert_conn = (struct isert_conn *)conn->context;
> +   struct ib_send_wr *send_wr = &isert_cmd->tx_desc.send_wr, *wr_failed;
> +   struct iscsi_scsi_rsp *hdr = (struct iscsi_scsi_rsp *)
> +   &isert_cmd->tx_desc.iscsi_header;
> +   int ret;
> +
> +   isert_create_send_desc(isert_conn, isert_cmd, &isert_cmd->tx_desc);
> +   iscsit_build_rsp_pdu(cmd, conn, true, hdr);
> +   isert_init_tx_hdrs(isert_conn, &isert_cmd->tx_desc);
> +   /*
> +* Attach SENSE DATA payload to iSCSI Response PDU
> +*/
> +   if (cmd->se_cmd.sense_buffer &&
> +   ((cmd->se_cmd.se_cmd_flags & SCF_TRANSPORT_TASK_SENSE) ||
> +   (cmd->se_cmd.se_cmd_flags & SCF_EMULATED_TASK_SENSE))) {
> +   struct ib_device *ib_dev = isert_conn->conn_cm_id->device;
> +   struct ib_sge *tx_dsg = &isert_cmd->tx_desc.tx_sg[1];
> +   u32 padding, sense_len;
> +
> +   put_unaligned_be16(cmd->se_cmd.scsi_sense_length,
> +  cmd->sense_buffer);
> +   cmd->se_cmd.scsi_sense_length += sizeof(__be16);
> +
> +   padding = -(cmd->se_cmd.scsi_sense_length) & 3;
> +   hton24(hdr->dlength, (u32)cmd->se_cmd.scsi_sense_length);
> +   sense_len = cmd->se_cmd.scsi_sense_length + padding;
> +
> +   isert_cmd->sense_buf_dma = ib_dma_map_single(ib_dev,
> +   (void *)cmd->sense_buffer, sense_len,
> +   DMA_TO_DEVICE);
> +
> +   isert_cmd->sense_buf_len = sense_len;
> +   ib_dma_sync_single_for_cpu(ib_dev, isert_cmd->sense_buf_dma,
> +  sense_len, DMA_TO_DEVICE);
> +   ib_dma_sync_single_for_device(ib_dev, 
> isert_cmd->sense_buf_dma,
> + sense_len, DMA_TO_DEVICE);
> +
> +   tx_dsg->addr= isert_cmd->sense_buf_dma;
> +   tx_dsg->length  = sense_len;
> +   tx_dsg->lkey= isert_conn->conn_mr->lkey;
> +   isert_cmd->tx_desc.num_sge = 2;
> +   }
> +
> +   isert_cmd->rdma_wr.iser_ib_op = ISER_IB_SEND;

[...]
> +   send_wr->wr_id = (unsigned long)&isert_cmd->tx_desc;
> +   send_wr->opcode = IB_WR_SEND;
> +   send_wr->send_flags = IB_SEND_SIGNALED;
> +   send_wr->sg_list = &isert_cmd->tx_desc.tx_sg[0];
> +   send_wr->num_sge = isert_cmd->tx_desc.num_sge;
> +   send_wr->next = NULL;
[...]

These seven lines are repeated 3-5 times below, a quick question and suggestion:

1. can't we do it beforehand?
2. we can move to helper function and call it when needed.


















> +isert_put_tm_rsp(struct iscsi_cmd *cmd, struct iscsi_conn *conn)
> +{
> +   struct isert_cmd *isert_cmd = container_of(cmd,
> +   struct isert_cmd, iscsi_cmd);
> +   struct isert_conn *isert_conn = (struct isert_conn *)conn->context;
> +   struct ib_send_wr *send_wr = &isert_cmd->tx_desc.send_wr, *wr_failed;
> +   int ret;
> +
> +   isert_create_send_desc(isert_conn, isert_cmd, &isert_cmd->tx_desc);
> +   iscsit_build_task_mgt_rsp(cmd, conn, (struct iscsi_tm_rsp *)
> + &isert_cmd->tx_desc.iscsi_header);
> +   isert_init_tx_hdrs(isert_conn, &isert_cmd->tx_desc);
> +
> +   isert_cmd->rdma_wr.iser_ib_op = ISER_IB_SEND;
> +   send_wr->wr_id = (unsigned long)&isert_cmd->tx_desc;
> +   send_wr->opcode = IB_WR_SEND;
> +   send_wr->send_flags = IB_SEND_SIGNALED;
> +   send_wr->sg_list = &isert_cmd->tx_desc.tx_sg[0];
> +   send_wr->num_sge = isert_cmd->tx_desc.num_sge;
> +   send_wr->next = NULL;
> +
> +   pr_debug("Posting Task Management Response IB_WR_SEND 
> >>\n");
> +
> +   atomic_inc(&isert_conn->post_send_buf_count);
> +
> +   ret = ib_post_send(isert_conn->conn_qp, &isert_cmd->tx_desc.send_wr,
> +   &wr_failed);
> +   if (ret) {
> +   pr_err("isert_put_tm_rsp() failed to post wr: %d\n", ret);
> +   atomic_dec(&isert_conn->post_send_buf_count);
> +   return ret;
> +   }
> +   return 0;
> +}
> +
> +static int
> +isert_build_rdma_wr(struct isert_conn *isert_conn, struct isert_cmd 
> *isert_cmd,
> +   struct ib_sge *ib_sge, struct ib_send_wr *send_wr,
> +   u32 data_left, u32 offset)
> +{
> +   struct iscsi_cmd *cmd = &isert_cmd->iscsi_cmd;
> +   struct scatterlist *sg_start, *tmp_sg;
> +   struct ib_device *ib_dev = isert_conn->conn_cm_id->device;
> +   u32 sg_off, pag

Re: [RFC-v2 10/12] iser-target: Add logic for verbs

2013-04-02 Thread Or Gerlitz
On Sat, Mar 23, 2013 at 1:55 AM, Nicholas A. Bellinger
 wrote:

+   device->dev_mr = ib_get_dma_mr(device->dev_pd,
+   IB_ACCESS_LOCAL_WRITE |
+   IB_ACCESS_REMOTE_WRITE |
+   IB_ACCESS_REMOTE_READ);


remove IB_ACCESS_REMOTE_yyy access  flags, you're not letting anyone
do remote rdma to this memory region
> +/***
> + * This file contains iSCSI extentions for RDMA (iSER) Verbs
> + *
> + * (c) Copyright 2013 RisingTide Systems LLC.
> + *
> + * Nicholas A. Bellinger 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + 
> /
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +
> +#include "isert_proto.h"
> +#include "isert_base.h"
> +#include "isert_core.h"
> +
> +#defineISERT_MAX_CONN  8
> +#define ISER_MAX_RX_CQ_LEN (ISERT_QP_MAX_RECV_DTOS * ISERT_MAX_CONN)
> +#define ISER_MAX_TX_CQ_LEN (ISERT_QP_MAX_REQ_DTOS  * ISERT_MAX_CONN)
> +
> +static DEFINE_MUTEX(device_list_mutex);
> +static LIST_HEAD(device_list);
> +
> +static void
> +isert_qp_event_callback(struct ib_event *e, void *context)
> +{
> +   struct isert_conn *isert_conn = (struct isert_conn *)context;
> +
> +   pr_err("isert_qp_event_callback event: %d\n", e->event);
> +   switch (e->event) {
> +   case IB_EVENT_COMM_EST:
> +   rdma_notify(isert_conn->conn_cm_id, IB_EVENT_COMM_EST);
> +   break;
> +   default:
> +   break;
> +   }
> +}
> +
> +static int
> +isert_query_device(struct ib_device *ib_dev, struct ib_device_attr *devattr)
> +{
> +   int ret;
> +
> +   ret = ib_query_device(ib_dev, devattr);
> +   if (ret) {
> +   pr_err("ib_query_device() failed: %d\n", ret);
> +   return ret;
> +   }
> +   pr_debug("devattr->max_sge: %d\n", devattr->max_sge);
> +   pr_debug("devattr->max_sge_rd: %d\n", devattr->max_sge_rd);
> +
> +   return 0;
> +}
> +
> +static int
> +isert_conn_setup_qp(struct isert_conn *isert_conn, struct rdma_cm_id *cma_id)
> +{
> +   struct isert_device *device = isert_conn->conn_device;
> +   struct ib_qp_init_attr attr;
> +   struct ib_device_attr devattr;
> +   int ret, index, min_index = 0;
> +
> +   memset(&devattr, 0, sizeof(struct ib_device_attr));
> +   ret = isert_query_device(cma_id->device, &devattr);
> +   if (ret)
> +   return ret;
> +
> +   mutex_lock(&device_list_mutex);
> +   for (index = 0; index < device->cqs_used; index++)
> +   if (device->cq_active_qps[index] <
> +   device->cq_active_qps[min_index])
> +   min_index = index;
> +   device->cq_active_qps[min_index]++;
> +   pr_debug("isert_conn_setup_qp: Using min_index: %d\n", min_index);
> +   mutex_unlock(&device_list_mutex);
> +
> +   memset(&attr, 0, sizeof(struct ib_qp_init_attr));
> +   attr.event_handler = isert_qp_event_callback;
> +   attr.qp_context = isert_conn;
> +   attr.send_cq = device->dev_tx_cq[min_index];
> +   attr.recv_cq = device->dev_rx_cq[min_index];
> +   attr.cap.max_send_wr = ISERT_QP_MAX_REQ_DTOS;
> +   attr.cap.max_recv_wr = ISERT_QP_MAX_RECV_DTOS;
> +   /*
> +* FIXME: Use devattr.max_sge - 2 for max_send_sge as
> +* work-around for RDMA_READ..
> +*/
> +   attr.cap.max_send_sge = devattr.max_sge - 2;
> +   isert_conn->max_sge = attr.cap.max_send_sge;
> +
> +   attr.cap.max_recv_sge = 1;
> +   attr.sq_sig_type = IB_SIGNAL_REQ_WR;
> +   attr.qp_type = IB_QPT_RC;
> +
> +   pr_debug("isert_conn_setup_qp cma_id->device: %p\n",
> +cma_id->device);
> +   pr_debug("isert_conn_setup_qp conn_pd->device: %p\n",
> +isert_conn->conn_pd->device);
> +
> +   ret = rdma_create_qp(cma_id, isert_conn->conn_pd, &attr);
> +   if (ret) {
> +   pr_err("rdma_create_qp failed for cma_id %d\n", ret);
> +   return ret;
> +   }
> +   isert_conn->conn_qp = cma_id->qp;
> +   pr_debug("rdma_create_qp() returned success 
> >.\n");
> +
> +   return 0;
> +}
> +
> +static void
> +isert_cq_event_callback(struct ib_event *e, void *context)
> +{
> +   pr_debug("isert_cq_event_callback event: %d\n", e->e

Re: [RFC-v2 10/12] iser-target: Add logic for verbs

2013-04-02 Thread Or Gerlitz
On Sat, Mar 23, 2013 at 1:55 AM, Nicholas A. Bellinger
 wrote:
[...]
> +static void
> +isert_qp_event_callback(struct ib_event *e, void *context)
> +{
> +   struct isert_conn *isert_conn = (struct isert_conn *)context;
> +
> +   pr_err("isert_qp_event_callback event: %d\n", e->event);
> +   switch (e->event) {
> +   case IB_EVENT_COMM_EST:
> +   rdma_notify(isert_conn->conn_cm_id, IB_EVENT_COMM_EST);
> +   break;
> +   default:
> +   break;
> +   }
> +}
[...]
> +static void
> +isert_cq_event_callback(struct ib_event *e, void *context)
> +{
> +   pr_debug("isert_cq_event_callback event: %d\n", e->event);
> +
> +   switch (e->event) {
> +   case IB_EVENT_QP_LAST_WQE_REACHED:
> +   pr_warn("Reached TX IB_EVENT_QP_LAST_WQE_REACHED:\n");
> +   break;
> +   default:
> +   pr_warn("Unknown e->event; %d\n", e->event);
> +   break;
> +   }
> +}

This is QP not CQ event, move the case for it to QP event hander
isert_qp_event_callback




















> +   ib_destroy_cq(device->dev_tx_cq[i]);
> +   device->dev_rx_cq[i] = NULL;
> +   device->dev_tx_cq[i] = NULL;
> +   }
> +
> +   ib_dereg_mr(device->dev_mr);
> +   ib_dealloc_pd(device->dev_pd);
> +   kfree(device->cq_desc);
> +}
> +
> +static void
> +isert_device_try_release(struct isert_device *device)
> +{
> +   mutex_lock(&device_list_mutex);
> +   device->refcount--;
> +   if (!device->refcount) {
> +   isert_free_device_ib_res(device);
> +   list_del(&device->dev_node);
> +   kfree(device);
> +   }
> +   mutex_unlock(&device_list_mutex);
> +}
> +
> +static struct isert_device *
> +isert_device_find_by_ib_dev(struct rdma_cm_id *cma_id)
> +{
> +   struct isert_device *device;
> +
> +   mutex_lock(&device_list_mutex);
> +   list_for_each_entry(device, &device_list, dev_node) {
> +   if (device->ib_device->node_guid == 
> cma_id->device->node_guid) {
> +   device->refcount++;
> +   mutex_unlock(&device_list_mutex);
> +   return device;
> +   }
> +   }
> +
> +   device = kzalloc(sizeof(struct isert_device), GFP_KERNEL);
> +   if (!device) {
> +   mutex_unlock(&device_list_mutex);
> +   return NULL;
> +   }
> +
> +   INIT_LIST_HEAD(&device->dev_node);
> +
> +   device->ib_device = cma_id->device;
> +   if (isert_create_device_ib_res(device)) {
> +   kfree(device);
> +   mutex_unlock(&device_list_mutex);
> +   return NULL;
> +   }
> +
> +   device->refcount++;
> +   list_add_tail(&device->dev_node, &device_list);
> +   mutex_unlock(&device_list_mutex);
> +
> +   return device;
> +}
> +
> +static int
> +isert_connect_request(struct rdma_cm_id *cma_id, struct rdma_cm_event *event)
> +{
> +   struct iscsi_np *np = cma_id->context;
> +   struct isert_np *isert_np = np->np_context;
> +   struct isert_conn *isert_conn;
> +   struct isert_device *device;
> +   struct ib_device *ib_dev = cma_id->device;
> +   int ret;
> +
> +   pr_debug("Entering isert_connect_request cma_id: %p, context: %p\n",
> +cma_id, cma_id->context);
> +
> +   isert_conn = kzalloc(sizeof(struct isert_conn), GFP_KERNEL);
> +   if (!isert_conn) {
> +   pr_err("Unable to allocate isert_conn\n");
> +   return -ENOMEM;
> +   }
> +   isert_conn->state = ISER_CONN_INIT;
> +   INIT_LIST_HEAD(&isert_conn->conn_accept_node);
> +   init_completion(&isert_conn->conn_login_comp);
> +   init_waitqueue_head(&isert_conn->conn_wait);
> +   kref_init(&isert_conn->conn_kref);
> +   kref_get(&isert_conn->conn_kref);
> +
> +   cma_id->context = isert_conn;
> +   isert_conn->conn_cm_id = cma_id;
> +   isert_conn->responder_resources = 
> event->param.conn.responder_resources;
> +   isert_conn->initiator_depth = event->param.conn.initiator_depth;
> +   pr_debug("Using responder_resources: %u initiator_depth: %u\n",
> +isert_conn->responder_resources, 
> isert_conn->initiator_depth);
> +
> +   isert_conn->login_buf = kzalloc(ISCSI_DEF_MAX_RECV_SEG_LEN +
> +   ISER_RX_LOGIN_SIZE, GFP_KERNEL);
> +   if (!isert_conn->login_buf) {
> +   pr_err("Unable to allocate isert_conn->login_buf\n");
> +   ret = -ENOMEM;
> +   goto out;
> +   }
> +
> +   isert_conn->login_req_buf = isert_conn->login_buf;
> +   isert_conn->login_rsp_buf = isert_conn->login_buf +
> +   ISCSI_DEF_MAX_RECV_SEG_LEN;
> +   pr_debug("Set login_buf: %p login_req_buf: %p login_rsp_buf: %p\n",
> +isert_conn->login_buf, isert_conn->login_req_buf,
> +  

Re: [RFC-v2 11/12] iser-target: Add logic for core

2013-04-02 Thread Or Gerlitz

On 23/03/2013 01:55, Nicholas A. Bellinger wrote:

+++ b/drivers/infiniband/ulp/isert/isert_core.h
@@ -0,0 +1,11 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+
+extern void iser_cq_tx_tasklet(unsigned long);
+extern void isert_cq_tx_callback(struct ib_cq *, void *);
+extern void iser_cq_rx_tasklet(unsigned long);
+extern void isert_cq_rx_callback(struct ib_cq *, void *);
+extern void isert_free_rx_descriptors(struct isert_conn *);


no need for externs here too, agree? also, any reason for these two 
header files not

to be merged into one or into one of the other header files?


diff --git a/drivers/infiniband/ulp/isert/isert_verbs.h 
b/drivers/infiniband/ulp/isert/isert_verbs.h
new file mode 100644
index 000..da7924d
--- /dev/null
+++ b/drivers/infiniband/ulp/isert/isert_verbs.h
@@ -0,0 +1,5 @@
+extern void isert_connect_release(struct isert_conn *);
+extern void isert_put_conn(struct isert_conn *);
+extern int isert_cma_handler(struct rdma_cm_id *, struct rdma_cm_event *);
+extern int isert_post_recv(struct isert_conn *, u32);
+extern int isert_post_send(struct isert_conn *, struct iser_tx_desc *);


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 10/12] iser-target: Add logic for verbs

2013-04-01 Thread Or Gerlitz

On 23/03/2013 01:55, Nicholas A. Bellinger wrote:

+++ b/drivers/infiniband/ulp/isert/isert_verbs.h
@@ -0,0 +1,5 @@
+extern void isert_connect_release(struct isert_conn *);
+extern void isert_put_conn(struct isert_conn *);
+extern int isert_cma_handler(struct rdma_cm_id *, struct rdma_cm_event *);
+extern int isert_post_recv(struct isert_conn *, u32);
+extern int isert_post_send(struct isert_conn *, struct iser_tx_desc *);


why use extern here? maybe a left over from V1?
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 10/11] iser-target: Add logic for core

2013-03-14 Thread Or Gerlitz

On 08/03/2013 03:45, Nicholas A. Bellinger wrote:

+void
+iser_cq_tx_tasklet(unsigned long data)
+{
+   struct isert_conn *isert_conn = (struct isert_conn *)data;
+   struct ib_cq *tx_cq = isert_conn->conn_tx_cq;
+   struct iser_tx_desc *tx_desc;
+   struct ib_wc wc;
+
+   while (ib_poll_cq(tx_cq, 1, &wc) == 1) {
+   tx_desc = (struct iser_tx_desc *)(unsigned long)wc.wr_id;
+
+   if (wc.status == IB_WC_SUCCESS) {
+   isert_send_completion(tx_desc, isert_conn);
+   } else {
+   pr_debug("TX wc.status != IB_WC_SUCCESS 
>>\n");
+   isert_dump_ib_wc(&wc);
+   atomic_dec(&isert_conn->post_send_buf_count);
+   isert_cq_comp_err(tx_desc, isert_conn);
+   }
+   }
+
+   ib_req_notify_cq(tx_cq, IB_CQ_NEXT_COMP);
+}
+
+void
+isert_cq_tx_callback(struct ib_cq *cq, void *context)
+{
+   struct isert_conn *isert_conn = context;
+
+   tasklet_schedule(&isert_conn->conn_tx_tasklet);
+}
+
+void
+iser_cq_rx_tasklet(unsigned long data)
+{
+   struct isert_conn *isert_conn = (struct isert_conn *)data;
+   struct ib_cq *rx_cq = isert_conn->conn_rx_cq;
+   struct iser_rx_desc *rx_desc;
+   struct ib_wc wc;
+   unsigned long xfer_len;
+
+   while (ib_poll_cq(rx_cq, 1, &wc) == 1) {
+   rx_desc = (struct iser_rx_desc *)(unsigned long)wc.wr_id;
+
+   if (wc.status == IB_WC_SUCCESS) {
+   xfer_len = (unsigned long)wc.byte_len;
+   isert_rx_completion(rx_desc, isert_conn, xfer_len);
+   } else {
+   pr_debug("RX wc.status != IB_WC_SUCCESS 
>>\n");
+   if (wc.status != IB_WC_WR_FLUSH_ERR)
+   isert_dump_ib_wc(&wc);
+
+   isert_conn->post_recv_buf_count--;
+   isert_cq_comp_err(NULL, isert_conn);
+   }
+   }
+
+   ib_req_notify_cq(rx_cq, IB_CQ_NEXT_COMP);
+}


We currently have here the following sequence of calls


isert_cq_rx_callback --> tasklet_schedule ...  --> ...
ib_poll_cq -->  isert_rx_completion --> isert_rx_queue_desc --> 
isert_rx_queue_desc --> queue_work (context switch)


isert_cq_tx_callback --> tasklet_schedule ...  --> ...
ib_poll_cq -->  isert_send_completion --> isert_completion_rdma_read 
-->  queue_work (context switch)


which means we have one context switch from the CQ callback to tasklet 
and then a PER IO context switch
from the tasklet to a kernel thread context which you might need for IO 
submission into the backing store.


This can be optimized by having one context switch from the isert cq 
callbacks to a kernel thread, with the work
item being "do polling" and then from the cq polling code do the 
submission into the backing store without

any further context switches.

Or.




+
+void
+isert_cq_rx_callback(struct ib_cq *cq, void *context)
+{
+   struct isert_conn *isert_conn = context;
+
+   tasklet_schedule(&isert_conn->conn_rx_tasklet);
+}


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 09/11] iser-target: Add logic for verbs

2013-03-14 Thread Or Gerlitz

On 08/03/2013 03:45, Nicholas A. Bellinger wrote:

isert_conn->conn_rx_cq = ib_create_cq(ib_dev, isert_cq_rx_callback,
+   isert_cq_event_callback,
+   (void *)isert_conn,
+   ISER_MAX_RX_CQ_LEN, 0);

[...]

+
+   isert_conn->conn_tx_cq = ib_create_cq(ib_dev, isert_cq_tx_callback,
+   isert_cq_event_callback,
+   (void *)isert_conn,
+   ISER_MAX_TX_CQ_LEN, 0);


Two comments here:

1. This code always attaches the CQ to vector #0 of the IB device. 
Typically vectors are associated with IRQs and IRQs are
"affiniated" with cores, so processing of completions for all 
connections will be carried out on single core at a given time.


What can be done here is to use different vectors for different CQs. The 
possible vector values are from 0 to ib_dev->num_comp_vectors - 1


2. A dedicated CQ pair is used per connection which might not scale well.

What can be done here, is to use a global context per IB device which 
holds a pool of CQs (created on different vectors, per the previous item),
and for each new connection, attach its QP to a CQ pair from this pool, 
e.g as done in drivers/infiniband/ulp/iser/iser_verbs.c :: 
iser_create_device_ib_res()
note that FWIW this context can hold also a global PD and DMA_MR objects 
to be used by all the connections opened on the device. The context can 
be opened on demand (1st connection that hits this IB device) and closed 
on non-demand (last connection).



Or.



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 09/11] iser-target: Add logic for verbs

2013-03-14 Thread Or Gerlitz

On 08/03/2013 03:45, Nicholas A. Bellinger wrote:

+++ b/drivers/infiniband/ulp/isert/isert_verbs.c
+#define ISERT_ADDR_ROUTE_TIMEOUT 1000


remove this define, its irrelevant and you don't use that anywhere


+static void
+isert_qp_event_callback(struct ib_event *e, void *context)
+{
+   pr_err("isert_qp_event_callback event: %d\n", e->event);
+}


To be on the safe side, when the event is IB_EVENT_COMM_EST (which means 
that the login request was received by the HCA before the connection was 
fully established), call rdma_notify (id, IB_EVENT_COMM_EST) with the id 
being the one pointed from qp->qp_context->isert_conn




+
+static int
+isert_query_device(struct ib_device *ib_dev, struct ib_device_attr *devattr)
+{
+   int ret;
+
+   ret = ib_query_device(ib_dev, devattr);
+   if (ret) {
+   pr_err("ib_query_device() failed: %d\n", ret);
+   return ret;
+   }
+   pr_debug("devattr->max_mr_size: 0x%016Lx\n", devattr->max_mr_size);


running user space "ibv_devinfo -v" will give the same effect... no need 
to dump this here, maybe

except for max_sge that you need.


+   pr_debug("devattr->page_size_cap: 0x%016Lx\n", devattr->page_size_cap);
+   pr_debug("devattr->max_qp: %d\n", devattr->max_qp);
+   pr_debug("devattr->max_qp_wr: %d\n", devattr->max_qp_wr);
+   pr_debug("devattr->device_cap_flags: 0x%08x\n", 
devattr->device_cap_flags);
+   pr_debug("devattr->max_sge: %d\n", devattr->max_sge);
+   pr_debug("devattr->max_sge_rd: %d\n", devattr->max_sge_rd);
+   pr_debug("devattr->max_cq: %d\n", devattr->max_cq);
+   pr_debug("devattr->max_cqe: %d\n", devattr->max_cqe);
+   pr_debug("devattr->max_mr: %d\n", devattr->max_mr);
+   pr_debug("devattr->max_pd: %d\n", devattr->max_pd);
+   pr_debug("devattr->max_rdd: %d\n", devattr->max_rdd);
+   pr_debug("devattr->max_mw: %d\n", devattr->max_mw);
+   pr_debug("devattr->max_srq: %d\n", devattr->max_srq);
+   pr_debug("devattr->max_srq_wr: %d\n", devattr->max_srq_wr);
+   pr_debug("devattr->max_srq_sge: %d\n", devattr->max_srq_sge);
+
+   return 0;
+}

[...]

+
+
+int
+isert_cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event)
+{
+   int ret = 0;
+
+   pr_debug("isert_cma_handler: event %d status %d conn %p id %p\n",
+   event->event, event->status, cma_id->context, cma_id);
+
+   switch (event->event) {
+   case RDMA_CM_EVENT_CONNECT_REQUEST:
+   pr_debug("RDMA_CM_EVENT_CONNECT_REQUEST: >>>\n");
+   ret = isert_connect_request(cma_id, event);
+   break;
+   case RDMA_CM_EVENT_ESTABLISHED:
+   pr_debug("RDMA_CM_EVENT_ESTABLISHED >>\n");
+   isert_connected_handler(cma_id);
+   break;
+   case RDMA_CM_EVENT_DISCONNECTED:
+   pr_debug("RDMA_CM_EVENT_DISCONNECTED: >>\n");
+   isert_disconnected_handler(cma_id);
+   break;
+   case RDMA_CM_EVENT_DEVICE_REMOVAL:
+   case RDMA_CM_EVENT_ADDR_CHANGE:
+   break;
+   case RDMA_CM_EVENT_ADDR_ERROR:
+   case RDMA_CM_EVENT_ROUTE_ERROR:
+   case RDMA_CM_EVENT_CONNECT_ERROR:
+   default:
+   pr_err("Unknown RDMA CMA event: %d\n", event->event);
+   break;
+   }
+


ADDR_ERROR and ROUTE_ERROR you can remove from here, since you don't 
call rdma_resolve_addr nor rdma_resolve_route, they

will not be delivered to you.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 10/11] iser-target: Add logic for core

2013-03-14 Thread Or Gerlitz

On 08/03/2013 03:45, Nicholas A. Bellinger wrote:

+void
+isert_dump_ib_wc(struct ib_wc *wc)
+{
+   pr_debug("wc->wr_id: %llu\n", wc->wr_id);
+   pr_debug("wc->status: 0x%08x\n", wc->status);


This helper is called for a CQ completion with error, but when this 
happens all the WC fields except for the wr_id and the status aren't 
defined, so there's no point in dumping them (can be terribly misleading)



+   pr_debug("wc->opcode: 0x%08x\n", wc->opcode);
+   pr_debug("wc->vendor_err: 0x%08x\n", wc->vendor_err);
+   pr_debug("wc->byte_len: %u\n", wc->byte_len);
+   pr_debug("wc->qp: %p\n", wc->qp);
+   pr_debug("wc->src_qp: %u\n", wc->src_qp);
+   pr_debug("wc->wc_flags: 0x%08x\n", wc->wc_flags);
+   pr_debug("wc->pkey_index: %hu\n", wc->pkey_index);
+   pr_debug("wc->slid: %hu\n", wc->slid);
+   pr_debug("wc->sl: 0x%02x\n", wc->sl);
+   pr_debug("wc->dlid_path_bits: 0x%02x\n", wc->dlid_path_bits);
+   pr_debug("wc->port_num: 0x%02x\n", wc->port_num);
+}
+
+void
+iser_cq_tx_tasklet(unsigned long data)
+{
+   struct isert_conn *isert_conn = (struct isert_conn *)data;
+   struct ib_cq *tx_cq = isert_conn->conn_tx_cq;
+   struct iser_tx_desc *tx_desc;
+   struct ib_wc wc;
+
+   while (ib_poll_cq(tx_cq, 1, &wc) == 1) {
+   tx_desc = (struct iser_tx_desc *)(unsigned long)wc.wr_id;
+
+   if (wc.status == IB_WC_SUCCESS) {
+   isert_send_completion(tx_desc, isert_conn);
+   } else {
+   pr_debug("TX wc.status != IB_WC_SUCCESS 
>>\n");
+   isert_dump_ib_wc(&wc);
+   atomic_dec(&isert_conn->post_send_buf_count);
+   isert_cq_comp_err(tx_desc, isert_conn);
+   }
+   }
+
+   ib_req_notify_cq(tx_cq, IB_CQ_NEXT_COMP);
+}
+
+void
+isert_cq_tx_callback(struct ib_cq *cq, void *context)
+{
+   struct isert_conn *isert_conn = context;
+
+   tasklet_schedule(&isert_conn->conn_tx_tasklet);
+}
+
+void
+iser_cq_rx_tasklet(unsigned long data)
+{
+   struct isert_conn *isert_conn = (struct isert_conn *)data;
+   struct ib_cq *rx_cq = isert_conn->conn_rx_cq;
+   struct iser_rx_desc *rx_desc;
+   struct ib_wc wc;
+   unsigned long xfer_len;
+
+   while (ib_poll_cq(rx_cq, 1, &wc) == 1) {
+   rx_desc = (struct iser_rx_desc *)(unsigned long)wc.wr_id;
+
+   if (wc.status == IB_WC_SUCCESS) {
+   xfer_len = (unsigned long)wc.byte_len;
+   isert_rx_completion(rx_desc, isert_conn, xfer_len);
+   } else {
+   pr_debug("RX wc.status != IB_WC_SUCCESS 
>>\n");
+   if (wc.status != IB_WC_WR_FLUSH_ERR)
+   isert_dump_ib_wc(&wc);
+
+   isert_conn->post_recv_buf_count--;
+   isert_cq_comp_err(NULL, isert_conn);
+   }
+   }
+
+   ib_req_notify_cq(rx_cq, IB_CQ_NEXT_COMP);
+}


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 00/11] Add support for iSCSI Extentions for RDMA (ISER) target

2013-03-14 Thread Or Gerlitz

On 08/03/2013 03:45, Nicholas A. Bellinger wrote:

This series is first RFC for iSCSI Extentions for RDMA (ISER) target
support with existing iscsi-target TCP based socket code for a future v3.10 
merge.

This code is available in git here:
   git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git 
iser_target-rfcv1

Ths includes a basic iscsit_transport API that allows different transports
to reside under a single iscsi-target configfs control plane, using an
pre-defined network portal attribute to enable a rdma_cm listener on top
of existing ipoib portals.


Hi Nic,

Here are quick few nits (...) which need to be sorted out for the next 
version, I've picked them from running the patches through checkpatch 
and sparse / gcc warnings checks


 target-pending]# git format-patch -o /tmp/lio-iser-rfc 
7b745c84a9f4ad62db4b67053fbceb5d706451af..

/tmp/lio-iser-rfc/0001-iscsi-target-Add-iscsit_transport-API-template.patch
/tmp/lio-iser-rfc/0002-iscsi-target-Initial-traditional-TCP-conversion-to-i.patch
/tmp/lio-iser-rfc/0003-iscsi-target-Add-iser-target-parameter-keys-setup-du.patch
/tmp/lio-iser-rfc/0004-iscsi-target-Add-per-transport-iscsi_cmd-alloc-free.patch
/tmp/lio-iser-rfc/0005-iscsi-target-Refactor-RX-PDU-logic-export-request-PD.patch
/tmp/lio-iser-rfc/0006-iscsi-target-Refactor-TX-queue-logic-export-response.patch
/tmp/lio-iser-rfc/0007-iscsi-target-Add-iser-network-portal-attribute.patch
/tmp/lio-iser-rfc/0008-iser-target-Add-base-proto-includes.patch
/tmp/lio-iser-rfc/0009-iser-target-Add-logic-for-verbs.patch
/tmp/lio-iser-rfc/0010-iser-target-Add-logic-for-core.patch
/tmp/lio-iser-rfc/0011-iser-target-Add-Makefile-Kconfig.patch

target-pending]# ./scripts/checkpatch.pl --strict /tmp/lio-iser-rfc/* | 
grep total

total: 0 errors, 8 warnings, 0 checks, 142 lines checked
total: 1 errors, 14 warnings, 6 checks, 1097 lines checked
total: 0 errors, 15 warnings, 7 checks, 299 lines checked
total: 0 errors, 0 warnings, 0 checks, 106 lines checked
total: 0 errors, 13 warnings, 4 checks, 795 lines checked
total: 1 errors, 12 warnings, 2 checks, 877 lines checked
total: 0 errors, 5 warnings, 0 checks, 83 lines checked
total: 7 errors, 12 warnings, 1 checks, 170 lines checked
total: 1 errors, 9 warnings, 14 checks, 481 lines checked
total: 1 errors, 39 warnings, 43 checks, 1732 lines checked
total: 0 errors, 0 warnings, 0 checks, 21 lines checked

drivers/infiniband/ulp/isert/isert_core.c:774:1: warning: symbol 
'isert_dump_ib_wc' was not declared. Should it be static?
drivers/infiniband/ulp/isert/isert_verbs.c:341:1: warning: symbol 
'isert_put_conn' was not declared. Should it be static?
drivers/infiniband/ulp/isert/isert_verbs.c:375:1: warning: symbol 
'isert_cma_handler' was not declared. Should it be static?
drivers/infiniband/ulp/isert/isert_verbs.c:416:1: warning: symbol 
'isert_post_recv' was not declared. Should it be static?
drivers/infiniband/ulp/isert/isert_verbs.c:451:1: warning: symbol 
'isert_post_send' was not declared. Should it be static?


drivers/infiniband/ulp/isert/isert_core.c: In function ?isert_rx_do_work?:
drivers/infiniband/ulp/isert/isert_core.c:481: warning: variable ?rc? 
set but not used



Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 01/11] iscsi-target: Add iscsit_transport API template

2013-03-08 Thread Or Gerlitz
On Fri, Mar 8, 2013 at 6:14 AM, Roland Dreier  wrote:
> Nicholas A. Bellinger  wrote:
> > +EXPORT_SYMBOL(iscsit_get_transport);

> It's not clear to me why this needs to be exported.  Who would use it
> outside the core iscsi target module?

Yep, as Nic noted, we're adding here an iscsi transport concept e.g in
the same manner Mike did libiscsi back in 2005/6 when the iser
initiator was pushed. This allows for multiple iscsi flavours to use a
common code  for common functionality. In the initiator area initially
there were iscsi tcp and iser, later few iscsi HW offloads were merged
too. Same story here. I think that the point is whether or not these
APIs are needed, since once we agree on that, we need an header file
and exporting of functions. As libiscsi.h resided under include/ it
makes sense to me for this include to be located there too.

Or.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/11] First pass at merging Bart's HA work

2012-12-07 Thread Or Gerlitz
On Thu, Dec 6, 2012 at 4:27 PM, Or Gerlitz  wrote:
[...]
> looking on the current locks in the system, we see that this kworker task
> holds four locks, but none of them seems  to be mutually held by another task,

That was ofcourse a wrong assertion, as a lock can't be mutually held
by two tasks...
I wanted to say that from the general view of which locks are being
held there's no obvious deadlock here, since none of the other locks
holders relates to the block/scsi layers...
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/11] First pass at merging Bart's HA work

2012-12-06 Thread Or Gerlitz

On 06/12/2012 17:04, Bart Van Assche wrote:

On 12/06/12 15:27, Or Gerlitz wrote:
The core problem here seems to be that scsi_remove_host simply never 
ends.


Hello Or,

The later patches in the srp-ha patch series avoided such behavior by 
checking whether the connection between SRP initiator and target is 
unique, and by removing duplicate SCSI hosts for which the transport 
layer failed.  Unfortunately these patches are still under review. 
Unless someone can come up with a better solution I will post a patch 
one of the next days that makes ib_srp again fail all commands after 
host removal started. That will avoid spending a long time doing error 
recovery.


Also, you might have noticed that Hannes Reinecke reported a few days 
ago that the SCSI error handler may need a lot of time for other 
transport types - this behavior is not SRP specific.



I'm not sure what to you exactly refer by duplicated SCSI hosts in this 
context or why we have them. Again, at the time we've took the stack 
traces snapshot from the system none of the SCSI EH threads was active, 
so I'm not sure either your comment about spending long time in the 
error recovery flow, as the flow we've run into seems to simply wait 
forever.


Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/11] First pass at merging Bart's HA work

2012-12-06 Thread Or Gerlitz

On 06/12/2012 16:10, Bart Van Assche wrote:

On 12/05/12 22:32, Or Gerlitz wrote:
On Wed, Dec 5, 2012 at 8:50 PM, Bart Van Assche  
wrote:

[...]

The only way to make I/O work reliably if a failure can occur at the
transport layer is to use multipathd on top of ib_srp. If a 
connection fails
for some reason, then the SRP SCSI host will be removed after the 
SCSI error

handler has finished with its error recovery strategy. And once the
transport layer is operational again and srp_daemon detects that the
initiator is no longer logged in srp_daemon will make ib_srp log in 
again.

multipathd will then cause I/O to continue over the new path.


Claim basically understood and agreed however, does this also hold
when the link is back again, that is can't SRP login via this single
path also when there's no multipath on top?


As far as I can remember the behavior of ib_srp has always been to try 
to reconnect once to the SRP target after the SCSI error handler 
kicked in. Other SCSI LLDs, e.g. the iSCSI initiator, can be 
configured to keep trying to reconnect after a transport layer 
failure. That has the advantage that the SCSI host number remains the 
same after reconnecting succeeded as before reconnecting started.




Bart,

The core problem here seems to be that scsi_remove_host simply never ends.

Observing all the tasks in the system (e.g using "echo t > 
/proc/sysrq-trigger"), we've noted that
none of the SCSI EH are currently running, that is for all of them their 
trace is the following



scsi_eh_0   S  0   380  2 0x
 88042c31be08 0046 88042c31bfd8 00014380
 88042c31a010 00014380 00014380 00014380
 88042c31bfd8 00014380 88042f5be5c0 88042bb48c40
Call Trace:
 [] ? scsi_unjam_host+0x1f0/0x1f0
 [] schedule+0x29/0x70
 [] scsi_error_handler+0x75/0x1c0
 [] ? scsi_unjam_host+0x1f0/0x1f0
 [] kthread+0xee/0x100
 [] ? __init_kthread_worker+0x70/0x70
 [] ret_from_fork+0x7c/0xb0
 [] ? __init_kthread_worker+0x70/0x70

However, the flow starting in srp_remove_target hangs somewhere in the
block layer waiting for something to happen

worker/11:1D  0   163  2 0x
 88082be6f738 0046 88082be6ffd8 00014380
 88082be6e010 00014380 00014380 00014380
 88082be6ffd8 00014380 88042f5ba580 88082be6c1c0
Call Trace:
 [] schedule+0x29/0x70
 [] schedule_timeout+0x14f/0x240
 [] ? lock_timer_base+0x70/0x70
 [] wait_for_common+0x11b/0x170
 [] ? try_to_wake_up+0x300/0x300
 [] wait_for_completion_timeout+0x13/0x20
 [] blk_execute_rq+0x133/0x1c0
 [] ? get_request+0x210/0x3d0
 [] scsi_execute+0xe8/0x180
 [] scsi_execute_req+0xa7/0x110
 [] sd_sync_cache+0xd8/0x130 [sd_mod]
 [] ? __dev_printk+0x3e/0x90
 [] ? dev_printk+0x45/0x50
 [] sd_shutdown+0xd0/0x150 [sd_mod]
 [] sd_remove+0x7c/0xc0 [sd_mod]
 [] __device_release_driver+0x7c/0xe0
 [] device_release_driver+0x2f/0x50
 [] bus_remove_device+0x126/0x190
 [] device_del+0x14b/0x250
 [] __scsi_remove_device+0x1b8/0x1d0
 [] scsi_forget_host+0xf6/0x110
 [] scsi_remove_host+0x108/0x1e0
 [] srp_remove_target+0xb8/0x150 [ib_srp]
 [] srp_remove_work+0x64/0xa0 [ib_srp]
 [] process_one_work+0x1c2/0x4a0
 [] ? process_one_work+0x150/0x4a0
 [] ? srp_remove_target+0x150/0x150 [ib_srp]
 [] worker_thread+0x12e/0x370
 [] ? manage_workers+0x180/0x180
 [] kthread+0xee/0x100
 [] ? __init_kthread_worker+0x70/0x70
 [] ret_from_fork+0x7c/0xb0
 [] ? __init_kthread_worker+0x70/0x70


looking on the current locks in the system, we see that this kworker task
holds four locks, but none of them seems  to be mutually held by another 
task,



Showing all locks held in the system:
4 locks held by kworker/11:1/163:
 #0:  (events_long){.+.+.+}, at: [] 
process_one_work+0x150/0x4a0
 #1:  ((&target->remove_work)){+.+.+.}, at: [] 
process_one_work+0x150/0x4a0
 #2:  (&shost->scan_mutex){+.+.+.}, at: [] 
scsi_remove_host+0x34/0x1e0
 #3:  (&__lockdep_no_validate__){..}, at: [] 
device_release_driver+0x27/0x50

1 lock held by bash/6298:
 #0:  (&tty->atomic_read_lock){+.+...}, at: [] 
n_tty_read+0x58e/0x960

1 lock held by mingetty/6319:
 #0:  (&tty->atomic_read_lock){+.+...}, at: [] 
n_tty_read+0x58e/0x960

1 lock held by mingetty/6321:
 #0:  (&tty->atomic_read_lock){+.+...}, at: [] 
n_tty_read+0x58e/0x960

1 lock held by mingetty/6323:
 #0:  (&tty->atomic_read_lock){+.+...}, at: [] 
n_tty_read+0x58e/0x960

1 lock held by mingetty/6325:
 #0:  (&tty->atomic_read_lock){+.+...}, at: [] 
n_tty_read+0x58e/0x960

1 lock held by mingetty/6327:
 #0:  (&tty->atomic_read_lock){+.+...}, at: [] 
n_tty_read+0x58e/0x960

1 lock held by mingetty/6329:
 #0:  (&tty->atomic_read_lock){+.+...}, at: [] 
n_tty_read+0x58e/0x960

1 lock held by agetty/6337:
 #0:  (&tty->atomic_read_lock){+.+...}, at: [] 
n_tty_read+0x58e/0x9

Re: [PATCH 00/11] First pass at merging Bart's HA work

2012-12-05 Thread Or Gerlitz
On Wed, Dec 5, 2012 at 8:50 PM, Bart Van Assche  wrote:
[...]
> The only way to make I/O work reliably if a failure can occur at the
> transport layer is to use multipathd on top of ib_srp. If a connection fails
> for some reason, then the SRP SCSI host will be removed after the SCSI error
> handler has finished with its error recovery strategy. And once the
> transport layer is operational again and srp_daemon detects that the
> initiator is no longer logged in srp_daemon will make ib_srp log in again.
> multipathd will then cause I/O to continue over the new path.

Claim basically understood and agreed however, does this also hold
when the link is back again, that is can't SRP login via this single
path also when there's no multipath on top?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/11] First pass at merging Bart's HA work

2012-12-05 Thread Or Gerlitz
On Fri, Nov 30, 2012 at 4:21 AM, David Dillow  wrote:
[...]
> Modulo a few style issues (braces around one line if branches, etc.) and
> having three state variables vs one, I can live with everything up to
> aabfa852acd27962 at git://github.com/bvanassche/linux.git#srp-ha. Those
> two are small things that can be fixed later and are not worth holding
> things up any further.
>
> I'll try to spend some time on the final four patches tomorrow afternoon.

Dave, Bart

My colleague Alex Turin  tried  today the bits as
they appear in Roland's kernel.org tree / for-next branch up to commit
 fb57e1dbbd4 and here's some feedback

Basically, what he did was connecting  to a target, next take down the
IB port on the initiator side, and issue some IOs (dd if=/dev/sdb
of=/dev/null count=1)

Our recollection of events from the logs (below) is the following

1. queued command get completion status 5

2. as part of error handling srp_reset_host() was called,

3. srp_reset_host() calls to srp_reconnect_target() which fails cause
port is down.

4. srp_reconnect_target() on failure calls to srp_queue_remove_work()
which sets
target->status to SRP_TARGET_REMOVED.

5.srp_reset_host() called second time. it calls to
srp_reconnect_target() but target->state == SRP_TARGET_REMOVED.
srp_reconnect_target() checks if target->state != SRP_TARGET_LIVE and
return -EAGAIN.

This probably means that even after enabling port it will still fail
to reconnect?

Or.


Dec  5 16:19:13 rsws42 kernel: scsi host7: ib_srp: failed send status 5
Dec  5 16:19:42 rsws42 kernel: scsi host7: SRP abort called
Dec  5 16:19:42 rsws42 kernel: scsi host7: SRP reset_device called
Dec  5 16:19:42 rsws42 kernel: scsi host7: ib_srp: SRP reset_host called
Dec  5 16:19:43 rsws42 kernel: scsi host7: ib_srp: Got failed path rec
status -110
Dec  5 16:19:43 rsws42 kernel: scsi host7: ib_srp: Path record query failed
Dec  5 16:19:43 rsws42 kernel: scsi host7: ib_srp: reconnect failed
(-110), removing target port.
Dec  5 16:19:43 rsws42 kernel: sd 7:0:0:11: Device offlined - not
ready after error recovery
Dec  5 16:19:43 rsws42 kernel: sd 7:0:0:11: [sdb] Synchronizing SCSI cache
Dec  5 16:20:45 rsws42 kernel: scsi host7: SRP abort called
Dec  5 16:20:50 rsws42 kernel: scsi host7: SRP abort called
Dec  5 16:21:05 rsws42 kernel: scsi host7: SRP abort called
Dec  5 16:21:10 rsws42 kernel: scsi host7: SRP reset_device called
Dec  5 16:21:15 rsws42 kernel: scsi host7: ib_srp: SRP reset_host called
Dec  5 16:21:15 rsws42 kernel: sd 7:0:0:11: Device offlined - not
ready after error recovery
Dec  5 16:21:15 rsws42 kernel: sd 7:0:0:11: Device offlined - not
ready after error recovery

repeating part:

Dec  5 16:22:17 rsws42 kernel: scsi host7: SRP abort called
Dec  5 16:22:22 rsws42 kernel: scsi host7: SRP abort called
Dec  5 16:22:37 rsws42 kernel: scsi host7: SRP abort called
Dec  5 16:22:42 rsws42 kernel: scsi host7: SRP reset_device called
Dec  5 16:22:47 rsws42 kernel: scsi host7: ib_srp: SRP reset_host called
Dec  5 16:22:47 rsws42 kernel: sd 7:0:0:11: Device offlined - not
ready after error recovery
Dec  5 16:22:47 rsws42 kernel: sd 7:0:0:11: Device offlined - not
ready after error recovery
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/11] First pass at merging Bart's HA work

2012-11-26 Thread Or Gerlitz

On 27/11/2012 06:04, David Dillow wrote:
We can push it through James's tree if need be, but Bart's code is 
pretty self-contained, and going through the SCSI tree will introduce 
merge dependencies. It'd be much easier to push it all through the 
RDMA tree


Yep, this makes sense to me even without taking into account the time 
left for the merge window to open,
the patches have been around for long time and relate directly to the 
SRP code in the IB subsystem, there's no point in introducing merge 
dependencies where it can be avoided.


Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/11] First pass at merging Bart's HA work

2012-11-25 Thread Or Gerlitz
On Mon, Nov 26, 2012 at 6:44 AM, David Dillow  wrote:
> One may also pull this series from github:
> git pull git://github.com/dillow/srp-initiator.git ha-merge-v1

Hi Dave,

The kernel maintainers file specifies the following tree
git://git.kernel.org/pub/scm/linux/kernel/git/dad/srp-initiator.git
for you -- I assume you've moved to github during the kernel.org
sitedown period, could you go back to kernel.org? its nice to have the
same look and feel e.g through git web for the IB, SRP, Networking,
SCSI etc trees.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html