Re: [PATCH v3 13/13] cxgbit: add files for cxgbit.ko
On Fri, Jul 8, 2016 at 9:36 PM, Steve Wise wrote: >> > > In first series libcxgb.ko will have common >> > > iSCSI DDP Page Pod Manager that will be shared >> > > by three Chelsio iSCSI drivers >> > > cxgb3i, cxgb4i, cxgbit. >> > >> > cool >> > >> > > In subsequent series I will add common connection >> > > management and other hardware specific common code >> > > in this module. >> > >> > any chance to get that ready for 4.8 too? >> >> I am working on common connection management, it is a >> big change as it involves multiple modules, so it is >> difficult to get it ready for 4.8. > > Let's shoot for 4.9 for the common CM. Nice work on the PPOD unification! Taking into account that this was a reviewer comment that basically should have been addressed for the initial merge into 4.7 and that there's more time for 4.8 work [1], could you try and get that for 4.8? [1] https://lwn.net/Articles/694055/ -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 13/13] cxgbit: add files for cxgbit.ko
On Wed, Jul 6, 2016 at 8:17 PM, Varun Prakash wrote: > On Wed, Jul 06, 2016 at 12:24:43AM +0300, Or Gerlitz wrote: >> On Thu, May 26, 2016 at 9:58 PM, Varun Prakash wrote: >> >> > Hi Or, Nicholas and Steve >> > Thank you for the feedback and apologies for the delay in my response. >> >> > I agree that we can refactor initiator, target and iwarp drivers to >> > reduce code duplication as Steve has mentioned. We will work on this for >> > 4.8 merge window. >> >> Nic, Varun, was anything done on this matter? we're on rc6 > > Yes, I am working on this, I will post first series > within two days, it will add common library module libcxgb.ko. thanks for the update, the plan was to get that info 4.8-rc1, and as we're after rc6 there's not much time left for submitting code for 4.8, so doing that sooner rather than later would be good here. > In first series libcxgb.ko will have common > iSCSI DDP Page Pod Manager that will be shared > by three Chelsio iSCSI drivers > cxgb3i, cxgb4i, cxgbit. cool > In subsequent series I will add common connection > management and other hardware specific common code > in this module. any chance to get that ready for 4.8 too? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 13/13] cxgbit: add files for cxgbit.ko
On Thu, May 26, 2016 at 9:58 PM, Varun Prakash wrote: > Hi Or, Nicholas and Steve > Thank you for the feedback and apologies for the delay in my response. > On Wed, May 25, 2016 at 09:55:04PM -0700, Nicholas A. Bellinger wrote: >> Varun + Co have made common improvements between existing software >> iscsi-target RX + TX PDU handling and their new driver, and no further >> comments for these prerequisites have been made. >> The additional improvements discussed here so far are cxgb* specific, >> and will not effect other drivers, and will not change configfs user ABI >> layout compatibility within /sys/kernel/config/target/iscsi/. >> That being the case, I think it's a reasonable starting point for >> mainline users to consume target ISO on cxgb hardware, and for Chelsio >> to make further common code improvements across their existing host >> drivers. > I agree that we can refactor initiator, target and iwarp drivers to > reduce code duplication as Steve has mentioned. We will work on this for > 4.8 merge window. Nic, Varun, was anything done on this matter? we're on rc6 Or. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 13/13] cxgbit: add files for cxgbit.ko
On Wed, May 25, 2016 at 8:40 PM, Steve Wise wrote: >> From: Or Gerlitz [mailto:gerlitz...@gmail.com] >> >> On Tue, May 24, 2016 at 9:40 AM, Nicholas A. Bellinger >> wrote: >> > Hi Or & Co, >> > On Wed, 2016-05-18 at 14:45 +0300, Or Gerlitz wrote: >> >> On Sat, Apr 30, 2016 at 6:54 PM, Or Gerlitz wrote: >> >> > On Tue, Apr 19, 2016 at 9:30 PM, Varun Prakash >> >> > wrote: >> >> >> cxgbit.h - This file contains data structure >> >> >> definitions for cxgbit.ko. >> >> >> >> >> >> cxgbit_lro.h - This file contains data structure >> >> >> definitions for LRO support. >> >> >> >> >> >> cxgbit_main.c - This file contains code for >> >> >> registering with iscsi target transport and >> >> >> cxgb4 driver. >> >> >> >> >> >> cxgbit_cm.c - This file contains code for >> >> >> connection management. >> >> >> >> >> >> cxgbit_target.c - This file contains code >> >> >> for processing iSCSI PDU. >> >> >> >> >> >> cxgbit_ddp.c - This file contains code for >> >> >> Direct Data Placement. >> >> > >> >> > Wait, >> >> > >> >> > You are adding many K's LOCs to handle things like CM (connection >> >> > management), DDP and LRO. But your upstream solution must be using CM >> >> > and DDP (and LRO as well) for the HW offloaded initiator side as well, >> >> > not to mention the iWARP side of things. >> >> > >> >> > There must be some way to refactor things instead of repeating the >> >> > same bits over and over, thoughts? >> >> >> The author haven't responded... where that this stands from your point of >> >> view? >> >> > For an initial merge, I don't have an objection to this series wrt >> > drivers/target/iscsi/* improvements + prerequisites, and new standalone >> > cxgbit iscsit_transport driver. >> > >> > That said, there are areas between cxgbi + cxgbit code that can be made >> > common as you've pointed out. The Cheliso folks have mentioned off-list >> > that cxgbi as-is in mainline does not support LRO, and that the majority >> > of DDP logic is shared between initiator + target. >> > >> > Are there specific pieces of logic in DDP or iWARP for cxgb* that you'd >> > like to see Varun + Co pursue as common code in v4.8+..? >> >> Hi Nic, >> >> As I wrote above, I have good reasons to believe that there are few K >> LOCs of duplication >> between this series to the chelsio hw iscsi initiator or the chelsio >> iwarp driver or both (triple). >> >> Ys, I'd like to see a public response from Varun and Co on this valid >> reviewer comment >> before you proceed with this series, makes sense? There's no point to >> duplicate the same >> code in the kernel again and again. **Even** if there's one >> duplication now, we don't want >> another one. >> >> Or. > > Hey Or and Nicholas, > > I've been staying out of this directly due to my workload. But I will now > take a look at where and how we can refactor to reduce the code duplication. > Here are some thoughts with a quick glance: > > DDP is not used by iWARP. DDP is for direct placement attempt on a TCP > streaming mode connection. RDMA has its own way of doing this that doesn’t > utilize the same HW resources. So I would say that the DDP usage could be > refactored such that there is a set of helper functions to do the common > logic, like managing page pod adapter resources. An example would be > cxgb4i.c:ddp_ppod_write_idata(), and cxgbit_ppod_write_idata() from patch > 13. sounds good > Connection Management (CM) can definitely be refactored to share helper > functions for setting up/tearing down TCP connections. But the iWARP logic > is very different from iscsi once the TCP connection is setup and the > connection moved into RDMA mode. So the sharing of logic would be in > sending various CPL messages to the adapter to connect/accept/close/abort > connections. Ditto > Having said that, the initiator and target wouldn't share a > lot by virtue of the initiator only doing active side connection setup, and > the target only doing passive side connection setup. Well, I am not sure. It's like saying that if you break the IB/RoCE CM to active side and p
Re: [PATCH v3 13/13] cxgbit: add files for cxgbit.ko
On Tue, May 24, 2016 at 9:40 AM, Nicholas A. Bellinger wrote: > Hi Or & Co, > On Wed, 2016-05-18 at 14:45 +0300, Or Gerlitz wrote: >> On Sat, Apr 30, 2016 at 6:54 PM, Or Gerlitz wrote: >> > On Tue, Apr 19, 2016 at 9:30 PM, Varun Prakash wrote: >> >> cxgbit.h - This file contains data structure >> >> definitions for cxgbit.ko. >> >> >> >> cxgbit_lro.h - This file contains data structure >> >> definitions for LRO support. >> >> >> >> cxgbit_main.c - This file contains code for >> >> registering with iscsi target transport and >> >> cxgb4 driver. >> >> >> >> cxgbit_cm.c - This file contains code for >> >> connection management. >> >> >> >> cxgbit_target.c - This file contains code >> >> for processing iSCSI PDU. >> >> >> >> cxgbit_ddp.c - This file contains code for >> >> Direct Data Placement. >> > >> > Wait, >> > >> > You are adding many K's LOCs to handle things like CM (connection >> > management), DDP and LRO. But your upstream solution must be using CM >> > and DDP (and LRO as well) for the HW offloaded initiator side as well, >> > not to mention the iWARP side of things. >> > >> > There must be some way to refactor things instead of repeating the >> > same bits over and over, thoughts? >> The author haven't responded... where that this stands from your point of >> view? > For an initial merge, I don't have an objection to this series wrt > drivers/target/iscsi/* improvements + prerequisites, and new standalone > cxgbit iscsit_transport driver. > > That said, there are areas between cxgbi + cxgbit code that can be made > common as you've pointed out. The Cheliso folks have mentioned off-list > that cxgbi as-is in mainline does not support LRO, and that the majority > of DDP logic is shared between initiator + target. > > Are there specific pieces of logic in DDP or iWARP for cxgb* that you'd > like to see Varun + Co pursue as common code in v4.8+..? Hi Nic, As I wrote above, I have good reasons to believe that there are few K LOCs of duplication between this series to the chelsio hw iscsi initiator or the chelsio iwarp driver or both (triple). Ys, I'd like to see a public response from Varun and Co on this valid reviewer comment before you proceed with this series, makes sense? There's no point to duplicate the same code in the kernel again and again. **Even** if there's one duplication now, we don't want another one. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 13/13] cxgbit: add files for cxgbit.ko
On Sat, Apr 30, 2016 at 6:54 PM, Or Gerlitz wrote: > On Tue, Apr 19, 2016 at 9:30 PM, Varun Prakash wrote: >> cxgbit.h - This file contains data structure >> definitions for cxgbit.ko. >> >> cxgbit_lro.h - This file contains data structure >> definitions for LRO support. >> >> cxgbit_main.c - This file contains code for >> registering with iscsi target transport and >> cxgb4 driver. >> >> cxgbit_cm.c - This file contains code for >> connection management. >> >> cxgbit_target.c - This file contains code >> for processing iSCSI PDU. >> >> cxgbit_ddp.c - This file contains code for >> Direct Data Placement. > > Wait, > > You are adding many K's LOCs to handle things like CM (connection > management), DDP and LRO. But your upstream solution must be using CM > and DDP (and LRO as well) for the HW offloaded initiator side as well, > not to mention the iWARP side of things. > > There must be some way to refactor things instead of repeating the > same bits over and over, thoughts? Nic, The author haven't responded... where that this stands from your point of view? Or. >> drivers/target/iscsi/Kconfig|2 + >> drivers/target/iscsi/Makefile |1 + >> drivers/target/iscsi/cxgbit/Kconfig |7 + >> drivers/target/iscsi/cxgbit/Makefile|6 + >> drivers/target/iscsi/cxgbit/cxgbit.h| 353 + >> drivers/target/iscsi/cxgbit/cxgbit_cm.c | 2086 >> +++ >> drivers/target/iscsi/cxgbit/cxgbit_ddp.c| 325 + >> drivers/target/iscsi/cxgbit/cxgbit_lro.h| 72 + >> drivers/target/iscsi/cxgbit/cxgbit_main.c | 701 + >> drivers/target/iscsi/cxgbit/cxgbit_target.c | 1561 >> 10 files changed, 5114 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 13/13] cxgbit: add files for cxgbit.ko
On Tue, Apr 19, 2016 at 9:30 PM, Varun Prakash wrote: > cxgbit.h - This file contains data structure > definitions for cxgbit.ko. > > cxgbit_lro.h - This file contains data structure > definitions for LRO support. > > cxgbit_main.c - This file contains code for > registering with iscsi target transport and > cxgb4 driver. > > cxgbit_cm.c - This file contains code for > connection management. > > cxgbit_target.c - This file contains code > for processing iSCSI PDU. > > cxgbit_ddp.c - This file contains code for > Direct Data Placement. Wait, You are adding many K's LOCs to handle things like CM (connection management), DDP and LRO. But your upstream solution must be using CM and DDP (and LRO as well) for the HW offloaded initiator side as well, not to mention the iWARP side of things. There must be some way to refactor things instead of repeating the same bits over and over, thoughts? Or. > drivers/target/iscsi/Kconfig|2 + > drivers/target/iscsi/Makefile |1 + > drivers/target/iscsi/cxgbit/Kconfig |7 + > drivers/target/iscsi/cxgbit/Makefile|6 + > drivers/target/iscsi/cxgbit/cxgbit.h| 353 + > drivers/target/iscsi/cxgbit/cxgbit_cm.c | 2086 > +++ > drivers/target/iscsi/cxgbit/cxgbit_ddp.c| 325 + > drivers/target/iscsi/cxgbit/cxgbit_lro.h| 72 + > drivers/target/iscsi/cxgbit/cxgbit_main.c | 701 + > drivers/target/iscsi/cxgbit/cxgbit_target.c | 1561 > 10 files changed, 5114 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] IB/iser: set max_segment_size
On Tue, Apr 12, 2016 at 5:13 PM, Christoph Hellwig wrote: > So that we don't overflow the number of MR segments allocated because > we have to split on SGL segment into multiple MR segments. > > Signed-off-by: Christoph Hellwig nit, but please fix IB/iser: set [...] --> IB/iser: Set max segment size in the SCSI host template I prefer to avoid names of variables and functions in commit titles for the iser initiator driver. I find the usage of higher language very beneficial for maintenance and such. Sagi, I would appreciate if you avoid acking patches lacking this practice for iser-I, specifically we want people to use capital letter in the 1st word that follows that IB/iser: prefix - ok? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 05/10] iser: Have initiator and target to share protocol structures and definitions
On 11/16/2015 6:37 PM, Sagi Grimberg wrote: +/** + * struct iser_hello - iSER Hello header + * + * @opcode: opcode (must be set to ISER_HELLO) + * @max_min_ver: maximum and minimum iser versions + * @iser_ird: iSER IRD + * @rsvd: reserved + */ +struct iser_hello { + u8 opcode; + u8 max_min_ver; + u16 iser_ird; + u8 rsvd[20]; +} __packed; + +/** + * struct iser_hello_rep - iSER Hello reply header + * + * @opcode_rej: opcode (must be set to ISER_HELLORPLY) + *lower bit is reject bit + * @max_cur_ver: maximum and current iser versions + * @iser_ord: iSER ORD + * @rsvd: reserved + */ +struct iser_hello_rep { + u8 opcode_rej; + u8 max_cur_ver; + u16 iser_ord; + u8 rsvd[20]; +} __packed; + I don't see the point to include these two defs, we don't use them and Steve even got iser to work over iwarp without them, so why care? we should only leave +#define ISER_HELLO 0x20 +#define ISER_HELLORPLY 0x30 to allow warnings on them if we get such packets -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] iscsi: fix regression caused by session lock patch
On Mon, Nov 16, 2015 at 7:30 PM, Michael Christie wrote: >> On Nov 15, 2015, at 4:10 AM, Or Gerlitz wrote: >> On Fri, Nov 13, 2015 at 6:51 PM, Mike Christie wrote: >>> On 11/13/2015 09:06 AM, Or Gerlitz wrote: >> After the locking change, adding a task to any of the connection >> mgmtqueue, cmdqueue, or requeue lists is under the session forward lock. >> >> Removing tasks from any of these lists in iscsi_data_xmit is under >> the session forward lock and **before** calling down to the transport >> to handle the task. >> >> The iscsi_complete_task helper was added by Mike's commit >> 3bbaaad95fd38ded "[SCSI] libiscsi: handle cleanup task races" >> and is indeed typically called under the backward lock && has this section >> >> + if (!list_empty(&task->running)) >> + list_del_init(&task->running); >> >> which per my reading of the code never comes into play, can you comment? > I had sent this to Sagi and your mellanox email the other day: > The bug occurs when a target completes a command while we are still > processing it. If we are doing a WRITE and the iscsi_task > is on the cmdqueue because we are handling a R2T. Mike, I failed to find how an iscsi_task can be added again to the cmdqueue list, nor how it can be added to the requeue list without the right locking, nor how can an iscsi_task be on either of these lists when iscsi_complete_task is invoked. Specifically, my reading of the code says that these are the events over time: t1. queuecommand :: we put the task on the cmdqueue list (libiscsi.c:1741) - under fwd lock t2. iscsi_data_xmit :: we remove the task from the cmdqueue list (libiscsi.c:1537) - under fwd lock when the R2T flow happens, we do the following t3. iscsi_tcp_hdr_dissect --> iscsi_tcp_r2t_rsp --> iscsi_requeue_task :: put the task on the requeue list -- the call to iscsi_tcp_r2t_rsp is under the fwd lock t4. iscsi_data_xmit :: we remove the task from the requeue list (libiscsi.c: 1578) - under fwd lock Do you agree to t1...t4 being what's possible for a given task? or I missed something? >> The target shouldn't >> send a Check Condition at this time, but some do. If that happens, then >> iscsi_queuecommand could be adding a new task to the cmdqueue, while the >> recv path is handling the CC for the task with the outsanding R2T. The >> recv path iscsi_complete_task call sees that task it on the cmdqueue and >> deletes it from the list at the same time iscsi_queuecommand is adding a new >> task. >> This should not happen per the iscsi spec. There is some wording about >> waiting to finish the sequence in progress, but targets goof this up. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] iscsi: fix regression caused by session lock patch
On Mon, Nov 16, 2015 at 7:30 PM, Michael Christie wrote: >> On Nov 15, 2015, at 4:10 AM, Or Gerlitz wrote: >> After the locking change, adding a task to any of the connection >> mgmtqueue, cmdqueue, or requeue lists is under the session forward lock. >> >> Removing tasks from any of these lists in iscsi_data_xmit is under >> the session forward lock and **before** calling down to the transport >> to handle the task. >> >> The iscsi_complete_task helper was added by Mike's commit >> 3bbaaad95fd38ded "[SCSI] libiscsi: handle cleanup task races" >> and is indeed typically called under the backward lock && has this section >> >> + if (!list_empty(&task->running)) >> + list_del_init(&task->running); >> which per my reading of the code never comes into play, can you comment? >> The bug occurs when a target completes a command while we are still >> processing it. If we are doing a WRITE and the iscsi_task >> is on the cmdqueue because we are handling a R2T. The target shouldn't >> send a Check Condition at this time, but some do. If that happens, then >> iscsi_queuecommand could be adding a new task to the cmdqueue, while the >> recv path is handling the CC for the task with the outsanding R2T. The >> recv path iscsi_complete_task call sees that task it on the cmdqueue and >> deletes it from the list at the same time iscsi_queuecommand is adding a new >> task. So we're now a bit beyond trivial bug and >> This should not happen per the iscsi spec. There is some wording about >> waiting to finish the sequence in progress, but targets goof this up. we have target/s that violate the spec, this is life, but can explain why it took us 18m to get bug report. Can you provide few point pointers into the relevant code pieces that one need to look at to realize what's going on there? was this code added before or after the patch? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ib_srp: initialize dma_length in srp_map_idb
On Sun, Nov 15, 2015, Sagi Grimberg wrote: > On 15/11/2015 19:59, Christoph Hellwig wrote: >> Without this sg_dma_len will return 0 on architectures tha have >> the dma_length field. and what wrong with that? Christoph, probably typo here? "tha" needs to be "that" >> Signed-off-by: Christoph Hellwig >> --- >> drivers/infiniband/ulp/srp/ib_srp.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c >> b/drivers/infiniband/ulp/srp/ib_srp.c >> index 32f7962..445c0a6 100644 >> --- a/drivers/infiniband/ulp/srp/ib_srp.c >> +++ b/drivers/infiniband/ulp/srp/ib_srp.c >> @@ -1520,6 +1520,9 @@ static int srp_map_idb(struct srp_rdma_ch *ch, >> struct srp_request *req, >> state.sg_nents = 1; >> sg_set_buf(idb_sg, req->indirect_desc, idb_len); >> idb_sg->dma_address = req->indirect_dma_addr; /* hack! */ >> +#ifdef CONFIG_NEED_SG_DMA_LENGTH >> + idb_sg->dma_length = idb_sg->length; /* hack^2 */ >> +#endif > > > :) > > We should really get this properly map/unmap per IO at some point. > Probably do it in both code paths... Sagi, can you please elaborate a little further on the problem, srpt WA, what do we do in isert and what is the proposed not WA solution? > Having said that, > Looks fine, > Reviewed-by: Sagi Grimberg -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] iscsi: fix regression caused by session lock patch
On Fri, Nov 13, 2015 at 6:51 PM, Mike Christie wrote: > On 11/13/2015 09:06 AM, Or Gerlitz wrote: >>> The patch has caused multiple regressions, did not even compile when >>> > sent to me, and was poorly reviewed and I have not heard from you guys >>> > in a week. Given the issues the patch has had and the current time, I do >>> > not feel comfortable with it anymore. I want to re-review it and fix it >>> > up when there is more time. >> Mike (Hi), >> >> It's a complex patch that touches all the iscsi transports, and yes, >> when it was send to you the 1st time, there was build error on one of >> the offload transports (bad! but happens) and yes, as you pointed, one >> static checker fix + one bug fix for it went upstream after this has >> been merged, happens too. > > A patch should not cause this many issues. > >> What makes you say it was poorly reviewed? > > I just did not do a good job at looking at the patch. I should have > caught all of these issues. > > - The bnx2i cleanup_task bug should have been obvious, especially for me > because I had commented about the back lock and the abort path. > > - This oops, was so basic. Incorrect locking around a linked list being > accessed from 2 threads is really one of those 1st year kernel > programmer things. Mike, Chris After the locking change, adding a task to any of the connection mgmtqueue, cmdqueue, or requeue lists is under the session forward lock. Removing tasks from any of these lists in iscsi_data_xmit is under the session forward lock and **before** calling down to the transport to handle the task. The iscsi_complete_task helper was added by Mike's commit 3bbaaad95fd38ded "[SCSI] libiscsi: handle cleanup task races" and is indeed typically called under the backward lock && has this section + if (!list_empty(&task->running)) + list_del_init(&task->running); which per my reading of the code never comes into play, can you comment? Lets address this area before we move to the others claims made on the patch. Again, the patch is around for ~18 months, since 3.15, and no deep complaints so far, lets not jump to conclusions. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] IB/iser: Convert to CQ abstraction
On 11/13/2015 3:46 PM, Christoph Hellwig wrote: From: Sagi Grimberg Care to sparse some text here to assist a reviewer and future bisections?! I have asked multiple times to avoid empty change-logs for patches in this driver. Signed-off-by: Sagi Grimberg Signed-off-by: Christoph Hellwig --- drivers/infiniband/ulp/iser/iscsi_iser.h | 68 --- drivers/infiniband/ulp/iser/iser_initiator.c | 142 ++- drivers/infiniband/ulp/iser/iser_memory.c| 21 ++- drivers/infiniband/ulp/iser/iser_verbs.c | 258 ++- 4 files changed, 209 insertions(+), 280 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] IB/iser: Use a dedicated descriptor for login
On 11/13/2015 3:46 PM, Christoph Hellwig wrote: From: Sagi Grimberg Makes better sense and we'll need it later with CQ abstraction. iser switch login bufs to void Sagi, few quick comments on this patch, please address for next version.. The 2nd sentence of the change-log needs better phrasing. also multiple checkpatch hits on the patch, please fix CHECK: Please don't use multiple blank lines #26: FILE: drivers/infiniband/ulp/iser/iscsi_iser.h:329: + WARNING: __packed is preferred over __attribute__((packed)) #42: FILE: drivers/infiniband/ulp/iser/iscsi_iser.h:345: +} __attribute__((packed)); CHECK: Please don't use multiple blank lines #44: FILE: drivers/infiniband/ulp/iser/iscsi_iser.h:347: + + CHECK: Alignment should match open parenthesis #161: FILE: drivers/infiniband/ulp/iser/iser_initiator.c:209: + if (ib_dma_mapping_error(device->ib_device, + desc->req_dma)) CHECK: Alignment should match open parenthesis #172: FILE: drivers/infiniband/ulp/iser/iser_initiator.c:220: + if (ib_dma_mapping_error(device->ib_device, + desc->rsp_dma)) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] move blk_iopoll to limit and make it generally available
On Sun, Nov 15, 2015 at 10:48 AM, Sagi Grimberg wrote: > Or is correct, > > I have attempted to convert iser to use blk_iopoll in the past, however > I've seen inconsistent performance and latency skews (comparing to > tasklets iser is using today). This was manifested in IOPs test cases > where I ran multiple threads with higher queue-depth and not in > sanitized pure latency (QD=1) test cases. Unfortunately I didn't have > the time to pick it up since. > > I do have every intention of testing it again with this. If it still > exist we will need to find the root-cause of it before converting > drivers to use it. Good, this way (inconsistent performance and latency skews) or another (all shines up) -- please let us know your findings, best through commenting within V > 0 the cover letter posts of this series -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] move blk_iopoll to limit and make it generally available
On Fri, Nov 13, 2015 at 3:46 PM, Christoph Hellwig wrote: > The new name is irq_poll as iopoll is already taken. Better suggestions > welcome. Sagi (or Christoph if you can address that), @ some pointer over the last 18 months there was a port done at mellanox for iser to use blk-iopoll and AFAIR it didn't work well or didn't work at all. Can you tell now what was the problem and how did you address it at your generalization? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] iscsi: fix regression caused by session lock patch
On Thu, Nov 12, 2015 at 10:58 PM, Mike Christie wrote: > On 11/12/2015 06:03 AM, Sagi Grimberg wrote: >>> The bug is caused by this patch: >>> >>> 659743b02c411075b26601725947b21df0bb29c8 >>> >>> which allowed the task lists to be manipulated under different locks >>> in the xmit and completion path. >>> To fix the oops this patch just reverts that patch. It also reverts >>> these 2 patches for regressions that were also a result of that patch: >> Whoa now Mike, this is a major change. Can we take a step back and think >> about this for a second? > The issue has been on the open-iscsi list for a week! You are on the > list still right? Or is even ccd on the thread. The email you sent me a week ago also cc-ed open-iscsi hence was routed by a rule in my mailer that made it to land in my open-iscsi subscription folder which is don't visit much nowadays. Only when you posted to linux-scsi we saw that and responded within few hours, to begin with. >> Can you provide a more detailed analysis of why is this list corruption >> triggered? What scenario was not honoring the locking policy? > Basic locking around a linked list bug. iscsi_queuecommand adds it under > the frwd lock and iscsi_complete_task was taking it off the back_lock. >> This code has been running reliably in our labs for a long time now >> (both iser and tcp). > The patch has caused multiple regressions, did not even compile when > sent to me, and was poorly reviewed and I have not heard from you guys > in a week. Given the issues the patch has had and the current time, I do > not feel comfortable with it anymore. I want to re-review it and fix it > up when there is more time. Mike (Hi), It's a complex patch that touches all the iscsi transports, and yes, when it was send to you the 1st time, there was build error on one of the offload transports (bad! but happens) and yes, as you pointed, one static checker fix + one bug fix for it went upstream after this has been merged, happens too. What makes you say it was poorly reviewed? And now after few years in upstream a possibly real bug was found (happens), why rush and revert? lets see if we can fix. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/8] IB/srp: Remove stale connection retry mechanism
On Fri, Sep 19, 2014 at 3:58 PM, Bart Van Assche wrote: > Attempting to connect three times may be insufficient after an > initiator system that was using multiple RDMA channels tries to > relogin. Additionally, this login retry mechanism is a workaround > for particular behavior of the IB/CM. Can you be more specific re the particular behavior of the IB CM? added Sean, the CM maintainer. > Since the srp_daemon retries > a failed login attempt anyway, remove the stale connection retry > mechanism. > > Signed-off-by: Bart Van Assche > --- > drivers/infiniband/ulp/srp/ib_srp.c | 16 +++- > 1 file changed, 3 insertions(+), 13 deletions(-) > > diff --git a/drivers/infiniband/ulp/srp/ib_srp.c > b/drivers/infiniband/ulp/srp/ib_srp.c > index d3c712f..9608e7a 100644 > --- a/drivers/infiniband/ulp/srp/ib_srp.c > +++ b/drivers/infiniband/ulp/srp/ib_srp.c > @@ -904,7 +904,6 @@ static void srp_rport_delete(struct srp_rport *rport) > > static int srp_connect_target(struct srp_target_port *target) > { > - int retries = 3; > int ret; > > WARN_ON_ONCE(target->connected); > @@ -945,19 +944,10 @@ static int srp_connect_target(struct srp_target_port > *target) > break; > > case SRP_STALE_CONN: > - /* Our current CM id was stale, and is now in > timewait. > -* Try to reconnect with a new one. > -*/ > - if (!retries-- || srp_new_cm_id(target)) { > - shost_printk(KERN_ERR, target->scsi_host, PFX > -"giving up on stale > connection\n"); > - target->status = -ECONNRESET; > - return target->status; > - } > - > shost_printk(KERN_ERR, target->scsi_host, PFX > -"retrying stale connection\n"); > - break; > +"giving up on stale connection\n"); > + target->status = -ECONNRESET; > + return target->status; > > default: > return target->status; > -- > 1.8.4.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 1/3] scsi_cmnd: Introduce scsi_transfer_length helper
On Tue, Jun 10, 2014 at 10:02 PM, Martin K. Petersen wrote: >> "Sagi" == Sagi Grimberg writes: > > +static inline unsigned scsi_prot_length(unsigned data_length, > + unsigned sector_size) > +{ > + switch (sector_size) { > + case 512: > + return (data_length >> 9) * 8; > + case 1024: > + return (data_length >> 10) * 8; > + case 2048: > + return (data_length >> 11) * 8; > + case 4096: > + return (data_length >> 12) * 8; > + default: > + return (data_length >> ilog2(sector_size)) * 8; > + } > +} > + > +static inline unsigned scsi_transfer_length(struct scsi_cmnd *cmd) > +{ > + unsigned data_length; > + > + if (cmd->sc_data_direction == DMA_FROM_DEVICE) { > + data_length = scsi_in(cmd)->length; > + if (scsi_get_prot_op(cmd) == SCSI_PROT_NORMAL || > + scsi_get_prot_op(cmd) == SCSI_PROT_READ_INSERT) > + return data_length; > + } else { > + data_length = scsi_out(cmd)->length; > + if (scsi_get_prot_op(cmd) == SCSI_PROT_NORMAL || > + scsi_get_prot_op(cmd) == SCSI_PROT_WRITE_STRIP) > + return data_length; > + } > + > + /* Protection information exists on the wire */ > + return data_length + scsi_prot_length(data_length, > + cmd->device->sector_size); > +} > > Let's do this for 3.16: Just to make sure, by 3.16 you also mean 3.15.y, right? -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Include protection information in iscsi header
On Tue, Jun 3, 2014 at 9:16 AM, Roland Dreier wrote: > On Sun, Jun 1, 2014 at 9:19 AM, Sagi Grimberg wrote: >> Although these patches involve 3 subsystems with different >> maintainers (scsi, iser, target) I would prefer seeing these >> patches included together. > > Why? Because they break wire compatibility? > > I hate to say it but even if they're merged at the same time, you > can't guarantee that targets and initiators will be updated together. Guys, this all deals with code merged in 3.15-rc1, and (thanks god and linus that -rc8 took place this cycle) 3.15 isn't out yet!! -- so we just need to act quickly and this (having the fix in 3.15 or if too late in 3.15.1) would be OK -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH for-3.15] scsi/libiscsi: Fix static checker warning on bh locking
From: Shlomo Pongratz Commit 659743b "[SCSI] libiscsi: Reduce locking contention in fast path" introduced a new smatch warning on libiscsi.c "iscsi_xmit_task() warn: inconsistent returns bottom_half:: locked (1410 [(-61)]) unlocked (1425 [0], 1425 [s32min-(-1),1-s32max])", which we can eliminate by using non bh locking on the nested spin_lock call. Reported-by: Dan Carpenter Signed-off-by: Shlomo Pongratz Signed-off-by: Or Gerlitz --- drivers/scsi/libiscsi.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 5b8605c..5087957 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -1411,9 +1411,9 @@ static int iscsi_xmit_task(struct iscsi_conn *conn) conn->task = NULL; } /* regular RX path uses back_lock */ - spin_lock_bh(&conn->session->back_lock); + spin_lock(&conn->session->back_lock); __iscsi_put_task(task); - spin_unlock_bh(&conn->session->back_lock); + spin_unlock(&conn->session->back_lock); return rc; } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1] Target/iser: Fail SCSI WRITE command if device detected integrity error
On 18/03/2014 14:52, Sagi Grimberg wrote: If during data-transfer a data-integrity error was detected we must fail the command with CHECK_CONDITION and not execute the command. Cnages from v0: - Added reported-by Or gerlitz Signed-off-by: Sagi Grimberg Reported-by: Or Gerlitz --- s/Cnages/Changes and put the history below the --- line, it need not go into the upstream change-log -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 11/13] SCSI/libiscsi: Add check_protection callback for transports
On Mon, Mar 17, 2014, Sagi Grimberg wrote: > On 3/17/2014 6:59 PM, Mike Christie wrote: >> This patch is ok, but happened to the patch for the xmit task path? > Hey Mike, Thanks for your Ack on this. > The xmit_task fix was posted as a separate patch since it is not specific > to this set (relevant also for existing data dma_mapping in xmit_task). > Care to pick it up? http://marc.info/?l=linux-scsi&m=139404160818424&w=2 Hi Roland, provided that Mike Christie acked the libiscsi patch from this series, and I acked the iser patches and you had no other comments on the series -- can you pick this for 3.15? The merge window is coming soon, if for some reason this will not work for you, I'd like Nic to push that through his tree along with the iser target DIF patches, so need your asap reply, please. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] Target/dif: Introduce protection-passthough-only mode
On Mon, Mar 17, 2014 at 12:52 PM, Sagi Grimberg wrote: > Target/dif: Introduce protection-passthough-only mode s/passthough/passthrough/ -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 11/13] SCSI/libiscsi: Add check_protection callback for transports
On Wed, Mar 5, 2014 at 7:43 PM, Sagi Grimberg wrote: > > iSCSI needs to be at least aware that a task involves protection > information. In case it does, after the transaction completed > libiscsi will ask the transport to check the protection status > of the transaction. > > Unlike transport errors, DIF errors should not prevent successful > completion of the transaction from the transport point of view, > but should be escelated to scsi mid-layer when constructing the > scsi result and sense data. > > check_protection routine will return the ascq corresponding to the > DIF error that occured (or 0 if no error happened). > > return ascq: > - 0x1: GUARD_CHECK_FAILED > - 0x2: APPTAG_CHECK_FAILED > - 0x3: REFTAG_CHECK_FAILED Hi Mike, just to remove doubt, this patch is OK with you, right? can you please ack it, so we know we're all good for the 3.15 merge window Or. > > Signed-off-by: Sagi Grimberg > Signed-off-by: Alex Tabachnik > --- > drivers/scsi/libiscsi.c | 32 > include/scsi/libiscsi.h |4 > include/scsi/scsi_transport_iscsi.h |1 + > 3 files changed, 37 insertions(+), 0 deletions(-) > > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c > index 4046241..3c11acf 100644 > --- a/drivers/scsi/libiscsi.c > +++ b/drivers/scsi/libiscsi.c > @@ -395,6 +395,10 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task > *task) > if (rc) > return rc; > } > + > + if (scsi_get_prot_op(sc) != SCSI_PROT_NORMAL) > + task->protected = true; > + > if (sc->sc_data_direction == DMA_TO_DEVICE) { > unsigned out_len = scsi_out(sc)->length; > struct iscsi_r2t_info *r2t = &task->unsol_r2t; > @@ -823,6 +827,33 @@ static void iscsi_scsi_cmd_rsp(struct iscsi_conn *conn, > struct iscsi_hdr *hdr, > > sc->result = (DID_OK << 16) | rhdr->cmd_status; > > + if (task->protected) { > + sector_t sector; > + u8 ascq; > + > + /** > +* Transports that didn't implement check_protection > +* callback but still published T10-PI support to scsi-mid > +* deserve this BUG_ON. > +**/ > + BUG_ON(!session->tt->check_protection); > + > + ascq = session->tt->check_protection(task, §or); > + if (ascq) { > + sc->result = DRIVER_SENSE << 24 | > +SAM_STAT_CHECK_CONDITION; > + scsi_build_sense_buffer(1, sc->sense_buffer, > + ILLEGAL_REQUEST, 0x10, ascq); > + sc->sense_buffer[7] = 0xc; /* Additional sense length > */ > + sc->sense_buffer[8] = 0; /* Information desc type */ > + sc->sense_buffer[9] = 0xa; /* Additional desc length > */ > + sc->sense_buffer[10] = 0x80; /* Validity bit */ > + > + put_unaligned_be64(sector, &sc->sense_buffer[12]); > + goto out; > + } > + } > + > if (rhdr->response != ISCSI_STATUS_CMD_COMPLETED) { > sc->result = DID_ERROR << 16; > goto out; > @@ -1567,6 +1598,7 @@ static inline struct iscsi_task > *iscsi_alloc_task(struct iscsi_conn *conn, > task->have_checked_conn = false; > task->last_timeout = jiffies; > task->last_xfer = jiffies; > + task->protected = false; > INIT_LIST_HEAD(&task->running); > return task; > } > diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h > index 309f513..1457c26 100644 > --- a/include/scsi/libiscsi.h > +++ b/include/scsi/libiscsi.h > @@ -133,6 +133,10 @@ struct iscsi_task { > unsigned long last_xfer; > unsigned long last_timeout; > boolhave_checked_conn; > + > + /* T10 protection information */ > + boolprotected; > + > /* state set/tested under session->lock */ > int state; > atomic_trefcount; > diff --git a/include/scsi/scsi_transport_iscsi.h > b/include/scsi/scsi_transport_iscsi.h > index 88640a4..2555ee5 100644 > --- a/include/scsi/scsi_transport_iscsi.h > +++ b/include/scsi/scsi_transport_iscsi.h > @@ -167,6 +167,7 @@ struct iscsi_transport { > struct iscsi_bus_flash_conn *fnode_conn); > int (*logout_flashnode_sid) (struct iscsi_cls_session *cls_sess); > int (*get_host_stats) (struct Scsi_Host *shost, char *buf, int len); > + u8 (*check_protection)(struct iscsi_task *task, sector_t *sector); > }; > > /* > -- > 1.7.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majord...@vger.kernel.org > More maj
Re: [PATCH v2 00/13] T10-PI support for iSER initiator
On Wed, Mar 5, 2014 at 7:43 PM, Sagi Grimberg wrote: > Hey Roland, Nic, Mike and Co > > This patchset adds T10 protection information offload support over > RDMA signature verbs API. This set, along with the iSER target set, > allow end-to-end protection information passthrough and validation. > The patchset was tested against Linux SCSI target with iSER DIF > support applied. > > iSER T10-PI support enablement is currently controlled with > module parameters (similar to lpfc for example) which make them > global. In the next phase we can consider passing these parameters > from iscsid nl messages, which would make them per-connection. > > The approach I took with respect to escalating protection information > errors was minimal iSCSI intervention in protection information affairs. > I added libiscsi a hook asking the transport to check the protection > information status, and construct the proper sense data in case of errors > (the alternative of letting the transport to construct sense data seemed > much less appealing). > > Note that this patchset comes on top of a pending patch for iSER to suppress > fastreg completions (http://marc.info/?l=linux-rdma&m=139047309831997&w=2). > > v0 patches are available in target-pending git repo (branch rdma-dif) and > passed 0-DAY testing. > > Roland, I would like to hear your feedback on this. Acked-By: Or Gerlitz Roland, given that Sagi addressed all the feedback from the reviewers, specifically from Mike Christie, and has my iser maintainer ack on the series, can you please go ahead and apply that for 3.15? > > The set is ordered by the following: > - Preparation patches (non/minor functionality changes). > - Add protection information execution support. > - Add protection information status check facilities. > - Publish T10-DIF support to SCSI mid-layer according to > IB device capabilities. > > Changes from v1: > - Removed extra space (BUG_ON). > - Dropped DID_ABORT from sc result for data integrity errors. > - Fixed failed sector report. > > Changes from v0: > - Fix protection information dma registration for unaligned scatterlists > which may happen when the block layer merges bios. > - Don't fail connections on devices without DIF support - warn and continue > without DIF. > - reword FR -> FastReg > > Alex Tabachnik (2): > IB/iser: Introduce pi_enable, pi_guard module parameters > IB/iser: Initialize T10-PI resources > > Sagi Grimberg (11): > IB/iser: Avoid FRWR notation, use fastreg instead > IB/iser: Push the decision what memory key to use into fast_reg_mr > routine > IB/iser: Move fast_reg_descriptor initialization to a function > IB/iser: Keep IB device attributes under iser_device > IB/iser: Replace fastreg descriptor valid bool with indicators > container > IB/iser: Generalize iser_unmap_task_data and > finalize_rdma_unaligned_sg > IB/iser: Generalize fall_to_bounce_buf routine > IB/iser: Support T10-PI operations > SCSI/libiscsi: Add check_protection callback for transports > IB/iser: Implement check_protection > IB/iser: Publish T10-PI support to SCSI midlayer > > drivers/infiniband/ulp/iser/iscsi_iser.c | 46 +++- > drivers/infiniband/ulp/iser/iscsi_iser.h | 71 - > drivers/infiniband/ulp/iser/iser_initiator.c | 98 +- > drivers/infiniband/ulp/iser/iser_memory.c| 445 > +++--- > drivers/infiniband/ulp/iser/iser_verbs.c | 287 - > drivers/scsi/libiscsi.c | 32 ++ > include/scsi/libiscsi.h |4 + > include/scsi/scsi_transport_iscsi.h |1 + > 8 files changed, 771 insertions(+), 213 deletions(-) > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 10/13] IB/iser: Support T10-PI operations
On 04/03/2014 16:44, Sagi Grimberg wrote: @@ -1707,10 +1707,17 @@ int iscsi_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *sc) goto prepd_fault; } } - if (session->tt->xmit_task(task)) { - session->cmdsn--; - reason = FAILURE_SESSION_NOT_READY; - goto prepd_reject; + + reason = session->tt->xmit_task(task); + if (reason) { + if (reason == -ENOMEM || reason == -EAGAIN) { + session->cmdsn--; I am pretty sure this has to be done anyway, no matter why the xmit_task callback failed Even if we abort? this just follows the same logic as iscsi_prep_scsi_cmd_pdu error flow. yes, take a 2nd look on iscsi_prep_scsi_cmd_pdu and you'll see that all the possible error cases take place **before** session->cmdsn is incremented -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 10/13] IB/iser: Support T10-PI operations
On 04/03/2014 11:59, Sagi Grimberg wrote: On 3/4/2014 11:38 AM, Or Gerlitz wrote: On 03/03/2014 06:44, Mike Christie wrote: The xmit_task callout does handle failures like EINVAL. If the above map calls fail then you would get infinite retries. You would currently want to do the mapping in the init_task callout instead. If it makes it easier on the driver implementation then it is ok to modify the xmit_task callers so that they handle multiple error codes for drivers like iser that have the xmit_task callout called from iscsi_queuecommand. Mike, After looking on the code with Sagi, it seems to us that the correct way to go here, would be to enhance in iscsi_queuecommand the processing of the result returned by session->tt->xmit_task(task) to behave in a similar manner to how the return value of iscsi_prep_scsi_cmd_pdu() is treated. E.g for errors such as ENOMEM and EGAIN take the "reject" flow which would cause the SCSI midlayer to retry the command and for other return values go to the "fault" flow which will cause the ML to abort the command. Or. Yes, we were thinking about the following: --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -1707,10 +1707,17 @@ int iscsi_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *sc) goto prepd_fault; } } - if (session->tt->xmit_task(task)) { - session->cmdsn--; - reason = FAILURE_SESSION_NOT_READY; - goto prepd_reject; + + reason = session->tt->xmit_task(task); + if (reason) { + if (reason == -ENOMEM || reason == -EAGAIN) { + session->cmdsn--; I am pretty sure this has to be done anyway, no matter why the xmit_task callback failed + reason = FAILURE_SESSION_NOT_READY; + goto prepd_reject; + } else { + sc->result = DID_ABORT << 16; + goto prepd_fault; + } } } else { list_add_tail(&task->running, &conn->cmdqueue); -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 10/13] IB/iser: Support T10-PI operations
On 03/03/2014 06:44, Mike Christie wrote: The xmit_task callout does handle failures like EINVAL. If the above map calls fail then you would get infinite retries. You would currently want to do the mapping in the init_task callout instead. If it makes it easier on the driver implementation then it is ok to modify the xmit_task callers so that they handle multiple error codes for drivers like iser that have the xmit_task callout called from iscsi_queuecommand. Mike, After looking on the code with Sagi, it seems to us that the correct way to go here, would be to enhance in iscsi_queuecommand the processing of the result returned by session->tt->xmit_task(task) to behave in a similar manner to how the return value of iscsi_prep_scsi_cmd_pdu() is treated. E.g for errors such as ENOMEM and EGAIN take the "reject" flow which would cause the SCSI midlayer to retry the command and for other return values go to the "fault" flow which will cause the ML to abort the command. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] iser-target: Fix post_send_buf_count for RDMA READ/WRITE
On 04/03/2014 02:01, Nicholas A. Bellinger wrote: This is necessary because even though IB_SEND_SIGNALED is not set for RDMA WRITEs + READs, during a QP failure event the work requests will be returned with exception status from the TX completion queue. Impossible... for rdma reads we must ask for completing, since we should write the data for the back-end, I assume it's just wrong mentioning of rdma-read here, right? -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux rdma 3.14 merge plans
On Wed, Jan 22, 2014, Sagi Grimberg wrote: > On 1/22/2014 2:43 AM, Roland Dreier wrote: >> On Tue, Jan 21, 2014, Or Gerlitz wrote: >>> Roland, ping! the signature patches were posted > three months ago. We >>> deserve a response from the maintainer that goes beyond "I need to >>> think on that". Responsiveness was stated by Linus to be the #1 requirement >>> from kernel maintainers. > Hi Roland, I'll try to respond here. removing LKML and adding Linux-scsi. Sorry, it seems we will not getting responses unless coping LKML, so lets do that again -- Roland, below is the detailed response Sagi wrote you following your "adds a bunch of complexity to support a feature no one really cares about" comment, can we get you to respond on that? >> Or, I'm not sure what response you're after from me. Linus has also >> said that maintainers should say "no" a lot more >> (http://lwn.net/Articles/571995/) so maybe you want me to say, "No, I >> won't merge this patch set, since it adds a bunch of complexity to >> support a feature no one really cares about." > 1. I disagree about no-one cares about DIF/DIX. We are witnessing growing > interests in this especially for RDMA. > 2. We put a lot of efforts to avoid complexity here and plug-in as simple as > possible. > Application that will choose to use DIF will implement only 3 steps: > a. allocate signature enabled MR. > b. register signature enabled MR with DIF attributes (via post_send) and > then do RDMA. > c. check MR status after transaction is completed (_lightweight_ verb that > can be called from interrupt context). >> Is that it? (And yes I am skeptical about this stuff -- I work at an >> enterprise >> storage company and even here it's hard to find anyone who cares about >> DIF/DIX, especially offload features that stop it from being end-to-end) > 1. RDMA verbs are _NOT_ stopping DIF from being end-to-end. > OS (or SCSI in our specific case) passes LLD 2 scatterlists: data {block1, > block2, block3,...}, and protection {DIF1, DIF2, DIF3}. > LLD is required to verify the data integrity (block guards) and to > interleave over the wire {block1, DIF1, block2, DIF2}. > You must support that in HW, you rather iSER/SRP will use giant copy's to > interleave by itself? or in case OS asked LLD > to INSERT DIF iSER/SRP will compute CRC for each data-block? RDMA storage > ULPs are transports - they should have no business with > data processing. > 2. HW DIF offload also gives you protection across the PCI. the > data-validation is done (hopefully offloaded) also > when data+protection are written to the back-end device. end-to-end is > preserved. > 3. SAS & FC have T10-PI offload. This is just adding RDMA into the game. > With this set of verbs iSER, SRP, FCoE Initiators and targets will be able > to support T10-PI. >> I'm sure you're not expecting me to say, "Sure, I'll merge it without >> understanding the problem it's solving > Problem: T10-PI offload support for RDMA based initiators. Supporting > end-to-end data integrity while sustaining high RDMA performance. >> or how it's doing that," > How it's doing that: > - We introduce a new type of memory region that posses protection attributes > suited for data integrity offload. > - We Introduce a new fast registration method that can bind all the relevant > info for verify/generate of protection information: > * describe if/how to interleave data with protection. > * describe what method of data integrity is used (DIF type X, CRC, XOR...) > and the seeds that HW should start calculation from. > * describe how to verify the data. > - We Introduce a new lightweight check of the data-integrity status to check > if there were any integrity errors and get information on them. > Note: We made MR allocation routine generic enough to lay a framework to > unite all MR allocation > methods (get_dma_mr, alloc_fast_reg_mr, reg_phys, reg_user_mr, fmrs, and > probably more in the future...). > We defined ib_create_mr that can actually get mr_init_attr which can be > easily extended as opposed to the specific calls exists today. > So I would say this even reduces complexity. > Hope this helps, > Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/11] IB/isert: Initialize T10-PI resources
On 12/01/2014 14:41, Sagi Grimberg wrote: --- a/drivers/infiniband/ulp/isert/ib_isert.h +++ b/drivers/infiniband/ulp/isert/ib_isert.h @@ -48,11 +48,21 @@ struct iser_tx_desc { struct ib_send_wr send_wr; } __packed; +struct pi_context { + struct ib_mr *prot_mr; + bool prot_key_valid; + struct ib_fast_reg_page_list *prot_frpl; + struct ib_mr *sig_mr; + bool sig_key_valid; +}; + struct fast_reg_descriptor { - struct list_headlist; - struct ib_mr*data_mr; - struct ib_fast_reg_page_list*data_frpl; - boolvalid; + struct list_headlist; + struct ib_mr *data_mr; + bool data_key_valid; + struct ib_fast_reg_page_list *data_frpl; + boolprotected; no need for many bools in one structure... each one needs a bit, correct? so embed them in one variable I figured it will be more explicit this way. protected boolean indicates if we should check the data-integrity status, and the other 3 indicates if the relevant MR is valid (no need to execute local invalidation). Do you think I should compact it somehow? usually xxx_valid booleans will align together although not always. I didn't note there are so many booleans there... sure, my personal preference is compaction, e.g have one field names flags and multiple bit locations marking the different flags e.g FAST_REG_DESC_VALID FAST_REG_DESC_DATA_KEY_VALID FAST_REG_DESC_PROTECTED etc -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/11] IB/isert: Accept RDMA_WRITE completions
On Thu, Jan 9, 2014 at 6:40 PM, Sagi Grimberg wrote: > In case of protected transactions, we will need to check the > protection status of the transaction before sending SCSI response. > So be ready for RDMA_WRITE completions. currently we don't ask > for these completions, but for T10-PI we will. > @@ -1721,9 +1735,9 @@ __isert_send_completion(struct iser_tx_desc *tx_desc, > isert_conn, ib_dev); > break; > case ISER_IB_RDMA_WRITE: > - pr_err("isert_send_completion: Got ISER_IB_RDMA_WRITE\n"); > - dump_stack(); > - break; > + pr_debug("isert_send_completion: Got ISER_IB_RDMA_WRITE\n"); > + atomic_dec(&isert_conn->post_send_buf_count); > + isert_completion_rdma_write(tx_desc, isert_cmd); are we doing fall through here? why? > case ISER_IB_RDMA_READ: > pr_debug("isert_send_completion: Got ISER_IB_RDMA_READ:\n"); > > -- > 1.7.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/11] IB/isert: Initialize T10-PI resources
On Thu, Jan 9, 2014 at 6:40 PM, Sagi Grimberg wrote: > @@ -557,8 +629,14 @@ isert_connect_request(struct rdma_cm_id *cma_id, struct > rdma_cm_event *event) > goto out_mr; > } > > + if (pi_support && !device->pi_capable) { > + pr_err("Protection information requested but not > supported\n"); > + ret = -EINVAL; > + goto out_mr; > + } > + > if (device->use_fastreg) { > - ret = isert_conn_create_fastreg_pool(isert_conn); > + ret = isert_conn_create_fastreg_pool(isert_conn, pi_support); just a nit, the pi_support bit can be looked up from the isert_conn struct, isn't it? > if (ret) { > pr_err("Conn: %p failed to create fastreg pool\n", >isert_conn); > @@ -566,7 +644,7 @@ isert_connect_request(struct rdma_cm_id *cma_id, struct > rdma_cm_event *event) > } > } > > - ret = isert_conn_setup_qp(isert_conn, cma_id); > + ret = isert_conn_setup_qp(isert_conn, cma_id, pi_support); > if (ret) > goto out_conn_dev; > > @@ -2193,7 +2271,7 @@ isert_fast_reg_mr(struct fast_reg_descriptor *fr_desc, > pagelist_len = isert_map_fr_pagelist(ib_dev, sg_start, sg_nents, > > &fr_desc->data_frpl->page_list[0]); > > - if (!fr_desc->valid) { > + if (!fr_desc->data_key_valid) { > memset(&inv_wr, 0, sizeof(inv_wr)); > inv_wr.opcode = IB_WR_LOCAL_INV; > inv_wr.ex.invalidate_rkey = fr_desc->data_mr->rkey; > @@ -2225,7 +2303,7 @@ isert_fast_reg_mr(struct fast_reg_descriptor *fr_desc, > pr_err("fast registration failed, ret:%d\n", ret); > return ret; > } > - fr_desc->valid = false; > + fr_desc->data_key_valid = false; > > ib_sge->lkey = fr_desc->data_mr->lkey; > ib_sge->addr = fr_desc->data_frpl->page_list[0] + page_off; > diff --git a/drivers/infiniband/ulp/isert/ib_isert.h > b/drivers/infiniband/ulp/isert/ib_isert.h > index 708a069..fab8b50 100644 > --- a/drivers/infiniband/ulp/isert/ib_isert.h > +++ b/drivers/infiniband/ulp/isert/ib_isert.h > @@ -48,11 +48,21 @@ struct iser_tx_desc { > struct ib_send_wr send_wr; > } __packed; > > +struct pi_context { > + struct ib_mr *prot_mr; > + boolprot_key_valid; > + struct ib_fast_reg_page_list *prot_frpl; > + struct ib_mr *sig_mr; > + boolsig_key_valid; > +}; > + > struct fast_reg_descriptor { > - struct list_headlist; > - struct ib_mr*data_mr; > - struct ib_fast_reg_page_list*data_frpl; > - boolvalid; > + struct list_headlist; > + struct ib_mr *data_mr; > + booldata_key_valid; > + struct ib_fast_reg_page_list *data_frpl; > + boolprotected; no need for many bools in one structure... each one needs a bit, correct? so embed them in one variable > + struct pi_context *pi_ctx; > }; > > struct isert_rdma_wr { > @@ -140,6 +150,7 @@ struct isert_cq_desc { > > struct isert_device { > int use_fastreg; > + boolpi_capable; this one (and its such) is/are derived from the ib device capabilities, so I would suggest to keep a copy of the caps instead of derived bools > int cqs_used; > int refcount; > int cq_active_qps[ISERT_MAX_CQ]; -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser
On 12/11/2013 06:34, Nicholas A. Bellinger wrote: Once iscsi_conn_queue_work() is invoked here to start process context execution of iscsi_xmitworker() -> iscsi_data_xmit() code, AFAICT there is no logic in place within iscsi_data_xmit() to honor the last received MaxCmdSN. Or to put it another way: what is preventing iscsi_data_xmit() from completely draining both conn->cmdqueue + conn->requeue, even when the CmdSN window has potentially been closed again..? Guys, Note that the iser initiator transport uses the pass-through command submission mode of libiscsi, that is iscsi_conn_queue_work isn't called from queuecommand at all. This is b/c we call iscsi_host_allocwith xmit_can_sleep = 0. Hence no workqueue is used for the command processing/submission over the wire, just a call toiscsi_prep_scsi_cmd_pdu and following that to iser's xmit_task callbackwhich isiscsi_iser_task_xmit that calls iser_send_command, etc. Mike, Nic is not using the new locking framework patches for libiscsi, as you know they are not upstream yet... Or. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] SCSI/libiscsi: Restructure iscsi_tcp r2t response logic
From: Shlomo Pongratz Restructure the iscsi_tcp_r2t_rsp routine in order to avoid allocating r2t from r2tpool.queue and returning it back in case the parameters rhdr->data_length and or rhdr->data_offset prohibit the requing. Since the values of these parameters are known prior to the allocation, we can pre-check and thus avoid futile allocations. Signed-off-by: Shlomo Pongratz Signed-off-by: Or Gerlitz --- drivers/scsi/libiscsi_tcp.c | 43 ++- 1 files changed, 22 insertions(+), 21 deletions(-) diff --git a/drivers/scsi/libiscsi_tcp.c b/drivers/scsi/libiscsi_tcp.c index 1d58d53..7f59073 100644 --- a/drivers/scsi/libiscsi_tcp.c +++ b/drivers/scsi/libiscsi_tcp.c @@ -529,6 +529,8 @@ static int iscsi_tcp_r2t_rsp(struct iscsi_conn *conn, struct iscsi_task *task) struct iscsi_r2t_rsp *rhdr = (struct iscsi_r2t_rsp *)tcp_conn->in.hdr; struct iscsi_r2t_info *r2t; int r2tsn = be32_to_cpu(rhdr->r2tsn); + u32 data_length; + u32 data_offset; int rc; if (tcp_conn->in.datalen) { @@ -554,40 +556,39 @@ static int iscsi_tcp_r2t_rsp(struct iscsi_conn *conn, struct iscsi_task *task) return 0; } - rc = kfifo_out(&tcp_task->r2tpool.queue, (void*)&r2t, sizeof(void*)); - if (!rc) { - iscsi_conn_printk(KERN_ERR, conn, "Could not allocate R2T. " - "Target has sent more R2Ts than it " - "negotiated for or driver has leaked.\n"); - return ISCSI_ERR_PROTO; - } - - r2t->exp_statsn = rhdr->statsn; - r2t->data_length = be32_to_cpu(rhdr->data_length); - if (r2t->data_length == 0) { + data_length = be32_to_cpu(rhdr->data_length); + if (data_length == 0) { iscsi_conn_printk(KERN_ERR, conn, "invalid R2T with zero data len\n"); - kfifo_in(&tcp_task->r2tpool.queue, (void*)&r2t, - sizeof(void*)); return ISCSI_ERR_DATALEN; } - if (r2t->data_length > session->max_burst) + if (data_length > session->max_burst) ISCSI_DBG_TCP(conn, "invalid R2T with data len %u and max " "burst %u. Attempting to execute request.\n", - r2t->data_length, session->max_burst); + data_length, session->max_burst); - r2t->data_offset = be32_to_cpu(rhdr->data_offset); - if (r2t->data_offset + r2t->data_length > scsi_out(task->sc)->length) { + data_offset = be32_to_cpu(rhdr->data_offset); + if (data_offset + data_length > scsi_out(task->sc)->length) { iscsi_conn_printk(KERN_ERR, conn, "invalid R2T with data len %u at offset %u " - "and total length %d\n", r2t->data_length, - r2t->data_offset, scsi_out(task->sc)->length); - kfifo_in(&tcp_task->r2tpool.queue, (void*)&r2t, - sizeof(void*)); + "and total length %d\n", data_length, + data_offset, scsi_out(task->sc)->length); return ISCSI_ERR_DATALEN; } + rc = kfifo_out(&tcp_task->r2tpool.queue, (void*)&r2t, sizeof(void*)); + if (!rc) { + iscsi_conn_printk(KERN_ERR, conn, "Could not allocate R2T. " + "Target has sent more R2Ts than it " + "negotiated for or driver has leaked.\n"); + return ISCSI_ERR_PROTO; + } + + r2t->exp_statsn = rhdr->statsn; + r2t->data_length = data_length; + r2t->data_offset = data_offset; + r2t->ttt = rhdr->ttt; /* no flip */ r2t->datasn = 0; r2t->sent = 0; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] SCSI/libiscsi: Reduce locking contention in fast path
From: Shlomo Pongratz Replace the session lock with two locks, a forward lock and a backwards lock named frwd_lock and back_lock respectively. The forward lock protects resources that change while sending a request to the target, such as cmdsn, queued_cmdsn, and allocating task from the commands' pool with kfifo_out. The backward lock protects resources that change while processing a response or in error path, such as cmdsn_exp, cmdsn_max, and returning tasks to the commands' pool with kfifo_in. Under a steady state fast-path situation, that is when one or more processes/threads submit IO to an iscsi device and a single kernel upcall (e.g softirq) is dealing with processing of responses without errors, this patch eliminates the contention between the queuecommand()/request response/scsi_done() flows associated with iscsi sessions. Between the forward and the backward locks exists a strict locking hierarchy. The mutual exclusion zone protected by the forward lock can enclose the mutual exclusion zone protected by the backward lock but not vice versa. For example, in iscsi_conn_teardown or in iscsi_xmit_data when there is a failure and __iscsi_put_task is called, the backward lock is taken while the forward lock is still taken. On the other hand, if in the RX path a nop is to be sent, for example in iscsi_handle_reject or __iscsi_complete_pdu than the forward lock is released and the backward lock is taken for the duration of iscsi_send_nopout, later the backward lock is released and the forward lock is retaken. libiscsi_tcp uses two kernel fifos the r2t pool and the r2t queue. The insertion and deletion from these queues didn't corespond to the assumption taken by the new forward/backwards session locking paradigm. That is, in iscsi_tcp_clenup_task which belongs to the RX (backwards) path, r2t is taken out from r2t queue and inserted to the r2t pool. In iscsi_tcp_get_curr_r2t which belong to the TX (forward) path, r2t is also inserted to the r2t pool and another r2t is pulled from r2t queue. Only in iscsi_tcp_r2t_rsp which is called in the RX path but can requeue to the TX path, r2t is taken from the r2t pool and inserted to the r2t queue. In order to cope with this situation, two spin locks were added, pool2queue and queue2pool. The former protects extracting from the r2t pool and inserting to the r2t queue, and the later protects the extracing from the r2t queue and inserting to the r2t pool. Signed-off-by: Shlomo Pongratz Signed-off-by: Or Gerlitz --- drivers/scsi/be2iscsi/be_main.c | 26 +++--- drivers/scsi/bnx2i/bnx2i_hwi.c | 46 drivers/scsi/bnx2i/bnx2i_iscsi.c |8 +- drivers/scsi/iscsi_tcp.c | 22 ++-- drivers/scsi/libiscsi.c | 214 + drivers/scsi/libiscsi_tcp.c | 28 +++-- drivers/scsi/qla4xxx/ql4_isr.c |4 +- include/scsi/libiscsi.h | 17 ++- include/scsi/libiscsi_tcp.h |2 + 9 files changed, 206 insertions(+), 161 deletions(-) diff --git a/drivers/scsi/be2iscsi/be_main.c b/drivers/scsi/be2iscsi/be_main.c index 1f37505..1fb16b1 100644 --- a/drivers/scsi/be2iscsi/be_main.c +++ b/drivers/scsi/be2iscsi/be_main.c @@ -232,20 +232,20 @@ static int beiscsi_eh_abort(struct scsi_cmnd *sc) cls_session = starget_to_session(scsi_target(sc->device)); session = cls_session->dd_data; - spin_lock_bh(&session->lock); + spin_lock_bh(&session->frwd_lock); if (!aborted_task || !aborted_task->sc) { /* we raced */ - spin_unlock_bh(&session->lock); + spin_unlock_bh(&session->fwrd_lock); return SUCCESS; } aborted_io_task = aborted_task->dd_data; if (!aborted_io_task->scsi_cmnd) { /* raced or invalid command */ - spin_unlock_bh(&session->lock); + spin_unlock_bh(&session->fwrd_lock); return SUCCESS; } - spin_unlock_bh(&session->lock); + spin_unlock_bh(&session->fwrd_lock); /* Invalidate WRB Posted for this Task */ AMAP_SET_BITS(struct amap_iscsi_wrb, invld, aborted_io_task->pwrb_handle->pwrb, @@ -307,9 +307,9 @@ static int beiscsi_eh_device_reset(struct scsi_cmnd *sc) /* invalidate iocbs */ cls_session = starget_to_session(scsi_target(sc->device)); session = cls_session->dd_data; - spin_lock_bh(&session->lock); + spin_lock_bh(&session->frwd_lock); if (!session->leadconn || session->state != ISCSI_STATE_LOGGED_IN) { - spin_unlock_bh(&session->lock); + spin_unlock_bh(&session->frwd_lock); return FAILED; } conn = session->leadconn; @@ -338,7 +338,7 @@ static int beiscsi_eh_device_reset(struct scsi_cmnd *sc) nu
[PATCH 0/3] SCSI/libiscsi: Reduce locking contention in fast path
Hi James, This series is about reducing locking contention in the IO submission/response processing fast path of libiscsi and the various iscsi transports usage of libiscsi code. We replace the session lock with two locks, a forward lock and a backwards lock named frwd_lock and back_lock respectively. The forward lock protects resources that change while sending a request to the target, such as cmdsn, queued_cmdsn, and allocating task from the commands' pool with kfifo_out. The backward lock protects resources that change while processing a response or in error path, such as cmdsn_exp, cmdsn_max, and returning tasks to the commands' pool with kfifo_in. The 1st patch in the series is a restructuring patch for iscsi_tcp r2t response logic, the 2nd is the main patch and the 3rd is cleanup asked by Mike who reviewed the whole series when we posted in over the open-iscsi mailing list. Under a "steady state" fast-path situation, that is when one or more processes/threads submit IO to an iscsi device and a single kernel upcall (e.g softirq) is dealing with processing of responses without errors, this patch eliminates the contention between the queuecommand()/request response/scsi_done() associated with iscsi sessions. Or and Shlomo. Shlomo Pongratz (3): SCSI/libiscsi: Restructure iscsi_tcp r2t response logic SCSI/libiscsi: Reduce locking contention in fast path SCSI/libiscsi: Remove unneeded code drivers/scsi/be2iscsi/be_main.c | 26 ++-- drivers/scsi/bnx2i/bnx2i_hwi.c | 46 drivers/scsi/bnx2i/bnx2i_iscsi.c |8 +- drivers/scsi/iscsi_tcp.c | 22 ++-- drivers/scsi/libiscsi.c | 226 -- drivers/scsi/libiscsi_tcp.c | 71 +++- drivers/scsi/qla4xxx/ql4_isr.c |4 +- include/scsi/libiscsi.h | 17 ++- include/scsi/libiscsi_tcp.h |2 + 9 files changed, 228 insertions(+), 194 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] SCSI/libiscsi: Remove unneeded code
From: Shlomo Pongratz We never will have a closed window and something on one of those lists. Signed-off-by: Mike Christie Signed-off-by: Shlomo Pongratz --- drivers/scsi/libiscsi.c | 12 1 files changed, 0 insertions(+), 12 deletions(-) diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 9c28c46..dd66d8a 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -108,18 +108,6 @@ static void __iscsi_update_cmdsn(struct iscsi_session *session, if (exp_cmdsn != session->exp_cmdsn && !iscsi_sna_lt(exp_cmdsn, session->exp_cmdsn)) session->exp_cmdsn = exp_cmdsn; - - if (max_cmdsn != session->max_cmdsn && - !iscsi_sna_lt(max_cmdsn, session->max_cmdsn)) { - session->max_cmdsn = max_cmdsn; - /* -* if the window closed with IO queued, then kick the -* xmit thread -*/ - if (!list_empty(&session->leadconn->cmdqueue) || - !list_empty(&session->leadconn->mgmtqueue)) - iscsi_conn_queue_work(session->leadconn); - } } void iscsi_update_cmdsn(struct iscsi_session *session, struct iscsi_nopin *hdr) -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] IB/iser: Add Discovery support
To run discovery over iSER we need to advertize the CAP_TEXT_NEGO capability towards user space. Also need to make sure the login RX buffer is posted when SendTargets TEXT PDUs are sent. For that end, we use a setting of the ISCSI_PARAM_DISCOVERY_SESS iscsi param as an indication that this is discovery session. Signed-off-by: Or Gerlitz --- Hi James, I submit this patch through your tree since it depends on the below iscsi patches (which are acknowledged by Mike) who are targted to 3.12 Or. [PATCH V2 1/4] scsi_transport_iscsi: Exporting new attrs for iscsi session and connection in sysfs http://marc.info/?l=linux-scsi&m=137267405429329&w=2 [PATCH V2 2/4] libiscsi: Exporting new attrs for iscsi session and connection in sysfs http://marc.info/?l=linux-scsi&m=137267405529330&w=2 drivers/infiniband/ulp/iser/iscsi_iser.c |3 ++- drivers/infiniband/ulp/iser/iser_initiator.c | 11 ++- drivers/scsi/libiscsi.c |5 + 3 files changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c index 2e84ef8..1ec78bd 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.c +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -672,6 +672,7 @@ static umode_t iser_attr_is_visible(int param_type, int param) case ISCSI_PARAM_TGT_RESET_TMO: case ISCSI_PARAM_IFACE_NAME: case ISCSI_PARAM_INITIATOR_NAME: + case ISCSI_PARAM_DISCOVERY_SESS: return S_IRUGO; default: return 0; @@ -701,7 +702,7 @@ static struct scsi_host_template iscsi_iser_sht = { static struct iscsi_transport iscsi_iser_transport = { .owner = THIS_MODULE, .name = "iser", - .caps = CAP_RECOVERY_L0 | CAP_MULTI_R2T, + .caps = CAP_RECOVERY_L0 | CAP_MULTI_R2T | CAP_TEXT_NEGO, /* session management */ .create_session = iscsi_iser_session_create, .destroy_session= iscsi_iser_session_destroy, diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index b6d81a8..b31fa1d 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -234,6 +234,7 @@ void iser_free_rx_descriptors(struct iser_conn *ib_conn) static int iser_post_rx_bufs(struct iscsi_conn *conn, struct iscsi_hdr *req) { struct iscsi_iser_conn *iser_conn = conn->dd_data; + struct iscsi_session *session = conn->session; iser_dbg("req op %x flags %x\n", req->opcode, req->flags); /* check if this is the last login - going to full feature phase */ @@ -248,7 +249,13 @@ static int iser_post_rx_bufs(struct iscsi_conn *conn, struct iscsi_hdr *req) WARN_ON(iser_conn->ib_conn->post_recv_buf_count != 1); WARN_ON(atomic_read(&iser_conn->ib_conn->post_send_buf_count) != 0); - iser_dbg("Initially post: %d\n", ISER_MIN_POSTED_RX); + if (session->discovery_sess) { + iser_info("Discovery session, re-using login RX buffer\n"); + return 0; + } else + iser_info("Normal session, posting batch of RX %d buffers\n", + ISER_MIN_POSTED_RX); + /* Initial post receive buffers */ if (iser_post_recvm(iser_conn->ib_conn, ISER_MIN_POSTED_RX)) return -ENOMEM; @@ -425,6 +432,8 @@ int iser_send_control(struct iscsi_conn *conn, } if (task == conn->login_task) { + iser_dbg("op %x dsl %lx, posting login rx buffer\n", +task->hdr->opcode, data_seg_len); err = iser_post_recvl(iser_conn->ib_conn); if (err) goto send_control_error; diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 86153e0..f17a692 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -3170,6 +3170,7 @@ int iscsi_set_param(struct iscsi_cls_conn *cls_conn, { struct iscsi_conn *conn = cls_conn->dd_data; struct iscsi_session *session = conn->session; + int val; switch(param) { case ISCSI_PARAM_FAST_ABORT: @@ -3264,6 +3265,10 @@ int iscsi_set_param(struct iscsi_cls_conn *cls_conn, case ISCSI_PARAM_DISCOVERY_PARENT_TYPE: return iscsi_switch_str_param(&session->discovery_parent_type, buf); + case ISCSI_PARAM_DISCOVERY_SESS: + sscanf(buf, "%d", &val); + session->discovery_sess = !!val; + break; default: return -ENOSYS; } -- 1.7.1 -- To unsubscribe from this lis
Re: [PATCH v3 0/13] IB SRP initiator patches for kernel 3.11
On 03/07/2013 15:41, Bart Van Assche wrote: [...] Bart, The individual patches in this series are as follows: 0001-IB-srp-Fix-remove_one-crash-due-to-resource-exhausti.patch 0002-IB-srp-Fix-race-between-srp_queuecommand-and-srp_cla.patch 0003-IB-srp-Avoid-that-srp_reset_host-is-skipped-after-a-.patch 0004-IB-srp-Fail-I-O-fast-if-target-offline.patch 0005-IB-srp-Skip-host-settle-delay.patch 0006-IB-srp-Maintain-a-single-connection-per-I_T-nexus.patch 0007-IB-srp-Keep-rport-as-long-as-the-IB-transport-layer.patch 0008-scsi_transport_srp-Add-transport-layer-error-handlin.patch 0009-IB-srp-Add-srp_terminate_io.patch 0010-IB-srp-Use-SRP-transport-layer-error-recovery.patch 0011-IB-srp-Start-timers-if-a-transport-layer-error-occur.patch 0012-IB-srp-Fail-SCSI-commands-silently.patch 0013-IB-srp-Make-HCA-completion-vector-configurable.patch 0014-IB-srp-Make-transport-layer-retry-count-configurable.patch 0015-IB-srp-Bump-driver-version-and-release-date.patch Some of these patches were already picked by Roland (SB), I would suggest that you post V4 and drop the ones which were accepted. e8ca413 IB/srp: Bump driver version and release date 4b5e5f4 IB/srp: Make HCA completion vector configurable 96fc248 IB/srp: Maintain a single connection per I_T nexus 99e1c13 IB/srp: Fail I/O fast if target offline 2742c1d IB/srp: Skip host settle delay 086f44f IB/srp: Avoid skipping srp_reset_host() after a transport error 1fe0cb8 IB/srp: Fix remove_one crash due to resource exhaustion Also, Would help if you use the --cover-letter of git format-patch and the resulted cover letter (patch 0/N) as it has standard content which you can enhance and place your additions. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] target fixes for v3.10-rc2
On Thu, May 16, 2013 at 7:53 AM, Nicholas A. Bellinger wrote: > Yes, that's the patch included in the PULL request here: > http://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/commit/?h=queue&id=52c07423a819091b0fe9abbf26977098b996f85b Cool! > > Btw, It still needs to be enabled in userspace, but appending a > rd_nullio=1 to the /control parameters with the following lio-utils > patch should work for your testing purposes atm.. are you merging this into the lio-utils git tree and maybe sparing a word on that in the documentation (where?) we'd like to point people to these bits and guidelines. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] target fixes for v3.10-rc2
On Wed, May 15, 2013 at 5:50 PM, Nicholas A. Bellinger wrote: > A handful of fixes + minor changes this time around, along with one > important >= v3.9 regression fix for IBLOCK backends. The highlights include: Hi Nic, Are you pushing also a patch which would allow to actually test NULL based RAM target (e.g target backend that does no actual IO) Or. > Nicholas Bellinger (5): > iscsi-target: Fix NULL pointer dereference in iscsit_send_reject > target/rd: Add ramdisk bit for NULLIO operation > iscsi-target: Fix typos in RDMAEXTENSIONS macro usage > MAINTAINERS: Update target git tree URL > target/iblock: Fix WCE=1 + DPOFUA=1 backend WRITE regression > > Shlomo Pongratz (1): > iscsi-target: Fix processing of OOO commands -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: initial LIO iSER performance numbers [was: GIT PULL] target updates for v3.10-rc1)
On Thu, May 2, 2013 at 10:31 PM, Nicholas A. Bellinger wrote: >> We used RAMDISK_MCP backend which was patched to act as NULL device, so >> we can test the raw iSER wire performance. > Btw, I'll be including a similar patch to allow for RAMDISK_NULL to be > configured as a NULL device mode. yep, that would be very helpful, so people can do that sort of testing without hacks... Or. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: initial LIO iSER performance numbers [was: GIT PULL] target updates for v3.10-rc1)
On 30/04/2013 05:59, Nicholas A. Bellinger wrote: Hello Linus! Here are the target pending changes for the v3.10-rc1 merge window. Please go ahead and pull from: git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git for-next-merge The highlights this round include: - Add fileio support for WRITE_SAME w/ UNMAP=1 discard (asias) - Add fileio support for UNMAP discard (asias) - Add tcm_vhost hotplug support to work with upstream QEMU vhost-scsi-pci code (asias + mst) - Check for aborted sequence in tcm_fc response path (mdr) - Add initial iscsit_transport support into iscsi-target code (nab) - Refactor iscsi-target RX PDU logic + export request PDU handling (nab) - Refactor iscsi-target TX queue logic + export response PDU creation (nab) - Add new iSCSI Extentions for RDMA (ISER) target driver (Or + nab) The biggest changes revolve around iscsi-target refactoring in order to support the iser-target driver. This includes the conversion of the iscsi-target data-path to use modern se_cmd->cmd_kref counting, and allowing transport independent aspects of RX/TX PDU request/response handling be shared across existing traditional iscsi-target code, and the new iser-target code. Hi Nic, everyone, So LIO iser target code is now merged into Linus tree, and will be in kernel 3.10, exciting! Here's some data on raw performance numbers we were able to get with the LIO iser code. For single initiator and single lun, block sizes varying over the range 1KB,2KB... 128KB doing random read: 1KB 227,870K 2KB 458,099K 4KB 909,761K 8KB 1,679,922K 16KB 3,233,753K 32KB 4,905,139K 64KB 5,294,873K 128KB 5,565,235K When enlarging the number of luns and still with single initiator, for 1KB randomreads we get: 1 LUN = 230k IOPS 2 LUNs = 420k IOPS 4 LUNs = 740k IOPS When enlarging the number of initiators, and each having four lunswe get for 1KB random reads: 1 initiator x 4 LUNs = 740k IOPS 2 initiators x 4 LUNs = 1480k IOPS 3 initiators x 4 LUNs = 1570k IOPS So all in all, things scale pretty nicely, and we observe a some bottleneck in the IOPS rate around 1.6 Million IOPS, so there's where to improve... Here's the fio command line used by the initiators $ fio --cpumask=0xfc --rw=randread --bs=1k --numjobs=2 --iodepth=128 --runtime=62 --time_based --size=1073741824k --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --group_reporting --exitall --name dev-sdb-randread-1k-2thr-libaio-128iodepth-62sec --filename=/dev/sdb And some details on the setup: The nodes are HP ProLiant DL380p Gen8 with the following CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz two NUMA nodes with eight cores each, 32GB RAM, PCI express gen3 8x, the HCA being Mellanox ConnectX3 with firmware 2.11.500 The target node was running upstream kernel and the initiators RHEL 6.3 kernel, all X86_64 We used RAMDISK_MCP backend which was patched to act as NULL device, so we can test the raw iSER wire performance. Or. Thanks to Or Gerlitz + Mellanox for supporting the iser-target development effort! Thank you, --nab Andy Grover (2): target/iscsi: Remove chap_set_random() target/iscsi: Use ISCSI_LOGIN_CURRENT/NEXT_STAGE macros Asias He (10): target/file: Add WRITE_SAME w/ UNMAP=1 emulation support target/file: Add UNMAP emulation support target/file: Add fd_do_unmap() helper target/iblock: Add iblock_do_unmap() helper target: Add sbc_execute_unmap() helper target/file: Set is_nonrot attribute tcm_vhost: Refactor the lock nesting rule tcm_vhost: Add hotplug/hotunplug support tcm_vhost: Add ioctl to get and set events missed flag tcm_vhost: Enable VIRTIO_SCSI_F_HOTPLUG Jörn Engel (2): qla2xxx: Remove unused function target: Change default sense key of NOT_READY Mark Rustad (1): tcm_fc: Check for aborted sequence Nicholas Bellinger (9): target: Add export of target_get_sess_cmd symbol iscsi-target: Add iscsit_transport API template iscsi-target: Initial traditional TCP conversion to iscsit_transport iscsi-target: Add iser-target parameter keys + setup during login iscsi-target: Add per transport iscsi_cmd alloc/free iscsi-target: Refactor RX PDU logic + export request PDU handling iscsi-target: Refactor TX queue logic + export response PDU creation iscsi-target: Add iser network portal attribute iser-target: Add iSCSI Extensions for RDMA (iSER) target driver Wei Yongjun (1): tcm_fc: using kfree_rcu() to simplify the code drivers/infiniband/Kconfig |1 + drivers/infiniband/Makefile|1 + drivers/infiniband/ulp/isert/Kconfig |5 + drivers/infiniband/ulp/isert/Makefile |2 + drivers/infiniband/ulp/isert/ib_isert.c| 2281 drivers/infiniband/ulp/isert/ib_isert.h| 138 ++ drivers/infiniband/ulp/isert/isert_proto.h
Re: [RFC-v4 9/9] iser-target: Add iSCSI Extensions for RDMA (iSER) target driver
On 12/04/2013 23:52, Nicholas A. Bellinger wrote: From: Nicholas Bellinger This patch adds support for iSCSI Extensions for RDMA target mode, and includes CQ pooling per isert_device context distributed across multiple active iser target sessions. Cool, feel free to ad my S.O.B here Signed-off-by: Or Gerlitz -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC-v3 9/9] iser-target: Add iSCSI Extensions for RDMA (iSER) target driver
On 04/04/2013 10:24, Nicholas A. Bellinger wrote: +static int +isert_put_response(struct iscsi_conn *conn, struct iscsi_cmd *cmd) +{ + struct isert_cmd *isert_cmd = container_of(cmd, + struct isert_cmd, iscsi_cmd); + struct isert_conn *isert_conn = (struct isert_conn *)conn->context; + struct ib_send_wr *send_wr = &isert_cmd->tx_desc.send_wr; + struct iscsi_scsi_rsp *hdr = (struct iscsi_scsi_rsp *) + &isert_cmd->tx_desc.iscsi_header; + + isert_create_send_desc(isert_conn, isert_cmd, &isert_cmd->tx_desc); + iscsit_build_rsp_pdu(cmd, conn, true, hdr); + isert_init_tx_hdrs(isert_conn, &isert_cmd->tx_desc); + /* +* Attach SENSE DATA payload to iSCSI Response PDU +*/ + if (cmd->se_cmd.sense_buffer && + ((cmd->se_cmd.se_cmd_flags & SCF_TRANSPORT_TASK_SENSE) || + (cmd->se_cmd.se_cmd_flags & SCF_EMULATED_TASK_SENSE))) { + struct ib_device *ib_dev = isert_conn->conn_cm_id->device; + struct ib_sge *tx_dsg = &isert_cmd->tx_desc.tx_sg[1]; + u32 padding, sense_len; + + put_unaligned_be16(cmd->se_cmd.scsi_sense_length, + cmd->sense_buffer); + cmd->se_cmd.scsi_sense_length += sizeof(__be16); + + padding = -(cmd->se_cmd.scsi_sense_length) & 3; + hton24(hdr->dlength, (u32)cmd->se_cmd.scsi_sense_length); + sense_len = cmd->se_cmd.scsi_sense_length + padding; + + isert_cmd->sense_buf_dma = ib_dma_map_single(ib_dev, + (void *)cmd->sense_buffer, sense_len, + DMA_TO_DEVICE); + + isert_cmd->sense_buf_len = sense_len; + ib_dma_sync_single_for_cpu(ib_dev, isert_cmd->sense_buf_dma, + sense_len, DMA_TO_DEVICE); + ib_dma_sync_single_for_device(ib_dev, isert_cmd->sense_buf_dma, + sense_len, DMA_TO_DEVICE); + you just called dma_map_single, and not going to touch the buffer before posting it to the wire, there's no point to sync it for the cpu and for the device, remove these calls. + tx_dsg->addr = isert_cmd->sense_buf_dma; + tx_dsg->length = sense_len; + tx_dsg->lkey = isert_conn->conn_mr->lkey; + isert_cmd->tx_desc.num_sge = 2; + } + + isert_init_send_wr(isert_cmd, send_wr); + + pr_debug("Posting SCSI Response IB_WR_SEND >>\n"); + + return isert_post_response(isert_conn, isert_cmd); +} -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC-v3 9/9] iser-target: Add iSCSI Extensions for RDMA (iSER) target driver
On 04/04/2013 10:24, Nicholas A. Bellinger wrote: +#define ISER_RECV_DATA_SEG_LEN 8192 +#define ISER_RX_PAYLOAD_SIZE(ISER_HEADERS_LEN + ISER_RECV_DATA_SEG_LEN) [...] +#define ISER_RX_PAD_SIZE (16384 - (ISER_RX_PAYLOAD_SIZE + \ + sizeof(u64) + sizeof(struct ib_sge))) We're eating here too much ram for the pad, you need 8K + something, so the pad can count down from 12K and not 16K which means each such element will consume three pages and not four. +struct iser_rx_desc { + struct iser_hdr iser_header; + struct iscsi_hdr iscsi_header; + chardata[ISER_RECV_DATA_SEG_LEN]; + u64 dma_addr; + struct ib_sge rx_sg; + charpad[ISER_RX_PAD_SIZE]; +} __packed; + +struct isert_rx_desc { + struct isert_conn *desc_conn; + struct work_struct desc_work; + struct iser_rx_desc desc; +} __packed; You have way enough room in the pad field of struct iser_rx_desc to place there the two fields added by struct isert_rx_desc (and you only use struct iser_rx_desc from within isert_rx_desc) --> any reason not to unify them? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC-v3 9/9] iser-target: Add iSCSI Extensions for RDMA (iSER) target driver
On 04/04/2013 10:24, Nicholas A. Bellinger wrote: + +void isert_cq_tx_callback(struct ib_cq *, void *); +void isert_cq_rx_callback(struct ib_cq *, void *); +void isert_free_rx_descriptors(struct isert_conn *); any reason not to have these as static functions (same for isert_cq_rx_work) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC-v2 11/12] iser-target: Add logic for core
On Sat, Mar 23, 2013 at 1:55 AM, Nicholas A. Bellinger wrote: > +static int > +isert_put_response(struct iscsi_conn *conn, struct iscsi_cmd *cmd) > +{ > + struct isert_cmd *isert_cmd = container_of(cmd, > + struct isert_cmd, iscsi_cmd); > + struct isert_conn *isert_conn = (struct isert_conn *)conn->context; > + struct ib_send_wr *send_wr = &isert_cmd->tx_desc.send_wr, *wr_failed; > + struct iscsi_scsi_rsp *hdr = (struct iscsi_scsi_rsp *) > + &isert_cmd->tx_desc.iscsi_header; > + int ret; > + > + isert_create_send_desc(isert_conn, isert_cmd, &isert_cmd->tx_desc); > + iscsit_build_rsp_pdu(cmd, conn, true, hdr); > + isert_init_tx_hdrs(isert_conn, &isert_cmd->tx_desc); > + /* > +* Attach SENSE DATA payload to iSCSI Response PDU > +*/ > + if (cmd->se_cmd.sense_buffer && > + ((cmd->se_cmd.se_cmd_flags & SCF_TRANSPORT_TASK_SENSE) || > + (cmd->se_cmd.se_cmd_flags & SCF_EMULATED_TASK_SENSE))) { > + struct ib_device *ib_dev = isert_conn->conn_cm_id->device; > + struct ib_sge *tx_dsg = &isert_cmd->tx_desc.tx_sg[1]; > + u32 padding, sense_len; > + > + put_unaligned_be16(cmd->se_cmd.scsi_sense_length, > + cmd->sense_buffer); > + cmd->se_cmd.scsi_sense_length += sizeof(__be16); > + > + padding = -(cmd->se_cmd.scsi_sense_length) & 3; > + hton24(hdr->dlength, (u32)cmd->se_cmd.scsi_sense_length); > + sense_len = cmd->se_cmd.scsi_sense_length + padding; > + > + isert_cmd->sense_buf_dma = ib_dma_map_single(ib_dev, > + (void *)cmd->sense_buffer, sense_len, > + DMA_TO_DEVICE); > + > + isert_cmd->sense_buf_len = sense_len; > + ib_dma_sync_single_for_cpu(ib_dev, isert_cmd->sense_buf_dma, > + sense_len, DMA_TO_DEVICE); > + ib_dma_sync_single_for_device(ib_dev, > isert_cmd->sense_buf_dma, > + sense_len, DMA_TO_DEVICE); > + > + tx_dsg->addr= isert_cmd->sense_buf_dma; > + tx_dsg->length = sense_len; > + tx_dsg->lkey= isert_conn->conn_mr->lkey; > + isert_cmd->tx_desc.num_sge = 2; > + } > + > + isert_cmd->rdma_wr.iser_ib_op = ISER_IB_SEND; [...] > + send_wr->wr_id = (unsigned long)&isert_cmd->tx_desc; > + send_wr->opcode = IB_WR_SEND; > + send_wr->send_flags = IB_SEND_SIGNALED; > + send_wr->sg_list = &isert_cmd->tx_desc.tx_sg[0]; > + send_wr->num_sge = isert_cmd->tx_desc.num_sge; > + send_wr->next = NULL; [...] These seven lines are repeated 3-5 times below, a quick question and suggestion: 1. can't we do it beforehand? 2. we can move to helper function and call it when needed. > +isert_put_tm_rsp(struct iscsi_cmd *cmd, struct iscsi_conn *conn) > +{ > + struct isert_cmd *isert_cmd = container_of(cmd, > + struct isert_cmd, iscsi_cmd); > + struct isert_conn *isert_conn = (struct isert_conn *)conn->context; > + struct ib_send_wr *send_wr = &isert_cmd->tx_desc.send_wr, *wr_failed; > + int ret; > + > + isert_create_send_desc(isert_conn, isert_cmd, &isert_cmd->tx_desc); > + iscsit_build_task_mgt_rsp(cmd, conn, (struct iscsi_tm_rsp *) > + &isert_cmd->tx_desc.iscsi_header); > + isert_init_tx_hdrs(isert_conn, &isert_cmd->tx_desc); > + > + isert_cmd->rdma_wr.iser_ib_op = ISER_IB_SEND; > + send_wr->wr_id = (unsigned long)&isert_cmd->tx_desc; > + send_wr->opcode = IB_WR_SEND; > + send_wr->send_flags = IB_SEND_SIGNALED; > + send_wr->sg_list = &isert_cmd->tx_desc.tx_sg[0]; > + send_wr->num_sge = isert_cmd->tx_desc.num_sge; > + send_wr->next = NULL; > + > + pr_debug("Posting Task Management Response IB_WR_SEND > >>\n"); > + > + atomic_inc(&isert_conn->post_send_buf_count); > + > + ret = ib_post_send(isert_conn->conn_qp, &isert_cmd->tx_desc.send_wr, > + &wr_failed); > + if (ret) { > + pr_err("isert_put_tm_rsp() failed to post wr: %d\n", ret); > + atomic_dec(&isert_conn->post_send_buf_count); > + return ret; > + } > + return 0; > +} > + > +static int > +isert_build_rdma_wr(struct isert_conn *isert_conn, struct isert_cmd > *isert_cmd, > + struct ib_sge *ib_sge, struct ib_send_wr *send_wr, > + u32 data_left, u32 offset) > +{ > + struct iscsi_cmd *cmd = &isert_cmd->iscsi_cmd; > + struct scatterlist *sg_start, *tmp_sg; > + struct ib_device *ib_dev = isert_conn->conn_cm_id->device; > + u32 sg_off, pag
Re: [RFC-v2 10/12] iser-target: Add logic for verbs
On Sat, Mar 23, 2013 at 1:55 AM, Nicholas A. Bellinger wrote: + device->dev_mr = ib_get_dma_mr(device->dev_pd, + IB_ACCESS_LOCAL_WRITE | + IB_ACCESS_REMOTE_WRITE | + IB_ACCESS_REMOTE_READ); remove IB_ACCESS_REMOTE_yyy access flags, you're not letting anyone do remote rdma to this memory region > +/*** > + * This file contains iSCSI extentions for RDMA (iSER) Verbs > + * > + * (c) Copyright 2013 RisingTide Systems LLC. > + * > + * Nicholas A. Bellinger > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + > / > +#include > +#include > +#include > + > +#include > +#include > +#include > + > +#include "isert_proto.h" > +#include "isert_base.h" > +#include "isert_core.h" > + > +#defineISERT_MAX_CONN 8 > +#define ISER_MAX_RX_CQ_LEN (ISERT_QP_MAX_RECV_DTOS * ISERT_MAX_CONN) > +#define ISER_MAX_TX_CQ_LEN (ISERT_QP_MAX_REQ_DTOS * ISERT_MAX_CONN) > + > +static DEFINE_MUTEX(device_list_mutex); > +static LIST_HEAD(device_list); > + > +static void > +isert_qp_event_callback(struct ib_event *e, void *context) > +{ > + struct isert_conn *isert_conn = (struct isert_conn *)context; > + > + pr_err("isert_qp_event_callback event: %d\n", e->event); > + switch (e->event) { > + case IB_EVENT_COMM_EST: > + rdma_notify(isert_conn->conn_cm_id, IB_EVENT_COMM_EST); > + break; > + default: > + break; > + } > +} > + > +static int > +isert_query_device(struct ib_device *ib_dev, struct ib_device_attr *devattr) > +{ > + int ret; > + > + ret = ib_query_device(ib_dev, devattr); > + if (ret) { > + pr_err("ib_query_device() failed: %d\n", ret); > + return ret; > + } > + pr_debug("devattr->max_sge: %d\n", devattr->max_sge); > + pr_debug("devattr->max_sge_rd: %d\n", devattr->max_sge_rd); > + > + return 0; > +} > + > +static int > +isert_conn_setup_qp(struct isert_conn *isert_conn, struct rdma_cm_id *cma_id) > +{ > + struct isert_device *device = isert_conn->conn_device; > + struct ib_qp_init_attr attr; > + struct ib_device_attr devattr; > + int ret, index, min_index = 0; > + > + memset(&devattr, 0, sizeof(struct ib_device_attr)); > + ret = isert_query_device(cma_id->device, &devattr); > + if (ret) > + return ret; > + > + mutex_lock(&device_list_mutex); > + for (index = 0; index < device->cqs_used; index++) > + if (device->cq_active_qps[index] < > + device->cq_active_qps[min_index]) > + min_index = index; > + device->cq_active_qps[min_index]++; > + pr_debug("isert_conn_setup_qp: Using min_index: %d\n", min_index); > + mutex_unlock(&device_list_mutex); > + > + memset(&attr, 0, sizeof(struct ib_qp_init_attr)); > + attr.event_handler = isert_qp_event_callback; > + attr.qp_context = isert_conn; > + attr.send_cq = device->dev_tx_cq[min_index]; > + attr.recv_cq = device->dev_rx_cq[min_index]; > + attr.cap.max_send_wr = ISERT_QP_MAX_REQ_DTOS; > + attr.cap.max_recv_wr = ISERT_QP_MAX_RECV_DTOS; > + /* > +* FIXME: Use devattr.max_sge - 2 for max_send_sge as > +* work-around for RDMA_READ.. > +*/ > + attr.cap.max_send_sge = devattr.max_sge - 2; > + isert_conn->max_sge = attr.cap.max_send_sge; > + > + attr.cap.max_recv_sge = 1; > + attr.sq_sig_type = IB_SIGNAL_REQ_WR; > + attr.qp_type = IB_QPT_RC; > + > + pr_debug("isert_conn_setup_qp cma_id->device: %p\n", > +cma_id->device); > + pr_debug("isert_conn_setup_qp conn_pd->device: %p\n", > +isert_conn->conn_pd->device); > + > + ret = rdma_create_qp(cma_id, isert_conn->conn_pd, &attr); > + if (ret) { > + pr_err("rdma_create_qp failed for cma_id %d\n", ret); > + return ret; > + } > + isert_conn->conn_qp = cma_id->qp; > + pr_debug("rdma_create_qp() returned success > >.\n"); > + > + return 0; > +} > + > +static void > +isert_cq_event_callback(struct ib_event *e, void *context) > +{ > + pr_debug("isert_cq_event_callback event: %d\n", e->e
Re: [RFC-v2 10/12] iser-target: Add logic for verbs
On Sat, Mar 23, 2013 at 1:55 AM, Nicholas A. Bellinger wrote: [...] > +static void > +isert_qp_event_callback(struct ib_event *e, void *context) > +{ > + struct isert_conn *isert_conn = (struct isert_conn *)context; > + > + pr_err("isert_qp_event_callback event: %d\n", e->event); > + switch (e->event) { > + case IB_EVENT_COMM_EST: > + rdma_notify(isert_conn->conn_cm_id, IB_EVENT_COMM_EST); > + break; > + default: > + break; > + } > +} [...] > +static void > +isert_cq_event_callback(struct ib_event *e, void *context) > +{ > + pr_debug("isert_cq_event_callback event: %d\n", e->event); > + > + switch (e->event) { > + case IB_EVENT_QP_LAST_WQE_REACHED: > + pr_warn("Reached TX IB_EVENT_QP_LAST_WQE_REACHED:\n"); > + break; > + default: > + pr_warn("Unknown e->event; %d\n", e->event); > + break; > + } > +} This is QP not CQ event, move the case for it to QP event hander isert_qp_event_callback > + ib_destroy_cq(device->dev_tx_cq[i]); > + device->dev_rx_cq[i] = NULL; > + device->dev_tx_cq[i] = NULL; > + } > + > + ib_dereg_mr(device->dev_mr); > + ib_dealloc_pd(device->dev_pd); > + kfree(device->cq_desc); > +} > + > +static void > +isert_device_try_release(struct isert_device *device) > +{ > + mutex_lock(&device_list_mutex); > + device->refcount--; > + if (!device->refcount) { > + isert_free_device_ib_res(device); > + list_del(&device->dev_node); > + kfree(device); > + } > + mutex_unlock(&device_list_mutex); > +} > + > +static struct isert_device * > +isert_device_find_by_ib_dev(struct rdma_cm_id *cma_id) > +{ > + struct isert_device *device; > + > + mutex_lock(&device_list_mutex); > + list_for_each_entry(device, &device_list, dev_node) { > + if (device->ib_device->node_guid == > cma_id->device->node_guid) { > + device->refcount++; > + mutex_unlock(&device_list_mutex); > + return device; > + } > + } > + > + device = kzalloc(sizeof(struct isert_device), GFP_KERNEL); > + if (!device) { > + mutex_unlock(&device_list_mutex); > + return NULL; > + } > + > + INIT_LIST_HEAD(&device->dev_node); > + > + device->ib_device = cma_id->device; > + if (isert_create_device_ib_res(device)) { > + kfree(device); > + mutex_unlock(&device_list_mutex); > + return NULL; > + } > + > + device->refcount++; > + list_add_tail(&device->dev_node, &device_list); > + mutex_unlock(&device_list_mutex); > + > + return device; > +} > + > +static int > +isert_connect_request(struct rdma_cm_id *cma_id, struct rdma_cm_event *event) > +{ > + struct iscsi_np *np = cma_id->context; > + struct isert_np *isert_np = np->np_context; > + struct isert_conn *isert_conn; > + struct isert_device *device; > + struct ib_device *ib_dev = cma_id->device; > + int ret; > + > + pr_debug("Entering isert_connect_request cma_id: %p, context: %p\n", > +cma_id, cma_id->context); > + > + isert_conn = kzalloc(sizeof(struct isert_conn), GFP_KERNEL); > + if (!isert_conn) { > + pr_err("Unable to allocate isert_conn\n"); > + return -ENOMEM; > + } > + isert_conn->state = ISER_CONN_INIT; > + INIT_LIST_HEAD(&isert_conn->conn_accept_node); > + init_completion(&isert_conn->conn_login_comp); > + init_waitqueue_head(&isert_conn->conn_wait); > + kref_init(&isert_conn->conn_kref); > + kref_get(&isert_conn->conn_kref); > + > + cma_id->context = isert_conn; > + isert_conn->conn_cm_id = cma_id; > + isert_conn->responder_resources = > event->param.conn.responder_resources; > + isert_conn->initiator_depth = event->param.conn.initiator_depth; > + pr_debug("Using responder_resources: %u initiator_depth: %u\n", > +isert_conn->responder_resources, > isert_conn->initiator_depth); > + > + isert_conn->login_buf = kzalloc(ISCSI_DEF_MAX_RECV_SEG_LEN + > + ISER_RX_LOGIN_SIZE, GFP_KERNEL); > + if (!isert_conn->login_buf) { > + pr_err("Unable to allocate isert_conn->login_buf\n"); > + ret = -ENOMEM; > + goto out; > + } > + > + isert_conn->login_req_buf = isert_conn->login_buf; > + isert_conn->login_rsp_buf = isert_conn->login_buf + > + ISCSI_DEF_MAX_RECV_SEG_LEN; > + pr_debug("Set login_buf: %p login_req_buf: %p login_rsp_buf: %p\n", > +isert_conn->login_buf, isert_conn->login_req_buf, > +
Re: [RFC-v2 11/12] iser-target: Add logic for core
On 23/03/2013 01:55, Nicholas A. Bellinger wrote: +++ b/drivers/infiniband/ulp/isert/isert_core.h @@ -0,0 +1,11 @@ +#include +#include +#include +#include +#include + +extern void iser_cq_tx_tasklet(unsigned long); +extern void isert_cq_tx_callback(struct ib_cq *, void *); +extern void iser_cq_rx_tasklet(unsigned long); +extern void isert_cq_rx_callback(struct ib_cq *, void *); +extern void isert_free_rx_descriptors(struct isert_conn *); no need for externs here too, agree? also, any reason for these two header files not to be merged into one or into one of the other header files? diff --git a/drivers/infiniband/ulp/isert/isert_verbs.h b/drivers/infiniband/ulp/isert/isert_verbs.h new file mode 100644 index 000..da7924d --- /dev/null +++ b/drivers/infiniband/ulp/isert/isert_verbs.h @@ -0,0 +1,5 @@ +extern void isert_connect_release(struct isert_conn *); +extern void isert_put_conn(struct isert_conn *); +extern int isert_cma_handler(struct rdma_cm_id *, struct rdma_cm_event *); +extern int isert_post_recv(struct isert_conn *, u32); +extern int isert_post_send(struct isert_conn *, struct iser_tx_desc *); -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC-v2 10/12] iser-target: Add logic for verbs
On 23/03/2013 01:55, Nicholas A. Bellinger wrote: +++ b/drivers/infiniband/ulp/isert/isert_verbs.h @@ -0,0 +1,5 @@ +extern void isert_connect_release(struct isert_conn *); +extern void isert_put_conn(struct isert_conn *); +extern int isert_cma_handler(struct rdma_cm_id *, struct rdma_cm_event *); +extern int isert_post_recv(struct isert_conn *, u32); +extern int isert_post_send(struct isert_conn *, struct iser_tx_desc *); why use extern here? maybe a left over from V1? -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 10/11] iser-target: Add logic for core
On 08/03/2013 03:45, Nicholas A. Bellinger wrote: +void +iser_cq_tx_tasklet(unsigned long data) +{ + struct isert_conn *isert_conn = (struct isert_conn *)data; + struct ib_cq *tx_cq = isert_conn->conn_tx_cq; + struct iser_tx_desc *tx_desc; + struct ib_wc wc; + + while (ib_poll_cq(tx_cq, 1, &wc) == 1) { + tx_desc = (struct iser_tx_desc *)(unsigned long)wc.wr_id; + + if (wc.status == IB_WC_SUCCESS) { + isert_send_completion(tx_desc, isert_conn); + } else { + pr_debug("TX wc.status != IB_WC_SUCCESS >>\n"); + isert_dump_ib_wc(&wc); + atomic_dec(&isert_conn->post_send_buf_count); + isert_cq_comp_err(tx_desc, isert_conn); + } + } + + ib_req_notify_cq(tx_cq, IB_CQ_NEXT_COMP); +} + +void +isert_cq_tx_callback(struct ib_cq *cq, void *context) +{ + struct isert_conn *isert_conn = context; + + tasklet_schedule(&isert_conn->conn_tx_tasklet); +} + +void +iser_cq_rx_tasklet(unsigned long data) +{ + struct isert_conn *isert_conn = (struct isert_conn *)data; + struct ib_cq *rx_cq = isert_conn->conn_rx_cq; + struct iser_rx_desc *rx_desc; + struct ib_wc wc; + unsigned long xfer_len; + + while (ib_poll_cq(rx_cq, 1, &wc) == 1) { + rx_desc = (struct iser_rx_desc *)(unsigned long)wc.wr_id; + + if (wc.status == IB_WC_SUCCESS) { + xfer_len = (unsigned long)wc.byte_len; + isert_rx_completion(rx_desc, isert_conn, xfer_len); + } else { + pr_debug("RX wc.status != IB_WC_SUCCESS >>\n"); + if (wc.status != IB_WC_WR_FLUSH_ERR) + isert_dump_ib_wc(&wc); + + isert_conn->post_recv_buf_count--; + isert_cq_comp_err(NULL, isert_conn); + } + } + + ib_req_notify_cq(rx_cq, IB_CQ_NEXT_COMP); +} We currently have here the following sequence of calls isert_cq_rx_callback --> tasklet_schedule ... --> ... ib_poll_cq --> isert_rx_completion --> isert_rx_queue_desc --> isert_rx_queue_desc --> queue_work (context switch) isert_cq_tx_callback --> tasklet_schedule ... --> ... ib_poll_cq --> isert_send_completion --> isert_completion_rdma_read --> queue_work (context switch) which means we have one context switch from the CQ callback to tasklet and then a PER IO context switch from the tasklet to a kernel thread context which you might need for IO submission into the backing store. This can be optimized by having one context switch from the isert cq callbacks to a kernel thread, with the work item being "do polling" and then from the cq polling code do the submission into the backing store without any further context switches. Or. + +void +isert_cq_rx_callback(struct ib_cq *cq, void *context) +{ + struct isert_conn *isert_conn = context; + + tasklet_schedule(&isert_conn->conn_rx_tasklet); +} -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 09/11] iser-target: Add logic for verbs
On 08/03/2013 03:45, Nicholas A. Bellinger wrote: isert_conn->conn_rx_cq = ib_create_cq(ib_dev, isert_cq_rx_callback, + isert_cq_event_callback, + (void *)isert_conn, + ISER_MAX_RX_CQ_LEN, 0); [...] + + isert_conn->conn_tx_cq = ib_create_cq(ib_dev, isert_cq_tx_callback, + isert_cq_event_callback, + (void *)isert_conn, + ISER_MAX_TX_CQ_LEN, 0); Two comments here: 1. This code always attaches the CQ to vector #0 of the IB device. Typically vectors are associated with IRQs and IRQs are "affiniated" with cores, so processing of completions for all connections will be carried out on single core at a given time. What can be done here is to use different vectors for different CQs. The possible vector values are from 0 to ib_dev->num_comp_vectors - 1 2. A dedicated CQ pair is used per connection which might not scale well. What can be done here, is to use a global context per IB device which holds a pool of CQs (created on different vectors, per the previous item), and for each new connection, attach its QP to a CQ pair from this pool, e.g as done in drivers/infiniband/ulp/iser/iser_verbs.c :: iser_create_device_ib_res() note that FWIW this context can hold also a global PD and DMA_MR objects to be used by all the connections opened on the device. The context can be opened on demand (1st connection that hits this IB device) and closed on non-demand (last connection). Or. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 09/11] iser-target: Add logic for verbs
On 08/03/2013 03:45, Nicholas A. Bellinger wrote: +++ b/drivers/infiniband/ulp/isert/isert_verbs.c +#define ISERT_ADDR_ROUTE_TIMEOUT 1000 remove this define, its irrelevant and you don't use that anywhere +static void +isert_qp_event_callback(struct ib_event *e, void *context) +{ + pr_err("isert_qp_event_callback event: %d\n", e->event); +} To be on the safe side, when the event is IB_EVENT_COMM_EST (which means that the login request was received by the HCA before the connection was fully established), call rdma_notify (id, IB_EVENT_COMM_EST) with the id being the one pointed from qp->qp_context->isert_conn + +static int +isert_query_device(struct ib_device *ib_dev, struct ib_device_attr *devattr) +{ + int ret; + + ret = ib_query_device(ib_dev, devattr); + if (ret) { + pr_err("ib_query_device() failed: %d\n", ret); + return ret; + } + pr_debug("devattr->max_mr_size: 0x%016Lx\n", devattr->max_mr_size); running user space "ibv_devinfo -v" will give the same effect... no need to dump this here, maybe except for max_sge that you need. + pr_debug("devattr->page_size_cap: 0x%016Lx\n", devattr->page_size_cap); + pr_debug("devattr->max_qp: %d\n", devattr->max_qp); + pr_debug("devattr->max_qp_wr: %d\n", devattr->max_qp_wr); + pr_debug("devattr->device_cap_flags: 0x%08x\n", devattr->device_cap_flags); + pr_debug("devattr->max_sge: %d\n", devattr->max_sge); + pr_debug("devattr->max_sge_rd: %d\n", devattr->max_sge_rd); + pr_debug("devattr->max_cq: %d\n", devattr->max_cq); + pr_debug("devattr->max_cqe: %d\n", devattr->max_cqe); + pr_debug("devattr->max_mr: %d\n", devattr->max_mr); + pr_debug("devattr->max_pd: %d\n", devattr->max_pd); + pr_debug("devattr->max_rdd: %d\n", devattr->max_rdd); + pr_debug("devattr->max_mw: %d\n", devattr->max_mw); + pr_debug("devattr->max_srq: %d\n", devattr->max_srq); + pr_debug("devattr->max_srq_wr: %d\n", devattr->max_srq_wr); + pr_debug("devattr->max_srq_sge: %d\n", devattr->max_srq_sge); + + return 0; +} [...] + + +int +isert_cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event) +{ + int ret = 0; + + pr_debug("isert_cma_handler: event %d status %d conn %p id %p\n", + event->event, event->status, cma_id->context, cma_id); + + switch (event->event) { + case RDMA_CM_EVENT_CONNECT_REQUEST: + pr_debug("RDMA_CM_EVENT_CONNECT_REQUEST: >>>\n"); + ret = isert_connect_request(cma_id, event); + break; + case RDMA_CM_EVENT_ESTABLISHED: + pr_debug("RDMA_CM_EVENT_ESTABLISHED >>\n"); + isert_connected_handler(cma_id); + break; + case RDMA_CM_EVENT_DISCONNECTED: + pr_debug("RDMA_CM_EVENT_DISCONNECTED: >>\n"); + isert_disconnected_handler(cma_id); + break; + case RDMA_CM_EVENT_DEVICE_REMOVAL: + case RDMA_CM_EVENT_ADDR_CHANGE: + break; + case RDMA_CM_EVENT_ADDR_ERROR: + case RDMA_CM_EVENT_ROUTE_ERROR: + case RDMA_CM_EVENT_CONNECT_ERROR: + default: + pr_err("Unknown RDMA CMA event: %d\n", event->event); + break; + } + ADDR_ERROR and ROUTE_ERROR you can remove from here, since you don't call rdma_resolve_addr nor rdma_resolve_route, they will not be delivered to you. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 10/11] iser-target: Add logic for core
On 08/03/2013 03:45, Nicholas A. Bellinger wrote: +void +isert_dump_ib_wc(struct ib_wc *wc) +{ + pr_debug("wc->wr_id: %llu\n", wc->wr_id); + pr_debug("wc->status: 0x%08x\n", wc->status); This helper is called for a CQ completion with error, but when this happens all the WC fields except for the wr_id and the status aren't defined, so there's no point in dumping them (can be terribly misleading) + pr_debug("wc->opcode: 0x%08x\n", wc->opcode); + pr_debug("wc->vendor_err: 0x%08x\n", wc->vendor_err); + pr_debug("wc->byte_len: %u\n", wc->byte_len); + pr_debug("wc->qp: %p\n", wc->qp); + pr_debug("wc->src_qp: %u\n", wc->src_qp); + pr_debug("wc->wc_flags: 0x%08x\n", wc->wc_flags); + pr_debug("wc->pkey_index: %hu\n", wc->pkey_index); + pr_debug("wc->slid: %hu\n", wc->slid); + pr_debug("wc->sl: 0x%02x\n", wc->sl); + pr_debug("wc->dlid_path_bits: 0x%02x\n", wc->dlid_path_bits); + pr_debug("wc->port_num: 0x%02x\n", wc->port_num); +} + +void +iser_cq_tx_tasklet(unsigned long data) +{ + struct isert_conn *isert_conn = (struct isert_conn *)data; + struct ib_cq *tx_cq = isert_conn->conn_tx_cq; + struct iser_tx_desc *tx_desc; + struct ib_wc wc; + + while (ib_poll_cq(tx_cq, 1, &wc) == 1) { + tx_desc = (struct iser_tx_desc *)(unsigned long)wc.wr_id; + + if (wc.status == IB_WC_SUCCESS) { + isert_send_completion(tx_desc, isert_conn); + } else { + pr_debug("TX wc.status != IB_WC_SUCCESS >>\n"); + isert_dump_ib_wc(&wc); + atomic_dec(&isert_conn->post_send_buf_count); + isert_cq_comp_err(tx_desc, isert_conn); + } + } + + ib_req_notify_cq(tx_cq, IB_CQ_NEXT_COMP); +} + +void +isert_cq_tx_callback(struct ib_cq *cq, void *context) +{ + struct isert_conn *isert_conn = context; + + tasklet_schedule(&isert_conn->conn_tx_tasklet); +} + +void +iser_cq_rx_tasklet(unsigned long data) +{ + struct isert_conn *isert_conn = (struct isert_conn *)data; + struct ib_cq *rx_cq = isert_conn->conn_rx_cq; + struct iser_rx_desc *rx_desc; + struct ib_wc wc; + unsigned long xfer_len; + + while (ib_poll_cq(rx_cq, 1, &wc) == 1) { + rx_desc = (struct iser_rx_desc *)(unsigned long)wc.wr_id; + + if (wc.status == IB_WC_SUCCESS) { + xfer_len = (unsigned long)wc.byte_len; + isert_rx_completion(rx_desc, isert_conn, xfer_len); + } else { + pr_debug("RX wc.status != IB_WC_SUCCESS >>\n"); + if (wc.status != IB_WC_WR_FLUSH_ERR) + isert_dump_ib_wc(&wc); + + isert_conn->post_recv_buf_count--; + isert_cq_comp_err(NULL, isert_conn); + } + } + + ib_req_notify_cq(rx_cq, IB_CQ_NEXT_COMP); +} -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 00/11] Add support for iSCSI Extentions for RDMA (ISER) target
On 08/03/2013 03:45, Nicholas A. Bellinger wrote: This series is first RFC for iSCSI Extentions for RDMA (ISER) target support with existing iscsi-target TCP based socket code for a future v3.10 merge. This code is available in git here: git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git iser_target-rfcv1 Ths includes a basic iscsit_transport API that allows different transports to reside under a single iscsi-target configfs control plane, using an pre-defined network portal attribute to enable a rdma_cm listener on top of existing ipoib portals. Hi Nic, Here are quick few nits (...) which need to be sorted out for the next version, I've picked them from running the patches through checkpatch and sparse / gcc warnings checks target-pending]# git format-patch -o /tmp/lio-iser-rfc 7b745c84a9f4ad62db4b67053fbceb5d706451af.. /tmp/lio-iser-rfc/0001-iscsi-target-Add-iscsit_transport-API-template.patch /tmp/lio-iser-rfc/0002-iscsi-target-Initial-traditional-TCP-conversion-to-i.patch /tmp/lio-iser-rfc/0003-iscsi-target-Add-iser-target-parameter-keys-setup-du.patch /tmp/lio-iser-rfc/0004-iscsi-target-Add-per-transport-iscsi_cmd-alloc-free.patch /tmp/lio-iser-rfc/0005-iscsi-target-Refactor-RX-PDU-logic-export-request-PD.patch /tmp/lio-iser-rfc/0006-iscsi-target-Refactor-TX-queue-logic-export-response.patch /tmp/lio-iser-rfc/0007-iscsi-target-Add-iser-network-portal-attribute.patch /tmp/lio-iser-rfc/0008-iser-target-Add-base-proto-includes.patch /tmp/lio-iser-rfc/0009-iser-target-Add-logic-for-verbs.patch /tmp/lio-iser-rfc/0010-iser-target-Add-logic-for-core.patch /tmp/lio-iser-rfc/0011-iser-target-Add-Makefile-Kconfig.patch target-pending]# ./scripts/checkpatch.pl --strict /tmp/lio-iser-rfc/* | grep total total: 0 errors, 8 warnings, 0 checks, 142 lines checked total: 1 errors, 14 warnings, 6 checks, 1097 lines checked total: 0 errors, 15 warnings, 7 checks, 299 lines checked total: 0 errors, 0 warnings, 0 checks, 106 lines checked total: 0 errors, 13 warnings, 4 checks, 795 lines checked total: 1 errors, 12 warnings, 2 checks, 877 lines checked total: 0 errors, 5 warnings, 0 checks, 83 lines checked total: 7 errors, 12 warnings, 1 checks, 170 lines checked total: 1 errors, 9 warnings, 14 checks, 481 lines checked total: 1 errors, 39 warnings, 43 checks, 1732 lines checked total: 0 errors, 0 warnings, 0 checks, 21 lines checked drivers/infiniband/ulp/isert/isert_core.c:774:1: warning: symbol 'isert_dump_ib_wc' was not declared. Should it be static? drivers/infiniband/ulp/isert/isert_verbs.c:341:1: warning: symbol 'isert_put_conn' was not declared. Should it be static? drivers/infiniband/ulp/isert/isert_verbs.c:375:1: warning: symbol 'isert_cma_handler' was not declared. Should it be static? drivers/infiniband/ulp/isert/isert_verbs.c:416:1: warning: symbol 'isert_post_recv' was not declared. Should it be static? drivers/infiniband/ulp/isert/isert_verbs.c:451:1: warning: symbol 'isert_post_send' was not declared. Should it be static? drivers/infiniband/ulp/isert/isert_core.c: In function ?isert_rx_do_work?: drivers/infiniband/ulp/isert/isert_core.c:481: warning: variable ?rc? set but not used Or. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 01/11] iscsi-target: Add iscsit_transport API template
On Fri, Mar 8, 2013 at 6:14 AM, Roland Dreier wrote: > Nicholas A. Bellinger wrote: > > +EXPORT_SYMBOL(iscsit_get_transport); > It's not clear to me why this needs to be exported. Who would use it > outside the core iscsi target module? Yep, as Nic noted, we're adding here an iscsi transport concept e.g in the same manner Mike did libiscsi back in 2005/6 when the iser initiator was pushed. This allows for multiple iscsi flavours to use a common code for common functionality. In the initiator area initially there were iscsi tcp and iser, later few iscsi HW offloads were merged too. Same story here. I think that the point is whether or not these APIs are needed, since once we agree on that, we need an header file and exporting of functions. As libiscsi.h resided under include/ it makes sense to me for this include to be located there too. Or. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/11] First pass at merging Bart's HA work
On Thu, Dec 6, 2012 at 4:27 PM, Or Gerlitz wrote: [...] > looking on the current locks in the system, we see that this kworker task > holds four locks, but none of them seems to be mutually held by another task, That was ofcourse a wrong assertion, as a lock can't be mutually held by two tasks... I wanted to say that from the general view of which locks are being held there's no obvious deadlock here, since none of the other locks holders relates to the block/scsi layers... -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/11] First pass at merging Bart's HA work
On 06/12/2012 17:04, Bart Van Assche wrote: On 12/06/12 15:27, Or Gerlitz wrote: The core problem here seems to be that scsi_remove_host simply never ends. Hello Or, The later patches in the srp-ha patch series avoided such behavior by checking whether the connection between SRP initiator and target is unique, and by removing duplicate SCSI hosts for which the transport layer failed. Unfortunately these patches are still under review. Unless someone can come up with a better solution I will post a patch one of the next days that makes ib_srp again fail all commands after host removal started. That will avoid spending a long time doing error recovery. Also, you might have noticed that Hannes Reinecke reported a few days ago that the SCSI error handler may need a lot of time for other transport types - this behavior is not SRP specific. I'm not sure what to you exactly refer by duplicated SCSI hosts in this context or why we have them. Again, at the time we've took the stack traces snapshot from the system none of the SCSI EH threads was active, so I'm not sure either your comment about spending long time in the error recovery flow, as the flow we've run into seems to simply wait forever. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/11] First pass at merging Bart's HA work
On 06/12/2012 16:10, Bart Van Assche wrote: On 12/05/12 22:32, Or Gerlitz wrote: On Wed, Dec 5, 2012 at 8:50 PM, Bart Van Assche wrote: [...] The only way to make I/O work reliably if a failure can occur at the transport layer is to use multipathd on top of ib_srp. If a connection fails for some reason, then the SRP SCSI host will be removed after the SCSI error handler has finished with its error recovery strategy. And once the transport layer is operational again and srp_daemon detects that the initiator is no longer logged in srp_daemon will make ib_srp log in again. multipathd will then cause I/O to continue over the new path. Claim basically understood and agreed however, does this also hold when the link is back again, that is can't SRP login via this single path also when there's no multipath on top? As far as I can remember the behavior of ib_srp has always been to try to reconnect once to the SRP target after the SCSI error handler kicked in. Other SCSI LLDs, e.g. the iSCSI initiator, can be configured to keep trying to reconnect after a transport layer failure. That has the advantage that the SCSI host number remains the same after reconnecting succeeded as before reconnecting started. Bart, The core problem here seems to be that scsi_remove_host simply never ends. Observing all the tasks in the system (e.g using "echo t > /proc/sysrq-trigger"), we've noted that none of the SCSI EH are currently running, that is for all of them their trace is the following scsi_eh_0 S 0 380 2 0x 88042c31be08 0046 88042c31bfd8 00014380 88042c31a010 00014380 00014380 00014380 88042c31bfd8 00014380 88042f5be5c0 88042bb48c40 Call Trace: [] ? scsi_unjam_host+0x1f0/0x1f0 [] schedule+0x29/0x70 [] scsi_error_handler+0x75/0x1c0 [] ? scsi_unjam_host+0x1f0/0x1f0 [] kthread+0xee/0x100 [] ? __init_kthread_worker+0x70/0x70 [] ret_from_fork+0x7c/0xb0 [] ? __init_kthread_worker+0x70/0x70 However, the flow starting in srp_remove_target hangs somewhere in the block layer waiting for something to happen worker/11:1D 0 163 2 0x 88082be6f738 0046 88082be6ffd8 00014380 88082be6e010 00014380 00014380 00014380 88082be6ffd8 00014380 88042f5ba580 88082be6c1c0 Call Trace: [] schedule+0x29/0x70 [] schedule_timeout+0x14f/0x240 [] ? lock_timer_base+0x70/0x70 [] wait_for_common+0x11b/0x170 [] ? try_to_wake_up+0x300/0x300 [] wait_for_completion_timeout+0x13/0x20 [] blk_execute_rq+0x133/0x1c0 [] ? get_request+0x210/0x3d0 [] scsi_execute+0xe8/0x180 [] scsi_execute_req+0xa7/0x110 [] sd_sync_cache+0xd8/0x130 [sd_mod] [] ? __dev_printk+0x3e/0x90 [] ? dev_printk+0x45/0x50 [] sd_shutdown+0xd0/0x150 [sd_mod] [] sd_remove+0x7c/0xc0 [sd_mod] [] __device_release_driver+0x7c/0xe0 [] device_release_driver+0x2f/0x50 [] bus_remove_device+0x126/0x190 [] device_del+0x14b/0x250 [] __scsi_remove_device+0x1b8/0x1d0 [] scsi_forget_host+0xf6/0x110 [] scsi_remove_host+0x108/0x1e0 [] srp_remove_target+0xb8/0x150 [ib_srp] [] srp_remove_work+0x64/0xa0 [ib_srp] [] process_one_work+0x1c2/0x4a0 [] ? process_one_work+0x150/0x4a0 [] ? srp_remove_target+0x150/0x150 [ib_srp] [] worker_thread+0x12e/0x370 [] ? manage_workers+0x180/0x180 [] kthread+0xee/0x100 [] ? __init_kthread_worker+0x70/0x70 [] ret_from_fork+0x7c/0xb0 [] ? __init_kthread_worker+0x70/0x70 looking on the current locks in the system, we see that this kworker task holds four locks, but none of them seems to be mutually held by another task, Showing all locks held in the system: 4 locks held by kworker/11:1/163: #0: (events_long){.+.+.+}, at: [] process_one_work+0x150/0x4a0 #1: ((&target->remove_work)){+.+.+.}, at: [] process_one_work+0x150/0x4a0 #2: (&shost->scan_mutex){+.+.+.}, at: [] scsi_remove_host+0x34/0x1e0 #3: (&__lockdep_no_validate__){..}, at: [] device_release_driver+0x27/0x50 1 lock held by bash/6298: #0: (&tty->atomic_read_lock){+.+...}, at: [] n_tty_read+0x58e/0x960 1 lock held by mingetty/6319: #0: (&tty->atomic_read_lock){+.+...}, at: [] n_tty_read+0x58e/0x960 1 lock held by mingetty/6321: #0: (&tty->atomic_read_lock){+.+...}, at: [] n_tty_read+0x58e/0x960 1 lock held by mingetty/6323: #0: (&tty->atomic_read_lock){+.+...}, at: [] n_tty_read+0x58e/0x960 1 lock held by mingetty/6325: #0: (&tty->atomic_read_lock){+.+...}, at: [] n_tty_read+0x58e/0x960 1 lock held by mingetty/6327: #0: (&tty->atomic_read_lock){+.+...}, at: [] n_tty_read+0x58e/0x960 1 lock held by mingetty/6329: #0: (&tty->atomic_read_lock){+.+...}, at: [] n_tty_read+0x58e/0x960 1 lock held by agetty/6337: #0: (&tty->atomic_read_lock){+.+...}, at: [] n_tty_read+0x58e/0x9
Re: [PATCH 00/11] First pass at merging Bart's HA work
On Wed, Dec 5, 2012 at 8:50 PM, Bart Van Assche wrote: [...] > The only way to make I/O work reliably if a failure can occur at the > transport layer is to use multipathd on top of ib_srp. If a connection fails > for some reason, then the SRP SCSI host will be removed after the SCSI error > handler has finished with its error recovery strategy. And once the > transport layer is operational again and srp_daemon detects that the > initiator is no longer logged in srp_daemon will make ib_srp log in again. > multipathd will then cause I/O to continue over the new path. Claim basically understood and agreed however, does this also hold when the link is back again, that is can't SRP login via this single path also when there's no multipath on top? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/11] First pass at merging Bart's HA work
On Fri, Nov 30, 2012 at 4:21 AM, David Dillow wrote: [...] > Modulo a few style issues (braces around one line if branches, etc.) and > having three state variables vs one, I can live with everything up to > aabfa852acd27962 at git://github.com/bvanassche/linux.git#srp-ha. Those > two are small things that can be fixed later and are not worth holding > things up any further. > > I'll try to spend some time on the final four patches tomorrow afternoon. Dave, Bart My colleague Alex Turin tried today the bits as they appear in Roland's kernel.org tree / for-next branch up to commit fb57e1dbbd4 and here's some feedback Basically, what he did was connecting to a target, next take down the IB port on the initiator side, and issue some IOs (dd if=/dev/sdb of=/dev/null count=1) Our recollection of events from the logs (below) is the following 1. queued command get completion status 5 2. as part of error handling srp_reset_host() was called, 3. srp_reset_host() calls to srp_reconnect_target() which fails cause port is down. 4. srp_reconnect_target() on failure calls to srp_queue_remove_work() which sets target->status to SRP_TARGET_REMOVED. 5.srp_reset_host() called second time. it calls to srp_reconnect_target() but target->state == SRP_TARGET_REMOVED. srp_reconnect_target() checks if target->state != SRP_TARGET_LIVE and return -EAGAIN. This probably means that even after enabling port it will still fail to reconnect? Or. Dec 5 16:19:13 rsws42 kernel: scsi host7: ib_srp: failed send status 5 Dec 5 16:19:42 rsws42 kernel: scsi host7: SRP abort called Dec 5 16:19:42 rsws42 kernel: scsi host7: SRP reset_device called Dec 5 16:19:42 rsws42 kernel: scsi host7: ib_srp: SRP reset_host called Dec 5 16:19:43 rsws42 kernel: scsi host7: ib_srp: Got failed path rec status -110 Dec 5 16:19:43 rsws42 kernel: scsi host7: ib_srp: Path record query failed Dec 5 16:19:43 rsws42 kernel: scsi host7: ib_srp: reconnect failed (-110), removing target port. Dec 5 16:19:43 rsws42 kernel: sd 7:0:0:11: Device offlined - not ready after error recovery Dec 5 16:19:43 rsws42 kernel: sd 7:0:0:11: [sdb] Synchronizing SCSI cache Dec 5 16:20:45 rsws42 kernel: scsi host7: SRP abort called Dec 5 16:20:50 rsws42 kernel: scsi host7: SRP abort called Dec 5 16:21:05 rsws42 kernel: scsi host7: SRP abort called Dec 5 16:21:10 rsws42 kernel: scsi host7: SRP reset_device called Dec 5 16:21:15 rsws42 kernel: scsi host7: ib_srp: SRP reset_host called Dec 5 16:21:15 rsws42 kernel: sd 7:0:0:11: Device offlined - not ready after error recovery Dec 5 16:21:15 rsws42 kernel: sd 7:0:0:11: Device offlined - not ready after error recovery repeating part: Dec 5 16:22:17 rsws42 kernel: scsi host7: SRP abort called Dec 5 16:22:22 rsws42 kernel: scsi host7: SRP abort called Dec 5 16:22:37 rsws42 kernel: scsi host7: SRP abort called Dec 5 16:22:42 rsws42 kernel: scsi host7: SRP reset_device called Dec 5 16:22:47 rsws42 kernel: scsi host7: ib_srp: SRP reset_host called Dec 5 16:22:47 rsws42 kernel: sd 7:0:0:11: Device offlined - not ready after error recovery Dec 5 16:22:47 rsws42 kernel: sd 7:0:0:11: Device offlined - not ready after error recovery -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/11] First pass at merging Bart's HA work
On 27/11/2012 06:04, David Dillow wrote: We can push it through James's tree if need be, but Bart's code is pretty self-contained, and going through the SCSI tree will introduce merge dependencies. It'd be much easier to push it all through the RDMA tree Yep, this makes sense to me even without taking into account the time left for the merge window to open, the patches have been around for long time and relate directly to the SRP code in the IB subsystem, there's no point in introducing merge dependencies where it can be avoided. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/11] First pass at merging Bart's HA work
On Mon, Nov 26, 2012 at 6:44 AM, David Dillow wrote: > One may also pull this series from github: > git pull git://github.com/dillow/srp-initiator.git ha-merge-v1 Hi Dave, The kernel maintainers file specifies the following tree git://git.kernel.org/pub/scm/linux/kernel/git/dad/srp-initiator.git for you -- I assume you've moved to github during the kernel.org sitedown period, could you go back to kernel.org? its nice to have the same look and feel e.g through git web for the IB, SRP, Networking, SCSI etc trees. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html