date:20100810

[PATCH 3/3 v4b] IB/srp: Reduce number of BUSY conditions

2010-08-10 Thread Bart Van Assche

As proposed by the SRP (draft) standard, ib_srp reserves one ring element for
SRP_TSK_MGMT requests. This patch makes sure that the SCSI mid-layer never
tries to queue more than (SRP request limit) - 1 SCSI commands to ib_srp. This
improves performance for targets whose request limit is less than or equal to
SRP_SQ_REQ_SIZE - SRP_TSK_MGMT_RSV (62) by reducing the number of BUSY
responses reported by ib_srp to the SCSI mid-layer.

Signed-off-by: Bart Van Assche 
Cc: Roland Dreier 
Cc: David Dillow 

---
 drivers/infiniband/ulp/srp/ib_srp.c |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index cc6a713..93d6d0f 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1354,8 +1354,13 @@ static int srp_cm_handler(struct ib_cm_id *cm_id, struct 
ib_cm_event *event)
target->max_ti_iu_len = be32_to_cpu(rsp->max_ti_iu_len);
target->req_lim   = be32_to_cpu(rsp->req_lim_delta);
 
-   target->scsi_host->can_queue = min(target->req_lim,
-  
target->scsi_host->can_queue);
+   /*
+* Set can_queue such that the SCSI mid-layer
+* never causes __srp_get_tx_iu() to fail.
+*/
+   target->scsi_host->can_queue
+   = min(target->req_lim - SRP_TSK_MGMT_RSV,
+ target->scsi_host->can_queue);
} else {
shost_printk(KERN_WARNING, target->scsi_host,
PFX "Unhandled RSP opcode %#x\n", opcode);
-- 
1.6.4.2
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3 v4] IB/srp: Reduce number of BUSY conditions

2010-08-10 Thread Bart Van Assche

On Tue, Aug 10, 2010 at 8:34 PM, Bart Van Assche  wrote:
> The SRP (draft) standard specifies that an SRP initiator must never queue more

Note: this patch still has the old (incorrect) patch description. Will
post a patch with an updated description soon.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] libibumad: Increase the limit of UMAD_MAX_DEVICES

2010-08-10 Thread Arputham Benjamin

ibstat command doesn't show all HCAs when the number of HCAs in one system
exceeds 20. We need to change this limit. Increase it to 32 to be consistent
with the define "IB_UVERBS_MAX_DEVICES = 32"

Signed-off-by: Arputham Benjamin 
---
diff -rup a/libibumad-1.3.5/include/infiniband/umad.h 
b/libibumad-1.3.5/include/infiniband/umad.h
--- a/libibumad-1.3.5/include/infiniband/umad.h 2010-08-10 14:45:01.611516578 
-0700
+++ b/libibumad-1.3.5/include/infiniband/umad.h 2010-08-10 14:45:45.427516376 
-0700
@@ -44,7 +44,7 @@
 #endif /* __cplusplus */
 
 BEGIN_C_DECLS
-#define UMAD_MAX_DEVICES 20
+#define UMAD_MAX_DEVICES 32
 #define UMAD_ANY_PORT  0
 typedef struct ib_mad_addr {
uint32_t qpn;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] rdma cm + XRC

2010-08-10 Thread Jason Gunthorpe

On Tue, Aug 10, 2010 at 04:18:57PM -0700, Hefty, Sean wrote:
> > Well.. the XRC domain needs to be an input to create_ep just like
> > the PD :(
> > 
> > In looking at how this API turned out maybe the PD should have been
> > carried in the rdma_addrinfo? Certainly I would put the XRC domain
> > in there.. Recall my original comments about the PD being used to
> > restrict device selection in rdma_getaddrinfo.
> 
> Personally, if I had it to do over, I don't know that I would have
> exposed the PD through the librdmacm at all.  Does anyone ever
> allocate more than one per device?

That isn't a half bad idea I guess, and if it was in rdma_addinfo it
could be 0 = use global PD/XRC and 99% of apps can just do that..

Maybe you should just go ahead and do that? It would not be hard to
do so and even keep ABI (but not API) compatability. Now would be the
time :)

> I feel similarly about the XRC domain.  Is there any real reason to
> expose it?  What if we just defined a 1:1 relationship between PDs
> and XRC domains, or between XRC domains and XRC TGT QPs?

Near as I can tell it serves the same purpose as the PD, to provide
a form of security within a single process..

I could imagine things like storage apps wanting to use them -
MR's/etc need to be carefully compartmentalized if you do not entirely
trust your peer.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] rdma cm + XRC

2010-08-10 Thread Hefty, Sean

> Well.. the XRC domain needs to be an input to create_ep just like
> the PD :(
> 
> In looking at how this API turned out maybe the PD should have been
> carried in the rdma_addrinfo? Certainly I would put the XRC domain
> in there.. Recall my original comments about the PD being used to
> restrict device selection in rdma_getaddrinfo.

Personally, if I had it to do over, I don't know that I would have exposed the 
PD through the librdmacm at all.  Does anyone ever allocate more than one per 
device?

I feel similarly about the XRC domain.  Is there any real reason to expose it?  
What if we just defined a 1:1 relationship between PDs and XRC domains, or 
between XRC domains and XRC TGT QPs?

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] rdma cm + XRC

2010-08-10 Thread Jason Gunthorpe

On Tue, Aug 10, 2010 at 04:05:42PM -0500, frank zago wrote:

> It seems the new API has too many constraints for XRC. There are a
> couple things that don't fit:

I'll try to take a more careful look at this later, but just want to
say that the new APIs are so new that we could still change them - not
supporting XRC seems like a API design failing that might haunt us
later?

Keep in mind that rdma_getaddrinfo is the scheme to be used for CM
address resolution scalability, so it seems natural that anyone using
RDMA CM and XRC would want to use both together.

> - XRC needs a domain, which must be created before creating the QP,
> but after we know the device to use. In addition it also needs a
> file descriptor. The application may want to use a different fd
> depending on the device. Currently the domain can only be created in
> the middle of rdma_create_ep().

Well.. the XRC domain needs to be an input to create_ep just like
the PD :(

In looking at how this API turned out maybe the PD should have been
carried in the rdma_addrinfo? Certainly I would put the XRC domain
in there.. Recall my original comments about the PD being used to
restrict device selection in rdma_getaddrinfo.

Not sure what the FD is about (been awhile since the libibverbs XRC
patches were posted)?

> - The server side of the connection also needs an SRQ. It's not
> obvious whether it's the application or rdma cm to create that
> SRQ. And that SRQ number must be given to the client side,
> presumably in the private data.

All rdma_create_ep could do is setup the XRC INI QP and XRC TGT QP,
the XRC SRQ is associated with the XRC domain so it has to be managed
by the app.

The private data is passed into rdma_connect, after create_ep, so
there seems to be no problem there, the app can format up the SRQ
number according to the CM private data protocol it is using..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] rdma cm + XRC

2010-08-10 Thread frank zago

On 08/10/2010 12:14 PM, Jason Gunthorpe wrote:
> On Tue, Aug 10, 2010 at 09:59:50AM -0700, Hefty, Sean wrote:
> 
>>> The general parameters would be the same as for RC. Should we create a new
>>> ai_flag ? or a new port space ?
>>
>> There's a ai_qp_type field available.  I think the RDMA TCP port
>> space would work.
> 
> Not sure the port space matters at all?
> 
> Is there anything additional CM information for XRC other than
> requesting an XRC QP type? (XRCSRQ or something?)
> 
>>> Is it really necessary to support rdma_getaddrinfo, rdma_create_ep and the
>>> new APIs ?
>>
>> I think so, yes.  At least XRC needs to be handled, even if some of
>> the calls just fail as unsupported.
> 
> I'd like to see a strong rational for leaving any of the new API
> unsupported for XRC - IMHO it should all be doable. The new API is
> supposed to be simplifying, we want people to use it..

It seems the new API has too many constraints for XRC. There are a couple 
things that don't fit:

- XRC needs a domain, which must be created before creating the QP, but after 
we know 
  the device to use. In addition it also needs a file descriptor. The 
application may 
  want to use a different fd depending on the device. Currently the domain can 
only 
  be created in the middle of rdma_create_ep().

- The server side of the connection also needs an SRQ. It's not obvious whether 
it's
  the application or rdma cm to create that SRQ. And that SRQ number must be
  given to the client side, presumably in the private data.

Frank.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: API for Proposal for adding ib_usa to the Linux Infiniband Subsystem

2010-08-10 Thread Mike Heinz

That thought occurred to me, but I thought it might be easier for the app 
developer if the api explicitly broke up the generic concepts of traps and 
notices into specific types.

-Original Message-
From: linux-rdma-ow...@vger.kernel.org 
[mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Hefty, Sean
Sent: Tuesday, August 10, 2010 3:43 PM
To: Mike Heinz; linux-rdma@vger.kernel.org; Jason Gunthorpe
Subject: RE: API for Proposal for adding ib_usa to the Linux Infiniband 
Subsystem

> enum ibv_event_type {
> ...
> +   IBV_EVENT_GID
> ...
> struct ibv_async_event {
> union {
> +   struct ibv_gid_event *gid_event;
> ...
> + int ibv_reg_gid_event(struct ibv_context *context, uint8_t port_num);

We need to get Roland's thoughts on this, since he maintains libibverbs.  As 
just a thought, I think you can still use a more generic API with kernel checks 
to verify that an application has permission to register for certain types of 
events.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: API for Proposal for adding ib_usa to the Linux Infiniband Subsystem

2010-08-10 Thread Hefty, Sean

> enum ibv_event_type {
> ...
> +   IBV_EVENT_GID
> ...
> struct ibv_async_event {
> union {
> +   struct ibv_gid_event *gid_event;
> ...
> + int ibv_reg_gid_event(struct ibv_context *context, uint8_t port_num);

We need to get Roland's thoughts on this, since he maintains libibverbs.  As 
just a thought, I think you can still use a more generic API with kernel checks 
to verify that an application has permission to register for certain types of 
events.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: API for Proposal for adding ib_usa to the Linux Infiniband Subsystem

2010-08-10 Thread Mike Heinz

Sorry for the massive lag in this conversation - between trying to balance 
working with Linux-RDMA with the rest of my job, support problems and 
vacations, this got pushed to the bottom of my queue while I thought about the 
best approach to the issue.

When we left this, we were discussing the most appropriate API, with Jason 
suggesting we go through /dev/*umad* and Sean suggesting leveraging the 
existing async functions in libibverbs:

> Re: Proposal for adding ib_usa to the Linux Infiniband Subsystem
> Jason Gunthorpe
> Fri, 21 May 2010 12:46:22 -0700
>
> My thinking was sort of:
>
> fd = open("/dev/..umad..");
> ioctl(fd, SUBSCRIBE_TRAP, &trap_description)
> read(fd..); // Events arrive
> close(fd);  // Event is unsubscribed
>
> The reason this fits nicely with umad is that you really actually want
> the GMP trap, not some processed version.. So this is like a multicast
> subscribe in IP.

and...

> RE: API for Proposal for adding ib_usa to the Linux Infiniband Subsystem
> Hefty, Sean
> Tue, 08 Jun 2010 14:23:59 -0700
>
> Would something like this work for you?

> enum ibv_event_type {
> ...
> +   IBV_EVENT_SA
> ...
>
> struct ibv_async_event {
> union {
> +   struct ibv_sa_event *sa_event;
> ...
> + int ibv_reg_sa_event(struct ibv_context *context, uint8_t port_num,
> uint16_t trap_number);
>
> This would tie in with the libibverbs async event reporting and provide a 
> simple registration API.  (Deregistering would just be done automatically 
> when 
> closing the ibv_context.)

I do like Sean's idea quite a lot since it's simple and ties into an existing 
async interface rather than requiring a new one - but I've also been wondering 
about security issues and whether it makes sense for most SA/SM traps to be 
exposed to end-user applications.

Here's what I mean: the list of traps includes GID in/out of service, MC group 
traps, Path is no longer valid, security notifications like pkey errors, 
etcetera. However, only 6 of these are valid for the SA/SM even in theory and 
it's not clear to me that anything except GID in/out of service and MC group 
created/deleted are actually implemented in the OpenSM. Plus, I don't think 
unprivileged applications should have access to the other traps in any case.

So, what I'm wondering is should we simplify things so that the application can 
only register for GID in/out notifications? If we did this correctly, it would 
be easy to extend the interface later to add other trap types if the need 
becomes clear. So, my suggestion for the API would be more like this:

enum ibv_event_type {
...
+   IBV_EVENT_GID
...
struct ibv_async_event {
union {
+   struct ibv_gid_event *gid_event;
...
+ int ibv_reg_gid_event(struct ibv_context *context, uint8_t port_num);
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3 v4] IB/srp: Reduce number of BUSY conditions

2010-08-10 Thread Bart Van Assche

The SRP (draft) standard specifies that an SRP initiator must never queue more
than (SRP request limit) - 1 unanswered SRP_CMD information units. This patch
makes sure that the SCSI mid-layer never tries to queue more than (SRP request
limit) - 1 SCSI commands to ib_srp. This improves performance for targets
whose request limit is less than or equal to
SRP_SQ_REQ_SIZE - SRP_TSK_MGMT_RSV (62) by reducing the number of BUSY
responses reported by ib_srp to the SCSI mid-layer.

Signed-off-by: Bart Van Assche 
Cc: Roland Dreier 
Cc: David Dillow 

---
 drivers/infiniband/ulp/srp/ib_srp.c |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index cc6a713..93d6d0f 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1354,8 +1354,13 @@ static int srp_cm_handler(struct ib_cm_id *cm_id, struct 
ib_cm_event *event)
target->max_ti_iu_len = be32_to_cpu(rsp->max_ti_iu_len);
target->req_lim   = be32_to_cpu(rsp->req_lim_delta);
 
-   target->scsi_host->can_queue = min(target->req_lim,
-  
target->scsi_host->can_queue);
+   /*
+* Set can_queue such that the SCSI mid-layer
+* never causes __srp_get_tx_iu() to fail.
+*/
+   target->scsi_host->can_queue
+   = min(target->req_lim - SRP_TSK_MGMT_RSV,
+ target->scsi_host->can_queue);
} else {
shost_printk(KERN_WARNING, target->scsi_host,
PFX "Unhandled RSP opcode %#x\n", opcode);
-- 
1.6.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3 v4] IB/srp: Implement SRP_CRED_REQ

2010-08-10 Thread Bart Van Assche

Implements SRP_CRED_REQ, which is an information unit defined in the SRP
(draft) standard and that allows an SRP target to inform an SRP initiator that
more requests may be sent by the initiator. Adds declarations for the
SRP_CRED_REQ and SRP_CRED_RSP information units to include/scsi/srp.h.

Signed-off-by: Bart Van Assche 
Cc: Roland Dreier 
Cc: David Dillow 

---
 drivers/infiniband/ulp/srp/ib_srp.c |  222 +--
 drivers/infiniband/ulp/srp/ib_srp.h |7 +-
 include/scsi/srp.h  |   14 +++
 3 files changed, 177 insertions(+), 66 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 6946012..cc6a713 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -898,6 +898,157 @@ static void srp_process_rsp(struct srp_target_port 
*target, struct srp_rsp *rsp)
spin_unlock_irqrestore(target->scsi_host->host_lock, flags);
 }
 
+/*
+ * Must be called with target->scsi_host->host_lock held to protect
+ * tx_head.  Lock cannot be dropped between call here and call to
+ * __srp_post_send_iu().
+ *
+ * Note:
+ * An upper limit for the number of allocated information units for each
+ * request type is:
+ * - SRP_TX_IU_REQ_NORMAL: SRP_REQ_SQ_SIZE - SRP_TSK_MGMT_RSV, since the
+ *   SCSI mid-layer never queues more than Scsi_Host.can_queue requests.
+ * - SRP_TX_IU_REQ_TASK_MGMT: SRP_TSK_MGMT_RSV.
+ * - SRP_TX_IU_RSP: 1, since a conforming SRP target never sends more than
+ *   one unanswered SRP request to an initiator.
+ */
+static struct srp_iu *__srp_get_tx_iu(struct srp_target_port *target,
+ enum srp_tx_iu_type tx_iu_type)
+{
+   s32 rsv;
+
+   BUILD_BUG_ON_NOT_POWER_OF_2(SRP_SQ_MASK + 1);
+
+   srp_send_completion(target->send_cq, target);
+
+   rsv = (tx_iu_type == SRP_TX_IU_REQ_NORMAL) ? SRP_TSK_MGMT_RSV : 0;
+
+   if (SRP_SQ_SIZE - (target->tx_head - target->tx_tail) <= rsv)
+   return NULL;
+
+   if (tx_iu_type != SRP_TX_IU_RSP && target->req_lim <= rsv) {
+   ++target->zero_req_lim;
+   return NULL;
+   }
+
+   return target->tx_ring[target->tx_head & SRP_SQ_MASK];
+}
+
+/*
+ * Must be called with target->scsi_host->host_lock held to protect tx_head.
+ */
+static int __srp_post_send_iu(struct srp_target_port *target,
+ struct srp_iu *iu, int len)
+{
+   struct ib_sge list;
+   struct ib_send_wr wr, *bad_wr;
+   int ret = 0;
+
+   list.addr   = iu->dma;
+   list.length = len;
+   list.lkey   = target->srp_host->srp_dev->mr->lkey;
+
+   wr.next   = NULL;
+   wr.wr_id  = target->tx_head & SRP_SQ_MASK;
+   wr.sg_list= &list;
+   wr.num_sge= 1;
+   wr.opcode = IB_WR_SEND;
+   wr.send_flags = IB_SEND_SIGNALED;
+
+   ret = ib_post_send(target->qp, &wr, &bad_wr);
+
+   if (!ret)
+   ++target->tx_head;
+
+   return ret;
+}
+
+/*
+ * Must be called with target->scsi_host->host_lock held to protect req_lim.
+ */
+static int __srp_post_send_req(struct srp_target_port *target,
+  struct srp_iu *iu, int len)
+{
+   int ret;
+
+   ret = __srp_post_send_iu(target, iu, len);
+   if (ret == 0)
+   --target->req_lim;
+   return ret;
+}
+
+/*
+ * Must be called with target->scsi_host->host_lock held.
+ */
+static int __srp_post_send_rsp(struct srp_target_port *target,
+  struct srp_iu *iu, int len)
+{
+   return __srp_post_send_iu(target, iu, len);
+}
+
+/*
+ * Must be called with target->scsi_host->host_lock locked to protect
+ * target->req_lim.
+ */
+static int srp_handle_cred_req(struct srp_target_port *target,
+  struct srp_cred_req *req,
+  struct srp_cred_rsp *rsp)
+{
+   target->req_lim += be32_to_cpu(req->req_lim_delta);
+
+   memset(rsp, 0, sizeof *rsp);
+   rsp->opcode = SRP_CRED_RSP;
+   rsp->tag= req->tag;
+
+   return 0;
+}
+
+static void srp_handle_req(struct srp_target_port *target,
+  struct srp_iu *req_iu)
+{
+   struct ib_device *dev;
+   u8 *req_buf;
+   unsigned long flags;
+   struct srp_iu *rsp_iu;
+   u8 *rsp_buf;
+   int res;
+
+   dev = target->srp_host->srp_dev->dev;
+   req_buf = req_iu->buf;
+
+   spin_lock_irqsave(target->scsi_host->host_lock, flags);
+
+   rsp_iu = __srp_get_tx_iu(target, SRP_TX_IU_RSP);
+   if (!rsp_iu)
+   goto out_unlock;
+
+   rsp_buf = rsp_iu->buf;
+
+   res = -EINVAL;
+
+   switch (req_buf[0]) {
+   case SRP_CRED_REQ:
+   res = srp_handle_cred_req(target,
+ (struct srp_cred_req *)req_buf,
+ (struct srp_cred_rsp *)rsp_buf);
+   break;
+   }
+
+

[PATCH 1/3 v4] IB/srp: Preparation for transmit ring response allocation

2010-08-10 Thread Bart Van Assche

The information unit transmit ring (srp_target.tx_ring) in ib_srp is currently
only used for allocating requests sent by the initiator to the target. This
patch prepares using that ring buffer for allocation of both requests and
responses. Also, this patch differentiates the uses of SRP_SQ_SIZE,
increases the size of the IB send completion queue by one element and
reserves one transmit ring slot for SRP_TSK_MGMT requests.

Signed-off-by: Bart Van Assche 
Cc: Roland Dreier 
Cc: David Dillow 

---
 drivers/infiniband/ulp/srp/ib_srp.c |   33 -
 drivers/infiniband/ulp/srp/ib_srp.h |   12 +---
 2 files changed, 29 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 7f8f16b..6946012 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -291,7 +291,7 @@ static void srp_free_target_ib(struct srp_target_port 
*target)
 
for (i = 0; i < SRP_RQ_SIZE; ++i)
srp_free_iu(target->srp_host, target->rx_ring[i]);
-   for (i = 0; i < SRP_SQ_SIZE + 1; ++i)
+   for (i = 0; i < SRP_SQ_SIZE; ++i)
srp_free_iu(target->srp_host, target->tx_ring[i]);
 }
 
@@ -820,9 +820,11 @@ static int srp_post_recv(struct srp_target_port *target)
unsigned int next;
int ret;
 
+   BUILD_BUG_ON_NOT_POWER_OF_2(SRP_RQ_MASK + 1);
+
spin_lock_irqsave(target->scsi_host->host_lock, flags);
 
-   next = target->rx_head & (SRP_RQ_SIZE - 1);
+   next = target->rx_head & SRP_RQ_MASK;
wr.wr_id = next;
iu   = target->rx_ring[next];
 
@@ -989,19 +991,23 @@ static void srp_send_completion(struct ib_cq *cq, void 
*target_ptr)
 static struct srp_iu *__srp_get_tx_iu(struct srp_target_port *target,
enum srp_request_type req_type)
 {
-   s32 min = (req_type == SRP_REQ_TASK_MGMT) ? 1 : 2;
+   s32 rsv;
+
+   BUILD_BUG_ON_NOT_POWER_OF_2(SRP_SQ_MASK + 1);
+
+   rsv = (req_type == SRP_REQ_NORMAL) ? SRP_TSK_MGMT_RSV : 0;
 
srp_send_completion(target->send_cq, target);
 
-   if (target->tx_head - target->tx_tail >= SRP_SQ_SIZE)
+   if (target->tx_head - target->tx_tail >= SRP_SQ_SIZE - rsv)
return NULL;
 
-   if (target->req_lim < min) {
+   if (target->req_lim <= rsv) {
++target->zero_req_lim;
return NULL;
}
 
-   return target->tx_ring[target->tx_head & SRP_SQ_SIZE];
+   return target->tx_ring[target->tx_head & SRP_SQ_MASK];
 }
 
 /*
@@ -1020,7 +1026,7 @@ static int __srp_post_send(struct srp_target_port *target,
list.lkey   = target->srp_host->srp_dev->mr->lkey;
 
wr.next   = NULL;
-   wr.wr_id  = target->tx_head & SRP_SQ_SIZE;
+   wr.wr_id  = target->tx_head & SRP_SQ_MASK;
wr.sg_list= &list;
wr.num_sge= 1;
wr.opcode = IB_WR_SEND;
@@ -1121,7 +1127,7 @@ static int srp_alloc_iu_bufs(struct srp_target_port 
*target)
goto err;
}
 
-   for (i = 0; i < SRP_SQ_SIZE + 1; ++i) {
+   for (i = 0; i < SRP_SQ_SIZE; ++i) {
target->tx_ring[i] = srp_alloc_iu(target->srp_host,
  srp_max_iu_len,
  GFP_KERNEL, DMA_TO_DEVICE);
@@ -1137,7 +1143,7 @@ err:
target->rx_ring[i] = NULL;
}
 
-   for (i = 0; i < SRP_SQ_SIZE + 1; ++i) {
+   for (i = 0; i < SRP_SQ_SIZE; ++i) {
srp_free_iu(target->srp_host, target->tx_ring[i]);
target->tx_ring[i] = NULL;
}
@@ -1626,9 +1632,9 @@ static struct scsi_host_template srp_template = {
.eh_abort_handler   = srp_abort,
.eh_device_reset_handler= srp_reset_device,
.eh_host_reset_handler  = srp_reset_host,
-   .can_queue  = SRP_SQ_SIZE,
+   .can_queue  = SRP_REQ_SQ_SIZE - SRP_TSK_MGMT_RSV,
.this_id= -1,
-   .cmd_per_lun= SRP_SQ_SIZE,
+   .cmd_per_lun= SRP_REQ_SQ_SIZE - SRP_TSK_MGMT_RSV,
.use_clustering = ENABLE_CLUSTERING,
.shost_attrs= srp_host_attrs
 };
@@ -1813,7 +1819,8 @@ static int srp_parse_options(const char *buf, struct 
srp_target_port *target)
printk(KERN_WARNING PFX "bad max cmd_per_lun 
parameter '%s'\n", p);
goto out;
}
-   target->scsi_host->cmd_per_lun = min(token, 
SRP_SQ_SIZE);
+   target->scsi_host->cmd_per_lun = min(token,
+SRP_REQ_SQ_SIZE - 
SRP_TSK_MGMT_RSV);
break;
 
case SRP_OPT_IO_CLASS:
@@ -1891,7

[PATCH 0/3 v4] IB/srp: Add SRP_CRED_REQ support

2010-08-10 Thread Bart Van Assche

This series of three patches adds SRP_CRED_REQ support in ib_srp, which is a
feature defined in the SRP (draft) standard.

Changes in v4 compared to v3:
- Dropped the fourth patch since it has been merged.
- Introduced the symbolic constant SRP_TSK_MGMT_RSV, which represents the
  number of slots reserved in the tx ring for SRP task management requests.
- One slot of the transmit ring buffer is now reserved for sending
  SRP_TSK_MGMT requests.
- The BUILD_BUG_ON_NOT_POWER_OF_2() macro is now used to verify whether
  SRP_RQ_MASK and SRP_SQ_MASK really are bitmasks.
- Rearranged function order such that no new forward declarations have to be
  introduced in the second patch.
- Eliminated the srp_target_port.tx_req and tx_rsp variables.
- Removed the SRP_OP_RSP flag.
- Added a source code comment that explains why can_queue must be less than
  req_lim.

Changes in v3 compared to v2:
- Corrected completion queue size from SRP_REQ_SQ_SIZE (63) into
  SRP_SQ_SIZE (64).
- Merged the two functions __srp_get_tx_req_iu() and __srp_get_tx_rsp_iu()
  into a single function __srp_get_tx_iu().
- Removed __attribute__((packed)) from the structure definitions added in
  include/scsi/srp.h because it was superfluous.
- Modified descriptions of patches 1/4, 2/4 and 3/4.

Changes in v2 compared to v1:
- Split the big patch into multiple patches, which makes reviewing easier.
- Instead of defining separate queues for transmitting requests and responses,
  a single queue is now used.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: i386 allmodconfig, current mainline

2010-08-10 Thread Phillip Lougher


Andrew Morton wrote:


fs/squashfs/xattr.c:37: warning: 'squashfs_xattr_handler' declared inline after 
being called
fs/squashfs/xattr.c:37: warning: previous declaration of 
'squashfs_xattr_handler' was here



The fix for this is in linux-next, and it will be in my imminent 2.6.36 pull
request to Linus.

Phillip
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] rdma cm + XRC

2010-08-10 Thread frank zago

On 08/10/2010 12:14 PM, Jason Gunthorpe wrote:
> On Tue, Aug 10, 2010 at 09:59:50AM -0700, Hefty, Sean wrote:
> 
>>> The general parameters would be the same as for RC. Should we create a new
>>> ai_flag ? or a new port space ?
>>
>> There's a ai_qp_type field available.  I think the RDMA TCP port
>> space would work.
> 
> Not sure the port space matters at all?
> 
> Is there anything additional CM information for XRC other than
> requesting an XRC QP type? (XRCSRQ or something?)

Creating a send or receive XRC QP is using a different API (ibv_create_qp vs
ibv_create_xrc_rcv_qp) so I used the max_send_wr capability attribute to
discriminate between both cases. That's the only visible change to the API.
On 7/30, I posted a patch to perftest rdmb_bw to show how it's used.

> 
>>> Is it really necessary to support rdma_getaddrinfo, rdma_create_ep and the
>>> new APIs ?
>>
>> I think so, yes.  At least XRC needs to be handled, even if some of
>> the calls just fail as unsupported.
> 
> I'd like to see a strong rational for leaving any of the new API
> unsupported for XRC - IMHO it should all be doable. The new API is
> supposed to be simplifying, we want people to use it..

No rational, besides that someone has to write some code :)
 
Regards,
  Frank.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: yet again the atomic operations

2010-08-10 Thread Ralph Campbell

On Tue, 2010-08-10 at 04:46 -0700, Rui Machado wrote:
> > There are two kinds supported. QLogic's driver does them in
> > the host driver so they are atomic with respect to all the CPUs
> > in the host.
> 
> I'm just curious about this: how does this work? Is the CPU getting
> interrupted and doing the operation while the Mellanox HCA does
> everything in hardware?

Yes.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] rdma cm + XRC

2010-08-10 Thread Hefty, Sean

> > There's a ai_qp_type field available.  I think the RDMA TCP port
> > space would work.
> 
> Not sure the port space matters at all?
> 
> Is there anything additional CM information for XRC other than
> requesting an XRC QP type? (XRCSRQ or something?)

It's nothing huge:

Modifications to Table 99:
* Responder Resources field, Values column: 0 for XRC
* Transport Service Type field, Description column: See
  Section 14.6.3.1 Transport Service Type.
* Retry Count field, Values column: 0 for XRC
* RNR Retry Count field, Values column: 0 for XRC
* SRQ field, Values column: 0 for XRC
* Change (reserved) field at position byte 51, bit 5 to:
  * Field: Extended Transport Type
  * Description: See Section 14.6.3.2 Extended Transport Type
  * Used for Purpose: C
  * Byte [Bit] Offset: 51 [5]
  * Length, Bits: 3

Modifications to Table 103:
* Initiator Depth field, Values column: 0 for XRC
* End-to-End Flow Control field, Values column: 0 for XRC
* SRQ field, Values column: 1 for XRC

But should still be set correctly.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] rdma cm + XRC

2010-08-10 Thread Jason Gunthorpe

On Tue, Aug 10, 2010 at 09:59:50AM -0700, Hefty, Sean wrote:

> > The general parameters would be the same as for RC. Should we create a new
> > ai_flag ? or a new port space ?
> 
> There's a ai_qp_type field available.  I think the RDMA TCP port
> space would work.

Not sure the port space matters at all?

Is there anything additional CM information for XRC other than
requesting an XRC QP type? (XRCSRQ or something?)

> > Is it really necessary to support rdma_getaddrinfo, rdma_create_ep and the
> > new APIs ?
> 
> I think so, yes.  At least XRC needs to be handled, even if some of
> the calls just fail as unsupported.

I'd like to see a strong rational for leaving any of the new API
unsupported for XRC - IMHO it should all be doable. The new API is
supposed to be simplifying, we want people to use it..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] rdma cm + XRC

2010-08-10 Thread Hefty, Sean

> > - XRC support upstream (kernel and user space) is still pending.
> >   (I can start a librdmacm branch for XRC support.)
> > - Changes are needed to the kernel rdma_cm.
> >   We could start submitting patches against Roland's xrc branch for
> these.
> > - Please update to the latest librdmacm tree.
> >   More specifically, rdma_getaddrinfo should support XRC as well.
> 
> The general parameters would be the same as for RC. Should we create a new
> ai_flag ? or a new port space ?

There's a ai_qp_type field available.  I think the RDMA TCP port space would 
work.

> Is it really necessary to support rdma_getaddrinfo, rdma_create_ep and the
> new APIs ?

I think so, yes.  At least XRC needs to be handled, even if some of the calls 
just fail as unsupported.

> > In general, I'd like to find a way to add XRC support to the librdmacm
> that makes things as simple for the user as possible.
> 
> Besides the need to correctly set cap.max_send_wr, the user API is
> unchanged.

I'm also concerned that the kernel must format the IB CM messages correctly 
when XRC is in use.  The kernel currently formats the messages assuming that 
the QP is RC.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: i386 allmodconfig, current mainline

2010-08-10 Thread Geert Uytterhoeven

On Tue, Aug 10, 2010 at 02:15, Stephen Rothwell  wrote:
> On Mon, 9 Aug 2010 16:43:46 -0700 Andrew Morton  
> wrote:
>>
>> Guys.  What's goin' on out there?
>
> I guess we are all so up to date that noone does 32 bit builds any
> more ...  Also noone is bothering to look at the build logs:
>
> linus tree: http://kisskb.ellerman.id.au/kisskb/branch/3/
> linux-next: http://kisskb.ellerman.id.au/kisskb/branch/9/

Yeah, we have build failures migrating from -next to -linus :-(

BTW, still no one has commented on http://lkml.org/lkml/2010/7/13/378?

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] rdma cm + XRC

2010-08-10 Thread frank zago

Hello Sean,

On 08/09/2010 03:53 PM, Hefty, Sean wrote:
>> This allow rdma ucm to establish an XRC connection between two nodes. Most
>> of the changes are related to modify_qp since the API is different
>> whether the QP is on the send or receive side.
>> To create an XRC receive QP, the cap.max_send_wr must be set to 0.
>> Conversely, to create the send XRC QP, that attribute must be non-zero.
> 
> I need to give XRC support to the librdmacm more thought, but here are at 
> least the initial concerns:
> 
> - XRC support upstream (kernel and user space) is still pending.
>   (I can start a librdmacm branch for XRC support.)
> - Changes are needed to the kernel rdma_cm.
>   We could start submitting patches against Roland's xrc branch for these.
> - Please update to the latest librdmacm tree.
>   More specifically, rdma_getaddrinfo should support XRC as well.

The general parameters would be the same as for RC. Should we create a new 
ai_flag ? or a new port space ?
Is it really necessary to support rdma_getaddrinfo, rdma_create_ep and the new 
APIs ?

> In general, I'd like to find a way to add XRC support to the librdmacm that 
> makes things as simple for the user as possible.

Besides the need to correctly set cap.max_send_wr, the user API is unchanged.

New patch attached.
diff --git a/include/rdma/rdma_cma.h b/include/rdma/rdma_cma.h
index d17ef88..d18685b 100644
--- a/include/rdma/rdma_cma.h
+++ b/include/rdma/rdma_cma.h
@@ -125,6 +125,8 @@ struct rdma_cm_id {
 	struct ibv_cq		*send_cq;
 	struct ibv_comp_channel *recv_cq_channel;
 	struct ibv_cq		*recv_cq;
+ 	struct ibv_xrc_domain *xrc_domain;
+ 	uint32_t xrc_rcv_qpn;
 };
 
 enum {
diff --git a/man/rdma_create_qp.3 b/man/rdma_create_qp.3
index 9d2de76..659e033 100644
--- a/man/rdma_create_qp.3
+++ b/man/rdma_create_qp.3
@@ -39,6 +39,10 @@ a send or receive completion queue is not specified, then a CQ will be
 allocated by the rdma_cm for the QP, along with corresponding completion
 channels.  Completion channels and CQ data created by the rdma_cm are
 exposed to the user through the rdma_cm_id structure.
+.P
+To create an XRC receive QP, and in addition to the XRC QP type,
+ibv_qp_init_attr.cap.max_send_wr must be set to 0. Conversely, to
+create the XRC send QP, that attribute must be non-zero.
 .SH "SEE ALSO"
 rdma_bind_addr(3), rdma_resolve_addr(3), rdma_destroy_qp(3), ibv_create_qp(3),
 ibv_modify_qp(3)
diff --git a/src/cma.c b/src/cma.c
index a4fd574..b4eec77 100755
--- a/src/cma.c
+++ b/src/cma.c
@@ -948,12 +948,29 @@ static int rdma_init_qp_attr(struct rdma_cm_id *id, struct ibv_qp_attr *qp_attr,
 	return 0;
 }
 
+static int rdma_modify_qp(struct rdma_cm_id *id, 
+		  struct ibv_qp_attr *qp_attr,
+		  int qp_attr_mask)
+{
+	int ret;
+
+	if (id->qp)
+		ret = ibv_modify_qp(id->qp, qp_attr, qp_attr_mask);
+	else if (id->xrc_domain)
+		ret = ibv_modify_xrc_rcv_qp(id->xrc_domain, id->xrc_rcv_qpn,
+	qp_attr, qp_attr_mask);
+	else 
+		ret = EINVAL;
+
+	return ret;
+}
+
 static int ucma_modify_qp_rtr(struct rdma_cm_id *id, uint8_t resp_res)
 {
 	struct ibv_qp_attr qp_attr;
 	int qp_attr_mask, ret;
 
-	if (!id->qp)
+	if (!id->qp && !id->xrc_domain)
 		return ERR(EINVAL);
 
 	/* Need to update QP attributes from default values. */
@@ -962,7 +979,7 @@ static int ucma_modify_qp_rtr(struct rdma_cm_id *id, uint8_t resp_res)
 	if (ret)
 		return ret;
 
-	ret = ibv_modify_qp(id->qp, &qp_attr, qp_attr_mask);
+	ret = rdma_modify_qp(id, &qp_attr, qp_attr_mask);
 	if (ret)
 		return ERR(ret);
 
@@ -973,7 +990,7 @@ static int ucma_modify_qp_rtr(struct rdma_cm_id *id, uint8_t resp_res)
 
 	if (resp_res != RDMA_MAX_RESP_RES)
 		qp_attr.max_dest_rd_atomic = resp_res;
-	return rdma_seterrno(ibv_modify_qp(id->qp, &qp_attr, qp_attr_mask));
+	return rdma_seterrno(rdma_modify_qp(id, &qp_attr, qp_attr_mask));
 }
 
 static int ucma_modify_qp_rts(struct rdma_cm_id *id, uint8_t init_depth)
@@ -988,29 +1005,29 @@ static int ucma_modify_qp_rts(struct rdma_cm_id *id, uint8_t init_depth)
 
 	if (init_depth != RDMA_MAX_INIT_DEPTH)
 		qp_attr.max_rd_atomic = init_depth;
-	return rdma_seterrno(ibv_modify_qp(id->qp, &qp_attr, qp_attr_mask));
+	return rdma_seterrno(rdma_modify_qp(id, &qp_attr, qp_attr_mask));
 }
 
 static int ucma_modify_qp_sqd(struct rdma_cm_id *id)
 {
 	struct ibv_qp_attr qp_attr;
 
-	if (!id->qp)
+	if (!id->qp && !id->xrc_domain)
 		return 0;
 
 	qp_attr.qp_state = IBV_QPS_SQD;
-	return rdma_seterrno(ibv_modify_qp(id->qp, &qp_attr, IBV_QP_STATE));
+	return rdma_seterrno(rdma_modify_qp(id, &qp_attr, IBV_QP_STATE));
 }
 
 static int ucma_modify_qp_err(struct rdma_cm_id *id)
 {
 	struct ibv_qp_attr qp_attr;
 
-	if (!id->qp)
+	if (!id->qp && !id->xrc_domain)
 		return 0;
 
 	qp_attr.qp_state = IBV_QPS_ERR;
-	return rdma_seterrno(ibv_modify_qp(id->qp, &qp_attr, IBV_QP_STATE));
+	return rdma_seterrno(rdma_modify_qp(id, &qp_attr, IBV_QP_STATE));
 }
 
 static int ucma_find_pkey(struct cma_device *cma_dev, uint8_t port_num,
@@ -1029,7 +1046,7 @@ static int

Re: [PATCH v2] rdma/ib_pack.h: add new bth opcodes

2010-08-10 Thread Bob Pearson


On 08/09/2010 06:36 PM, Hefty, Sean wrote:

Several new opcodes have been added since the last time ib_pack.h was
updated.
These changes add them.
 

Will anything make use of these?

   

diff --git a/include/rdma/ib_pack.h b/include/rdma/ib_pack.h
index cbb50f4..df10acc 100644
--- a/include/rdma/ib_pack.h
+++ b/include/rdma/ib_pack.h
@@ -73,8 +73,11 @@ enum {
 IB_OPCODE_UC= 0x20,
 IB_OPCODE_RD= 0x40,
 IB_OPCODE_UD= 0x60,
+   IB_OPCODE_CN= 0x80,
+   IB_OPCODE_XRC   = 0xA0,
 

The XRC and RD values all look correct, but I must have fallen asleep and 
missed something.  What's CN
   
- Sean


   


Or also asked this question. Thanks to both of you.

CN is (backwards) congestion notification. It has a single opcode and 
caries no payload. It is described in an annex in IBA Vol 1.
I started adding the opcodes for send_xxx_with_invalidate but figured we 
may as well capture all of them.
My interest is supporting the rxe driver, a software implementation of 
the IB transport over Ethernet, and we have
looked at supporting xrc as an option, and I spent a little time looking 
at trying to exploit congestion notification to see if it

would bu useful in this context.

Bob Pearson

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: yet again the atomic operations

2010-08-10 Thread Rui Machado

>
> You can work around this by creating a loopback connection (ie an RC
> connection from the local HCA to itself) and post atomic operations to
> that QP instead of accessing the memory directly with the CPU.
>
Right but that's really slow, specially if you're implementing some
sort of synchronization scheme that happens really often.
Isn't it possible to do something such as interrupting the CPU and
have it do the operation? For an atomic operation (small data) maybe
the overhead is not that large?!

Cheers,
Rui
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: yet again the atomic operations

2010-08-10 Thread Rui Machado

> There are two kinds supported. QLogic's driver does them in
> the host driver so they are atomic with respect to all the CPUs
> in the host.

I'm just curious about this: how does this work? Is the CPU getting
interrupted and doing the operation while the Mellanox HCA does
everything in hardware?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/4] IB/srp: implement SRP_CRED_REQ

2010-08-10 Thread Bart Van Assche

On Tue, Aug 3, 2010 at 5:44 PM, David Dillow  wrote:
>
> On Tue, 2010-08-03 at 17:26 +0200, Bart Van Assche wrote:
> [ ... ]
> > I'm not sure it is a good idea to allow that all transmit buffers get
> > allocated for sending CMD_RSP information units and that none remain
> > for replying to incoming SRP_CRED_REQ / SRP_AER_REQ / SRP_T_LOGOUT
> > requests. Wouldn't that be a protocol violation ?
>
> You mean SRP_CMD? We don't do that, we clamp the outstanding requests we
> generate so we'll always have a buffer to reply to SRP_CRED_REQ et al.
>
> If SRQ_SQ_SIZE is 64, we set .can_queue to 63 which will leave a buffer
> available for the reply.
>
> Now, if the target sends us multiple SRP_CRED_REQ without waiting for a
> reply, then we could use up buffers and not be able to send requests,
> but that would be a protocol violation on the target's side.

(resending as plain text)

I have been looking further into this. There is a slight asymmetry in
the current transmit buffer allocation code: one buffer element is
reserved on the target for task management requests but not on the
initiator. So - at least in theory - it is possible that all elements
of the initiator transmit ring are allocated for SRP_CMD requests at
the time an SRP_TSK_MGMT request should be sent. Wouldn't it be more
symmetric to change __srp_get_tx_iu() as follows (diff against the
current for-next branch) ?

--- a/drivers/infiniband/ulp/srp/
ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -989,14 +989,14 @@ static void srp_send_completion(struct ib_cq
*cq, void *target_ptr)
 static struct srp_iu *__srp_get_tx_iu(struct srp_target_port *target,
    enum srp_request_type req_type)
 {
-   s32 min = (req_type == SRP_REQ_TASK_MGMT) ? 1 : 2;
+   s32 rsv = (req_type == SRP_REQ_TASK_MGMT) ? 0 : 1;

    srp_send_completion(target->send_cq, target);

-   if (target->tx_head - target->tx_tail >= SRP_SQ_SIZE)
+   if (target->tx_head - target->tx_tail >= SRP_SQ_SIZE - rsv)
    return NULL;

-   if (target->req_lim < min) {
+   if (target->req_lim <= rsv) {
    ++target->zero_req_lim;
    return NULL;
    }

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: {RFC] ibv_post_send()/ibv_post_recv() kernel path optimizations

2010-08-10 Thread Walukiewicz, Miroslaw

Hello Jason, 

Do you have any benchmarks that show the alloca is a measurable
overhead?  

We changed overall path (both kernel and user space) to allocation-less 
approach and 
We achieved twice better latency using call to kernel driver. I have no data 
which path 
Is dominant - kernel or user space. I think I will have some measurements next 
week, so I will share 
My results.

Roland is right, all you
really need is a per-context (+per-cpu?) buffer you can grab, fill,
and put back.

I agree. I will go into this direction.

Regards,

Mirek

-Original Message-
From: linux-rdma-ow...@vger.kernel.org 
[mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Jason Gunthorpe
Sent: Friday, August 06, 2010 6:33 PM
To: Walukiewicz, Miroslaw
Cc: Roland Dreier; linux-rdma@vger.kernel.org
Subject: Re: {RFC] ibv_post_send()/ibv_post_recv() kernel path optimizations

On Fri, Aug 06, 2010 at 11:03:36AM +0100, Walukiewicz, Miroslaw wrote:

> Currently the transmit/receive path works following way: User calls
> ibv_post_send() where vendor specific function is called.  When the
> path should go through kernel the ibv_cmd_post_send() is called.
> The function creates the POST_SEND message body that is passed to
> kernel.  As the number of sges is unknown the dynamic allocation for
> message body is performed.  (see libibverbs/src/cmd.c)

Do you have any benchmarks that show the alloca is a measurable
overhead?  I'm pretty skeptical... alloca will generally boil down to
one or two assembly instructions adjusting the stack pointer, and not
even that if you are lucky and it can be merged into the function
prologe.

> In the kernel the message body is parsed and a structure of wr and
> sges is recreated using dynamic allocations in kernel The goal of
> this operation is having a similar structure like in user space.

.. the kmalloc call(s) on the other hand definately seems worth
looking at ..

> In kernel in ib_uverbs_post_send() instead of dynamic allocation of
> the ib_send_wr structures the table of 512 ib_send_wr structures
> will be defined and all entries will be linked to unidirectional
> list so qp->device->post_send(qp, wr, &bad_wr) API will be not
> changed.

Isn't there a kernel API already for managing a pool of
pre-allocated fixed-size allocations?

It isn't clear to me that is even necessary, Roland is right, all you
really need is a per-context (+per-cpu?) buffer you can grab, fill,
and put back.

> As I know no driver uses that kernel path to posting buffers so
> iWARP multicast acceleration implemented in NES driver Would be a
> first application that can utilize the optimized path.

??

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: {RFC] ibv_post_send()/ibv_post_recv() kernel path optimizations

2010-08-10 Thread Walukiewicz, Miroslaw

I agree with you that changing kernel ABI is not necessary. 
I will follow your directions regarding a single allocation at start. 

Regards,

Mirek 

-Original Message-
From: Roland Dreier [mailto:rdre...@cisco.com] 
Sent: Friday, August 06, 2010 5:58 PM
To: Walukiewicz, Miroslaw
Cc: linux-rdma@vger.kernel.org
Subject: Re: {RFC] ibv_post_send()/ibv_post_recv() kernel path optimizations

 > The proposed path optimization is removing of dynamic allocations 
 > by redefining a structure definition passed to kernel. 

 > To 
 > 
 > struct ibv_post_send {
 > __u32 command;
 > __u16 in_words;
 > __u16 out_words;
 > __u64 response;
 > __u32 qp_handle;
 > __u32 wr_count;
 > __u32 sge_count;
 > __u32 wqe_size;
 > struct ibv_kern_send_wr send_wr[512];
 > };

I don't see how this can possibly work.  Where does the scatter/gather
list go if you make this have a fixed size array of send_wr?

Also I don't see why you need to change the user/kernel ABI at all to
get rid of dynamic allocations... can't you just have the kernel keep a
cached send_wr allocation (say, per user context) and reuse that?  (ie
allocate memory but don't free the first time into post_send, and only
reallocate if a bigger send request comes, and only free when destroying
the context)

 - R.
-- 
Roland Dreier  || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ANNOUNCE] dapl-2.0.30 and compat-dapl-1.2.19 release

2010-08-10 Thread Vladimir Sokolovsky



Vlad, please pull both into OFED 1.5.2 RC4:


Thanks,

-arlin



Done,

Regards,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: i386 allmodconfig, current mainline

2010-08-10 Thread David Miller

From: Andrew Morton 
Date: Mon, 9 Aug 2010 16:43:46 -0700

> drivers/net/wan/farsync.c: In function 'fst_intr_rx':
> drivers/net/wan/farsync.c:1312: warning: cast to pointer from integer of 
> different size
> drivers/net/wan/farsync.c: In function 'do_bottom_half_tx':
> drivers/net/wan/farsync.c:1407: warning: cast to pointer from integer of 
> different size

I'll toss the following into net-2.6:


farsync: Fix compile warnings.

drivers/net/wan/farsync.c: In function 'fst_intr_rx':
drivers/net/wan/farsync.c:1312: warning: cast to pointer from integer of 
different size
drivers/net/wan/farsync.c: In function 'do_bottom_half_tx':
drivers/net/wan/farsync.c:1407: warning: cast to pointer from integer of 
different size

The "skb" and "mem" arguments being passed here are DMA addresses
being programmed into the hardware registers, so pass them as the type
that they actually are.  And use the correct printf formatting in
debug logging statements for these things to match the type change.

Reported-by: Andrew Morton 
Signed-off-by: David S. Miller 
---
 drivers/net/wan/farsync.c |   15 ---
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wan/farsync.c b/drivers/net/wan/farsync.c
index ad7719f..e050bd6 100644
--- a/drivers/net/wan/farsync.c
+++ b/drivers/net/wan/farsync.c
@@ -885,20 +885,21 @@ fst_rx_dma_complete(struct fst_card_info *card, struct 
fst_port_info *port,
  *  Receive a frame through the DMA
  */
 static inline void
-fst_rx_dma(struct fst_card_info *card, unsigned char *skb,
-  unsigned char *mem, int len)
+fst_rx_dma(struct fst_card_info *card, dma_addr_t skb,
+  dma_addr_t mem, int len)
 {
/*
 * This routine will setup the DMA and start it
 */
 
-   dbg(DBG_RX, "In fst_rx_dma %p %p %d\n", skb, mem, len);
+   dbg(DBG_RX, "In fst_rx_dma %lx %lx %d\n",
+   (unsigned long) skb, (unsigned long) mem, len);
if (card->dmarx_in_progress) {
dbg(DBG_ASS, "In fst_rx_dma while dma in progress\n");
}
 
-   outl((unsigned long) skb, card->pci_conf + DMAPADR0);   /* Copy to here 
*/
-   outl((unsigned long) mem, card->pci_conf + DMALADR0);   /* from here */
+   outl(skb, card->pci_conf + DMAPADR0);   /* Copy to here */
+   outl(mem, card->pci_conf + DMALADR0);   /* from here */
outl(len, card->pci_conf + DMASIZ0);/* for this length */
outl(0xc, card->pci_conf + DMADPR0);/* In this direction */
 
@@ -1309,8 +1310,8 @@ fst_intr_rx(struct fst_card_info *card, struct 
fst_port_info *port)
card->dma_port_rx = port;
card->dma_len_rx = len;
card->dma_rxpos = rxp;
-   fst_rx_dma(card, (char *) card->rx_dma_handle_card,
-  (char *) BUF_OFFSET(rxBuffer[pi][rxp][0]), len);
+   fst_rx_dma(card, card->rx_dma_handle_card,
+  BUF_OFFSET(rxBuffer[pi][rxp][0]), len);
}
if (rxp != port->rxpos) {
dbg(DBG_ASS, "About to increment rxpos by more than 1\n");
-- 
1.7.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3 v4b] IB/srp: Reduce number of BUSY conditions

Re: [PATCH 3/3 v4] IB/srp: Reduce number of BUSY conditions

[PATCH] libibumad: Increase the limit of UMAD_MAX_DEVICES

Re: [PATCH] rdma cm + XRC

RE: [PATCH] rdma cm + XRC

Re: [PATCH] rdma cm + XRC

Re: [PATCH] rdma cm + XRC

RE: API for Proposal for adding ib_usa to the Linux Infiniband Subsystem

RE: API for Proposal for adding ib_usa to the Linux Infiniband Subsystem

RE: API for Proposal for adding ib_usa to the Linux Infiniband Subsystem

[PATCH 3/3 v4] IB/srp: Reduce number of BUSY conditions

[PATCH 2/3 v4] IB/srp: Implement SRP_CRED_REQ

[PATCH 1/3 v4] IB/srp: Preparation for transmit ring response allocation

[PATCH 0/3 v4] IB/srp: Add SRP_CRED_REQ support

Re: i386 allmodconfig, current mainline

Re: [PATCH] rdma cm + XRC

Re: yet again the atomic operations

RE: [PATCH] rdma cm + XRC

Re: [PATCH] rdma cm + XRC

RE: [PATCH] rdma cm + XRC

Re: i386 allmodconfig, current mainline

Re: [PATCH] rdma cm + XRC

Re: [PATCH v2] rdma/ib_pack.h: add new bth opcodes

Re: yet again the atomic operations

Re: yet again the atomic operations

Re: [PATCH 2/4] IB/srp: implement SRP_CRED_REQ

RE: {RFC] ibv_post_send()/ibv_post_recv() kernel path optimizations

RE: {RFC] ibv_post_send()/ibv_post_recv() kernel path optimizations

Re: [ANNOUNCE] dapl-2.0.30 and compat-dapl-1.2.19 release

Re: i386 allmodconfig, current mainline

30 matches

Site Navigation

Mail list logo

Footer information