date:20071010

[ofa-general] Congratulations,

2007-10-10 Thread Foundazion Di Vittorio


 
Attn:Winner
Congratulations The Foundazion Di Vittorio has chosenyoubythe board
of trustees as one of the final recipients ofacashGrant/Donation for
your own personal,educational,andbusinessTocelebrate the 30th
anniversary 2007 program,We are giving outayearlydonation
of US$200,000.00 to nd it to the PaymentRemitanceOffice Viaemail
contact BATCH NO40 lucky recipients,ascharitydonations/aid.
fill out below Formse:Batch(N-222-6747,E-900-56)
FullName:..
ResidentialAddress:...
Occupation:..
Country:..
Telephone:..
Fax:..
Number:
Sex:...
age:.
NextofKin:
Winning BatchNo:..
(PaymentRemitanceContact)
MrCalvinoCostantino.
E-Mail:[EMAIL PROTECTED]
http://www.fondazionedivittorio.it





___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] [PATCH] libibverbs/examples: Fixes some issues in the examples files

2007-10-10 Thread Dotan Barak

Fixes the following issues in the examples:
* memory leaks
* warnings reported by valgrind of uninitialized attributes in strcuts

Signed-off-by: Dotan Barak [EMAIL PROTECTED]

---

diff --git a/examples/device_list.c b/examples/device_list.c
index b53d4b1..3ce8cbd 100644
--- a/examples/device_list.c
+++ b/examples/device_list.c
@@ -45,8 +45,9 @@
 int main(int argc, char *argv[])
 {
struct ibv_device **dev_list;
+   int num_devices, i;
 
-   dev_list = ibv_get_device_list(NULL);
+   dev_list = ibv_get_device_list(num_devices);
if (!dev_list) {
fprintf(stderr, No IB devices found\n);
return 1;
@@ -55,12 +56,13 @@ int main(int argc, char *argv[])
printf(%-16s\t   node GUID\n, device);
printf(%-16s\t\n, --);
 
-   while (*dev_list) {
+   for (i = 0; i  num_devices; ++i) {
printf(%-16s\t%016llx\n,
-  ibv_get_device_name(*dev_list),
-  (unsigned long long) 
ntohll(ibv_get_device_guid(*dev_list)));
-   ++dev_list;
+  ibv_get_device_name(dev_list[i]),
+  (unsigned long long) 
ntohll(ibv_get_device_guid(dev_list[i])));
}
 
+   ibv_free_device_list(dev_list);
+
return 0;
 }
diff --git a/examples/devinfo.c b/examples/devinfo.c
index d054999..4e4316a 100644
--- a/examples/devinfo.c
+++ b/examples/devinfo.c
@@ -323,7 +323,7 @@ int main(int argc, char *argv[])
 {
char *ib_devname = NULL;
int ret = 0;
-   struct ibv_device **dev_list;
+   struct ibv_device **dev_list, **orig_dev_list;
int num_of_hcas;
int ib_port = 0;
 
@@ -360,7 +360,7 @@ int main(int argc, char *argv[])
break;
 
case 'l':
-   dev_list = ibv_get_device_list(num_of_hcas);
+   dev_list = orig_dev_list = 
ibv_get_device_list(num_of_hcas);
if (!dev_list) {
fprintf(stderr, Failed to get IB devices 
list);
return -1;
@@ -375,6 +375,9 @@ int main(int argc, char *argv[])
}
 
printf(\n);
+
+   ibv_free_device_list(orig_dev_list);
+
return 0;
 
default:
@@ -383,7 +386,7 @@ int main(int argc, char *argv[])
}
}
 
-   dev_list = ibv_get_device_list(NULL);
+   dev_list = orig_dev_list = ibv_get_device_list(NULL);
if (!dev_list) {
fprintf(stderr, Failed to get IB device list\n);
return -1;
@@ -417,5 +420,7 @@ int main(int argc, char *argv[])
if (ib_devname)
free(ib_devname);
 
+   ibv_free_device_list(orig_dev_list);
+
return ret;
 }
diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c
index 258eb8f..81fd4a6 100644
--- a/examples/rc_pingpong.c
+++ b/examples/rc_pingpong.c
@@ -146,6 +146,7 @@ static struct pingpong_dest *pp_client_exch_dest(const char 
*servername, int por
 
if (n  0) {
fprintf(stderr, %s for %s:%d\n, gai_strerror(n), servername, 
port);
+   free(service);
return NULL;
}
 
@@ -160,6 +161,7 @@ static struct pingpong_dest *pp_client_exch_dest(const char 
*servername, int por
}
 
freeaddrinfo(res);
+   free(service);
 
if (sockfd  0) {
fprintf(stderr, Couldn't connect to %s:%d\n, servername, 
port);
@@ -214,6 +216,7 @@ static struct pingpong_dest *pp_server_exch_dest(struct 
pingpong_context *ctx,
 
if (n  0) {
fprintf(stderr, %s for port %d\n, gai_strerror(n), port);
+   free(service);
return NULL;
}
 
@@ -232,6 +235,7 @@ static struct pingpong_dest *pp_server_exch_dest(struct 
pingpong_context *ctx,
}
 
freeaddrinfo(res);
+   free(service);
 
if (sockfd  0) {
fprintf(stderr, Couldn't listen to port %d\n, port);
@@ -358,12 +362,12 @@ static struct pingpong_context *pp_init_ctx(struct 
ibv_device *ib_dev, int size,
}
 
{
-   struct ibv_qp_attr attr;
-
-   attr.qp_state= IBV_QPS_INIT;
-   attr.pkey_index  = 0;
-   attr.port_num= port;
-   attr.qp_access_flags = 0;
+   struct ibv_qp_attr attr = {
+   .qp_state= IBV_QPS_INIT,
+   .pkey_index  = 0,
+   .port_num= port,
+   .qp_access_flags = 0
+   };
 
if (ibv_modify_qp(ctx-qp, attr,
  IBV_QP_STATE  |
diff --git a/examples/srq_pingpong.c b/examples/srq_pingpong.c
index 490ad0a..91fd566 100644
--- a/examples/srq_pingpong.c
+++ b/examples/srq_pingpong.c
@@ -157,6

Re: [ofa-general] Re: [PATCH 2/3][NET_BATCH] net core use batching

2007-10-10 Thread Herbert Xu

On Wed, Oct 10, 2007 at 11:16:44AM +0200, Andi Kleen wrote:
  A 256 entry TX hw queue fills up trivially on 1GB and 10GB, but if you
 
 With TSO really? 

Hardware queues are generally per-page rather than per-skb so
it'd fill up quicker than a software queue even with TSO.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] [PATCH v5] IB/mlx4: shrinking WQE

2007-10-10 Thread Jack Morgenstein

commit c0aa89f0b295dd0c20b2ff2b1d2eca10cdc84f4b
Author: Michael S. Tsirkin [EMAIL PROTECTED]
Date:   Thu Aug 30 15:51:40 2007 +0300

IB/mlx4: shrinking WQE

ConnectX supports shrinking wqe, such that a single WR can include
multiple units of wqe_shift.  This way, WRs can differ in size, and
do not have to be a power of 2 in size, saving memory and speeding up
send WR posting.  Unfortunately, if we do this wqe_index field in CQE
can't be used to look up the WR ID anymore, so do this only if
selective signalling is off.

Further, on 32-bit platforms, we can't use vmap to make
the QP buffer virtually contigious. Thus we have to use
constant-sized WRs to make sure a WR is always fully within
a single page-sized chunk.

Finally, we use WR with NOP opcode to avoid wrap-around
in the middle of WR. We set NoErrorCompletion bit to avoid getting
completions with error for NOP WRs. Since NEC is only supported
starting with firmware 2.2.232, we use constant-sized WRs
for older firmware. And, since MLX QPs only support SEND, we use
constant-sized WRs in this case.

Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]

---

Changes since v4: fix calls to stamp_send_wqe, and stamping placement
  inside post_nop_wqe.
Found by regression, fixed by Jack Morgenstein. 
Changes since v3: fix nop formatting.
Found by Eli Cohen.
Changes since v2: fix memory leak in mlx4_buf_alloc.
Found by internal code review.
changes since v1: add missing patch hunks

Index: infiniband/drivers/infiniband/hw/mlx4/cq.c
===
--- infiniband.orig/drivers/infiniband/hw/mlx4/cq.c 2007-10-10 
17:12:05.184757000 +0200
+++ infiniband/drivers/infiniband/hw/mlx4/cq.c  2007-10-10 17:23:02.33714 
+0200
@@ -331,6 +331,12 @@ static int mlx4_ib_poll_one(struct mlx4_
is_error = (cqe-owner_sr_opcode  MLX4_CQE_OPCODE_MASK) ==
MLX4_CQE_OPCODE_ERROR;
 
+   if (unlikely((cqe-owner_sr_opcode  MLX4_CQE_OPCODE_MASK) == 
MLX4_OPCODE_NOP 
+is_send)) {
+   printk(KERN_WARNING Completion for NOP opcode detected!\n);
+   return -EINVAL;
+   }
+
if (!*cur_qp ||
(be32_to_cpu(cqe-my_qpn)  0xff) != (*cur_qp)-mqp.qpn) {
/*
@@ -353,8 +359,10 @@ static int mlx4_ib_poll_one(struct mlx4_
 
if (is_send) {
wq = (*cur_qp)-sq;
-   wqe_ctr = be16_to_cpu(cqe-wqe_index);
-   wq-tail += (u16) (wqe_ctr - (u16) wq-tail);
+   if (!(*cur_qp)-sq_signal_bits) {
+   wqe_ctr = be16_to_cpu(cqe-wqe_index);
+   wq-tail += (u16) (wqe_ctr - (u16) wq-tail);
+   }
wc-wr_id = wq-wrid[wq-tail  (wq-wqe_cnt - 1)];
++wq-tail;
} else if ((*cur_qp)-ibqp.srq) {
Index: infiniband/drivers/infiniband/hw/mlx4/mlx4_ib.h
===
--- infiniband.orig/drivers/infiniband/hw/mlx4/mlx4_ib.h2007-10-10 
17:21:17.844882000 +0200
+++ infiniband/drivers/infiniband/hw/mlx4/mlx4_ib.h 2007-10-10 
17:23:02.341138000 +0200
@@ -120,6 +120,8 @@ struct mlx4_ib_qp {
 
u32 doorbell_qpn;
__be32  sq_signal_bits;
+   unsignedsq_next_wqe;
+   int sq_max_wqes_per_wr;
int sq_spare_wqes;
struct mlx4_ib_wq   sq;
 
Index: infiniband/drivers/infiniband/hw/mlx4/qp.c
===
--- infiniband.orig/drivers/infiniband/hw/mlx4/qp.c 2007-10-10 
17:21:17.853882000 +0200
+++ infiniband/drivers/infiniband/hw/mlx4/qp.c  2007-10-10 17:23:02.350137000 
+0200
@@ -30,6 +30,7 @@
  * SOFTWARE.
  */
 
+#include linux/log2.h
 #include rdma/ib_cache.h
 #include rdma/ib_pack.h
 
@@ -92,7 +93,7 @@ static int is_qp0(struct mlx4_ib_dev *de
 
 static void *get_wqe(struct mlx4_ib_qp *qp, int offset)
 {
-   if (qp-buf.nbufs == 1)
+   if (BITS_PER_LONG == 64 || qp-buf.nbufs == 1)
return qp-buf.u.direct.buf + offset;
else
return qp-buf.u.page_list[offset  PAGE_SHIFT].buf +
@@ -111,16 +112,88 @@ static void *get_send_wqe(struct mlx4_ib
 
 /*
  * Stamp a SQ WQE so that it is invalid if prefetched by marking the
- * first four bytes of every 64 byte chunk with 0x, except for
- * the very first chunk of the WQE.
+ * first four bytes of every 64 byte chunk with
+ * 0x7FF | (invalid_ownership_value  31).
+ *
+ * When max WR is than or equal to the WQE size,
+ * as an optimization, we can stamp WQE with 0x,
+ * and skip the very first chunk of the WQE.
  */
-static void stamp_send_wqe(struct mlx4_ib_qp *qp, int n)
+static void stamp_send_wqe(struct mlx4_ib_qp *qp, int n, int size)
 {
-   u32 *wqe = get_send_wqe(qp, n);

RE: [ofa-general] [PATCH v5] IB/mlx4: shrinking WQE

2007-10-10 Thread Tang, Changqing


Can you provide sample code to use these new features ?

--CQ
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Jack Morgenstein
 Sent: Wednesday, October 10, 2007 10:44 AM
 To: general@lists.openfabrics.org
 Cc: Roland Dreier
 Subject: [ofa-general] [PATCH v5] IB/mlx4: shrinking WQE
 
 commit c0aa89f0b295dd0c20b2ff2b1d2eca10cdc84f4b
 Author: Michael S. Tsirkin [EMAIL PROTECTED]
 Date:   Thu Aug 30 15:51:40 2007 +0300
 
 IB/mlx4: shrinking WQE
 
 ConnectX supports shrinking wqe, such that a single WR can include
 multiple units of wqe_shift.  This way, WRs can differ in 
 size, and
 do not have to be a power of 2 in size, saving memory and 
 speeding up
 send WR posting.  Unfortunately, if we do this wqe_index 
 field in CQE
 can't be used to look up the WR ID anymore, so do this only if
 selective signalling is off.
 
 Further, on 32-bit platforms, we can't use vmap to make
 the QP buffer virtually contigious. Thus we have to use
 constant-sized WRs to make sure a WR is always fully within
 a single page-sized chunk.
 
 Finally, we use WR with NOP opcode to avoid wrap-around
 in the middle of WR. We set NoErrorCompletion bit to avoid getting
 completions with error for NOP WRs. Since NEC is only supported
 starting with firmware 2.2.232, we use constant-sized WRs
 for older firmware. And, since MLX QPs only support SEND, we use
 constant-sized WRs in this case.
 
 Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]
 
 ---
 
 Changes since v4: fix calls to stamp_send_wqe, and stamping placement
   inside post_nop_wqe.
 Found by regression, fixed by Jack Morgenstein. 
 Changes since v3: fix nop formatting.
 Found by Eli Cohen.
 Changes since v2: fix memory leak in mlx4_buf_alloc.
 Found by internal code review.
 changes since v1: add missing patch hunks
 
 Index: infiniband/drivers/infiniband/hw/mlx4/cq.c
 ===
 --- infiniband.orig/drivers/infiniband/hw/mlx4/cq.c   
 2007-10-10 17:12:05.184757000 +0200
 +++ infiniband/drivers/infiniband/hw/mlx4/cq.c
 2007-10-10 17:23:02.33714 +0200
 @@ -331,6 +331,12 @@ static int mlx4_ib_poll_one(struct mlx4_
   is_error = (cqe-owner_sr_opcode  MLX4_CQE_OPCODE_MASK) ==
   MLX4_CQE_OPCODE_ERROR;
  
 + if (unlikely((cqe-owner_sr_opcode  
 MLX4_CQE_OPCODE_MASK) == MLX4_OPCODE_NOP 
 +  is_send)) {
 + printk(KERN_WARNING Completion for NOP opcode 
 detected!\n);
 + return -EINVAL;
 + }
 +
   if (!*cur_qp ||
   (be32_to_cpu(cqe-my_qpn)  0xff) != 
 (*cur_qp)-mqp.qpn) {
   /*
 @@ -353,8 +359,10 @@ static int mlx4_ib_poll_one(struct mlx4_
  
   if (is_send) {
   wq = (*cur_qp)-sq;
 - wqe_ctr = be16_to_cpu(cqe-wqe_index);
 - wq-tail += (u16) (wqe_ctr - (u16) wq-tail);
 + if (!(*cur_qp)-sq_signal_bits) {
 + wqe_ctr = be16_to_cpu(cqe-wqe_index);
 + wq-tail += (u16) (wqe_ctr - (u16) wq-tail);
 + }
   wc-wr_id = wq-wrid[wq-tail  (wq-wqe_cnt - 1)];
   ++wq-tail;
   } else if ((*cur_qp)-ibqp.srq) {
 Index: infiniband/drivers/infiniband/hw/mlx4/mlx4_ib.h
 ===
 --- infiniband.orig/drivers/infiniband/hw/mlx4/mlx4_ib.h  
 2007-10-10 17:21:17.844882000 +0200
 +++ infiniband/drivers/infiniband/hw/mlx4/mlx4_ib.h   
 2007-10-10 17:23:02.341138000 +0200
 @@ -120,6 +120,8 @@ struct mlx4_ib_qp {
  
   u32 doorbell_qpn;
   __be32  sq_signal_bits;
 + unsignedsq_next_wqe;
 + int sq_max_wqes_per_wr;
   int sq_spare_wqes;
   struct mlx4_ib_wq   sq;
  
 Index: infiniband/drivers/infiniband/hw/mlx4/qp.c
 ===
 --- infiniband.orig/drivers/infiniband/hw/mlx4/qp.c   
 2007-10-10 17:21:17.853882000 +0200
 +++ infiniband/drivers/infiniband/hw/mlx4/qp.c
 2007-10-10 17:23:02.350137000 +0200
 @@ -30,6 +30,7 @@
   * SOFTWARE.
   */
  
 +#include linux/log2.h
  #include rdma/ib_cache.h
  #include rdma/ib_pack.h
  
 @@ -92,7 +93,7 @@ static int is_qp0(struct mlx4_ib_dev *de
  
  static void *get_wqe(struct mlx4_ib_qp *qp, int offset)  {
 - if (qp-buf.nbufs == 1)
 + if (BITS_PER_LONG == 64 || qp-buf.nbufs == 1)
   return qp-buf.u.direct.buf + offset;
   else
   return qp-buf.u.page_list[offset  
 PAGE_SHIFT].buf + @@ -111,16 +112,88 @@ static void 
 *get_send_wqe(struct mlx4_ib
  
  /*
   * Stamp a SQ WQE so that it is invalid if prefetched by marking the
 - * first four bytes of every 64 byte chunk with 0x, 
 except for
 - * the very first chunk of the WQE.
 + * first four bytes

Re: [ofa-general] [PATCH v5] IB/mlx4: shrinking WQE

2007-10-10 Thread Roland Dreier

  Can you provide sample code to use these new features ?

There are no new features, it's purely an internal driver optimization.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [ofa-general] [PATCH 2/3]: IB/mthca: allow lockless SRQ

2007-10-10 Thread Tang, Changqing



Can give a few more words about lockless SRQ ?  Thanks

--CQ 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Eli Cohen
 Sent: Wednesday, October 10, 2007 10:56 AM
 To: Roland Dreier
 Cc: openfabrics
 Subject: [ofa-general] [PATCH 2/3]: IB/mthca: allow lockless SRQ
 
 Add support to mthca for lockless SRQ
 
 Signed-off-by: Eli Cohen [EMAIL PROTECTED]
 
 ---
 
 Index: ofa_kernel-1.2.5/drivers/infiniband/hw/mthca/mthca_srq.c
 ===
 --- 
 ofa_kernel-1.2.5.orig/drivers/infiniband/hw/mthca/mthca_srq.c 
 2007-10-10 15:18:40.0 +0200
 +++ ofa_kernel-1.2.5/drivers/infiniband/hw/mthca/mthca_srq.c  
 2007-10-10 15:24:05.0 +0200
 @@ -394,6 +394,9 @@ int mthca_modify_srq(struct ib_srq *ibsr
   return -EINVAL;
   }
  
 + if (attr_mask  IB_SRQ_LOCKNESS)
 + srq-use_lock = !!attr-use_lock;
 +
   return 0;
  }
  
 @@ -473,7 +476,8 @@ void mthca_free_srq_wqe(struct mthca_srq
  
   ind = wqe_addr  srq-wqe_shift;
  
 - spin_lock(srq-lock);
 + if (srq-use_lock)
 + spin_lock(srq-lock);
  
   if (likely(srq-first_free = 0))
   *wqe_to_link(get_wqe(srq, srq-last_free)) = 
 ind; @@ -483,7 +487,8 @@ void mthca_free_srq_wqe(struct mthca_srq
   *wqe_to_link(get_wqe(srq, ind)) = -1;
   srq-last_free = ind;
  
 - spin_unlock(srq-lock);
 + if (srq-use_lock)
 + spin_unlock(srq-lock);
  }
  
  int mthca_tavor_post_srq_recv(struct ib_srq *ibsrq, struct 
 ib_recv_wr *wr, @@ -502,7 +507,8 @@ int 
 mthca_tavor_post_srq_recv(struct ib_
   void *wqe;
   void *prev_wqe;
  
 - spin_lock_irqsave(srq-lock, flags);
 + if (srq-use_lock)
 + spin_lock_irqsave(srq-lock, flags);
  
   first_ind = srq-first_free;
  
 @@ -609,7 +615,9 @@ int mthca_tavor_post_srq_recv(struct ib_
*/
   mmiowb();
  
 - spin_unlock_irqrestore(srq-lock, flags);
 + if (srq-use_lock)
 + spin_unlock_irqrestore(srq-lock, flags);
 +
   return err;
  }
  
 @@ -626,7 +634,8 @@ int mthca_arbel_post_srq_recv(struct ib_
   int i;
   void *wqe;
  
 - spin_lock_irqsave(srq-lock, flags);
 + if (srq-use_lock)
 + spin_lock_irqsave(srq-lock, flags);
  
   for (nreq = 0; wr; ++nreq, wr = wr-next) {
   ind = srq-first_free;
 @@ -692,7 +701,9 @@ int mthca_arbel_post_srq_recv(struct ib_
   *srq-db = cpu_to_be32(srq-counter);
   }
  
 - spin_unlock_irqrestore(srq-lock, flags);
 + if (srq-use_lock)
 + spin_unlock_irqrestore(srq-lock, flags);
 +
   return err;
  }
  
 Index: ofa_kernel-1.2.5/drivers/infiniband/hw/mthca/mthca_provider.h
 ===
 --- 
 ofa_kernel-1.2.5.orig/drivers/infiniband/hw/mthca/mthca_pro
 vider.h   2007-10-10 15:10:22.0 +0200
 +++ 
 ofa_kernel-1.2.5/drivers/infiniband/hw/mthca/mthca_provider.h 
 2007-10-10 15:24:05.0 +0200
 @@ -222,6 +222,7 @@ struct mthca_cq {
  struct mthca_srq {
   struct ib_srq   ibsrq;
   spinlock_t  lock;
 + int use_lock;
   int refcount;
   int srqn;
   int max;
 
 ___
 general mailing list
 general@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
 
 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general
 
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [ofa-general] [PATCH 2/3]: IB/mthca: allow lockless SRQ

2007-10-10 Thread Eli Cohen


 Can give a few more words about lockless SRQ ?  Thanks
 

The idea is that if the consumer know that calls to ib_poll_cq and
ib_post_srq_recv are serialize than you don't need to use a spinlock to
serialize access to the SRQ's data structures.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [ofa-general] Re: [PATCH 2/3][NET_BATCH] net core use batching

2007-10-10 Thread Waskiewicz Jr, Peter P

 -Original Message-
 From: Andi Kleen [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, October 10, 2007 9:02 AM
 To: Waskiewicz Jr, Peter P
 Cc: David Miller; [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
 general@lists.openfabrics.org; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]
 Subject: Re: [ofa-general] Re: [PATCH 2/3][NET_BATCH] net 
 core use batching

  We've done similar testing with ixgbe to push maximum descriptor 
  counts, and we lost performance very quickly in the same 
 range you're 
  quoting on NIU.

 Did you try it with WC writes to the ring or CLFLUSH?

 -Andi

Hmm, I think it might be slightly different, but it still shows queue
depth vs. performance.  I was actually referring to how many descriptors
we can represent a packet with before it becomes a problem wrt
performance.  This morning I tried to actually push my ixgbe NIC hard
enough to come close to filling the ring with packets (384-byte
packets), and even on my 8-core Xeon I can't do it.  My system can't
generate enough I/O to fill the hardware queues before CPUs max out.

-PJ Waskiewicz
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] [PATCH 1/3]: IB/core: allow lockless SRQ

2007-10-10 Thread Sean Hefty


Eli Cohen wrote:

Allow to modify a SRQ to be lockless

This patch allow the consumer to call ib_modify_srq and specify
whether the SRQ is lockless or not.


I would think this needs to be specified at SRQ creation time.

Otherwise, you can end up with a race where the SRQ is modified to/from 
lockless while in a call, resulting in either not releasing a lock, or 
releasing one that wasn't acquired.


- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] Re: [PATCH 1/3]: IB/core: allow lockless SRQ

2007-10-10 Thread Roland Dreier

I don't think we really want to go down this route.  There are too
many subtleties in locking that consumers would have to worry about,
and I don't think anyone would ever get it right.

 - R.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] [PATCH] fix some ehca limits

2007-10-10 Thread Anton Blanchard


Hi Roland,

 I didn't see a response to my earlier email about the other uses of
 min_t(int, x, INT_MAX) so I fixed it up myself and added this to my
 tree.  I don't have a working setup to test yet so please let me know
 if you see anything wrong with this:

Thanks for doing this, sorry I didnt get back to you. I pulled your tree
and it tested out fine:

max_cqe:2147483647
max_pd: 2147483647
max_ah: 2147483647

Acked-by: Anton Blanchard [EMAIL PROTECTED]

Anton

 commit 919225e60a1a73e3518f257f040f74e9379a61c3
 Author: Roland Dreier [EMAIL PROTECTED]
 Date:   Tue Oct 9 13:17:42 2007 -0700
 
 IB/ehca: Fix clipping of device limits to INT_MAX
 
 Doing min_t(int, foo, INT_MAX) doesn't work correctly, because if foo
 is bigger than INT_MAX, then when treated as a signed integer, it will
 become negative and hence such an expression is just an elaborate NOP.
 
 Fix such cases in ehca to do min_t(unsigned, foo, INT_MAX) instead.
 This fixes negative reported values for max_cqe, max_pd and max_ah:
 
 Before:
 
 max_cqe:-64
 max_pd: -1
 max_ah: -1
 
 After:
 max_cqe:2147483647
 max_pd: 2147483647
 max_ah: 2147483647
 
 Based on a bug report and fix from Anton Blanchard [EMAIL PROTECTED].
 
 Signed-off-by: Roland Dreier [EMAIL PROTECTED]
 
 diff --git a/drivers/infiniband/hw/ehca/ehca_hca.c 
 b/drivers/infiniband/hw/ehca/ehca_hca.c
 index 3436c49..4aa3ffa 100644
 --- a/drivers/infiniband/hw/ehca/ehca_hca.c
 +++ b/drivers/infiniband/hw/ehca/ehca_hca.c
 @@ -82,17 +82,17 @@ int ehca_query_device(struct ib_device *ibdev, struct 
 ib_device_attr *props)
   props-vendor_id   = rblock-vendor_id  8;
   props-vendor_part_id  = rblock-vendor_part_id  16;
   props-hw_ver  = rblock-hw_ver;
 - props-max_qp  = min_t(int, rblock-max_qp, INT_MAX);
 - props-max_qp_wr   = min_t(int, rblock-max_wqes_wq, INT_MAX);
 - props-max_sge = min_t(int, rblock-max_sge, INT_MAX);
 - props-max_sge_rd  = min_t(int, rblock-max_sge_rd, INT_MAX);
 - props-max_cq  = min_t(int, rblock-max_cq, INT_MAX);
 - props-max_cqe = min_t(int, rblock-max_cqe, INT_MAX);
 - props-max_mr  = min_t(int, rblock-max_mr, INT_MAX);
 - props-max_mw  = min_t(int, rblock-max_mw, INT_MAX);
 - props-max_pd  = min_t(int, rblock-max_pd, INT_MAX);
 - props-max_ah  = min_t(int, rblock-max_ah, INT_MAX);
 - props-max_fmr = min_t(int, rblock-max_mr, INT_MAX);
 + props-max_qp  = min_t(unsigned, rblock-max_qp, INT_MAX);
 + props-max_qp_wr   = min_t(unsigned, rblock-max_wqes_wq, INT_MAX);
 + props-max_sge = min_t(unsigned, rblock-max_sge, INT_MAX);
 + props-max_sge_rd  = min_t(unsigned, rblock-max_sge_rd, INT_MAX);
 + props-max_cq  = min_t(unsigned, rblock-max_cq, INT_MAX);
 + props-max_cqe = min_t(unsigned, rblock-max_cqe, INT_MAX);
 + props-max_mr  = min_t(unsigned, rblock-max_mr, INT_MAX);
 + props-max_mw  = min_t(unsigned, rblock-max_mw, INT_MAX);
 + props-max_pd  = min_t(unsigned, rblock-max_pd, INT_MAX);
 + props-max_ah  = min_t(unsigned, rblock-max_ah, INT_MAX);
 + props-max_fmr = min_t(unsigned, rblock-max_mr, INT_MAX);
  
   if (EHCA_BMASK_GET(HCA_CAP_SRQ, shca-hca_cap)) {
   props-max_srq = props-max_qp;
 @@ -104,15 +104,15 @@ int ehca_query_device(struct ib_device *ibdev, struct 
 ib_device_attr *props)
   props-local_ca_ack_delay
   = rblock-local_ca_ack_delay;
   props-max_raw_ipv6_qp
 - = min_t(int, rblock-max_raw_ipv6_qp, INT_MAX);
 + = min_t(unsigned, rblock-max_raw_ipv6_qp, INT_MAX);
   props-max_raw_ethy_qp
 - = min_t(int, rblock-max_raw_ethy_qp, INT_MAX);
 + = min_t(unsigned, rblock-max_raw_ethy_qp, INT_MAX);
   props-max_mcast_grp
 - = min_t(int, rblock-max_mcast_grp, INT_MAX);
 + = min_t(unsigned, rblock-max_mcast_grp, INT_MAX);
   props-max_mcast_qp_attach
 - = min_t(int, rblock-max_mcast_qp_attach, INT_MAX);
 + = min_t(unsigned, rblock-max_mcast_qp_attach, INT_MAX);
   props-max_total_mcast_qp_attach
 - = min_t(int, rblock-max_total_mcast_qp_attach, INT_MAX);
 + = min_t(unsigned, rblock-max_total_mcast_qp_attach, INT_MAX);
  
   /* translate device capabilities */
   props-device_cap_flags = IB_DEVICE_SYS_IMAGE_GUID |
___
general mailing list
general@lists.openfabrics.org

[ofa-general] Re: [PATCH] IB/ipoib: optimize receive flow

2007-10-10 Thread Roland Dreier

  -if (!likely(wr_id  IPOIB_CM_RX_UPDATE_MASK)) {
  +if (unlikely(wr_id  IPOIB_CM_RX_UPDATE_MASK)) {

This looks dubious -- you've reversed the sense of the test here.

if (!likely(foo))

should be converted to

if (unlikely(!foo))

instead.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] IPOB CM (NOSRQ) [PATCH V9] patch

2007-10-10 Thread Sean Hefty


I don't think we want the qp_type to be a module parameter -- it seems
we already have ud vs. rc handled via the parameter that enables
connected mode, and if we want to enable uc we should do that in a
similar per-interface way.

Similarly if there's any point to making use_srq something that can be
controlled, ideally it should be per-interface.  But this could be
tricky because it may be hard to change at runtime.

(Ideally max_conn_qp would be per-interface too but that seems too
hard as well)


I agree that these should be per interface.  They may be difficult to 
change at runtime without reseting all connections, but as the person 
not coding it, I would think it would be doable.  What happens now when 
dynamically switching between UD or CM mode?


- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] Re: [PATCH] IB/ipoib: optimize receive flow

2007-10-10 Thread Roland Dreier

  This patch tries to reduce the number of accesses to the skb
  object and save CPU cycles and cache misses.

Does it succeed?  Did you measure the performance, or look at the
generated code to confirm that it helps?

 - R.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] Re: [PATCH] IB/ipoib: optimize receive flow

2007-10-10 Thread Eli Cohen

On Wed, 2007-10-10 at 09:58 -0700, Roland Dreier wrote:
  -   if (!likely(wr_id  IPOIB_CM_RX_UPDATE_MASK)) {
   +  if (unlikely(wr_id  IPOIB_CM_RX_UPDATE_MASK)) {
 
 This looks dubious -- you've reversed the sense of the test here.
 
   if (!likely(foo))
 
 should be converted to
 
   if (unlikely(!foo))
 
 instead.

Sure, you're right.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] Re: [PATCH] IB/ipoib: optimize receive flow

2007-10-10 Thread Eli Cohen

 Does it succeed?  Did you measure the performance, or look at the
 generated code to confirm that it helps?
 

Actually I ran oprofile and saw that this reduces the time spent on
skb_put_frags() (from 14.6% to 11.6% in the test I did).

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] question regarding umad_recv

2007-10-10 Thread Hal Rosenstock

On Wed, 2007-10-10 at 15:03 +0530, Sumit Gaur - Sun Microsystem wrote:
 Hi,
 I am using madrpc_init which in turn calling umad_register(). There is no 
 problem in sending and receiving data. Only problem comes when two separate 
 user 
 threads(one for SMI recv and another for GSI recv) are trying to recv data 
 using 
   mad_receive(0, timeout) function simultaneously. I get SMI mad in GSI 
 thread 
 and vice versa  sometimes. How to get rid of this problem as mad_receive has 
 no 
 control of qp selection.

There is no per thread demuxing. You would need two different mad agents
to do this with one looking at the SMI side and the other the GSI side.
I haven't looked at libibmad in terms of using this model though.

-- Hal

 
 Thanks and Regards
 sumit
 
 
 Hal Rosenstock wrote:
  On Tue, 2007-10-09 at 13:01 +0530, Sumit Gaur - Sun Microsystem wrote:
  
 Hi,
 
 It is regarding *umad_recv* function of libibumad/src/umad.c file. Is it 
 not 
 possible to recv MAD specific to GSI or SMI type. As per my impression if I 
 have 
 two separate threads to send and receive then I could send MADs to 
 different qp 
 0 or 1 depend on GSI and SMI MAD. But receiving has no control over it. 
 Please 
 suggest if there is any workaround for it.
  
  
  See umad_register().
  
  -- Hal
  
  
 Thanks and Regards
 sumit
 ___
 general mailing list
 general@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
 
 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] IPOB CM (NOSRQ) [PATCH V9] patch

2007-10-10 Thread Pradeep Satyanarayana

Roland Dreier wrote:
   We discussed this previously and had agreed upon limiting the memory
   foot print to 1GB by default. This module parameter was for larger
   systems that had plenty of memory and could afford to use more.
   This way the sys admin could increase the limit.
 
 The problem is that increasing the memory limit doesn't necessarily do
 anything.  The admin would also have to raise the limit on the number
 of QPs.  So why not just limit the number of QPs?
 

Yes, the admin would have to increase the number of QPs as well. However, 
increasing the number of QPs only does not give a picture to the admin as 
to how much memory is being used. This way he is able to tune the system to
use resources the way he would want to control.

Pradeep

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [ofa-general] question regarding umad_recv

2007-10-10 Thread Sean Hefty

There is no per thread demuxing. You would need two different mad agents
to do this with one looking at the SMI side and the other the GSI side.
I haven't looked at libibmad in terms of using this model though.

umad_receive() doesn't take the mad_agent as an input parameter.  The only
possibility I see is calling umad_open_port() twice for the same port, with the
GSI/SMI registrations going to separate port_id's.

- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] IPOB CM (NOSRQ) [PATCH V9] patch

2007-10-10 Thread Pradeep Satyanarayana

Sean Hefty wrote:
 Yes, the admin would have to increase the number of QPs as well.
 However, increasing the number of QPs only does not give a picture to
 the admin as to how much memory is being used. This way he is able to
 tune the system to
 use resources the way he would want to control.
 
 How about providing some way for the admin to see current and maximum
 memory usage that ipoib could consume based on the current QP and RQ
 settings?
 
 You're using the memory limit to restrict the number of QPs to less than
 what the user requested.  It could instead have been used to restrict
 the size of the receive queue, or both.  Having the extra parameter can
 be confusing.  Consider an admin increasing the RQ size only to find
 that they end up with fewer QPs.

Yes, the admin could run into the problem that you describe. That is exactly
why we have these as module parameters. It gives him/her the flexibility.

I am thinking tht we are seeing this differently. I don't view that as a 
problem, 
but us usefulness.

Pradeep

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] RLIMIT_MEMLOCK

2007-10-10 Thread Steve Wise

I usually have to add something in /etc/init.d/ssh* and restart the ssh 
daemon...




Adam Miller wrote:
We have run into this problem with using mpiexec.  SLES 10 is on the 
cluster and we have set the limits under /etc/security/limits.conf and 
they work there, even when we run mpirun commands work fine but when 
tying them all in using mpiexec it still comes back with the 32K limit 
in memory.


Any and all users can log in and in bash type ulimit -a and tcsh type 
limit and both state the correct full memory limits, but when using 
mpiexec under both shells they get the 32k limit.


Any suggestions?

thanks


___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] Re: [PATCH] IB/mthca: optimize post srq

2007-10-10 Thread Roland Dreier

It makes sense to mark the error paths as unlikely(), so I applied this.

 If this approach is accepted I can do the same for mlx4

I just looked a the mlx4 code -- it seems I already marked the error
paths as unlikely in the post srq recv function.  So I don't think
there's anything to do there.

 - R.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] Re: [PATCH 04/23] IB/ipath - Verify host bus bandwidth to chip will not limit performance

2007-10-10 Thread Roland Dreier

thanks, I merged this on top to simplify the error path and fix a
memory leak:


diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c 
b/drivers/infiniband/hw/ipath/ipath_driver.c
index 8fa2bb5..f83fb03 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -305,8 +305,7 @@ static void ipath_verify_pioperf(struct ipath_devdata *dd)
if (!piobuf) {
dev_info(dd-pcidev-dev,
No PIObufs for checking perf, skipping\n);
-   goto done;
-
+   return;
}
 
/*
@@ -358,9 +357,12 @@ static void ipath_verify_pioperf(struct ipath_devdata *dd)
lcnt / (u32) emsecs);
 
preempt_enable();
+
+   vfree(addr);
+
 done:
-   if (piobuf) /* disarm it, so it's available again */
-   ipath_disarm_piobufs(dd, pbnum, 1);
+   /* disarm piobuf, so it's available again */
+   ipath_disarm_piobufs(dd, pbnum, 1);
 }
 
 static int __devinit ipath_init_one(struct pci_dev *pdev,
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] [PATCH] infiniband/core: Enable loopback of DR SMP responses from userspace

2007-10-10 Thread Hal Rosenstock

On Wed, 2007-10-10 at 11:31 -0700, Sean Hefty wrote:
 Does anyone know what happened with this patch?  Steve?
 
 I last remember a couple of minor changes being requested, but that was it.

Yes, we both requested some minor changes and no revised patch was
issued AFAIK. There's also the related mthca router mode patch too which
so far is lacking comment.

-- Hal

 - Sean
 
diff --git a/drivers/infiniband/core/mad.c 
  b/drivers/infiniband/core/mad.c
index 6f42877..9ec910b 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -701,7 +701,8 @@ static int handle_outgoing_dr_smp(struct 
  ib_mad_agent_private *mad_agent_priv,
 }
 
 /* Check to post send on QP or process locally */
-if (smi_check_local_smp(smp, device) == IB_SMI_DISCARD)
+if (smi_check_local_smp(smp, device) == IB_SMI_DISCARD 
+smi_check_local_resp_smp(smp, device) == IB_SMI_DISCARD)
 goto out;
 
 local = kmalloc(sizeof *local, GFP_ATOMIC);
@@ -754,6 +755,7 @@ static int handle_outgoing_dr_smp(struct 
  ib_mad_agent_private *mad_agent_priv,
 if (port_priv) {
 mad_priv-mad.mad.mad_hdr.tid =
 ((struct ib_mad *)smp)-mad_hdr.tid;
+memcpy(mad_priv-mad.mad, smp, sizeof(struct 
  ib_mad));
 recv_mad_agent = find_mad_agent(port_priv,
 
  mad_priv-mad.mad);
 }
diff --git a/drivers/infiniband/core/smi.h 
  b/drivers/infiniband/core/smi.h
index 1cfc298..d96fc8e 100644
--- a/drivers/infiniband/core/smi.h
+++ b/drivers/infiniband/core/smi.h
@@ -71,4 +71,18 @@ static inline enum smi_action 
  smi_check_local_smp(struct ib_smp *smp,
 (smp-hop_ptr == smp-hop_cnt + 1)) ?
 IB_SMI_HANDLE : IB_SMI_DISCARD);
 }
+
+/*
+ * Return 1 if the SMP response should be handled by the local 
  management stack
+ */
+static inline enum smi_action smi_check_local_resp_smp(struct ib_smp 
  *smp,
+   struct ib_device 
  *device)
+{
+/* C14-13:3 -- We're at the end of the DR segment of path */
+/* C14-13:4 -- Hop Pointer == 0 - give to SM */
+return ((device-process_mad 
+ib_get_smp_direction(smp) 
+!smp-hop_ptr) ? IB_SMI_HANDLE : IB_SMI_DISCARD);
+}
+
 #endif  /* __SMI_H_ */
  ___
  general mailing list
  general@lists.openfabrics.org
  http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
  
  To unsubscribe, please visit 
  http://openib.org/mailman/listinfo/openib-general
  
 ___
 general mailing list
 general@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] IPOB CM (NOSRQ) [PATCH V9] patch

2007-10-10 Thread Sean Hefty


Yes, the admin could run into the problem that you describe. That is exactly
why we have these as module parameters. It gives him/her the flexibility.


But it doesn't give additional flexibility, and makes it more difficult.

Increasing this value by itself may not do anything unless the admin 
also increase max QPs / RQ size / mtu.  Similarly, increasing max QP / 
RQ size / mtu may not work without also increasing this value.  Multiple 
values need to be manipulated.


Decreasing this value can have the side effect of limiting max QP.  This 
side effect is arbitrary.


And even if this value is left unchanged, the results of changing other 
parameters is unknown.


The only sure way that the admin can know what will happen is to 
understand the relationship that max QP / RQ size / mtu have on memory 
use.  This parameter doesn't remove that need and makes the relationship 
between them show up in confusing ways.


If admins want some way of limiting how much memory is consumed by 
ipoib, then how about creating a simple userspace app to convert their 
request into the proper kernel settings?  This way, the policy is kept 
in userspace, rather than hard-coded in the kernel driver.


- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-10-10 Thread Sean Hefty

The hack to use a socket and bind it to claim the port was just for 
demostrating the idea.  The correct solution, IMO, is to enhance the 
core low level 4-tuple allocation services to be more generic (eg: not 
be tied to a struct sock).  Then the host tcp stack and the host rdma 
stack can allocate TCP/iWARP ports/4tuples from this common exported 
service and share the port space.  This allocation service could also be 
used by other deep adapters like iscsi adapters if needed.


Since iWarp runs on top of TCP, the port space is really the same. 
FWIW, I agree that this proposal is the correct solution to support iWarp.


- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] Re: [PATCH 2.6.24] rdma/cm: fix deadlock destroying listen requests

2007-10-10 Thread Kanoj Sarcar


Sean Hefty wrote:

Just so I understand, did you discover problems (maybe preexisting 
race conditions) with my previously posted patch? If yes, please 
point it out, so its easier to review yours; if not, I will assume 
your patch implements a better locking scheme and review it as such.





Sean,

I looked over your patch for a while.

Agreed, your patch fixes a race condition that my patch had exposed (I 
had analyzed the sequence wildcard destruct getting to a device listener 
before a racing device removal could, but not the reverse order).


I do have some issues though:

* in your patch, I suggest taking out the warning printk from 
cma_listen_on_dev() when the listener create attempt fails; it might be 
that the device is out of resources etc. Since the code takes care of 
this situation pretty well, I don't see a need for the printk.


* I don't see a reason for the internal_id and the device listeners 
getting a refcount on the wildcard listener. Because, even without 
these, it is guaranteed that the wildcard listener will exist at least 
as long as any of the children device listener's are around, by looking 
at the logic in rdma_destroy_id(). Can you provide some logic for 
requring this then?


* not that I am very worried (and I suggesting resolving this thru 
another subsequent patch if it is really a problem), but I think device 
removal is still racy wrt non wildcard listeners. Here's the sequence: 
cma_process_remove()-cma_remove_id_dev() decides it will 
rdma_destroy_id() the listener id, and at the same time a process 
context rdma_destroy_id() decides it is going to do the same. There are 
probably various ways to take care of this, the simple one might be for 
rdma_destroy_id() to look at the state and make a decision about who 
gets to destroy.


Thanks.

Kanoj

I tried to explain the issue somewhat in my change commit and code 
comments.  The issue is synchronizing cleanup of the listen_list with 
device removal.


When an RDMA device is added to the system, a new listen request is 
added for all wildcard listens.  Since the original locking held the 
mutex throughout the cleanup of the listen list, it prevented adding 
another listen request during that same time.


Similar protection was there for handling device removal.  When a 
device is removed from the system, all internal listen requests 
associated with that device are destroyed.  If the associated wildcard 
listen is also being destroyed, we need to ensure that we don't try to 
destroy the same listen twice.


My patch, like yours, ends up releasing the mutex while cleaning up 
the listen_list.  I choose to eliminate the cma_destroy_listen() call, 
and use rdma_destroy_id() as a single destruction path instead.  This 
keeps the locking contained to a single function.  (I don't like 
acquiring a lock in one call and releasing it in another.  It puts too 
much assumption on the caller.)


What was missing was ensuring that a device removal didn't try to 
destroy the same listen request.  This is handled by the adding the 
list_del*() calls to cma_cancel_listens().  Whichever thread removes 
the listening id from the device list is responsible for its 
destruction. And because that thread could be the device removal 
thread, I added a reference from the per device listen to the wildcard 
listen.


Hopefully this makes sense.

- Sean



___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] Re: [PATCH v3 for 2.6.24] IB/ipoib: enable IGMP for userpsace multicast IB apps

2007-10-10 Thread Roland Dreier

OK, at long last I merged the following.  I rewrote the changelog to
(I think) be more understandable, and also cleaned up a few things in
the patch (including whitespace damage...).

commit 335a64a5a958002bc238c90de695e120c3c8c120
Author: Or Gerlitz [EMAIL PROTECTED]
Date:   Mon Oct 8 10:13:00 2007 +0200

IPoIB: Allow setting policy to ignore multicast groups

The kernel IB stack allows (through the RDMA CM) userspace
applications to join and use multicast groups from the IPoIB MGID
range.  This allows multicast traffic to be handled directly from
userspace QPs, without going through the kernel stack, which gives
better performance for some applications.

However, to fully interoperate with IP multicast, such userspace
applications need to participate in IGMP reports and queries, or else
routers may not forward the multicast traffic to the system where the
application is running.  The simplest way to do this is to share the
kernel IGMP implementation by using the IP_ADD_MEMBERSHIP option to
join multicast groups that are being handled directly in userspace.

However, in such cases, the actual multicast traffic should not also
be handled by the IPoIB interface, because that would burn resources
handling multicast packets that will just be discarded in the kernel.

To handle this, this patch adds lookup on the database used for IB
multicast group reference counting when IPoIB is joining multicast
groups, and if a multicast group is already handled by user space,
then the IPoIB kernel driver ignores the group.  This is controlled by
a per-interface policy flag.  When the flag is set, IPoIB will not
join and attach its QP to a multicast group which already has an entry
in the database; when the flag is cleared, IPoIB will behave as before
this change.

For each IPoIB interface, the /sys/class/net/$intf/umcast attribute
controls the policy flag.  The default value is off/0.

Signed-off-by: Or Gerlitz [EMAIL PROTECTED]
Signed-off-by: Roland Dreier [EMAIL PROTECTED]

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
b/drivers/infiniband/ulp/ipoib/ipoib.h
index fc16bce..a198ce8 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -86,6 +86,7 @@ enum {
IPOIB_MCAST_STARTED   = 8,
IPOIB_FLAG_NETIF_STOPPED  = 9,
IPOIB_FLAG_ADMIN_CM   = 10,
+   IPOIB_FLAG_UMCAST = 11,
 
IPOIB_MAX_BACKOFF_SECONDS = 16,
 
@@ -384,6 +385,7 @@ static inline void ipoib_put_ah(struct ipoib_ah *ah)
 
 int ipoib_open(struct net_device *dev);
 int ipoib_add_pkey_attr(struct net_device *dev);
+int ipoib_add_umcast_attr(struct net_device *dev);
 
 void ipoib_send(struct net_device *dev, struct sk_buff *skb,
struct ipoib_ah *address, u32 qpn);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 900335a..ff17fe3 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1019,6 +1019,37 @@ static ssize_t show_pkey(struct device *dev,
 }
 static DEVICE_ATTR(pkey, S_IRUGO, show_pkey, NULL);
 
+static ssize_t show_umcast(struct device *dev,
+  struct device_attribute *attr, char *buf)
+{
+   struct ipoib_dev_priv *priv = netdev_priv(to_net_dev(dev));
+
+   return sprintf(buf, %d\n, test_bit(IPOIB_FLAG_UMCAST, priv-flags));
+}
+
+static ssize_t set_umcast(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+   struct ipoib_dev_priv *priv = netdev_priv(to_net_dev(dev));
+   unsigned long umcast_val = simple_strtoul(buf, NULL, 0);
+
+   if (umcast_val  0) {
+   set_bit(IPOIB_FLAG_UMCAST, priv-flags);
+   ipoib_warn(priv, ignoring multicast groups joined directly 
+   by userspace\n);
+   } else
+   clear_bit(IPOIB_FLAG_UMCAST, priv-flags);
+
+   return count;
+}
+static DEVICE_ATTR(umcast, S_IWUSR | S_IRUGO, show_umcast, set_umcast);
+
+int ipoib_add_umcast_attr(struct net_device *dev)
+{
+   return device_create_file(dev-dev, dev_attr_umcast);
+}
+
 static ssize_t create_child(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
@@ -1136,6 +1167,8 @@ static struct net_device *ipoib_add_port(const char 
*format,
goto sysfs_failed;
if (ipoib_add_pkey_attr(priv-dev))
goto sysfs_failed;
+   if (ipoib_add_umcast_attr(priv-dev))
+   goto sysfs_failed;
if (device_create_file(priv-dev-dev, dev_attr_create_child))
goto sysfs_failed;
if (device_create_file(priv-dev-dev, dev_attr_delete_child))
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c

[ofa-general] IPoIB CM (NOSRQ) [PATCH V9] updated to incorporate Sean's comments

2007-10-10 Thread Pradeep Satyanarayana

This patch has been updated to include all of Sean's comments including 
elimination
of the max_recv_buf module parameter. Instead we print a warning when no srq
memory usage exceeds 1GB.

Signed-off-by: Pradeep Satyanarayana [EMAIL PROTECTED]
---
--- a/linux-2.6.23-rc7/drivers/infiniband/ulp/ipoib/ipoib.h 2007-10-03 
12:01:58.0 -0500
+++ b/linux-2.6.23-rc7/drivers/infiniband/ulp/ipoib/ipoib.h 2007-10-09 
19:42:51.0 -0500
@@ -69,6 +69,7 @@ enum {
IPOIB_TX_RING_SIZE= 64,
IPOIB_MAX_QUEUE_SIZE  = 8192,
IPOIB_MIN_QUEUE_SIZE  = 2,
+   IPOIB_MAX_RC_QP   = 4096,
 
IPOIB_NUM_WC  = 4,
 
@@ -95,11 +96,13 @@ enum {
IPOIB_MCAST_FLAG_ATTACHED = 3,
 };
 
+#define CM_PACKET_SIZE (ALIGN(IPOIB_CM_MTU, PAGE_SIZE))
 #defineIPOIB_OP_RECV   (1ul  31)
+
 #ifdef CONFIG_INFINIBAND_IPOIB_CM
-#defineIPOIB_CM_OP_SRQ (1ul  30)
+#defineIPOIB_CM_OP_RECV (1ul  30)
 #else
-#defineIPOIB_CM_OP_SRQ (0)
+#defineIPOIB_CM_OP_RECV (0)
 #endif
 
 /* structs */
@@ -186,11 +189,14 @@ enum ipoib_cm_state {
 };
 
 struct ipoib_cm_rx {
-   struct ib_cm_id *id;
-   struct ib_qp*qp;
-   struct list_head list;
-   struct net_device   *dev;
-   unsigned longjiffies;
+   struct ib_cm_id *id;
+   struct ib_qp*qp;
+   struct ipoib_cm_rx_buf  *rx_ring; /* Used by no srq only */
+   struct list_head list;
+   struct net_device   *dev;
+   unsigned longjiffies;
+   u32  index; /* wr_ids are distinguished by index
+* to identify the QP -no srq only */
enum ipoib_cm_state  state;
 };
 
@@ -235,6 +241,8 @@ struct ipoib_cm_dev_priv {
struct ib_wcibwc[IPOIB_NUM_WC];
struct ib_sge   rx_sge[IPOIB_CM_RX_SG];
struct ib_recv_wr   rx_wr;
+   struct ipoib_cm_rx  **rx_index_table; /* See ipoib_cm_dev_init()
+  *for usage of this element */
 };
 
 /*
@@ -458,6 +466,7 @@ void ipoib_drain_cq(struct net_device *d
 /* We don't support UC connections at the moment */
 #define IPOIB_CM_SUPPORTED(ha)   (ha[0]  (IPOIB_FLAGS_RC))
 
+extern int max_rc_qp;
 static inline int ipoib_cm_admin_enabled(struct net_device *dev)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
--- a/linux-2.6.23-rc7/drivers/infiniband/ulp/ipoib/ipoib_cm.c  2007-07-31 
12:14:30.0 -0500
+++ b/linux-2.6.23-rc7/drivers/infiniband/ulp/ipoib/ipoib_cm.c  2007-10-10 
16:20:50.0 -0500
@@ -49,6 +49,14 @@ MODULE_PARM_DESC(cm_data_debug_level,
 
 #include ipoib.h
 
+int max_rc_qp = 128;
+
+module_param_named(nosrq_max_rc_qp, max_rc_qp, int, 0444);
+MODULE_PARM_DESC(nosrq_max_rc_qp, Max number of no srq RC QPs supported);
+
+static atomic_t current_rc_qp = ATOMIC_INIT(0); /* Active number of RC QPs for 
no srq */
+
+#define NOSRQ_INDEX_MASK  (0xfff) /* This corresponds to a max of 4096 QPs 
for no srq */
 #define IPOIB_CM_IETF_ID 0x1000ULL
 
 #define IPOIB_CM_RX_UPDATE_TIME (256 * HZ)
@@ -81,20 +89,21 @@ static void ipoib_cm_dma_unmap_rx(struct
ib_dma_unmap_single(priv-ca, mapping[i + 1], PAGE_SIZE, 
DMA_FROM_DEVICE);
 }
 
-static int ipoib_cm_post_receive(struct net_device *dev, int id)
+static int post_receive_srq(struct net_device *dev, u64 id)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
struct ib_recv_wr *bad_wr;
int i, ret;
 
-   priv-cm.rx_wr.wr_id = id | IPOIB_CM_OP_SRQ;
+   priv-cm.rx_wr.wr_id = id | IPOIB_CM_OP_RECV;
 
for (i = 0; i  IPOIB_CM_RX_SG; ++i)
priv-cm.rx_sge[i].addr = priv-cm.srq_ring[id].mapping[i];
 
ret = ib_post_srq_recv(priv-cm.srq, priv-cm.rx_wr, bad_wr);
if (unlikely(ret)) {
-   ipoib_warn(priv, post srq failed for buf %d (%d)\n, id, ret);
+   ipoib_warn(priv, post srq failed for buf %lld (%d)\n,
+  (unsigned long long)id, ret);
ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1,
  priv-cm.srq_ring[id].mapping);
dev_kfree_skb_any(priv-cm.srq_ring[id].skb);
@@ -104,12 +113,47 @@ static int ipoib_cm_post_receive(struct 
return ret;
 }
 
-static struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev, int id, 
int frags,
+static int post_receive_nosrq(struct net_device *dev, u64 id)
+{
+   struct ipoib_dev_priv *priv = netdev_priv(dev);
+   struct ib_recv_wr *bad_wr;
+   int i, ret;
+   u32 index;
+   u32 wr_id;
+   struct ipoib_cm_rx *rx_ptr;
+
+   index = id   NOSRQ_INDEX_MASK;
+   wr_id = id  32;
+
+   rx_ptr = priv-cm.rx_index_table[index];
+
+   priv-cm.rx_wr.wr_id = id | IPOIB_CM_OP_RECV;
+
+   for (i = 0; i  IPOIB_CM_RX_SG; ++i)
+   priv-cm.rx_sge[i].addr

Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-10-10 Thread David Miller

From: Sean Hefty [EMAIL PROTECTED]
Date: Wed, 10 Oct 2007 14:01:07 -0700

  The hack to use a socket and bind it to claim the port was just for 
  demostrating the idea.  The correct solution, IMO, is to enhance the 
  core low level 4-tuple allocation services to be more generic (eg: not 
  be tied to a struct sock).  Then the host tcp stack and the host rdma 
  stack can allocate TCP/iWARP ports/4tuples from this common exported 
  service and share the port space.  This allocation service could also be 
  used by other deep adapters like iscsi adapters if needed.

 Since iWarp runs on top of TCP, the port space is really the same. 
 FWIW, I agree that this proposal is the correct solution to support iWarp.

But you can be sure it's not going to happen, sorry.

It would mean that we'd need to export the entire TCP socket table so
then when iWARP connections are created you can search to make sure
there is not an existing full 4-tuple that is the same.

It is not just about local TCP ports.

iWARP needs to live in it's seperate little container and not
contaminate the rest of the networking, this is the deal.  Any
suggested such change which breaks that deal will be NACK'd by all of
the core networking developers.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] Re: [PATCH 2.6.24] rdma/cm: fix deadlock destroyinglisten requests

2007-10-10 Thread Kanoj Sarcar


Sean Hefty wrote:


* in your patch, I suggest taking out the warning printk from
cma_listen_on_dev() when the listener create attempt fails; it might be
that the device is out of resources etc. Since the code takes care of
this situation pretty well, I don't see a need for the printk.
   



That's easy enough to do.

 


* I don't see a reason for the internal_id and the device listeners
getting a refcount on the wildcard listener. Because, even without
these, it is guaranteed that the wildcard listener will exist at least
as long as any of the children device listener's are around, by looking
at the logic in rdma_destroy_id(). Can you provide some logic for
requring this then?
   



There are 2 ways to destroy an internal_id: destroying its parent (the wildcard
listen) or removing its device.  When a device is removed, the internal_id is
removed from its parent list to ensure that it is only destroyed once.  If the
parent were to be destroyed at this point, it would destroy any remaining
children, then be freed.  The internal_id still exists however, and could be
generating connection request events, which expects to fine the parent.  The
reference ensures that the parent stays around as long as any children remain.
 



Ok, makes sense.

 


* not that I am very worried (and I suggesting resolving this thru
another subsequent patch if it is really a problem), but I think device
removal is still racy wrt non wildcard listeners. Here's the sequence:
cma_process_remove()-cma_remove_id_dev() decides it will
rdma_destroy_id() the listener id, and at the same time a process
context rdma_destroy_id() decides it is going to do the same. There are
probably various ways to take care of this, the simple one might be for
rdma_destroy_id() to look at the state and make a decision about who
gets to destroy.
   



A user cannot both return non-zero from their callback (indicating that the
rdma_cm should destroy the id) and call rdma_destroy_id() on the same id.  This
is equivalent to call rdma_destroy_id() twice.  It's not too difficult for the
user to avoid this.

- Sean

 

I don't understand your response. ucma.c for example can call 
rdma_create_id() and rdma_destroy_id(), correct? What says that when 
ucma.c does a rdma_destroy_id() on a nonwildcard listener, a device 
removal is not attempting to do the same on the listener? If this is 
possible, the code paths I mentioned above can still trigger a double 
destruct on a listener, correct?


Kanoj
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [ofa-general] Re: [PATCH 2.6.24] rdma/cm: fix deadlock destroyinglisten requests

2007-10-10 Thread Sean Hefty

I don't understand your response. ucma.c for example can call
rdma_create_id() and rdma_destroy_id(), correct? What says that when
ucma.c does a rdma_destroy_id() on a nonwildcard listener, a device
removal is not attempting to do the same on the listener? If this is
possible, the code paths I mentioned above can still trigger a double
destruct on a listener, correct?

Device removal only automatically destroys internal listens, and a non-wildcard
listen would never generate an internal listen.  Internal listens are used to
map wildcard listens across multiple RDMA devices.  Their creation and
destruction is contained to the cma.  From the viewpoint of the device removal
code, a nonwildcard listen is treated the same as a connected id.

The ucma only destroys id's from an event callback if the id is for a new
connection which it can't handle.

Hope this makes sense.

- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] Re: [PATCH 2.6.24] rdma/cm: fix deadlock destroyinglisten requests

2007-10-10 Thread Sean Hefty

Wait, I see ... cma_remove_id_dev() would return 0 from the 
event_handler, ensuring cma_process_remove() does not invoke 
rdma_destroy_id(), is that it?


yep - the destruction of the id is controlled by the user
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] Re: [PATCH 2.6.24] rdma/cm: fix deadlock destroyinglisten requests

2007-10-10 Thread Kanoj Sarcar


Sean Hefty wrote:

Wait, I see ... cma_remove_id_dev() would return 0 from the 
event_handler, ensuring cma_process_remove() does not invoke 
rdma_destroy_id(), is that it?



yep - the destruction of the id is controlled by the user


Ok, one last thing while we are here.

cma_process_remove() - cma_remove_id_dev() generates the event for 
device removal. This is ok to do as long as it can be guaranteed that a 
racing rdma_destroy_id() has not returned back to caller, correct?


IE, the caller must be willing to accept device removal events until its 
rdma_destroy_id() returns.


If so, why is cma_remove_id_dev() trying so hard to not generate the 
event when rdma_destroy_id() has gotten to the point of setting 
CMA_DESTROYING? Could it not just generate the event, happy in the 
knowledge that the refcount bump done by cma_process_remove() will 
prevent the rdma_destroy_id() call from returning?


If it could, that could mean all the cma_exch() code can be deleted from 
cma.c, and the CMA_DESTROYING state can also go away (your patch has 
taken out the only other reason CMA_DESTROYING was needed).


Kanoj
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] librdmacm and libmthca question

2007-10-10 Thread Doug Ledford

OK, I ran into an issue with librdmacm and I was curious what the
answers to these issues are.

First, the rdma_connect/rdma_accept functions both require a connection
param struct.  That struct tells librdmacm what you want in terms of
responder_resources and initiator_depth.  Reading the man page, that's
the number of outstanding RMDA reads and RDMA atomic operations.  In
usage, I found that the QP max_recv_wr and max_send_wr are totally
unrelated to this (I at first thought they could be the same).  In fact,
on mthca hardware I found the hard limit to be either 4 or 5 (4 worked,
6 didn't, didn't try 5, assumed 4).  So even with a send queue depth of
128, I couldn't get above a 4 depth on initiator_depth.  I think it
might be of value to document somewhere that the initiator depth and
responder resources are not directly related to the actual work queue
depth, and that without some sort of intervention, are not that high.

However, I spent a *lot* of time tracking this down because the failure
doesn't occur until rdma_accept time.  Passing an impossibly high value
in initiator_depth or responder_resources doesn't fail on rdma_connect.
This leads one to believe that the values are OK, even though they fail
when you use the same values in rdma_accept.  A note to this effect in
the man pages would help.

Second, now that I know that mthca hardware fails with initiator depth
or responder resources  4, it raises several unanswered questions:

1) Can this limit be adjusted by module parameters, and if so, which
ones?

2) Does this limit represent the limit on outstanding RMDA READ/Atomic
operations in a) progress, b) queue, or c) registration?

3) The answer to #2 implies the answer to this, but I would like a
specific response.  If I attempt to register more IBV_ACCESS_REMOTE_READ
memory regions than responder resources, what happens?  If I attempt to
queue more IBV_WR_RDMA_READ work requests than initiator_depth, what
happens?  If there are more IBV_WR_RDMA_READ requests in queue than
initiator_depth and it hits the initiator_depth + 1 request while still
processing the proceeding requests, what happens?

-- 
Doug Ledford [EMAIL PROTECTED]
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] [PATCH V2] infiniband/core: Enable loopback of DR SMP responses from userspace

2007-10-10 Thread swelch

  Sean, Roland,

  The local loopback of an outgoing DR SMP response is limited to those that
  originate at the driver specific SMA implementation during the drivers
  process_mad() function.  This patch[v2] enables the DR SMP response
  originating in user space (or elsewhere) to be delivered back up the
  stack on the same node.  In this case the driver specific process_mad()
  function does not consume or process the MAD so it must be manually
  copied to the MAD buffer which is to be handed off to a local agent.

  This is version 2 of the patch, the comments are updated, function
  renamed to better reflect IB specification terminology, and setting
  of the TID removed which this patch elminates the need for.

  Thanks, Steve

Signed-off-by: Steve Welch [EMAIL PROTECTED]
---
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 6f42877..3c26cea 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -701,7 +701,8 @@ static int handle_outgoing_dr_smp(struct 
ib_mad_agent_private *mad_agent_priv,
}
 
/* Check to post send on QP or process locally */
-   if (smi_check_local_smp(smp, device) == IB_SMI_DISCARD)
+   if (smi_check_local_smp(smp, device) == IB_SMI_DISCARD 
+   smi_check_local_outgoing_smp(smp, device) == IB_SMI_DISCARD)
goto out;
 
local = kmalloc(sizeof *local, GFP_ATOMIC);
@@ -752,8 +753,7 @@ static int handle_outgoing_dr_smp(struct 
ib_mad_agent_private *mad_agent_priv,
port_priv = ib_get_mad_port(mad_agent_priv-agent.device,
mad_agent_priv-agent.port_num);
if (port_priv) {
-   mad_priv-mad.mad.mad_hdr.tid =
-   ((struct ib_mad *)smp)-mad_hdr.tid;
+   memcpy(mad_priv-mad.mad, smp, sizeof(struct ib_mad));
recv_mad_agent = find_mad_agent(port_priv,
mad_priv-mad.mad);
}
diff --git a/drivers/infiniband/core/smi.h b/drivers/infiniband/core/smi.h
index 1cfc298..53407b1 100644
--- a/drivers/infiniband/core/smi.h
+++ b/drivers/infiniband/core/smi.h
@@ -59,7 +59,8 @@ extern enum smi_action smi_handle_dr_smp_send(struct ib_smp 
*smp,
  u8 node_type, int port_num);
 
 /*
- * Return 1 if the SMP should be handled by the local SMA/SM via process_mad
+ * Return IB_SMI_HANDLE if the SMP should be handled by the local SMA/SM
+ * via process_mad
  */
 static inline enum smi_action smi_check_local_smp(struct ib_smp *smp,
  struct ib_device *device)
@@ -71,4 +72,19 @@ static inline enum smi_action smi_check_local_smp(struct 
ib_smp *smp,
(smp-hop_ptr == smp-hop_cnt + 1)) ?
IB_SMI_HANDLE : IB_SMI_DISCARD);
 }
+
+/*
+ * Return IB_SMI_HANDLE if the SMP should be handled by the local SMA/SM
+ * via process_mad
+ */
+static inline enum smi_action smi_check_local_outgoing_smp(struct ib_smp *smp,
+  struct ib_device *device)
+{
+   /* C14-13:3 -- We're at the end of the DR segment of path */
+   /* C14-13:4 -- Hop Pointer == 0 - give to SM */
+   return ((device-process_mad 
+   ib_get_smp_direction(smp) 
+   !smp-hop_ptr) ? IB_SMI_HANDLE : IB_SMI_DISCARD);
+}
+
 #endif /* __SMI_H_ */
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] [PATCH V3] infiniband/core: Enable loopback of DR SMP responses from userspace

2007-10-10 Thread swelch



  Sean, Roland,

  This patch [v3] replaces the [v2] patch; it includes those changes but renames
  the smi function testing returning SMP requests to the name Hal recommends.

  This patch allows userspace DR SMP responses to be looped back and delivered
  to a local mad agent by the management stack.

  Thanks, Steve

Signed-off-by: Steve Welch [EMAIL PROTECTED]
---
 drivers/infiniband/core/mad.c |6 +++---
 drivers/infiniband/core/smi.h |   18 +-
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 6f42877..98148d6 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -701,7 +701,8 @@ static int handle_outgoing_dr_smp(struct 
ib_mad_agent_private *mad_agent_priv,
}
 
/* Check to post send on QP or process locally */
-   if (smi_check_local_smp(smp, device) == IB_SMI_DISCARD)
+   if (smi_check_local_smp(smp, device) == IB_SMI_DISCARD 
+   smi_check_local_returning_smp(smp, device) == IB_SMI_DISCARD)
goto out;
 
local = kmalloc(sizeof *local, GFP_ATOMIC);
@@ -752,8 +753,7 @@ static int handle_outgoing_dr_smp(struct 
ib_mad_agent_private *mad_agent_priv,
port_priv = ib_get_mad_port(mad_agent_priv-agent.device,
mad_agent_priv-agent.port_num);
if (port_priv) {
-   mad_priv-mad.mad.mad_hdr.tid =
-   ((struct ib_mad *)smp)-mad_hdr.tid;
+   memcpy(mad_priv-mad.mad, smp, sizeof(struct ib_mad));
recv_mad_agent = find_mad_agent(port_priv,
mad_priv-mad.mad);
}
diff --git a/drivers/infiniband/core/smi.h b/drivers/infiniband/core/smi.h
index 1cfc298..aff96ba 100644
--- a/drivers/infiniband/core/smi.h
+++ b/drivers/infiniband/core/smi.h
@@ -59,7 +59,8 @@ extern enum smi_action smi_handle_dr_smp_send(struct ib_smp 
*smp,
  u8 node_type, int port_num);
 
 /*
- * Return 1 if the SMP should be handled by the local SMA/SM via process_mad
+ * Return IB_SMI_HANDLE if the SMP should be handled by the local SMA/SM
+ * via process_mad
  */
 static inline enum smi_action smi_check_local_smp(struct ib_smp *smp,
  struct ib_device *device)
@@ -71,4 +72,19 @@ static inline enum smi_action smi_check_local_smp(struct 
ib_smp *smp,
(smp-hop_ptr == smp-hop_cnt + 1)) ?
IB_SMI_HANDLE : IB_SMI_DISCARD);
 }
+
+/*
+ * Return IB_SMI_HANDLE if the SMP should be handled by the local SMA/SM
+ * via process_mad
+ */
+static inline enum smi_action smi_check_local_returning_smp(struct ib_smp *smp,
+  struct ib_device *device)
+{
+   /* C14-13:3 -- We're at the end of the DR segment of path */
+   /* C14-13:4 -- Hop Pointer == 0 - give to SM */
+   return ((device-process_mad 
+   ib_get_smp_direction(smp) 
+   !smp-hop_ptr) ? IB_SMI_HANDLE : IB_SMI_DISCARD);
+}
+
 #endif /* __SMI_H_ */
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] nightly osm_sim report 2007-10-11:normal completion

2007-10-10 Thread kliteyn

OSM Simulation Regression Summary
 
[Generated mail - please do NOT reply]
 
 
OpenSM binary date = 2007-10-10
OpenSM git rev = Tue_Oct_2_22:28:56_2007 
[d5c34ddc158599abff9f09a6cc6c8cad67745f0b]
ibutils git rev = Tue_Sep_4_17:57:34_2007 
[4bf283f6a0d7c0264c3a1d2de92745e457585fdb]
 
 
Total=520  Pass=520  Fail=0
 
 
Pass:
39 Stability IS1-16.topo
39 Pkey IS1-16.topo
39 OsmTest IS1-16.topo
39 OsmStress IS1-16.topo
39 Multicast IS1-16.topo
39 LidMgr IS1-16.topo
13 Stability IS3-loop.topo
13 Stability IS3-128.topo
13 Pkey IS3-128.topo
13 OsmTest IS3-loop.topo
13 OsmTest IS3-128.topo
13 OsmStress IS3-128.topo
13 Multicast IS3-loop.topo
13 Multicast IS3-128.topo
13 LidMgr IS3-128.topo
13 FatTree merge-roots-4-ary-2-tree.topo
13 FatTree merge-root-4-ary-3-tree.topo
13 FatTree gnu-stallion-64.topo
13 FatTree blend-4-ary-2-tree.topo
13 FatTree RhinoDDR.topo
13 FatTree FullGnu.topo
13 FatTree 4-ary-2-tree.topo
13 FatTree 2-ary-4-tree.topo
13 FatTree 12-node-spaced.topo
13 FTreeFail 4-ary-2-tree-missing-sw-link.topo
13 FTreeFail 4-ary-2-tree-links-at-same-rank-2.topo
13 FTreeFail 4-ary-2-tree-links-at-same-rank-1.topo
13 FTreeFail 4-ary-2-tree-diff-num-pgroups.topo

Failures:
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

39 matches

Mail list logo