Re: [ofa-general] [PATCH] mlx4_core: increase max number of qp's and of srq's to 128K

2007-11-28 Thread Jack Morgenstein
On Wednesday 21 November 2007 10:09, Or Gerlitz wrote:
 Why you want to increase the maxima for SRQs as well? a 1:1 ratio 
 between QPs to SRQs means a broken application design, isn't it?
 
Not really, for the new XRC qp type.  In this case, we will have one XRC 
connection
per multi-process application per host, with a larger number of XRC_SRQs (one 
per process per host).

However, the XRC SRQs act more like RD qps, so we really don't need to increase 
the default max SRQs.

I'll post V2 of the patch now.

- Jack
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] [PATCH V2] mlx4_core: increase max number of qp's to 128K

2007-11-28 Thread Jack Morgenstein
mlx4_core: increase max QPs to 128K.

With the advent large clusters which utilize multicore hosts,
64K qp's is not enough.

We want to increase the default maxima for QPs to 128K.

Signed-off-by: Jack Morgenstein [EMAIL PROTECTED]

Index: ofa_1_3_dev_kernel/drivers/net/mlx4/main.c
===
--- ofa_1_3_dev_kernel.orig/drivers/net/mlx4/main.c 2007-11-21 
17:51:56.0 +0200
+++ ofa_1_3_dev_kernel/drivers/net/mlx4/main.c  2007-11-22 10:26:04.0 
+0200
@@ -76,7 +76,7 @@ static const char mlx4_version[] __devin
DRV_VERSION  ( DRV_RELDATE )\n;
 
 static struct mlx4_profile default_profile = {
-   .num_qp = 1  16,
+   .num_qp = 1  17,
.num_srq= 1  16,
.rdmarc_per_qp  = 1  4,
.num_cq = 1  16,
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] ofa_1_3_kernel 20071128-0200 daily build status

2007-11-28 Thread Vladimir Sokolovsky (Mellanox)
This email was generated automatically, please do not reply


git_url: git://git.openfabrics.org/ofed_1_3/linux-2.6.git
git_branch: ofed_kernel

Common build parameters:   --with-ipoib-mod --with-sdp-mod --with-srp-mod 
--with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-mlx4-mod 
--with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod 
--with-nes-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.22
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.13
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.17
Passed on ia64 with linux-2.6.18
Passed on x86_64 with linux-2.6.19
Passed on ppc64 with linux-2.6.14
Passed on powerpc with linux-2.6.13
Passed on ia64 with linux-2.6.17
Passed on ppc64 with linux-2.6.16
Passed on powerpc with linux-2.6.14
Passed on powerpc with linux-2.6.12
Passed on ppc64 with linux-2.6.12
Passed on ppc64 with linux-2.6.19
Passed on ia64 with linux-2.6.19
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.12
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.18
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.15
Passed on ppc64 with linux-2.6.17
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.22
Passed on ia64 with linux-2.6.13
Passed on powerpc with linux-2.6.15
Passed on ia64 with linux-2.6.12
Passed on x86_64 with linux-2.6.18-53.el5
Passed on ia64 with linux-2.6.16
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on x86_64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.9-55.ELsmp
Passed on ia64 with linux-2.6.22
Passed on ia64 with linux-2.6.23
Passed on ppc64 with linux-2.6.13
Passed on x86_64 with linux-2.6.15
Passed on ia64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.18-8.el5
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on ppc64 with linux-2.6.18-8.el5

Failed:
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [PATCH] librdmacm/man: fix-up man pages

2007-11-28 Thread Or Gerlitz
On 11/27/07, Sean Hefty [EMAIL PROTECTED] wrote:
  These have been committed to master branch.

OK, got it.

Some users have approached me and said that its unclear from the man
pages for some values of the connection param structure what are their
legal values. Reviewing this a little, I think we should add the
maximum values for the retry_count and rnr_retry_count under the
infiniband specific section of the rdma_connect and rdma_accept pages.

Also, what about pushing all these documentation changes as a release
to OFED 1.3?

Or.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] [PATCH] libmlx4: max_recv_wr must be non-zero for non-SRQ QPs

2007-11-28 Thread Jack Morgenstein
max_recv_wr must also be non-zero for QPs which are not
associated with an SRQ.

Signed-off-by: Jack Morgenstein [EMAIL PROTECTED]

---
Roland,
Without this patch, if the userspace caller specifies max_recv_wr = 0 for a 
non-srq
QP, the creation will be rejected in kernel space in file
infiniband/hw/mlx4/qp.c, procedure set_rq_size:

} else {
/* HW requires = 1 RQ entry with = 1 gather entry */
== NOTE:   if (is_user  (!cap-max_recv_wr || !cap-max_recv_sge))
return -EINVAL;

You've set max_recv_sge size to 1, but not max_recv_wr.

Jack

diff --git a/src/verbs.c b/src/verbs.c
index 4e7beff..ec4c6a5 100644
--- a/src/verbs.c
+++ b/src/verbs.c
@@ -367,8 +367,12 @@ struct ibv_qp *mlx4_create_qp(struct ibv_pd *pd, struct 
ibv_qp_init_attr *attr)
 
if (attr-srq)
attr-cap.max_recv_wr = qp-rq.wqe_cnt = 0;
-   else if (attr-cap.max_recv_sge  1)
-   attr-cap.max_recv_sge = 1;
+   else {
+   if (attr-cap.max_recv_sge  1)
+   attr-cap.max_recv_sge = 1;
+   if (attr-cap.max_recv_wr  1)
+   attr-cap.max_recv_wr = 1;
+   }
 
if (mlx4_alloc_qp_buf(pd, attr-cap, attr-qp_type, qp))
goto err;
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [ofa-general] Re: iWARP peer-to-peer CM proposal

2007-11-28 Thread Kanevsky, Arkady
Agree with initiator/client sending signalled 0B RDMA Read.
This will handle client side.

Still not 100% clear on passive/server side.
Two issues which bothers me.
1. Is bogus S-tag allowed for incomming RDMA ops?
I do not recall that RDDP requires that length is checked before
S-tag.

2. How is verb layer on server side knows that RDMA Read op
came and was done? Is it some back door to vendor FW?
Will this be kicked for all incoming RDMA Read ops?

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Steve Wise [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, November 27, 2007 7:48 PM
 To: Caitlin Bestler
 Cc: Kanevsky, Arkady; Glenn Grundstrom; Leonid Grossman; 
 [EMAIL PROTECTED]
 Subject: Re: [ofa-general] Re: iWARP peer-to-peer CM proposal
 
 Caitlin Bestler wrote:
  On Nov 27, 2007 3:58 PM, Steve Wise 
 [EMAIL PROTECTED] wrote:
  
  For the short term, I claim we just implement this as part 
 of linux 
  iwarp connection setup (mandating a 0B read be sent from 
 the active 
  side).  Your proposal to add meta-data to the private data 
 requires a 
  standards change anyway and is, IMO, the 2nd phase of this whole 
  enchilada...
 
  Steve.
 
  
  I don't see how you can have any solution here that does 
 not require meta-data.
  For non-peer-to-peer connections neither a zero length RDMA Read or 
  Write should be sent. An extraneous RDMA Read is 
 particularly onerous 
  for a short lived connection that fits the classic active/passive 
  model. So *something* is telling the CMA layer that this 
 connection may need an MPA unjam action.
  If that isn't meta-data, what is it?
 
 I assumed the 0B read would _always_ be sent as part of 
 establishing an iWARP connection using linux and the rdma-cm.
 
  
  Further, the RDMA Read solution is adequate whenever the RDMA Write 
  solution would have been (although at an unnecessary extra 
 cost), but 
  as near as I can determine it is not a complete solution. If the 
  passive side needs an untagged message completion then *something* 
  needs to send it. How can the CM layer (or, I suppose, the 
 ULP itself) 
  know that this untagged NOP message must be sent without meta-data?
 
 I believe at Reno we had the current rnic vendors all saying 
 a SEND or 0B read will work.  So:  If someone has current 
 iwarp HW that will _not_
   handle this problem by doing the 0B read hack, please speak up now.
 
  
  As I see it, if we want to do the minimum that is required, but be 
  certain that it is adequate, we need a per-connection setup 
 meta-data exchange.
 
 Are you going to prototype this?
 
 
 Steve.
 
 
 
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [ofa-general] Re: iWARP peer-to-peer CM proposal

2007-11-28 Thread Kanevsky, Arkady
Any posting to SQ prior to connection establishment will complete
immideately with the flashed status.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Glenn Grundstrom [mailto:[EMAIL PROTECTED] 
 Sent: Sunday, November 25, 2007 9:00 PM
 To: Steve Wise; Kanevsky, Arkady
 Cc: Leonid Grossman; [EMAIL PROTECTED]
 Subject: RE: [ofa-general] Re: iWARP peer-to-peer CM proposal
 
  
  Kanevsky, Arkady wrote:
   Very good points.
   Thanks Steve.
  
   If we can do unsignalled 0-size RDMA Read with bogus 
  S-tag this may
   work better.
   Yes, it will require IRD not to be 0 set at Responder.
   Ditto ORD of at least 1 on Responder.
   There is no need to have extra CQ entry on either side for it.
   It is only needed for error path.
   So this will only be needed if Sender posted the full queue
  of sends.
   But it can not post anything because CM will not let it know that 
   connection is established.
  
 
  Well, actually, I think the ULP _can_ post before establishing the 
  connection.  But I guess we can define the semantics such that 
  applications using the rdma-cm interface must adhere to whatever we 
  need to make this hack work.
  
  Q: are there apps using the rdma-cm out there today that 
 pre-post SQ 
  WRs before getting a ESTABLISHED event?
  
  Steve.
 
 ULPs are allowed to post prior to establishing the 
 connection, but I can't name any that operate this way.  
 Prohibiting applications that use the rdma_cm directly from 
 pre-posting is okay, but what about ULP's over other ULP's 
 (i.e. MPI over uDAPL).  How can/will this be handled?
 
 Glenn.
 
 
   Happy Thanksgiving,
  
   Arkady Kanevsky   email: [EMAIL PROTECTED]
   Network Appliance Inc.   phone: 781-768-5395
   1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
   Waltham, MA 02451   central phone: 781-768-5300

  
 
   -Original Message-
   From: Steve Wise [mailto:[EMAIL PROTECTED]
   Sent: Wednesday, November 21, 2007 1:07 PM
   To: Kanevsky, Arkady
   Cc: Glenn Grundstrom; Leonid Grossman; [EMAIL PROTECTED]
   Subject: [ofa-general] Re: iWARP peer-to-peer CM proposal
  
   Comments in-line below...
  
  
   Kanevsky, Arkady wrote:
   
   Group,
  
  
   below is proposal on how to resolve peer-to-peer
  iWARP CM issue
   discovered at interop event.
  
  
   The main issue is that MPA spec (relevant portion of
 
   IETF RFC 5044
   
   is below) require that
  
  
   connection initiator send first message over the
 
   established connection.
   
   Multiple MPI implementations and several other apps use
 
   peer-to-peer
   
   model.
  
  
   So rather then forcing all of them to do it on their
 
   own, which will
   
   not help with
  
  
   interop between different implementations, the goal
  is to extend
   lower layers to provide it.
  
  

  
  
   Our first idea was to leave MPA protocol untouched and
 
   try to solve
   
   this problem
  
  
   in iw_cm. But there are too many complications to it. 
  First, in
   order to adhere to RFC5044
  
  
   initiator must send first FPDU and responder process
 
   it. But since
   
   the connection is already
  
  
   established processing FPDU involves ULP on whose behalf the
   connection is created.
  
  
   So either initiator sends a message which generates
 
   completion on
   
   responder CQ, thus visible
  
  
   to ULP, or not. 
 
  
   
   In the later case, the only op which can do it is
   RDMA one, which means
  
  
   that responder somehow provided initiator S-tag which
 
   it can use.
   
   So, this is an extension
  
  
   to MPA, probably using private data. And that responder upon
   receiving it destroy this S-tag.
  
  
   In any case this is an extension of MPA.
  
 
   This stag exchange isn't needed if this RDMA op is a 0B READ. 
The responder waits for that 0B read and only indicates 
 the rdma 
   connection is established to its ULP when it replies to the 0B 
   read.  In this scenario, the responder/server side 
 doesn't consume 
   any CQ resources.
   But it would require an IRD of at least 1 to be configured
  on the QP. 
   The initiator still requires an SQ entry, and possibly a 
 CQ entry, 
   for initiating the 0B read and handling completion.
   But its perhaps a little less painful than doing a SEND/RECV 
   exchange.  The read wr could be unsignaled so that it won't 
   generate a CQE.  But it still consumes an SQ WR slot so the SQ 
   would have to be sized to allow this extra WR. And I 
 

[ofa-general] [PATCH] IB/ehca: Fix static rate if path faster than link

2007-11-28 Thread Joachim Fenkes
The formula would yield -1 for this, which is wrong in a bad way (max
throttling). Clamp to 0, which is the correct value.

Signed-off-by: Joachim Fenkes [EMAIL PROTECTED]
---

This fixes another regression introduced in rc3.
Please review and apply for 2.6.24-rc4. Thanks!

 drivers/infiniband/hw/ehca/ehca_av.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/ehca/ehca_av.c 
b/drivers/infiniband/hw/ehca/ehca_av.c
index 453eb99..f7782c8 100644
--- a/drivers/infiniband/hw/ehca/ehca_av.c
+++ b/drivers/infiniband/hw/ehca/ehca_av.c
@@ -76,8 +76,12 @@ int ehca_calc_ipd(struct ehca_shca *shca, int port,
 
link = ib_width_enum_to_int(pa.active_width) * pa.active_speed;
 
-   /* IPD = round((link / path) - 1) */
-   *ipd = ((link + (path  1)) / path) - 1;
+   if (path = link)
+   /* no need to throttle if path faster than link */
+   *ipd = 0;
+   else
+   /* IPD = round((link / path) - 1) */
+   *ipd = ((link + (path  1)) / path) - 1;
 
return 0;
 }
-- 
1.5.2



___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [ofa-general] Re: iWARP peer-to-peer CM proposal

2007-11-28 Thread Felix Marti


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:general-
 [EMAIL PROTECTED] On Behalf Of Kanevsky, Arkady
 Sent: Wednesday, November 28, 2007 5:30 AM
 To: Steve Wise; Caitlin Bestler
 Cc: Leonid Grossman; [EMAIL PROTECTED]
 Subject: RE: [ofa-general] Re: iWARP peer-to-peer CM proposal
 
 Agree with initiator/client sending signalled 0B RDMA Read.
 This will handle client side.
 
 Still not 100% clear on passive/server side.
 Two issues which bothers me.
 1. Is bogus S-tag allowed for incomming RDMA ops?
 I do not recall that RDDP requires that length is checked before
 S-tag.
 
 2. How is verb layer on server side knows that RDMA Read op
 came and was done? Is it some back door to vendor FW?
 Will this be kicked for all incoming RDMA Read ops?

As you point out, the server Verbs layer is not aware of an incoming 0B
RDMA Read (or Write for that matter). Hence some kind of magic must
happen in the adapter where we vendors will have a choice: a) just
'unjam' the SQ in the adapter (which means that the CM layer works as
today and the server can post SQ ops before the 'unjam' is received but
they won't make it to the wire) or b) send a back-door command to the CM
which can then move the state machine to established only after the
'unjam' is received.

Whatever is done, it cannot happen for all zero-length RDMA Read (or
Write for that matter). Hence the adapter must be informed that that the
next zero-length is the 'unjam' message (which also means that the
server side could, in theory, omit sending the RDMA Read Response,
because the RDMA Read Request was really a 'unjam'... not that I would
be pushing for such an 'optimization' to avoid an extra wire message).

 
 Arkady Kanevsky   email: [EMAIL PROTECTED]
 Network Appliance Inc.   phone: 781-768-5395
 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
 Waltham, MA 02451   central phone: 781-768-5300
 
 
  -Original Message-
  From: Steve Wise [mailto:[EMAIL PROTECTED]
  Sent: Tuesday, November 27, 2007 7:48 PM
  To: Caitlin Bestler
  Cc: Kanevsky, Arkady; Glenn Grundstrom; Leonid Grossman;
  [EMAIL PROTECTED]
  Subject: Re: [ofa-general] Re: iWARP peer-to-peer CM proposal
 
  Caitlin Bestler wrote:
   On Nov 27, 2007 3:58 PM, Steve Wise
  [EMAIL PROTECTED] wrote:
  
   For the short term, I claim we just implement this as part
  of linux
   iwarp connection setup (mandating a 0B read be sent from
  the active
   side).  Your proposal to add meta-data to the private data
  requires a
   standards change anyway and is, IMO, the 2nd phase of this whole
   enchilada...
  
   Steve.
  
  
   I don't see how you can have any solution here that does
  not require meta-data.
   For non-peer-to-peer connections neither a zero length RDMA Read
or
   Write should be sent. An extraneous RDMA Read is
  particularly onerous
   for a short lived connection that fits the classic active/passive
   model. So *something* is telling the CMA layer that this
  connection may need an MPA unjam action.
   If that isn't meta-data, what is it?
 
  I assumed the 0B read would _always_ be sent as part of
  establishing an iWARP connection using linux and the rdma-cm.
 
  
   Further, the RDMA Read solution is adequate whenever the RDMA
Write
   solution would have been (although at an unnecessary extra
  cost), but
   as near as I can determine it is not a complete solution. If the
   passive side needs an untagged message completion then *something*
   needs to send it. How can the CM layer (or, I suppose, the
  ULP itself)
   know that this untagged NOP message must be sent without
meta-data?
 
  I believe at Reno we had the current rnic vendors all saying
  a SEND or 0B read will work.  So:  If someone has current
  iwarp HW that will _not_
handle this problem by doing the 0B read hack, please speak up
now.
 
  
   As I see it, if we want to do the minimum that is required, but be
   certain that it is adequate, we need a per-connection setup
  meta-data exchange.
 
  Are you going to prototype this?
 
 
  Steve.
 
 
 
 ___
 general mailing list
 general@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
 
 To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
 general
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] DDR vs SDR performance

2007-11-28 Thread Stijn De Smet
Hello,

I have a problem with the DDR performance:

Configuration:
2 servers (IBM x3755, equiped with 4 dualcore opteron and 16GB RAM)
3 HCA's installed (2 Cisco DDR(Cheetah) and 1 Cisco dual SDR(LionMini),
all PCI-e x8), all DDR HCA's at newest Cisco Firmware v1.2.917 build
3.2.0.149, with label 'HCA.Cheetah-DDR.20'

The  DDR's are connected with a cable, and s3n1 is running a SM. The 
SDR boards are connected over a Cisco SFS-7000D, but the DDR performance
is +- the same over this SFS-7000D

Both servers are running SLES10-SP1 with Ofed 1.2.5.


s3n1:~ # ibstatus
Infiniband device 'mthca0' port 1 status:-- DDR board #1, not connected
default gid: fe80::::0005:ad00:000b:cb39
base lid:0x0
sm lid:  0x0
state:   1: DOWN
phys state:  2: Polling
rate:10 Gb/sec (4X)

Infiniband device 'mthca1' port 1 status:  --- DDR board #2, connected
with cable
default gid: fe80::::0005:ad00:000b:cb31
base lid:0x16
sm lid:  0x16
state:   4: ACTIVE
phys state:  5: LinkUp
rate:20 Gb/sec (4X DDR)

Infiniband device 'mthca2' port 1 status: --- SDR board, only port 1
connected to the SFS-7000D
default gid: fe80::::0005:ad00:0008:a8d9
base lid:0x3
sm lid:  0x2
state:   4: ACTIVE
phys state:  5: LinkUp
rate:10 Gb/sec (4X)

Infiniband device 'mthca2' port 2 status:
default gid: fe80::::0005:ad00:0008:a8da
base lid:0x0
sm lid:  0x0
state:   1: DOWN
phys state:  2: Polling
rate:10 Gb/sec (4X)


RDMA test of :
-- SDR:
s3n2:~ # ib_rdma_bw -d mthca2 gpfs3n1
7190: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000
| duplex=0 | cma=0 |
7190: Local address:  LID 0x05, QPN 0x0408, PSN 0xf10f03 RKey 0x003b00
VAddr 0x002ba7b9943000
7190: Remote address: LID 0x03, QPN 0x040a, PSN 0xa9cf5c, RKey 0x003e00
VAddr 0x002adb2f3bb000


7190: Bandwidth peak (#0 to #989): 937.129 MB/sec
7190: Bandwidth average: 937.095 MB/sec
7190: Service Demand peak (#0 to #989): 2709 cycles/KB
7190: Service Demand Avg  : 2709 cycles/KB

-- DDR
s3n2:~ # ib_rdma_bw -d mthca1 gpfs3n1
7191: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000
| duplex=0 | cma=0 |
7191: Local address:  LID 0x10, QPN 0x0405, PSN 0x5e19e RKey 0x002600
VAddr 0x002b76eab2
7191: Remote address: LID 0x16, QPN 0x0405, PSN 0xdd976e, RKey
0x80002900 VAddr 0x002ba8ed10e000


7191: Bandwidth peak (#0 to #990): 1139.32 MB/sec
7191: Bandwidth average: 1139.31 MB/sec
7191: Service Demand peak (#0 to #990): 2228 cycles/KB
7191: Service Demand Avg  : 2228 cycles/KB

So only 200MB/s increase between SDR and DDR
With comparable hardware(x3655, dual dualcore opteron, 8GB RAM), I get a
little bit better RDMA performance(1395MB/s so close to the PCI-e x8
limit), but even worse IPoIB and SDP performance with kernels 2.6.22 and
2.6.23.9 and Ofed 1.3b



IPoIB test(iperf), IPoIB in connected mode, MTU 65520:
#ib2 is SDR, ib1 is DDR
#SDR:
s3n2:~ # iperf -c cic-s3n1

Client connecting to cic-s3n1, TCP port 5001
TCP window size: 1.00 MByte (default)

[  3] local 192.168.1.2 port 50598 connected with 192.168.1.1 port 5001
[  3]  0.0-10.0 sec  6.28 GBytes  5.40 Gbits/sec

#DDR:
s3n2:~ # iperf -c cic-s3n1

Client connecting to cic-s3n1, TCP port 5001
TCP window size: 1.00 MByte (default)

[  3] local 192.168.1.2 port 32935 connected with 192.168.1.1 port 5001
[  3]  0.0-10.0 sec  6.91 GBytes  5.93 Gbits/sec


Now the increase is only 0.5Gbit

And finally a test with SDP:

DDR:
s3n2:~ # LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=ok iperf -c cic-s3n1

Client connecting to cic-s3n1, TCP port 5001
TCP window size: 3.91 MByte (default)

[  4] local 192.168.1.2 port 58186 connected with 192.168.1.1 port 5001
[  4]  0.0-10.0 sec  7.72 GBytes  6.63 Gbits/sec

#SDR:
s3n2:~ # LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=ok iperf -c cic-s3n1

Client connecting to cic-s3n1, TCP port 5001
TCP window size: 3.91 MByte (default)

[  4] local 192.168.1.2 port 58187 connected with 192.168.1.1 port 5001
[  4]  0.0-10.0 sec  7.70 GBytes  6.61 Gbits/sec

With SDP there is even no difference anymore between the 2 boards.


Even when using multiple connections(using 3 servers(s3s2,s3s3,s3s4),
x3655, 2.6.22, connecting all to one(s3s1) over 

Re: [ofa-general] [PATCH ofed-1.3] IB/IPoIB: Restore support for interface statistics

2007-11-28 Thread Jack Morgenstein
On Wednesday 28 November 2007 09:20, Moni Shoua wrote:
 While moving to kernel 2.6.24 in OFED, the function for getting interface 
 statistics got lost. This is a backport patch to re-enable net device 
 statistics for kernels that do not have the struct net_device_stats 
 in struct netdevice.
 
 This patch fixes bug 790.
 
Thanks Moni, applied.

I actually applied the patch so that it created the various backport files,
then committed all the backport files together in a single commit, with your
authorship and signed-off-by.

(I probably should have added myself as well, below your signed-off -- since
I changed the commit format -- but I forgot to do this; sorry about that).

- Jack
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [ofa-general] DDR vs SDR performance

2007-11-28 Thread Gilad Shainer
Is the chipset in your servers HT2000? 

Gilad.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Stijn De
Smet
Sent: Wednesday, November 28, 2007 6:43 AM
To: general@lists.openfabrics.org
Subject: [ofa-general] DDR vs SDR performance

Hello,

I have a problem with the DDR performance:

Configuration:
2 servers (IBM x3755, equiped with 4 dualcore opteron and 16GB RAM)
3 HCA's installed (2 Cisco DDR(Cheetah) and 1 Cisco dual SDR(LionMini),
all PCI-e x8), all DDR HCA's at newest Cisco Firmware v1.2.917 build
3.2.0.149, with label 'HCA.Cheetah-DDR.20'

The  DDR's are connected with a cable, and s3n1 is running a SM. The SDR
boards are connected over a Cisco SFS-7000D, but the DDR performance is
+- the same over this SFS-7000D

Both servers are running SLES10-SP1 with Ofed 1.2.5.


s3n1:~ # ibstatus
Infiniband device 'mthca0' port 1 status:-- DDR board #1, not
connected
default gid: fe80::::0005:ad00:000b:cb39
base lid:0x0
sm lid:  0x0
state:   1: DOWN
phys state:  2: Polling
rate:10 Gb/sec (4X)

Infiniband device 'mthca1' port 1 status:  --- DDR board #2, connected
with cable
default gid: fe80::::0005:ad00:000b:cb31
base lid:0x16
sm lid:  0x16
state:   4: ACTIVE
phys state:  5: LinkUp
rate:20 Gb/sec (4X DDR)

Infiniband device 'mthca2' port 1 status: --- SDR board, only port 1
connected to the SFS-7000D
default gid: fe80::::0005:ad00:0008:a8d9
base lid:0x3
sm lid:  0x2
state:   4: ACTIVE
phys state:  5: LinkUp
rate:10 Gb/sec (4X)

Infiniband device 'mthca2' port 2 status:
default gid: fe80::::0005:ad00:0008:a8da
base lid:0x0
sm lid:  0x0
state:   1: DOWN
phys state:  2: Polling
rate:10 Gb/sec (4X)


RDMA test of :
-- SDR:
s3n2:~ # ib_rdma_bw -d mthca2 gpfs3n1
7190: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000
| duplex=0 | cma=0 |
7190: Local address:  LID 0x05, QPN 0x0408, PSN 0xf10f03 RKey 0x003b00
VAddr 0x002ba7b9943000
7190: Remote address: LID 0x03, QPN 0x040a, PSN 0xa9cf5c, RKey 0x003e00
VAddr 0x002adb2f3bb000


7190: Bandwidth peak (#0 to #989): 937.129 MB/sec
7190: Bandwidth average: 937.095 MB/sec
7190: Service Demand peak (#0 to #989): 2709 cycles/KB
7190: Service Demand Avg  : 2709 cycles/KB

-- DDR
s3n2:~ # ib_rdma_bw -d mthca1 gpfs3n1
7191: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000
| duplex=0 | cma=0 |
7191: Local address:  LID 0x10, QPN 0x0405, PSN 0x5e19e RKey 0x002600
VAddr 0x002b76eab2
7191: Remote address: LID 0x16, QPN 0x0405, PSN 0xdd976e, RKey
0x80002900 VAddr 0x002ba8ed10e000


7191: Bandwidth peak (#0 to #990): 1139.32 MB/sec
7191: Bandwidth average: 1139.31 MB/sec
7191: Service Demand peak (#0 to #990): 2228 cycles/KB
7191: Service Demand Avg  : 2228 cycles/KB

So only 200MB/s increase between SDR and DDR With comparable
hardware(x3655, dual dualcore opteron, 8GB RAM), I get a little bit
better RDMA performance(1395MB/s so close to the PCI-e x8 limit), but
even worse IPoIB and SDP performance with kernels 2.6.22 and
2.6.23.9 and Ofed 1.3b



IPoIB test(iperf), IPoIB in connected mode, MTU 65520:
#ib2 is SDR, ib1 is DDR
#SDR:
s3n2:~ # iperf -c cic-s3n1

Client connecting to cic-s3n1, TCP port 5001 TCP window size: 1.00 MByte
(default)

[  3] local 192.168.1.2 port 50598 connected with 192.168.1.1 port 5001
[  3]  0.0-10.0 sec  6.28 GBytes  5.40 Gbits/sec

#DDR:
s3n2:~ # iperf -c cic-s3n1

Client connecting to cic-s3n1, TCP port 5001 TCP window size: 1.00 MByte
(default)

[  3] local 192.168.1.2 port 32935 connected with 192.168.1.1 port 5001
[  3]  0.0-10.0 sec  6.91 GBytes  5.93 Gbits/sec


Now the increase is only 0.5Gbit

And finally a test with SDP:

DDR:
s3n2:~ # LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=ok iperf -c cic-s3n1

Client connecting to cic-s3n1, TCP port 5001 TCP window size: 3.91 MByte
(default)

[  4] local 192.168.1.2 port 58186 connected with 192.168.1.1 port 5001
[  4]  0.0-10.0 sec  7.72 GBytes  6.63 Gbits/sec

#SDR:
s3n2:~ # LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=ok iperf -c cic-s3n1

Client connecting to cic-s3n1, TCP port 5001 TCP window size: 3.91 MByte
(default)

[  4] local 192.168.1.2 port 58187 

Re: [ofa-general] DDR vs SDR performance

2007-11-28 Thread Stijn De Smet
One ServerWorks HT2100 A PCI Express Bridge, one HT2100 B PCI Express
Bridge, and
one ServerWorks HT1000 South Bridge

Regards,
Stijn

Gilad Shainer wrote:
 Is the chipset in your servers HT2000? 

 Gilad.

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Stijn De
 Smet
 Sent: Wednesday, November 28, 2007 6:43 AM
 To: general@lists.openfabrics.org
 Subject: [ofa-general] DDR vs SDR performance

 Hello,

 I have a problem with the DDR performance:

 Configuration:
 2 servers (IBM x3755, equiped with 4 dualcore opteron and 16GB RAM)
 3 HCA's installed (2 Cisco DDR(Cheetah) and 1 Cisco dual SDR(LionMini),
 all PCI-e x8), all DDR HCA's at newest Cisco Firmware v1.2.917 build
 3.2.0.149, with label 'HCA.Cheetah-DDR.20'

 The  DDR's are connected with a cable, and s3n1 is running a SM. The SDR
 boards are connected over a Cisco SFS-7000D, but the DDR performance is
 +- the same over this SFS-7000D

 Both servers are running SLES10-SP1 with Ofed 1.2.5.


 s3n1:~ # ibstatus
 Infiniband device 'mthca0' port 1 status:-- DDR board #1, not
 connected
 default gid: fe80::::0005:ad00:000b:cb39
 base lid:0x0
 sm lid:  0x0
 state:   1: DOWN
 phys state:  2: Polling
 rate:10 Gb/sec (4X)

 Infiniband device 'mthca1' port 1 status:  --- DDR board #2, connected
 with cable
 default gid: fe80::::0005:ad00:000b:cb31
 base lid:0x16
 sm lid:  0x16
 state:   4: ACTIVE
 phys state:  5: LinkUp
 rate:20 Gb/sec (4X DDR)

 Infiniband device 'mthca2' port 1 status: --- SDR board, only port 1
 connected to the SFS-7000D
 default gid: fe80::::0005:ad00:0008:a8d9
 base lid:0x3
 sm lid:  0x2
 state:   4: ACTIVE
 phys state:  5: LinkUp
 rate:10 Gb/sec (4X)

 Infiniband device 'mthca2' port 2 status:
 default gid: fe80::::0005:ad00:0008:a8da
 base lid:0x0
 sm lid:  0x0
 state:   1: DOWN
 phys state:  2: Polling
 rate:10 Gb/sec (4X)


 RDMA test of :
 -- SDR:
 s3n2:~ # ib_rdma_bw -d mthca2 gpfs3n1
 7190: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000
 | duplex=0 | cma=0 |
 7190: Local address:  LID 0x05, QPN 0x0408, PSN 0xf10f03 RKey 0x003b00
 VAddr 0x002ba7b9943000
 7190: Remote address: LID 0x03, QPN 0x040a, PSN 0xa9cf5c, RKey 0x003e00
 VAddr 0x002adb2f3bb000


 7190: Bandwidth peak (#0 to #989): 937.129 MB/sec
 7190: Bandwidth average: 937.095 MB/sec
 7190: Service Demand peak (#0 to #989): 2709 cycles/KB
 7190: Service Demand Avg  : 2709 cycles/KB

 -- DDR
 s3n2:~ # ib_rdma_bw -d mthca1 gpfs3n1
 7191: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000
 | duplex=0 | cma=0 |
 7191: Local address:  LID 0x10, QPN 0x0405, PSN 0x5e19e RKey 0x002600
 VAddr 0x002b76eab2
 7191: Remote address: LID 0x16, QPN 0x0405, PSN 0xdd976e, RKey
 0x80002900 VAddr 0x002ba8ed10e000


 7191: Bandwidth peak (#0 to #990): 1139.32 MB/sec
 7191: Bandwidth average: 1139.31 MB/sec
 7191: Service Demand peak (#0 to #990): 2228 cycles/KB
 7191: Service Demand Avg  : 2228 cycles/KB

 So only 200MB/s increase between SDR and DDR With comparable
 hardware(x3655, dual dualcore opteron, 8GB RAM), I get a little bit
 better RDMA performance(1395MB/s so close to the PCI-e x8 limit), but
 even worse IPoIB and SDP performance with kernels 2.6.22 and
 2.6.23.9 and Ofed 1.3b



 IPoIB test(iperf), IPoIB in connected mode, MTU 65520:
 #ib2 is SDR, ib1 is DDR
 #SDR:
 s3n2:~ # iperf -c cic-s3n1
 
 Client connecting to cic-s3n1, TCP port 5001 TCP window size: 1.00 MByte
 (default)
 
 [  3] local 192.168.1.2 port 50598 connected with 192.168.1.1 port 5001
 [  3]  0.0-10.0 sec  6.28 GBytes  5.40 Gbits/sec

 #DDR:
 s3n2:~ # iperf -c cic-s3n1
 
 Client connecting to cic-s3n1, TCP port 5001 TCP window size: 1.00 MByte
 (default)
 
 [  3] local 192.168.1.2 port 32935 connected with 192.168.1.1 port 5001
 [  3]  0.0-10.0 sec  6.91 GBytes  5.93 Gbits/sec


 Now the increase is only 0.5Gbit

 And finally a test with SDP:

 DDR:
 s3n2:~ # LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=ok iperf -c cic-s3n1
 
 Client connecting to cic-s3n1, TCP port 5001 TCP window size: 3.91 MByte
 (default)
 
 [  4] local 192.168.1.2 port 58186 connected with 192.168.1.1 port 5001
 [  4]  0.0-10.0 sec  7.72 GBytes  6.63 Gbits/sec

 #SDR:
 s3n2:~ # LD_PRELOAD=libsdp.so 

[ofa-general] Re: i got kernel oops in ib_umad when executing ULPs tests

2007-11-28 Thread Sasha Khapyorsky
Hi Dotan,

On 11:24 Tue 27 Nov , Dotan Barak wrote:
  Hi.
 
  When executing SDP tests (stress_connect) i got a kernel oops in my machine 
  in ib_umad:

Is it reproducible somehow?

 
  Here are the machine props:
  *
  Host Name : sw112/3
  Host Architecture : x86_64
  Linux Distribution: SUSE Linux Enterprise Server 10 (x86_64) VERSION = 10
  Kernel Version: 2.6.16.21-0.8-smp
  GCC Version   : gcc (GCC) 4.1.0 (SUSE Linux)
  Memory size   : 4049452 kB
  Number of CPUs: 4
  cpu MHz   : 3192.308
  MST Version   : 4.4.3
  Driver Version: ofa_1_3_dev-20071126-0855
  HCA ID(s) : mlx4_0
  HCA model(s)  : 25418
  Board(s)  : MT_04A0110002
  *
 
  Here is the dump of the /var/log/messages:
  Nov 27 09:26:32 sw112 OpenSM[24713]: Exiting SM
  Nov 27 09:26:32 sw112 kernel: general protection fault:  [1] SMP
  Nov 27 09:26:32 sw112 kernel: last sysfs file: /class/net/ib0/address
  Nov 27 09:26:32 sw112 kernel: CPU 2
  Nov 27 09:26:32 sw112 kernel: Modules linked in: mst_pciconf mst_pci 
  rdma_ucm rds ib_sdp rdma_cm iw_cm ib_addr ib_ipoib ib_c
  m ib_sa ib_uverbs ib_umad mlx4_ib mlx4_core ib_mthca ib_mad ib_core memtrack 
  autofs4 ipv6 nfs lockd nfs_acl sunrpc af_packet
  button battery ac apparmor aamatch_pcre loop dm_mod ide_cd uhci_hcd ehci_hcd 
  cdrom shpchp pci_hotplug hw_random i8xx_tco us
  bcore e1000 ext3 jbd edd fan thermal processor sg mptspi mptscsih mptbase 
  scsi_transport_spi piix sd_mod scsi_mod ide_disk i
  de_core
  Nov 27 09:26:32 sw112 kernel: Pid: 24713, comm: opensm Tainted: PFU 
  2.6.16.21-0.8-smp #1
  Nov 27 09:26:32 sw112 kernel: RIP: 0010:[8837d39f] 
  8837d39f{:ib_umad:dequeue_send+26}
  Nov 27 09:26:32 sw112 kernel: RSP: 0018:8100c0d9fde8  EFLAGS: 00010046
  Nov 27 09:26:32 sw112 kernel: RAX: 8100c1a95658 RBX: 3f40a6f32b5a2004 
  RCX: 3f40a6f32b5a2014
  Nov 27 09:26:32 sw112 kernel: RDX: 8100c0d9fe58 RSI: 3f40a6f32b5a2004 
  RDI: 81007401ac3c
  Nov 27 09:26:32 sw112 kernel: RBP: 3f40a6f32b5a2004 R08: 0206 
  R09: 07d7
  Nov 27 09:26:32 sw112 kernel: R10:  R11: 0246 
  R12: 81007401ac00
  Nov 27 09:26:32 sw112 kernel: R13: 81007401a210 R14: 0005 
  R15: 
  Nov 27 09:26:32 sw112 kernel: FS:  2b13822edef0() 
  GS:81012bd6b340() knlGS:
  Nov 27 09:26:32 sw112 kernel: CS:  0010 DS:  ES:  CR0: 
  8005003b
  Nov 27 09:26:32 sw112 kernel: CR2: 005d99c0 CR3: 37079000 
  CR4: 06e0
  Nov 27 09:26:32 sw112 kernel: Process opensm (pid: 24713, threadinfo 
  8100c0d9e000, task 8100cd8047d0)
  Nov 27 09:26:32 sw112 kernel: Stack: 81012d706b10 8100c0d9fe68 
  81007401ac00 8837d4b1
  Nov 27 09:26:32 sw112 kernel:0296 8100c0d9fe40 
  81007401a210 81007401a200
  Nov 27 09:26:32 sw112 kernel:0005 8827261e
  Nov 27 09:26:32 sw112 kernel: Call Trace: 
  8837d4b1{:ib_umad:send_handler+38}
  Nov 27 09:26:32 sw112 kernel:
  8827261e{:ib_mad:ib_unregister_mad_agent+359}
  Nov 27 09:26:32 sw112 kernel:
  8837d26b{:ib_umad:ib_umad_unreg_agent+121}
  Nov 27 09:26:32 sw112 kernel:
  8837db37{:ib_umad:ib_umad_ioctl+74} 
  8018b6b9{do_ioctl+33}
  Nov 27 09:26:32 sw112 kernel:8018b94b{vfs_ioctl+584} 
  801e7e6b{__up_write+33}
  Nov 27 09:26:32 sw112 kernel:8018b9c6{sys_ioctl+98} 
  8010a7be{system_call+126}
  Nov 27 09:26:32 sw112 kernel:
  Nov 27 09:26:32 sw112 kernel: Code: 48 8b 53 10 48 8b 41 08 48 89 42 08 48 
  89 10 48 c7 41 08 00
  Nov 27 09:26:32 sw112 kernel: RIP 
  8837d39f{:ib_umad:dequeue_send+26} RSP 8100c0d9fde8
 
 
 
  Here is the dump of /var/log/opensm.log:
 
  Nov 27 09:26:44 546327 [D6AC7EF0] 0x03 - OpenSM 3.1.7
  Nov 27 09:26:44 546407 [D6AC7EF0] 0x80 - OpenSM 3.1.7
  Nov 27 09:26:44 547422 [D6AC7EF0] 0x02 - osm_vendor_bind: Binding to port 
  0x4025
   ^^
Is this a valid GUID?

  Nov 27 09:26:44 673957 [D6AC7EF0] 0x01 - osm_vendor_bind: ERR 5426: Unable 
  to register class 129 version 1
  Nov 27 09:26:44 674032 [D6AC7EF0] 0x01 - osm_sm_mad_ctrl_bind: ERR 3118: 
  Vendor specific bind failed
  Nov 27 09:26:44 674057 [D6AC7EF0] 0x01 - osm_sm_bind: ERR 2E10: SM MAD 
  Controller bind failed (IB_ERROR)
  Nov 27 09:26:44 674089 [D6AC7EF0] 0x01 - osm_sa_mad_ctrl_unbind: ERR 1A11: 
  No previous bind
  Nov 27 09:26:44 675165 [D6AC7EF0] 0x80 - Exiting SM
 
 
  can you check this issue?

Could you send OpenSM log file too?

Sasha
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please 

[ofa-general] [PATCH] return ENOSYS instead of -ENOSYS

2007-11-28 Thread Gleb Natapov
Return ENOSYS instead of -ENOSYS. We are not in the kernel.


diff --git a/src/verbs.c b/src/verbs.c
index 4e7beff..7fa1dbc 100644
--- a/src/verbs.c
+++ b/src/verbs.c
@@ -227,7 +227,7 @@ err:
 int mlx4_resize_cq(struct ibv_cq *ibcq, int cqe)
 {
/* XXX resize CQ not implemented */
-   return -ENOSYS;
+   return ENOSYS;
 }
 
 int mlx4_destroy_cq(struct ibv_cq *cq)
--
Gleb.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [ofa-general] DDR vs SDR performance

2007-11-28 Thread Gilad Shainer
Here are some notes. You can contact me directly for more info.
1. You do not compare the same HW. The single port IB HCAs provides
difference performance than the dual port devices. If you want to see
the difference between SDR and DDR, you need to use the same IB
configuration as well. 
2. Saying that, with the single port DDR you should get around 1400MB/s
with the RDMA tests but:
- The benchmark you are using is not supported any more (well, for long
time now). You should use the IB send, IB write etc tests
- On Opteron, the HTxx00 chipset configuration is very important (not
just for IB performance) 
- There is a difference of performance depends on the location of the
memory. If you will run the tests you will see numbers in the high 1300
and low 1100 (with your current chipset config)


Gilad.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Stijn De
Smet
Sent: Wednesday, November 28, 2007 7:02 AM
To: Gilad Shainer
Cc: general@lists.openfabrics.org
Subject: Re: [ofa-general] DDR vs SDR performance

One ServerWorks HT2100 A PCI Express Bridge, one HT2100 B PCI Express
Bridge, and one ServerWorks HT1000 South Bridge

Regards,
Stijn

Gilad Shainer wrote:
 Is the chipset in your servers HT2000? 

 Gilad.

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Stijn De 
 Smet
 Sent: Wednesday, November 28, 2007 6:43 AM
 To: general@lists.openfabrics.org
 Subject: [ofa-general] DDR vs SDR performance

 Hello,

 I have a problem with the DDR performance:

 Configuration:
 2 servers (IBM x3755, equiped with 4 dualcore opteron and 16GB RAM)
 3 HCA's installed (2 Cisco DDR(Cheetah) and 1 Cisco dual 
 SDR(LionMini), all PCI-e x8), all DDR HCA's at newest Cisco Firmware 
 v1.2.917 build 3.2.0.149, with label 'HCA.Cheetah-DDR.20'

 The  DDR's are connected with a cable, and s3n1 is running a SM. The 
 SDR boards are connected over a Cisco SFS-7000D, but the DDR 
 performance is
 +- the same over this SFS-7000D

 Both servers are running SLES10-SP1 with Ofed 1.2.5.


 s3n1:~ # ibstatus
 Infiniband device 'mthca0' port 1 status:-- DDR board #1, not
 connected
 default gid: fe80::::0005:ad00:000b:cb39
 base lid:0x0
 sm lid:  0x0
 state:   1: DOWN
 phys state:  2: Polling
 rate:10 Gb/sec (4X)

 Infiniband device 'mthca1' port 1 status:  --- DDR board #2, 
 connected with cable
 default gid: fe80::::0005:ad00:000b:cb31
 base lid:0x16
 sm lid:  0x16
 state:   4: ACTIVE
 phys state:  5: LinkUp
 rate:20 Gb/sec (4X DDR)

 Infiniband device 'mthca2' port 1 status: --- SDR board, only port 1 
 connected to the SFS-7000D
 default gid: fe80::::0005:ad00:0008:a8d9
 base lid:0x3
 sm lid:  0x2
 state:   4: ACTIVE
 phys state:  5: LinkUp
 rate:10 Gb/sec (4X)

 Infiniband device 'mthca2' port 2 status:
 default gid: fe80::::0005:ad00:0008:a8da
 base lid:0x0
 sm lid:  0x0
 state:   1: DOWN
 phys state:  2: Polling
 rate:10 Gb/sec (4X)


 RDMA test of :
 -- SDR:
 s3n2:~ # ib_rdma_bw -d mthca2 gpfs3n1
 7190: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | 
 iters=1000
 | duplex=0 | cma=0 |
 7190: Local address:  LID 0x05, QPN 0x0408, PSN 0xf10f03 RKey 0x003b00

 VAddr 0x002ba7b9943000
 7190: Remote address: LID 0x03, QPN 0x040a, PSN 0xa9cf5c, RKey 
 0x003e00 VAddr 0x002adb2f3bb000


 7190: Bandwidth peak (#0 to #989): 937.129 MB/sec
 7190: Bandwidth average: 937.095 MB/sec
 7190: Service Demand peak (#0 to #989): 2709 cycles/KB
 7190: Service Demand Avg  : 2709 cycles/KB

 -- DDR
 s3n2:~ # ib_rdma_bw -d mthca1 gpfs3n1
 7191: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | 
 iters=1000
 | duplex=0 | cma=0 |
 7191: Local address:  LID 0x10, QPN 0x0405, PSN 0x5e19e RKey 0x002600 
 VAddr 0x002b76eab2
 7191: Remote address: LID 0x16, QPN 0x0405, PSN 0xdd976e, RKey 
 0x80002900 VAddr 0x002ba8ed10e000


 7191: Bandwidth peak (#0 to #990): 1139.32 MB/sec
 7191: Bandwidth average: 1139.31 MB/sec
 7191: Service Demand peak (#0 to #990): 2228 cycles/KB
 7191: Service Demand Avg  : 2228 cycles/KB

 So only 200MB/s increase between SDR and DDR With comparable 
 hardware(x3655, dual dualcore opteron, 8GB RAM), I get a little bit 
 better RDMA performance(1395MB/s so close to the PCI-e x8 limit), but 
 even worse IPoIB and SDP performance with kernels 2.6.22 and
 2.6.23.9 and Ofed 1.3b



 IPoIB test(iperf), IPoIB in connected mode, MTU 65520:
 #ib2 is SDR, ib1 is DDR
 #SDR:
 s3n2:~ # iperf -c cic-s3n1
 
 Client connecting to cic-s3n1, TCP port 5001 TCP window size: 1.00 
 MByte
 

Re: [ofa-general] [ANNOUNCE] ibsim-0.4 tarballs release

2007-11-28 Thread Hal Rosenstock
On Wed, 2007-11-28 at 12:50 +0530, Keshetti Mahesh wrote:
  ibutils maintainer is Oren Kladnitsky [EMAIL PROTECTED]
  Not sure if he monitors this list.
 
 Sorry, I actual wanted to know who are the developers of ibadm group
 of utilities.

ibadm or ibdm ? Your original question was about ibdm. ibdm is under the
ibutils tree. I don't think Mellanox has open sourced ibadm but I might
be wrong. Maybe it's just not part of OpenIB/OpenFabrics code.

   LASH resolves credit loops by using different VLs, I don't think ibdmchk
   takes this into account, but don't know for sure.
 
 Yes, I have verified in ibdmchk that it considers only one VL while
 checking for
 credit loops.
 
  I also think ibdmchk needs some support to handle LASH. I don't think it
  is currently supported by it (although that is not documented AFAIK).
 
 
 Is anyone currently working on this part (adding support to ibdmchk to
 handle LASH)
 in OFED community.

I seriously doubt it.

-- Hal

 -Mahesh
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] [PATCH] Bug fix IPOIB CM dereferencing invalid pointer - resend

2007-11-28 Thread Eli Cohen
Bug fix IPOIB CM dereferencing invalid pointer

When ipoib_neigh_free gets called it needs to set to NULL
its -cm-neigh member So that a completion with error reaching
ipoib_cm_handle_tx_wc will not access an invalid pointer.

Signed-off-by: Eli Cohen [EMAIL PROTECTED]
---

This is what I really meant to send


 drivers/infiniband/ulp/ipoib/ipoib_main.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index a03a65e..0c66723 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -869,6 +869,10 @@ void ipoib_neigh_free(struct net_device *dev, struct 
ipoib_neigh *neigh)
}
if (ipoib_cm_get(neigh))
ipoib_cm_destroy_tx(ipoib_cm_get(neigh));
+
+   if (neigh-cm)
+   neigh-cm-neigh = NULL;
+
kfree(neigh);
 }
 
-- 
1.5.3.6



___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] [PATCH] ipoib: Bug fix IPOIB CM dereferencing invalid pointer

2007-11-28 Thread Eli Cohen
Bug fix IPOIB CM dereferencing invalid pointer

When ipoib_neigh_free gets called it needs to set to NULL
its -cm member so that a completion with error reaching
ipoib_cm_handle_tx_wc will not access an invalid pointer.

Signed-off-by: Eli Cohen [EMAIL PROTECTED]
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index a03a65e..95c7714 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -869,6 +869,8 @@ void ipoib_neigh_free(struct net_device *dev, struct 
ipoib_neigh *neigh)
}
if (ipoib_cm_get(neigh))
ipoib_cm_destroy_tx(ipoib_cm_get(neigh));
+
+   neigh-cm = NULL;
kfree(neigh);
 }
 
-- 
1.5.3.6



___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] MTHCA driver from OFED 1.3a package

2007-11-28 Thread Jack Morgenstein
On Tuesday 27 November 2007 19:17, Lukas Hejtmanek wrote:
 On Tue, Nov 27, 2007 at 06:51:48PM +0200, Tziporet Koren wrote:
  just found, that OFED 1.3a with 2.6.23 kernel runs at 2/3 speed compared to
  2.6.23 kernel with built in driver. Any reason for this?

  Which benchmark?
 
 ib_rdma_bw
 ib_send_bw
 ibv_uc_pingpong
 
  Which HCA?
 
 Mellanox InfiniBand HCA, HCA.Cheetah-DDR.20.
 
  Is it the same with ofed beta release?
 
 Did you mean 1.3b? I have not tried it.
 
Which userspace libraries did you use with the built-in driver
of the 2.6.23 kernel?

- Jack
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] [PATCH] Bug fix IPOIB CM dereferencing invalid pointer - resend

2007-11-28 Thread Eli Cohen
Actually I see that tx-neigh is already set to NULL in
ipoib_cm_destroy_tx so this fixes nothing. Although when I did this my
system stopped crashing. I guess I have to dig farther. By the way this
happens when I run netperf UDP and the connection is closed during the
test runs.

On Wed, 2007-11-28 at 18:05 +0200, Eli Cohen wrote:
 Bug fix IPOIB CM dereferencing invalid pointer
 
 When ipoib_neigh_free gets called it needs to set to NULL
 its -cm-neigh member So that a completion with error reaching
 ipoib_cm_handle_tx_wc will not access an invalid pointer.
 
 Signed-off-by: Eli Cohen [EMAIL PROTECTED]
 ---
 
 This is what I really meant to send
 
 
  drivers/infiniband/ulp/ipoib/ipoib_main.c |4 
  1 files changed, 4 insertions(+), 0 deletions(-)
 
 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
 b/drivers/infiniband/ulp/ipoib/ipoib_main.c
 index a03a65e..0c66723 100644
 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
 +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
 @@ -869,6 +869,10 @@ void ipoib_neigh_free(struct net_device *dev, struct 
 ipoib_neigh *neigh)
   }
   if (ipoib_cm_get(neigh))
   ipoib_cm_destroy_tx(ipoib_cm_get(neigh));
 +
 + if (neigh-cm)
 + neigh-cm-neigh = NULL;
 +
   kfree(neigh);
  }
  

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [ofa-general] Re: iWARP peer-to-peer CM proposal

2007-11-28 Thread Caitlin Bestler


 -Original Message-
 From: Steve Wise [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, November 27, 2007 4:48 PM
 To: Caitlin Bestler
 Cc: Kanevsky, Arkady; Glenn Grundstrom; Leonid Grossman; openib-
 [EMAIL PROTECTED]
 Subject: Re: [ofa-general] Re: iWARP peer-to-peer CM proposal
 
 Caitlin Bestler wrote:
  On Nov 27, 2007 3:58 PM, Steve Wise [EMAIL PROTECTED]
 wrote:
 
  For the short term, I claim we just implement this as part of linux
  iwarp connection setup (mandating a 0B read be sent from the active
  side).  Your proposal to add meta-data to the private data requires
 a
  standards change anyway and is, IMO, the 2nd phase of this whole
  enchilada...
 
  Steve.
 
 
  I don't see how you can have any solution here that does not require
 meta-data.
  For non-peer-to-peer connections neither a zero length RDMA Read or
 Write
  should be sent. An extraneous RDMA Read is particularly onerous for a
 short
  lived connection that fits the classic active/passive model. So
 *something*
  is telling the CMA layer that this connection may need an MPA unjam
 action.
  If that isn't meta-data, what is it?
 
 I assumed the 0B read would _always_ be sent as part of establishing an
 iWARP connection using linux and the rdma-cm.
 

That is an extra round-trip per connection setup, which is a significant
penalty for a short lived connection. It is trivial for HPC/peer-to-peer
applications, but would be a killer for something like HTTP over RDMA.

Doing something like this for *every* connection makes it effectively
a change to the MPA protocol. OFA is not the forum for such discussions,
the IETF is.

OFA drafting an understanding of how peer-to-peer applications use the
existing protocol, on the other hand, is quite reasonable. But it has
to be something done by peer-to-peer middleware or by the verbs layer
in response to a flag from the peer-to-peer middleware. Otherwise it
is not augmenting a protocol, it is changing it.

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] Re: iWARP peer-to-peer CM proposal

2007-11-28 Thread Steve Wise

Caitlin Bestler wrote:



-Original Message-
From: Steve Wise [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 27, 2007 4:48 PM
To: Caitlin Bestler
Cc: Kanevsky, Arkady; Glenn Grundstrom; Leonid Grossman; openib-
[EMAIL PROTECTED]
Subject: Re: [ofa-general] Re: iWARP peer-to-peer CM proposal

Caitlin Bestler wrote:

On Nov 27, 2007 3:58 PM, Steve Wise [EMAIL PROTECTED]

wrote:

For the short term, I claim we just implement this as part of linux
iwarp connection setup (mandating a 0B read be sent from the active
side).  Your proposal to add meta-data to the private data requires

a

standards change anyway and is, IMO, the 2nd phase of this whole
enchilada...

Steve.


I don't see how you can have any solution here that does not require

meta-data.

For non-peer-to-peer connections neither a zero length RDMA Read or

Write

should be sent. An extraneous RDMA Read is particularly onerous for a

short

lived connection that fits the classic active/passive model. So

*something*

is telling the CMA layer that this connection may need an MPA unjam

action.

If that isn't meta-data, what is it?

I assumed the 0B read would _always_ be sent as part of establishing an
iWARP connection using linux and the rdma-cm.



That is an extra round-trip per connection setup, which is a significant
penalty for a short lived connection. It is trivial for HPC/peer-to-peer
applications, but would be a killer for something like HTTP over RDMA.

Doing something like this for *every* connection makes it effectively
a change to the MPA protocol. OFA is not the forum for such discussions,
the IETF is.

OFA drafting an understanding of how peer-to-peer applications use the
existing protocol, on the other hand, is quite reasonable. But it has
to be something done by peer-to-peer middleware or by the verbs layer
in response to a flag from the peer-to-peer middleware. Otherwise it
is not augmenting a protocol, it is changing it.



posting a 0B read after the mpa setup isn't changing the MPA protocol. 
Its adding a protocol on top of the MPA setup in order to meet the 
requirements of the MPA protocol.  Whether you add a private-data 
request for this or _assume_ the 0B read will happen doesn't change this.




___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [ofa-general] Re: iWARP peer-to-peer CM proposal

2007-11-28 Thread Tom Tucker

On Wed, 2007-11-28 at 11:43 -0500, Caitlin Bestler wrote:
 
  -Original Message-
  From: Steve Wise [mailto:[EMAIL PROTECTED]
  Sent: Tuesday, November 27, 2007 4:48 PM
  To: Caitlin Bestler
  Cc: Kanevsky, Arkady; Glenn Grundstrom; Leonid Grossman; openib-
  [EMAIL PROTECTED]
  Subject: Re: [ofa-general] Re: iWARP peer-to-peer CM proposal
  
  Caitlin Bestler wrote:
   On Nov 27, 2007 3:58 PM, Steve Wise [EMAIL PROTECTED]
  wrote:
  
   For the short term, I claim we just implement this as part of linux
   iwarp connection setup (mandating a 0B read be sent from the active
   side).  Your proposal to add meta-data to the private data requires
  a
   standards change anyway and is, IMO, the 2nd phase of this whole
   enchilada...
  
   Steve.
  
  
   I don't see how you can have any solution here that does not require
  meta-data.
   For non-peer-to-peer connections neither a zero length RDMA Read or
  Write
   should be sent. An extraneous RDMA Read is particularly onerous for a
  short
   lived connection that fits the classic active/passive model. So
  *something*
   is telling the CMA layer that this connection may need an MPA unjam
  action.
   If that isn't meta-data, what is it?
  
  I assumed the 0B read would _always_ be sent as part of establishing an
  iWARP connection using linux and the rdma-cm.
  
 
 That is an extra round-trip per connection setup, which is a significant
 penalty for a short lived connection. It is trivial for HPC/peer-to-peer
 applications, but would be a killer for something like HTTP over RDMA.
 

I find it hard to get excited about optimizing short lived connections
for RDMA. I simply don't think it's an interesting use case. And btw,
HTTP long ago got rid of short lived connections because it's painful
even on TCP.

 Doing something like this for *every* connection makes it effectively
 a change to the MPA protocol.

Uh. No, it doesn't. Normalizing the behavior of applications during
connection setup doesn't change the underlying protocol. It adds another
one on top.

  OFA is not the forum for such discussions,
 the IETF is.

My living room, the dinner table, the local bar and this mailing list
are perfectly acceptable forums for discussing a protocol. The IETF is
the forum for standardizing one. Right now, I don't think we're ready to
standardize, because we're still exploring the options; the first of
which is NOT changing MPA.

This group has the unique benefit of actually USING and IMPLEMENTING the
protocol and therefore has some beneficial insights that may and should
be shared. All that said revving the MPA protocol is way down the road. 

 
 OFA drafting an understanding of how peer-to-peer applications use the
 existing protocol, on the other hand, is quite reasonable. 

That's step 1 and the 0B READ is one way to do it.

 But it has
 to be something done by peer-to-peer middleware or by the verbs layer
 in response to a flag from the peer-to-peer middleware. Otherwise it
 is not augmenting a protocol, it is changing it.
 

The flag may be useful, however, I don't see the connection between the
flag and complying with the MPA protocol.

 ___
 general mailing list
 general@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: iWARP peer-to-peer CM proposal

2007-11-28 Thread Steve Wise

Kanevsky, Arkady wrote:

ULP can post recvs before connection is established but not to send
queue
prior to connection establishment.



I hate quoting specs (and the RDMAC verbs spec isn't really any 
standard), but, page 25 of draft-hilland-iwarp-verbs-v1.0 indicates its 
ok to post SQ WRs when in idle:



The QP MUST be in the Idle state following QP creation or when moved to 
this state with Modify QP. In this state, Send or Receive WRs MAY be 
posted but they MUST NOT be processed and CQEs MUST NOT be generated.


___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: iWARP peer-to-peer CM proposal

2007-11-28 Thread Steve Wise

Kanevsky, Arkady wrote:

Agree with initiator/client sending signalled 0B RDMA Read.
This will handle client side.

Still not 100% clear on passive/server side.
Two issues which bothers me.
1. Is bogus S-tag allowed for incomming RDMA ops?


The stag/to must not be validated if the incoming read is 0B length.

http://www.ietf.org/rfc/rfc5040.txt:


*  If the Data Source receives an RDMA Read Request Header with the
  RDMA Read Message Size set to zero, the Data Source RDMAP:

  *  MUST NOT validate the Data Source STag and Data Source Tagged
 Offset contained in the RDMA Read Request Header, and

  *  MUST respond with a zero-length RDMA Read Response Message.



___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] RE: [PATCH] librdmacm/man: fix-up man pages

2007-11-28 Thread Sean Hefty
Some users have approached me and said that its unclear from the man
pages for some values of the connection param structure what are their
legal values. Reviewing this a little, I think we should add the
maximum values for the retry_count and rnr_retry_count under the
infiniband specific section of the rdma_connect and rdma_accept pages.

I can do this.

Also, what about pushing all these documentation changes as a release
to OFED 1.3?

I'm holding off on a release until I'm fairly sure that all of the documentation
changes are in.  I don't foresee a problem getting documentation only changes
into OFED 1.3 though.

- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [ofa-general] Re: iWARP peer-to-peer CM proposal

2007-11-28 Thread Kanevsky, Arkady
Another small discreptancy between IB and iWARP.
Since RDMA_CM is used for ULP which are transport
independent they will follow the stricter rule.
That is IB. For IB any posting to SQ prior to QP
being in RTS state shall be flushed.

This semantic is actually very useful for ULPs which
use insignalled completions. Because, once you see
the completion for the request you posted after connection
failure you are sure that all previously posted request on the
same SQ are completed and had you had seen them all.

So while, you are correct on the spec since we are working
in IW_CM we can assume IB semantic on posting.

Thanks,

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Steve Wise [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, November 28, 2007 1:52 PM
 To: Kanevsky, Arkady
 Cc: Glenn Grundstrom; Leonid Grossman; [EMAIL PROTECTED]
 Subject: Re: [ofa-general] Re: iWARP peer-to-peer CM proposal
 
 Kanevsky, Arkady wrote:
  ULP can post recvs before connection is established but not to send 
  queue prior to connection establishment.
  
 
 I hate quoting specs (and the RDMAC verbs spec isn't really any 
 standard), but, page 25 of draft-hilland-iwarp-verbs-v1.0 
 indicates its 
 ok to post SQ WRs when in idle:
 
 
 The QP MUST be in the Idle state following QP creation or 
 when moved to 
 this state with Modify QP. In this state, Send or Receive WRs MAY be 
 posted but they MUST NOT be processed and CQEs MUST NOT be generated.
 
 
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [PATCH] librdmacm/man: fix-up man pages

2007-11-28 Thread Or Gerlitz
On 11/28/07, Sean Hefty [EMAIL PROTECTED] wrote:
 Reviewing this a little, I think we should add the
 maximum values for the retry_count and rnr_retry_count under the
 infiniband specific section of the rdma_connect and rdma_accept pages.

 I can do this.

thanks.

 I'm holding off on a release until I'm fairly sure that all of the 
 documentation
 changes are in.  I don't foresee a problem getting documentation only changes
 into OFED 1.3 though.

indeed, cool.

Or.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: iWARP peer-to-peer CM proposal

2007-11-28 Thread Or Gerlitz
On 11/29/07, Kanevsky, Arkady [EMAIL PROTECTED] wrote:
 So while, you are correct on the spec since we are working
 in IW_CM we can assume IB semantic on posting.

please spend a minute on
http://www.zip.com.au/~akpm/linux/patches/stuff/top-posting.txt

Or.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] OFA server patching

2007-11-28 Thread Jeff Becker
Hi all. In the interest of keeping our server up to date, I applied the
latest Ubuntu patches. Several upgrades were made, including git. If you
have any problems, let me know. Thanks.

-jeff
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] [PATCH] opensm: allow multiple scopes in a partition

2007-11-28 Thread Rolf Manderscheid
Hi Sasha,

This patch allows multiple scopes to be configured for a partition.
This allows ipoib interfaces with different scopes to coexist in a
partition.  The partition configuration file can now have multiple
scope=N flags and they all take effect (instead of just the last one).

Signed-off-by: Rolf Manderscheid [EMAIL PROTECTED]

--

diff --git a/opensm/man/opensm.8 b/opensm/man/opensm.8
index efd6ff0..c51f386 100644
--- a/opensm/man/opensm.8
+++ b/opensm/man/opensm.8
@@ -366,7 +366,8 @@ Currently recognized flags are:
  sl=val- specifies SL for this IPoIB MC group
(default is 0)
  scope=val - specifies scope for this IPoIB MC group
-   (default is 2 (link local))
+   (default is 2 (link local)).  Multiple scope settings
+   are permitted for a partition.
 
 Note that values for rate, mtu, and scope should be specified as
 defined in the IBTA specification (for example, mtu=4 for 2048).
diff --git a/opensm/opensm/osm_prtn_config.c b/opensm/opensm/osm_prtn_config.c
index 1253031..646bf2a 100644
--- a/opensm/opensm/osm_prtn_config.c
+++ b/opensm/opensm/osm_prtn_config.c
@@ -68,7 +68,7 @@ struct part_conf {
osm_log_t *p_log;
osm_subn_t *p_subn;
osm_prtn_t *p_prtn;
-   unsigned is_ipoib, mtu, rate, sl, scope;
+   unsigned is_ipoib, mtu, rate, sl, scope_mask;
boolean_t full;
 };
 
@@ -89,6 +89,7 @@ static int partition_create(unsigned lineno, struct part_conf 
*conf,
char *name, char *id, char *flag, char *flag_val)
 {
uint16_t pkey;
+   unsigned int scope;
 
if (!id  name  isdigit(*name)) {
id = name;
@@ -119,12 +120,26 @@ static int partition_create(unsigned lineno, struct 
part_conf *conf,
}
conf-p_prtn-sl = (uint8_t) conf-sl;
 
-   if (conf-is_ipoib)
+   if (! conf-is_ipoib)
+   return 0;
+
+   if (! conf-scope_mask) {
osm_prtn_add_mcgroup(conf-p_log, conf-p_subn, conf-p_prtn,
 (uint8_t) conf-rate,
 (uint8_t) conf-mtu,
-(uint8_t) conf-scope);
+0);
+   return 0;
+   }
+
+   for (scope = 0; scope  16; scope++) {
+   if (((1scope)  conf-scope_mask) == 0)
+   continue;
 
+   osm_prtn_add_mcgroup(conf-p_log, conf-p_subn, conf-p_prtn,
+(uint8_t) conf-rate,
+(uint8_t) conf-mtu,
+(uint8_t) scope);
+   }
return 0;
 }
 
@@ -147,11 +162,13 @@ static int partition_add_flag(unsigned lineno, struct 
part_conf *conf,
flag \'rate\' requires valid value
 - skipped\n, lineno);
} else if (!strncmp(flag, scope, len)) {
-   if (!val || (conf-scope = strtoul(val, NULL, 0)) == 0)
+   unsigned int scope;
+   if (!val || (scope = strtoul(val, NULL, 0)) == 0 || scope  0xF)
osm_log(conf-p_log, OSM_LOG_VERBOSE,
PARSE WARN: line %d: 
flag \'scope\' requires valid value
 - skipped\n, lineno);
+   conf-scope_mask |= (1scope);
} else if (!strncmp(flag, sl, len)) {
unsigned sl;
char *end;
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [ofa-general] OFA server patching

2007-11-28 Thread Scott Weitzenkamp (sweitzen)
OFA bugzilla seems down, I get:

Software error:
Can't connect to the database.
Error: Access denied for user 'ofabug_user'@'localhost' (using password:
YES)
  Is your database installed and up and running?
  Do you have the correct username and password selected in localconfig?


For help, please send mail to the webmaster ([EMAIL PROTECTED]),
giving this error message and the time and date of the error. 

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems


 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Jeff Becker
 Sent: Wednesday, November 28, 2007 3:33 PM
 To: general@lists.openfabrics.org
 Subject: [ofa-general] OFA server patching
 
 Hi all. In the interest of keeping our server up to date, I 
 applied the
 latest Ubuntu patches. Several upgrades were made, including 
 git. If you
 have any problems, let me know. Thanks.
 
 -jeff
 ___
 general mailing list
 general@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
 
 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general
 
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] OFA server patching

2007-11-28 Thread Sasha Khapyorsky
On 15:32 Wed 28 Nov , Jeff Becker wrote:
 Hi all. In the interest of keeping our server up to date, I applied the
 latest Ubuntu patches. Several upgrades were made, including git.

git on the server was manually compiled and installed (from
~sashak/files/git-1.5.2). As far as I can see the same git installation
still be there.

Sasha
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] [PATCH] [RFC] rdma/ucm: add support for rdma_migrate_id()

2007-11-28 Thread Sean Hefty
This is based on user feedback from Doug Ledford at RedHat:

Events that occur on an rdma_cm_id are reported to userspace through
an event channel.  Connection request events are reported
on the event channel associated with the listen.  When the
connection is accepted, a new rdma_cm_id is created and automatically
uses the listen event channel.  This is suboptimal where the user
only wants listen events on that channel.

Additionally, it may be desirable to have events related to
connection establishment use a different event channel than those
related to already established connections.

Allow the user to migrate an rdma_cm_id between event channels.
All pending events associated with the rdma_cm_id are moved to the
new event channel.

Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
I will follow this post with a patch to the librdmacm to make use of this.
I wanted to get feedback on the approach, in particular about the locking
and use of fget().

 drivers/infiniband/core/ucma.c |   92 
 include/rdma/rdma_user_cm.h|   13 +-
 2 files changed, 104 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 90d675a..15937eb 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -31,6 +31,7 @@
  */
 
 #include linux/completion.h
+#include linux/file.h
 #include linux/mutex.h
 #include linux/poll.h
 #include linux/idr.h
@@ -991,6 +992,96 @@ out:
return ret;
 }
 
+static void ucma_lock_files(struct ucma_file *file1, struct ucma_file *file2)
+{
+   /* Acquire mutex's based on pointer comparison to prevent deadlock. */
+   if (file1  file2) {
+   mutex_lock(file1-mut);
+   mutex_lock(file2-mut);
+   } else {
+   mutex_lock(file2-mut);
+   mutex_lock(file1-mut);
+   }
+}
+
+static void ucma_unlock_files(struct ucma_file *file1, struct ucma_file *file2)
+{
+   if (file1  file2) {
+   mutex_unlock(file2-mut);
+   mutex_unlock(file1-mut);
+   } else {
+   mutex_unlock(file1-mut);
+   mutex_unlock(file2-mut);
+   }
+}
+
+static void ucma_move_events(struct ucma_context *ctx, struct ucma_file *file)
+{
+   struct ucma_event *uevent, *tmp;
+
+   list_for_each_entry_safe(uevent, tmp, ctx-file-event_list, list)
+   if (uevent-ctx == ctx)
+   list_move_tail(uevent-list, file-event_list);
+}
+
+static ssize_t ucma_migrate_id(struct ucma_file *new_file,
+  const char __user *inbuf,
+  int in_len, int out_len)
+{
+   struct rdma_ucm_migrate_id cmd;
+   struct rdma_ucm_migrate_resp resp;
+   struct ucma_context *ctx;
+   struct file *filp;
+   struct ucma_file *cur_file;
+   int ret = 0;
+
+   if (copy_from_user(cmd, inbuf, sizeof(cmd)))
+   return -EFAULT;
+
+   /* Get current fd to protect against it being closed */
+   filp = fget(cmd.fd);
+   if (!filp)
+   return -ENOENT;
+
+   /* Validate current fd and prevent destruction of id. */
+   ctx = ucma_get_ctx(filp-private_data, cmd.id);
+   if (IS_ERR(ctx)) {
+   ret = PTR_ERR(ctx);
+   goto file_put;
+   }
+
+   cur_file = ctx-file;
+   if (cur_file == new_file) {
+   resp.events_reported = ctx-events_reported;
+   goto response;
+   }
+
+   /*
+* Migrate events between fd's, maintaining order, and avoiding new
+* events being added before existing events.
+*/
+   ucma_lock_files(cur_file, new_file);
+   mutex_lock(mut);
+
+   list_move_tail(ctx-list, new_file-ctx_list);
+   ucma_move_events(ctx, new_file);
+   ctx-file = new_file;
+   resp.events_reported = ctx-events_reported;
+
+   mutex_unlock(mut);
+   ucma_unlock_files(cur_file, new_file);
+
+response:
+   if (copy_to_user((void __user *)(unsigned long)cmd.response,
+resp, sizeof(resp)))
+   ret = -EFAULT;
+
+   ucma_put_ctx(ctx);
+file_put:
+   fput(filp);
+   return ret;
+}
+
 static ssize_t (*ucma_cmd_table[])(struct ucma_file *file,
   const char __user *inbuf,
   int in_len, int out_len) = {
@@ -1012,6 +1103,7 @@ static ssize_t (*ucma_cmd_table[])(struct ucma_file *file,
[RDMA_USER_CM_CMD_NOTIFY]   = ucma_notify,
[RDMA_USER_CM_CMD_JOIN_MCAST]   = ucma_join_multicast,
[RDMA_USER_CM_CMD_LEAVE_MCAST]  = ucma_leave_multicast,
+   [RDMA_USER_CM_CMD_MIGRATE_ID]   = ucma_migrate_id
 };
 
 static ssize_t ucma_write(struct file *filp, const char __user *buf,
diff --git a/include/rdma/rdma_user_cm.h b/include/rdma/rdma_user_cm.h
index 9749c1b..c557054 100644
--- a/include/rdma/rdma_user_cm.h
+++ b/include/rdma/rdma_user_cm.h

[ofa-general] [PATCH] [RFC] librdmacm: add rdma_migrate_id

2007-11-28 Thread Sean Hefty
This is based on user feedback from Doug Ledford at RedHat:

Events that occur on an rdma_cm_id are reported to userspace through
an event channel.  Connection request events are reported
on the event channel associated with the listen.  When the
connection is accepted, a new rdma_cm_id is created and automatically
uses the listen event channel.  This is suboptimal where the user
only wants listen events on that channel.

Additionally, it may be desirable to have events related to
connection establishment use a different event channel than those
related to already established connections.

Allow the user to migrate an rdma_cm_id between event channels.

Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
I started to provide support for calling rdma_migrate_id() while the
user is polling for events or making other calls on the migrating id, but
while the complexity seemed doable, it just didn't seem justified
based on the expected usage model.  I believe that the kernel interface
allows this support to be added later, if it is needed.  For now, the
documentation simply states that the user can only migrate an id if they
are not processing events on the current event channel and not invoking
another call on that id simultaneously.


 Makefile.am |1 +
 examples/cmatose.c  |   59 +++
 include/rdma/rdma_cma.h |7 +
 include/rdma/rdma_cma_abi.h |   13 +
 man/rdma_migrate_id.3   |   27 
 man/ucmatose.1  |4 +++
 src/cma.c   |   35 ++
 src/librdmacm.map   |1 +
 8 files changed, 140 insertions(+), 7 deletions(-)

diff --git a/Makefile.am b/Makefile.am
index 77782da..290cbc3 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -54,6 +54,7 @@ man_MANS = \
man/rdma_join_multicast.3 \
man/rdma_leave_multicast.3 \
man/rdma_listen.3 \
+   man/rdma_migrate_id.3 \
man/rdma_notify.3 \
man/rdma_reject.3 \
man/rdma_resolve_addr.3 \
diff --git a/examples/cmatose.c b/examples/cmatose.c
index dcb6074..2f6e5f6 100644
--- a/examples/cmatose.c
+++ b/examples/cmatose.c
@@ -82,6 +82,7 @@ static int message_size = 100;
 static int message_count = 10;
 static uint8_t set_tos = 0;
 static uint8_t tos;
+static uint8_t migrate = 0;
 static char *dst_addr;
 static char *src_addr;
 
@@ -465,6 +466,35 @@ static int disconnect_events(void)
return ret;
 }
 
+static int migrate_channel(struct rdma_cm_id *listen_id)
+{
+   struct rdma_event_channel *channel;
+   int i, ret;
+
+   printf(migrating to new event channel\n);
+
+   channel = rdma_create_event_channel();
+   if (!channel) {
+   printf(cmatose: failed to create event channel\n);
+   return -1;
+   }
+
+   ret = 0;
+   if (listen_id)
+   ret = rdma_migrate_id(listen_id, channel);
+
+   for (i = 0; i  connections  !ret; i++)
+   ret = rdma_migrate_id(test.nodes[i].cma_id, channel);
+
+   if (!ret) {
+   rdma_destroy_event_channel(test.channel);
+   test.channel = channel;
+   } else
+   printf(cmatose: failure migrating to channel: %d\n, ret);
+
+   return ret;
+}
+
 static int get_addr(char *dst, struct sockaddr_in *addr)
 {
struct addrinfo *res;
@@ -543,6 +573,13 @@ static int run_server(void)
printf(data transfers complete\n);
 
}
+
+   if (migrate) {
+   ret = migrate_channel(listen_id);
+   if (ret)
+   goto out;
+   }
+
printf(cmatose: disconnecting\n);
for (i = 0; i  connections; i++) {
if (!test.nodes[i].connected)
@@ -592,30 +629,36 @@ static int run_client(void)
 
ret = connect_events();
if (ret)
-   goto out;
+   goto disc;
 
if (message_count) {
printf(receiving data transfers\n);
ret = poll_cqs();
if (ret)
-   goto out;
+   goto disc;
 
printf(sending replies\n);
for (i = 0; i  connections; i++) {
ret = post_sends(test.nodes[i]);
if (ret)
-   goto out;
+   goto disc;
}
 
printf(data transfers complete\n);
}
 
ret = 0;
-out:
+
+   if (migrate) {
+   ret = migrate_channel(NULL);
+   if (ret)
+   goto out;
+   }
+disc:
ret2 = disconnect_events();
if (ret2)
ret = ret2;
-
+out:
return ret;
 }
 
@@ -623,7 +666,7 @@ int main(int argc, char **argv)
 {
int op, ret;
 
-   while ((op = getopt(argc, argv, s:b:c:C:S:t:)) != -1) {
+   while ((op = getopt(argc, argv, s:b:c:C:S:t:m)) != -1) {
   

Re: [ofa-general] OFA server patching

2007-11-28 Thread Jeff Becker
Working on it... Thanks.

-jeff

Scott Weitzenkamp (sweitzen) wrote:
 OFA bugzilla seems down, I get:

 Software error:
 Can't connect to the database.
 Error: Access denied for user 'ofabug_user'@'localhost' (using password:
 YES)
   Is your database installed and up and running?
   Do you have the correct username and password selected in localconfig?


 For help, please send mail to the webmaster ([EMAIL PROTECTED]),
 giving this error message and the time and date of the error. 

 Scott Weitzenkamp
 SQA and Release Manager
 Server Virtualization Business Unit
 Cisco Systems


  

   
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Jeff Becker
 Sent: Wednesday, November 28, 2007 3:33 PM
 To: general@lists.openfabrics.org
 Subject: [ofa-general] OFA server patching

 Hi all. In the interest of keeping our server up to date, I 
 applied the
 latest Ubuntu patches. Several upgrades were made, including 
 git. If you
 have any problems, let me know. Thanks.

 -jeff
 ___
 general mailing list
 general@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general

 

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [PATCH 2.6.25 2/2] RDMA/cxgb3: Support 5.0 firmware.

2007-11-28 Thread Roland Dreier
OK, applied 1 and 2...

  Note: this change requires 5.0 firmware.

I assume the change to the cxgb3 FW versions is pending in a net
driver change for 2.6.25?
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [PATCH] IB/ehca: Fix static rate if path faster than link

2007-11-28 Thread Roland Dreier
thanks, applied
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] [PATCH] return ENOSYS instead of -ENOSYS

2007-11-28 Thread Roland Dreier
thanks, applied
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [PATCH] libmlx4: max_recv_wr must be non-zero for non-SRQ QPs

2007-11-28 Thread Roland Dreier
thanks, applied
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [PATCH 2.6.25 2/2] RDMA/cxgb3: Support 5.0 firmware.

2007-11-28 Thread Steve Wise

Yes.

Roland Dreier wrote:

OK, applied 1 and 2...

  Note: this change requires 5.0 firmware.

I assume the change to the cxgb3 FW versions is pending in a net
driver change for 2.6.25?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] nightly osm_sim report 2007-11-29:normal completion

2007-11-28 Thread kliteyn
OSM Simulation Regression Summary
 
[Generated mail - please do NOT reply]
 
 
OpenSM binary date = 2007-11-28
OpenSM git rev = Mon_Nov_26_08:12:10_2007 
[b989216e1ae91e0049ec3d4980cb8e2bdad8ed49]
ibutils git rev = Tue_Sep_4_17:57:34_2007 
[4bf283f6a0d7c0264c3a1d2de92745e457585fdb]
 
 
Total=480  Pass=480  Fail=0
 
 
Pass:
36 Stability IS1-16.topo
36 Pkey IS1-16.topo
36 OsmTest IS1-16.topo
36 OsmStress IS1-16.topo
36 Multicast IS1-16.topo
36 LidMgr IS1-16.topo
12 Stability IS3-loop.topo
12 Stability IS3-128.topo
12 Pkey IS3-128.topo
12 OsmTest IS3-loop.topo
12 OsmTest IS3-128.topo
12 OsmStress IS3-128.topo
12 Multicast IS3-loop.topo
12 Multicast IS3-128.topo
12 LidMgr IS3-128.topo
12 FatTree merge-roots-4-ary-2-tree.topo
12 FatTree merge-root-4-ary-3-tree.topo
12 FatTree gnu-stallion-64.topo
12 FatTree blend-4-ary-2-tree.topo
12 FatTree RhinoDDR.topo
12 FatTree FullGnu.topo
12 FatTree 4-ary-2-tree.topo
12 FatTree 2-ary-4-tree.topo
12 FatTree 12-node-spaced.topo
12 FTreeFail 4-ary-2-tree-missing-sw-link.topo
12 FTreeFail 4-ary-2-tree-links-at-same-rank-2.topo
12 FTreeFail 4-ary-2-tree-links-at-same-rank-1.topo
12 FTreeFail 4-ary-2-tree-diff-num-pgroups.topo

Failures:
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general