RE: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED 1.4 plans

2008-04-04 Thread Tang, Changqing

What I mean claim to support is to have more people to test with this config.

--CQ

 -Original Message-
 From: Or Gerlitz [mailto:[EMAIL PROTECTED]
 Sent: Thursday, April 03, 2008 11:18 PM
 To: Tang, Changqing
 Cc: [EMAIL PROTECTED]; ewg@lists.openfabrics.org
 Subject: Re: [ofa-general] Re: [ewg] OFED March 24 meeting
 summary on OFED 1.4 plans

 On Thu, Apr 3, 2008 at 5:40 PM, Tang, Changqing
 [EMAIL PROTECTED] wrote:

   The problem is, from MPI side, (and by default), we don't
 know which
  port is on which  fabric, since the subnet prefix is the
 same. We rely
  on system admin to config two  different subnet prefixes
 for HP-MPI to work.
 
   No vendor has claimed to support this.

 CQ, not supporting a different subnet prefix per IB subnet is
 against IB nature, I don't think there should be any problem
 to configure a different prefix at each open SM instance and
 the Linux host stack would work perfectly under this config.
 If you are a ware to any problem in the opensm and/or the
 host stack please let the community know and the maintainers
 will fix it.

 Or.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED 1.4 plans

2008-04-04 Thread Tang, Changqing
   for example, in MPI, process A know the HCA guid on another node.
  After running for  some time, the switch is restarted for
 some reason, and the whole fabric is re-configured.


 CQ,

 If by the whole fabric is re-configured you refer to a case
 where a subnet prefix changes while a job runs and a process
 is detached/reattached to the job  so now you want to adopt
 your design to handle it, is over engineering, why you want
 to do that?


I am concerning the port lid change. It is always the best if a process can 
figure
the info it needs by itself, SA query is the right way and is in IB spec.

while it is possible to let processes to exchange information(port lid) again, 
but
there are difficulties: during the middle of a long job run, it is hard to let 
two
processes to coordinate such infomation exchange, and it requires a second 
channel
to do so. If the second channel is IPoIB, it is broken as well, and we need to 
re-establish
it again.

I just ask for the SA functionalities. If it is not possible, we have to use a 
very
complicated way to let HP-MPI to survive from network failure.


--CQ



 Or.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED 1.4 plans

2008-04-03 Thread Tang, Changqing

Thanks. When can we have the SA features, very soon, long time, or never ?


--CQ

 -Original Message-
 From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
 Sent: Thursday, April 03, 2008 10:02 AM
 To: Tang, Changqing
 Cc: Erez Zilber; Tziporet Koren; ewg@lists.openfabrics.org;
 [EMAIL PROTECTED]
 Subject: RE: [ofa-general] Re: [ewg] OFED March 24 meeting
 summary on OFED 1.4 plans

 On Thu, 2008-04-03 at 14:53 +, Tang, Changqing wrote:
  One other thing I hope to talk is some fabric query functionalities
  for normal user, not only just for root. This is at IB
 verbs level, not rdma_cm level.
 
  for example, in MPI, process A know the HCA guid on another node.
  After running for some time, the switch is restarted for
 some reason, and the whole fabric is re-configured.
 
  Now process A wants to know if the port lid on another node has
  changed or not, it knows the HCA guid,  is there any
 function to query this ?

  I know as root, we can use the mad/umad library to do this kind of
  query, I want to do such query in MPI, which is a normal user.

 In the IB arch, there are SA registrations and queries for
 the specific example you used. However, these are not
 directly exposed to Linux user space directly (for the normal
 user as opposed to MAD user (note there are some difficulties
 in making this available to the normal user)) (at least not
 yet AFAIK). While these are not (direct) fabric query (really
 SA query), they serve the same function in a different way.

 -- Hal

  --CQ Tang, HP-MPI
 
 
 
   -Original Message-
   From: [EMAIL PROTECTED]
   [mailto:[EMAIL PROTECTED] On Behalf Of Erez
   Zilber
   Sent: Thursday, April 03, 2008 8:51 AM
   To: Tziporet Koren
   Cc: ewg@lists.openfabrics.org; [EMAIL PROTECTED]
   Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on
   OFED 1.4 plans
  
   
*OFED 1.4:*
1. Kernel base: since we target 1.4 release to Sep we
 target the
kernel base to be 2.6.27
This is a good target, but we may need to stay with
   2.6.26 if the
kernel progress will not be aligned.
   
2. Suggestions for new features:
   
* NFS-RDMA
* Verbs: Reliable Multicast (to be presented at Sonoma)
* SDP - Zero copy (There was a question on IPv6 support
   - seems no
  one interested for now)
* IPoIB - continue with performance enhancements
* Xsigo new virtual NIC
* New vendor HW support - non was reported so far (IBM
   and Chelsio
  - do you have something?)
* OpenSM:
  o Incremental routing
  o Temporary SA DB - to answer queries and a heavy
   sweep is done
  o APM - disjoint paths (?)
  o MKey manager (?)
  o Sasha to send more management features
* MPI:
  o Open MPI 1.3
  o APM support in MPI
  o mvapich ???
* uDAPl
  o Extensions for new APIs (like XRC) - ?
  o uDAPL provider for interop between Windows  Linux
  o 1.2 and 2.0 will stay
   
  
   As I wrote in an earlier discussion (~2 months ago), we
 plan to add
   tgt (SCSI target) with iSCSI over iSER (and TCP of
   course) support. The git tree for tgt already exists on the ofa
   server.
  
   Erez
  
   ___
   general mailing list
   [EMAIL PROTECTED]
   http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
  
   To unsubscribe, please visit
   http://openib.org/mailman/listinfo/openib-general
  
  ___
  ewg mailing list
  ewg@lists.openfabrics.org
  http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ewg] Re: [ofa-general] OFED Jan 28 meeting summary on RC3 readiness

2008-01-30 Thread Tang, Changqing

When do you pack the official RC3 ?  Thanks.


--CQ

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of
 Tziporet Koren
 Sent: Wednesday, January 30, 2008 10:40 AM
 To: Doug Ledford
 Cc: ewg@lists.openfabrics.org; [EMAIL PROTECTED]
 Subject: Re: [ewg] Re: [ofa-general] OFED Jan 28 meeting
 summary on RC3 readiness

 Doug Ledford wrote:
 
  Hmmm...I'd like to put my $.02 in here.  I don't have any
 visibility
  into what drives the OFED schedule, so I have no clue as to
 why people
  don't want to slip the schedule for this change.  I'm sure you guys
  have your reasons.  However, I also happen to be a consumer of this
  code, and I know for a fact that no one has gotten my input on this
  issue.  So, the deal is that I'm currently integrating OFED
 1.3 into
  what will be RHEL5.2.  The RHEL5.2 freeze date has already
 passed, but
  in order to keep what finally goes out from being too
 stale, I'm being
  allowed to submit the OFED-1.3-rc1 code prior to freeze, and then
  update to
  OFED-1.3 final during our beta test process.  What this
 means, is that
  anything you punt from 1.3 to 1.3.1, you are also punting out of
  RHEL5.2 and RHEL4.7.  So, that being said, there's a whole trickle
  down effect with various groups that would really like to
 be able to
  use 5.2 out of the box that may prefer a slip in 1.3 so
 that this can
  be part of it instead of punting to 1.3.1.  I'm not saying
 this will
  change your mind, but I'm sure it wasn't part of the
 decision process
  before, so I'm bringing it up.
 
 Thanks for the input (BTW you are welcome to join our weekly
 meetings and give us feedback online) I think it is important
 to make sure RH new versions will include best OFED release

 This my suggestion is:

 * Delay 1.3 release in a week
 * Do RC4 next week - Feb 6
 * Add RC5 on Feb 18 - this will be the GOLD version
 * GA release on Feb 25


 All - please reply if this is acceptable
 
 
  760 major   [EMAIL PROTECTED]  UDP performance on
 Rx is lower
  than Tx  - for 1.3.1
  761 major   [EMAIL PROTECTED]  Poor and jittery UDP
  performance at small messages  - for 1.3.1
 
 
  Ditto for requesting these two be in 1.3.  We've already
 had customers
  bring up the UDP performance issue in our previous releases.
 
 
 We will push some fixes of these to RC4 if the above plan is accepted

 Tziporet
 ___
 general mailing list
 [EMAIL PROTECTED]
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

 To unsubscribe, please visit
 http://openib.org/mailman/listinfo/openib-general

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [ofa-general] [ANNOUCE] dapl 2.0.5 release

2008-01-29 Thread Tang, Changqing
Arlin:
I have not had a chance to look at uDAPL 2.0, can you give a brief summary 
the changes from 1.2 to 2.0, I am interested from the applications perspective, 
don't care the internal details.

Thanks.

--CQ


From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Arlin Davis
Sent: Tuesday, January 29, 2008 6:58 PM
To: OpenFabrics General; EWG
Cc: Lentini, James; 'Vladimir Sokolovsky'
Subject: [ofa-general] [ANNOUCE] dapl 2.0.5 release

There is new release for dapl 2.0 available on the OFA download page and in my 
git tree.

Changes to allow both v1 and v2 development packages to be installed on the 
same system.
v2 libdat.so has been renamed to libdat2.so.

md5sum: 010459e421a5c194438d58b1ccf1c6d0   dapl-2.0.5.tar.gz

Vlad, please pull new v2 release into OFED 1.3 RC3 and install the following 
packages:

Note: please make sure dapl-1.2.4-devel is added to list.

dapl-1.2.4-1
dapl-devel-1.2.4-1
dapl-2.0.5-1
dapl-utils-2.0.5-1
dapl-devel-2.0.5-1
dapl-debuginfo-2.0.5-1

See 
http://www.openfabrics.org/downloads/dapl/README.htmlBLOCKED::http://www.openfabrics.org/downloads/dapl/README.html
  for details.

-arlin
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] RE: [ofa-general] OFED 1.3 Beta release is available

2007-12-17 Thread Tang, Changqing

I remembered someone else suggested to use:

struct ibv_context {
 struct ibv_device  *device;
 struct ibv_context_ops  ops;
 int cmd_fd;
 int async_fd;
 int num_comp_vectors;
 pthread_mutex_t mutex;
 void   *abi_compat;
 struct ibv_context_extra_ops  extra_ops;
};

Here we don't use pointer for extra_ops, and any future changes are added into 
'extra_ops',
So why not this way ?


Thanks.
--CQ

 -Original Message-
 From: Jack Morgenstein [mailto:[EMAIL PROTECTED]
 Sent: Sunday, December 16, 2007 11:01 AM
 To: [EMAIL PROTECTED]
 Cc: Tang, Changqing; Roland Dreier;
 ewg@lists.openfabrics.org; [EMAIL PROTECTED]
 Subject: Re: [ofa-general] OFED 1.3 Beta release is available

 On Wednesday 05 December 2007 17:45, Tang, Changqing wrote:
   I think the only alternative we have to preserve backwards
   compatibility is to leave struct ibv_context_ops alone and change
   the structure to:
  
   struct ibv_context {
   struct ibv_device  *device;
   struct ibv_context_ops  ops;
   int cmd_fd;
   int async_fd;
   int num_comp_vectors;
   pthread_mutex_t mutex;
   void   *abi_compat;
   struct ibv_xrc_op  *xrc_ops;
   };
  
   with xrc_ops added at the end.  It's my fault for not
 making the ops
   member a pointer I guess.
  
   Tziporet/Jack/whoever -- please fix up the libibverbs you
 ship for
   OFED 1.3 to resolve this.
  
   We can clean this up for libibverbs 1.2 when the ABI can change,
   if/when we have something worth breaking the ABI for.
 

 We need to have all userspace libraries set their private
 context object to 0 at allocation time (the private context
 object includes the ibv_context structure, which must now be
 NULL-ed out).

 The other userspace driver libraries (e.g., libmthca) do not
 zero-out their internal userspace context structures (e.g.,
 mthca_context) which include the ibv_context structure as the
 first element.
 Up to now, we depended on the ibv_context assign to set
 unavailable verb implementations to NULL.
 (and every userspace driver assigned the ops structure, with
 unimplemented operations set to NULL by the compiler).
 This is no longer true.

 Thus, anyone installing OFED will have a compatible set of
 userspace drivers for XRC applications (drivers which do not
 implement XRC will return errors for XRC-verbs).

 Applications which were compiled with previous libraries will
 still work (since they do not use XRC).

 - Jack

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ewg] Re: [ofa-general] OFED 1.3 Beta release is available

2007-12-05 Thread Tang, Changqing

There are some other input structure changes such as ibv_qp_init_attr, if the 
qp_type is not IBV_QPT_XRC,
the field xrc_domain is not touched, right ?

Similar thing for struct ibv_send_wr xrc_remote_srq_num field.


--CQ Tang


 -Original Message-
 From: Jack Morgenstein [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, December 05, 2007 12:34 PM
 To: ewg@lists.openfabrics.org
 Cc: Roland Dreier; Tang, Changqing;
 [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: Re: [ewg] Re: [ofa-general] OFED 1.3 Beta release is
 available

 On Wednesday 05 December 2007 07:24, Roland Dreier wrote:
 
  I think the only alternative we have to preserve backwards
  compatibility is to leave struct ibv_context_ops alone and
 change the
  structure to:
 
  struct ibv_context {
  struct ibv_device  *device;
  struct ibv_context_ops  ops;
  int cmd_fd;
  int async_fd;
  int num_comp_vectors;
  pthread_mutex_t mutex;
  void   *abi_compat;
  struct ibv_xrc_op  *xrc_ops;
  };
 
  with xrc_ops added at the end.  It's my fault for not
 making the ops
  member a pointer I guess.
 

 We don't need to have this as a pointer, really (I'd like to
 save the extra malloc and associated bookkeeping). If we have
 the ibv_xrc_op struct at the end of ibv_context, this is
 sufficient for backwards binary
 compatibility(libmlx4 itself allocates the ibv_context
 structure for libibverbs.  If the actual structure is a bit
 bigger, who cares -- we just need to preserve the current
 offsets of the structure fields for binary compatibility).

 If you want to be a bit more generic, we could do this as an
 extra_ops
 structure and add new ops as needed.
 (If future changes are messier than just adding a new op, we
 can then increment the API version):

 struct ibv_context_extra_ops {
 struct ibv_srq *(*create_xrc_srq)(struct ibv_pd *pd,
   struct
 ibv_xrc_domain *xrc_domain,
   struct
 ibv_cq *xrc_cq,
   struct
 ibv_srq_init_attr *srq_init_attr);
 struct ibv_xrc_domain * (*open_xrc_domain)(struct
 ibv_context *context,
int fd, int oflag);
 int (*close_xrc_domain)(struct
 ibv_xrc_domain *d);
 };

  struct ibv_context {
  struct ibv_device  *device;
  struct ibv_context_ops  ops;
  int cmd_fd;
  int async_fd;
  int num_comp_vectors;
  pthread_mutex_t mutex;
  void   *abi_compat;
  struct ibv_context_extra_ops  extra_ops;  };


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [ofa-general] OFED 1.3 Beta release is available

2007-12-05 Thread Tang, Changqing

Roland:
I think in future we will have more such changes, why don't we take the
pain now to make ops as a pointer and mark it as verbs 1.2 ?


--CQ


 -Original Message-
 From: Roland Dreier [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, December 04, 2007 11:25 PM
 To: Tang, Changqing
 Cc: Tziporet Koren; ewg@lists.openfabrics.org;
 [EMAIL PROTECTED]
 Subject: Re: [ofa-general] OFED 1.3 Beta release is available

   I think the problem is that sizeof struct
 ibv_context_ops has   changed, so the new driver returns a
 big struct ibv_context, app   compiled with older header
 file has a smaller struct ibv_context
   and use the old offset to find fields after ops.

 Oh crud, you're obviously right.  For some reason I kept
 missing that when I looked over the code.

 I think the only alternative we have to preserve backwards
 compatibility is to leave struct ibv_context_ops alone and
 change the structure to:

 struct ibv_context {
 struct ibv_device  *device;
 struct ibv_context_ops  ops;
 int cmd_fd;
 int async_fd;
 int num_comp_vectors;
 pthread_mutex_t mutex;
 void   *abi_compat;
 struct ibv_xrc_op  *xrc_ops;
 };

 with xrc_ops added at the end.  It's my fault for not making
 the ops member a pointer I guess.

 Tziporet/Jack/whoever -- please fix up the libibverbs you
 ship for OFED 1.3 to resolve this.

 We can clean this up for libibverbs 1.2 when the ABI can
 change, if/when we have something worth breaking the ABI for.

  - R.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [ofa-general] OFED 1.3 Beta release is available

2007-12-04 Thread Tang, Changqing
Here is an issue we have:

struct ibv_context {
struct ibv_device  *device;
struct ibv_context_ops  ops;
int cmd_fd;
int async_fd;
int num_comp_vectors;
pthread_mutex_t mutex;
void   *abi_compat;
};

The binary is compiled with OFED 1.2 header files,  it tries to set async_fd to 
non-blocking, I get error:
Bad file descriptor.   If I compile the binary with OFED-1.3-beta header files 
(with XRC changes), it works fine.


Is this the expected behavior, or there will be a fix ?


Thanks.
--CQ Tang



From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tziporet Koren
Sent: Thursday, November 22, 2007 9:46 AM
To: ewg@lists.openfabrics.org
Cc: [EMAIL PROTECTED]
Subject: [ofa-general] OFED 1.3 Beta release is available


Hi,

OFED 1.3 Beta release is available on
http://www.openfabrics.org/downloads/OFED/ofed-1.3/OFED-1.3-beta2.tgz
To get BUILD_ID run ofed_info

Please report any issues in bugzilla https://bugs.openfabrics.org/

The RC1 release is expected on December 5

Tziporet  Vlad



Release information:

OS support:
Novell:
- SLES10
- SLES10 SP1 and up1
Redhat:
- Redhat EL4 up4 and up5
- Redhat EL5 and up1
kernel.org:
- 2.6.23 and 2.6.24-rc2

Systems:
* x86_64
* x86
* ia64
* ppc64*

Main Changes from OFED 1.3-alpha


 *   Kernel code based on 2.6.24-rc2
 *   New packages:
*   SRP target
*   qperf test from Qlogic
*   ibsim package
*   uDAPL 2.0 library (1.0  2.0 are coexist)
 *   New OSes Support:
*   RHEL 5 up1
*   SLES10 SP1 up1
 *   Compilation issues resolved:
*   Open MPI compilation on SLES10 SP1
*   ibutils compiles on SLES10 PPC64 (64 bits)
*   Apply patches that fix warning of backport patches
*   Prefix is now supported properly
 *   RDS implementation for API version 2 was updated form 1.2.5 branch
 *   Fix binary compatibility of libibverbs caused by XRC implementation
 *   Uninstall is now working properly
 *   ib-bonding update to release 19
 *   MPI packages update:
*   mvapich-1.0.0-1625.src.rpm
*   mvapich2-1.0.1-1.src.rpm
*   openmpi-1.2.4-1.src.rpm

Mlx4 driver specific changes:

 *   Enable changing the default of HCA resource limits with module parameters
 *   Default number of maximum QPs is now 128K (was 64K)
 *   Fixing max_cqe's (not adding an extra cqe)
 *   Fix state check in mlx4_qp_modify
 *   Sanity check userspace send queue sizes
 *   Several bug fixes in XRC


Tasks that should be completed for the beta release:

1. 32-bit libraries to be supported on SLES10 SP1 Update1.
2. Fix SDP stability issues
3. IPoIB performance improvements for small messages
4. Fix bugs
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] RE: [ofa-general] OFED 1.3 Beta release is available

2007-12-04 Thread Tang, Changqing

I think the problem is that  sizeof struct ibv_context_ops has changed, so 
the new driver returns a
big struct ibv_context, app compiled with older header file has a smaller 
struct ibv_context and
use the old offset to find fields after ops.

--CQ


 -Original Message-
 From: Roland Dreier [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, December 04, 2007 6:18 PM
 To: Tang, Changqing
 Cc: Tziporet Koren; ewg@lists.openfabrics.org;
 [EMAIL PROTECTED]
 Subject: Re: [ofa-general] OFED 1.3 Beta release is available

   Here is an issue we have:
  
   struct ibv_context {
   struct ibv_device  *device;
   struct ibv_context_ops  ops;
   int cmd_fd;
   int async_fd;
   int num_comp_vectors;
   pthread_mutex_t mutex;
   void   *abi_compat;
   };
  
   The binary is compiled with OFED 1.2 header files,  it
 tries to set async_fd to non-blocking, I get error:
   Bad file descriptor.   If I compile the binary with
 OFED-1.3-beta header files (with XRC changes), it works fine.
  
   Is this the expected behavior, or there will be a fix ?

 Unfortunately the XRC patches were put into OFED 1.3 before
 they went into the upstream libibverbs tree, so I have not
 reviewed them in detail.  If XRC support requires an ABI
 change, then we'll have to create a new ABI and provide
 versioned symbols for backwards compatibility.

 However your problem seems quite strange: I don't see any
 change to struct ibv_context caused by the XRC patches.  So I
 don't understand exactly what is causing the problem you see.
  Can you debug further to see which structure layout change
 is the real issue?

  - R.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Post receive WR to QP in error state

2007-08-15 Thread Tang, Changqing

Roland:

I read the IB specification, it does say posting send WR to QP
in error state is OK, and the WR will be completed in FLUSH error, but
it does not say that for posting receive WR. I want to check with you if
posting receive WR to QP in error state returns SUCCESS, and the WR
will be completed with FLUSH error.  Thanks.


--CQ

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: RFCv2: SRC API

2007-08-06 Thread Tang, Changqing
 

  OK, I was wrong before, here is my question.
  
  if remote node n has j2, j3, and j4, and j2 is the job to 
 create qp2 
  and make connection with qp1 in j1.
  if j2 is done before j3 and j4, then we can not let j2 to 
 destroy qp2, 
  because j3 and j4 are still communicating with j1. Since j2 
 owns qp2, 
  j2 need to be the last job to cleanup.
  
  Am I right ?
 
 Correct. Is this clear from the text, or is some kind of 
 additional clarification necessary?

It is not clear at the first read, so please add one sentence to clarify
it.

if j2 is the last job to cleanup, how can it know all other jobs on the
same node has called 
ibv_close_src_domain(), and it is time for itself to cleanup ?

Is this something upto application to do ?

--CQ


 
 --
 MST
 
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: RFCv2: SRC API

2007-08-06 Thread Tang, Changqing
 
 Cleanup:
 When job j1 does not need to communicate to any jobs on node 
 n, it disconnects qp1 from qp2, and asks j2 to destroy qp2.
 +
 +Note: both qp1 and qp2 must exist for the communication to 
 take place.
 +Thus, j2 should not destroy qp2 (and in particular, should not exit) 
 +until j1 has completed communication with node n and has asked j2 to 
 +disconnect.
 
Thanks. 

Another question. if a node n has 8 jobs, say, j2-j9, usually the first
job j2 is the one to create the SRC
domain(other jobs just attach and share) and it make sense to let j2 to
create all the receiving QPs for all other
remote jobs and make all the connections. (we can do in roundrobin way,
but more work).

Is there any performance worry to let j2(the first job on a node) to do
all the work ?

What is the latency of SRC+SRQ ?

--CQ
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [ofa-general] Scalable reliable connection

2007-07-31 Thread Tang, Changqing

A send queue can only serve max J jobs within a node. Is it possible to
make a single send queue to serve all jobs on all nodes ?

--CQ 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Michael S. Tsirkin
 Sent: Monday, July 30, 2007 7:51 AM
 To: Gleb Natapov
 Cc: Pavel Shamis; ewg@lists.openfabrics.org; Michael S. 
 Tsirkin; [EMAIL PROTECTED]; Ishai Rabinovitz
 Subject: [ofa-general] Scalable reliable connection
 
 
 Here's some background on what SRC is.  This is basically 
 slide 6 in Dror's talk, for those that missed the talk.
 
  * * *
 
 SRC is an extension supported by recent Mellanox hardware 
 which is geared toward reducing the number of QPs required 
 for all-to-all communication on systems with a high number of 
 jobs per node.
 
 ===
 Motivation:
 ===
 Given N nodes with J jobs per node, number of QPs required 
 for all-to-all communication is:
 
 With RC:
   O((N * J) ^ 2)
 
   Since each job out of O(N * J) jobs must create a single QP
   to communicate with each one of O(N * J) other jobs.
 
 With SRC:
   O(N ^ 2 * J)
 
   This is achived by using a single send queue (per job, 
 out of O(N * J) jobs)
   to send data to all J jobs running on a specific node 
 (out of O(N) nodes).
   Hardware uses new SRQ number field in packet header to
   multiplex receive WRs and WCs to private memory of each job.
 
 This is similiar idea to IB RD.
 Q: Why not use RD then?
 A: Because no hardware supports it.
 
 Details:
 
 ===
 Verbs extension:
 ===
 
 - There is a new transport/QP type SRC.
 - There is a new object type SRC domain
 - Each SRQ gets new (optional) attributes:
 SRC domain
   SRC SRQ number
 SRC CQ
   SRQ must have either all 3 of these or none of these attributes
 
 - QPs of type SRC have all the same attributes as regular RC QPs
   connected to SRQ, except that:
   A. Each SRC QP has a new required attribute SRC domain
   B. SRC QPs do *not* have SRQ attribute
   (do not have a specific SRQ associated with them)
 
 ===
 Protocol extension:
 ===
 SRC QP behaviour: Requestor
 - Post send WR for this QP type is extended with SRQ number field
   This number is sent as part of packet header
 - SRC Packets follow rules for RC packets on the wire, exactly
   What is different is their handling at the responder side
 
 SRC QP behaviour: Responder
 Each incoming packet passes transport checks with respect to 
 the SRC QP, following RC rules, exactly.
 
 After this, SRQ number in packet header is used to look up a 
 specific SRQ. SRC domain of the resulting SRQ must be equal 
 to SRC domain of the QP, otherwise a NAK is sent, and QP 
 moves to error state.
 
 If the SRC domains match, receive WR and receive WC 
 processing are as follows:
 
 - RC Send
   - Rather than using SRQ to which the QP is attached,
 SRQ is looked up by SRQ number in the packet.
 Receive WR is taken from this SRQ.
   - Completions are generated on the CQ specified in the SRQ
 
 - RDMA/Atomic
   - Rather than using PD to which the QP is attached,
 SRQ is looked up by SRQ number in the packet.
 PD of this SRQ is used for protection checks.
 ===
  
 --
 MST
 ___
 general mailing list
 [EMAIL PROTECTED]
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
 
 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general
 
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: Scalable reliable connection

2007-07-31 Thread Tang, Changqing
 

In this way, only one send queue is needed for each job(process), and we
don't need to track the location of each other job(which is on which
node).
from a job point of view, either self, or others, all others are
equal...

--CQ

 -Original Message-
 From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, July 31, 2007 11:16 AM
 To: Tang, Changqing
 Cc: Michael S. Tsirkin; Gleb Natapov; Pavel Shamis; 
 ewg@lists.openfabrics.org; [EMAIL PROTECTED]; 
 Ishai Rabinovitz
 Subject: Re: Scalable reliable connection
 
  Quoting Tang, Changqing [EMAIL PROTECTED]:
  Subject: RE: Scalable reliable connection
  
  
  A send queue can only serve max J jobs within a node. Is it 
 possible 
  to make a single send queue to serve all jobs on all nodes ?
 
 How do you propose to do this?
 
 --
 MST
 
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg