[openib-general] [PATCH] uDAPL - include dapltest and dtest in build

2007-02-23 Thread Arlin Davis
This uDAPL patch adds both dapltest and dtest utilities, including manual pages, to the DAPL project build. The dapltest required some modifications to build on x86_64. James, please review. Signed-off by: Arlin Davis [EMAIL PROTECTED] diff --git a/Makefile.am b/Makefile.am index 1190f20

Re: [openib-general] Fork issues with simple MPI program

2007-02-20 Thread Arlin Davis
Arlin Davis wrote: Any insight would be greatly appreciated. It was our assumption that the parent process can continue to use IB resources after the fixes went into 2.6.16 and OFED 1.1. Is this true? As was discussed over this list in few occasions: in contrast to popular thought

Re: [openib-general] Fork issues with simple MPI program

2007-02-20 Thread Arlin Davis
Or you could try setting the IBV_FORK_SAFE environment variable before running your application. I guess for MPI jobs you need to make sure that environment variable is propagated to every process. Ahh! That's what I was looking for. Thanks! This information is scattered around in various

[openib-general] Fork issues with simple MPI program

2007-02-19 Thread Arlin Davis
We are seeing some fork issues with a simple MPI program (attached) running on a 2.6.16+ kernels and OFED 1.1. We have tried both Intel MPI and mvapich2 with the same results: t_fork mpiexec -n 2 t_system_fork parent process [0]

Re: [openib-general] dapltest?

2007-02-07 Thread Arlin Davis
Steve Wise wrote: Hey Arlin, Shouldn't dapl/test be shipped with OFED? It appears not to be... Yes, I will try to get to this by next week at the latest. Can you add a bugzilla report to track against? -arlin ___ openib-general mailing list

Re: [openib-general] dapl broken for iWARP

2007-02-07 Thread Arlin Davis
Steve Wise wrote: On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote: Arlin, The OFED dapl code is assuming the responder_resources and initiator_depth passed up on a connection request event are from the remote peer. This doesn't happen for iWARP. In the current iWARP specifications, its

Re: [openib-general] OFED 1.2 release - to be reviewed in the meeting today

2007-01-31 Thread Arlin Davis
*Sources developed in OFA:* 1. Each git owner will open a branch with the name ofed_1_2. This branch should be opened on 31-Jan (based on code readiness we will review today). ofed_1_2 branch created for dapl.git -arlin ___ openib-general

Re: [openib-general] librdmacm and udapl: Which git branch to use in ofed_1_2 build

2007-01-22 Thread Arlin Davis
~ardavis/dapl.git: rdma_ucm* master Can you reply which branch to use in our daily ofed 1.2 builds. for dapl use rdma_ucm branch ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To

Re: [openib-general] librdmacm and udapl: Which git branch to use in ofed_1_2 build

2007-01-22 Thread Arlin Davis
OK, we can switch to master. Then DAPL would need to be updated, right? Arlin? I think DAPL stays with rdma_ucm, but Arlin can confirm. I created a 1.2 uDAPL branch to use with ofed_1_2 builds. ___ openib-general mailing list

[openib-general] [PATCH] uDAPL - rdma_ucm branch: add changes to support rr/init exchange

2007-01-19 Thread Arlin Davis
Some uDAPL changes to support exchanging and validation of the device responder_resources and the initiator_depth during connection establishment. Signed-off by: Arlin Davis [EMAIL PROTECTED] diff --git a/dapl/openib_cma/dapl_ib_cm.c b/dapl/openib_cma/dapl_ib_cm.c old mode 100644 new mode

Re: [openib-general] building dapl for ofed

2007-01-16 Thread Arlin Davis
-Original Message- From: Steve WIse [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 16, 2007 8:56 AM To: Davis, Arlin R; Hefty, Sean Cc: openib-general Subject: building dapl for ofed I'm having problems building dapl for ofed 1.2. I'm using the dapl rdma_ucm branch and still getting

Re: [openib-general] [PATCH 3/3] uDAPL cma: add support for address and route retries, call disconnect when recving dreq

2006-11-07 Thread Arlin Davis
All 3 patches committed in OFA(r10074) and SourceForge (r1414) ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH 1/3] uDAPL cma: add support for new client register event

2006-11-06 Thread Arlin Davis
New series of patches with -x -up Added support for new ib verbs client register event. No extra processing required at the uDAPL level. Shows up if opensm bounces. Signed-off by: Arlin Davis [EMAIL PROTECTED] Index: dapl/openib_cma/dapl_ib_util.c

[openib-general] [PATCH 2/3] uDAPL cma: fix issues with creating qp without rcv resources

2006-11-06 Thread Arlin Davis
Fix some issues supporting create qp without recv cq handle or recv qp resources. IB verbs assume a recv_cq handle and uDAPL dapl_ep_create assumes there is always recv_sge resources specified. Signed-off by: Arlin Davis [EMAIL PROTECTED] Index: dapl/common/dapl_ep_create.c

[openib-general] [PATCH 3/3] uDAPL cma: add support for address and route retries, call disconnect when recving dreq

2006-11-06 Thread Arlin Davis
DAPL_CM_ROUTE_TIMEOUT_MS  4000 DAPL_CM_ROUTE_RETRY_COUNT 15     Signed-off by: Arlin Davis [EMAIL PROTECTED] Index: dapl/openib_cma/dapl_ib_cm.c === --- dapl/openib_cma/dapl_ib_cm.c(revision 10032) +++ dapl/openib_cma/dapl_ib_cm.c

Re: [openib-general] scaling issues, was: uDAPL cma: add support for address and route retries, call disconnect when recving dreq

2006-11-02 Thread Arlin Davis
Sean Hefty wrote: One option is having the SA (or ib_umad?) return a busy status in response to a MAD, but we'd still have to be able to send this response as quickly as requests are being received. We could then limit the number of requests that would be queued in the kernel for a user.

Re: [openib-general] scaling issues, was: uDAPL cma: add support for address and route retries, call disconnect when recving dreq

2006-11-02 Thread Arlin Davis
Michael S. Tsirkin wrote: Another great option would be to have path record caching. Unfortunately OFED 1.1 did not include ib_local_sa in the release. This won't help you much. With 256 nodes all to all already gives you 65000 requests which is the same order of magnitude as the

Re: [openib-general] Verbs QP create with RQ=0?

2006-11-01 Thread Arlin Davis
Roland Dreier wrote: As was already suggested, you should be able to use the same CQ for receives and for sends. If you never post any receives on the QP, you don't have to allocate any extra space on your send CQ. And it should work to have 0 receive work queue entries. Have you tried it?

[openib-general] [PATCH 2/3] uDAPL cma: fix issues with creating qp without rcv resources

2006-11-01 Thread Arlin Davis
Fix some issues supporting create qp without recv cq handle or recv qp resources. IB verbs assume a recv_cq handle and uDAPL dapl_ep_create assumes there is always recv_sge resources specified. Signed-off by: Arlin Davis [EMAIL PROTECTED] Index: dapl/common/dapl_ep_create.c

[openib-general] [PATCH 3/3] uDAPL cma: add support for address and route retries, call disconnect when recving dreq

2006-11-01 Thread Arlin Davis
to the remote side. Here are the new options (environment variables) with the default setting DAPL_CM_ARP_TIMEOUT_MS 4000 DAPL_CM_ARP_RETRY_COUNT 15 DAPL_CM_ROUTE_TIMEOUT_MS 4000 DAPL_CM_ROUTE_RETRY_COUNT 15 Signed-off by: Arlin Davis [EMAIL PROTECTED] Index: dapl/openib_cma/dapl_ib_cm.c

Re: [openib-general] uDAPL problem

2006-10-17 Thread Arlin Davis
Stephen Smaldone wrote: Arlin Davis wrote: Steve Smaldone wrote: Hi, Sorry for replying to myself, but I loaded rdma_ucm and the rdma_cm device appears. However, it now fails with the following: $ ./dapltest -T S -D IB1 ... DAT Registry: dat_ia_openv (IB1,1:2,0) called DAT

Re: [openib-general] OFED 1.1 release schedule

2006-10-16 Thread Arlin Davis
Tziporet Koren wrote: This is the plan to do the 1.1 release this week: We will publish 1.1-pre1 package tomorrow (Tue. 17-Oct) Only blocker issues from RC7 will be updated: 1. SRP fix for Cisco FC gateway 2. Small updates for the install 3. Fix in diagnet to support SM on a

Re: [openib-general] uDAPL problem

2006-10-16 Thread Arlin Davis
Steve Smaldone wrote: Hi, Sorry for replying to myself, but I loaded rdma_ucm and the rdma_cm device appears. However, it now fails with the following: $ ./dapltest -T S -D IB1 ... DAT Registry: dat_ia_openv (IB1,1:2,0) called DAT Registry: IA IB1, trying to load library

[openib-general] [PATCH] remove scm provider from uDAPL build.

2006-10-12 Thread Arlin Davis
Here is a patch to remove uDAPL scm provider from the build since it is no longer needed nor supported. This provider was merely a stop gap until uCMA was pushed into kernel. Tziporet, can you get this change into OFED 1.1? Signed-off by: Arlin Davis [EMAIL PROTECTED] Index: doc/dat.conf

Re: [openib-general] OFED 1.1 RC7

2006-10-10 Thread Arlin Davis
Aviram Gutman wrote: OFED-1.1-rc7 is available on https://openib.org/svn/gen2/branches/1.1/ofed/releases/ File: OFED-1.1-rc7.tgz Please report any issues in bugzilla http://openib.org/bugzilla/ Aviram, Can you verify that the sean_cm_drep_on_not_found.patch is actually applied in RC7? Our

Re: [openib-general] [openfabrics-ewg] OFED Status

2006-10-02 Thread Arlin Davis
Woodruff, Robert J wrote: Aviram wrote, Pending that IPoIB HA is solved would like to issue RC7 that suppose to be final. Is everyone OK with this approach? Aviram Sounds good, What is the target date for RC7 ? Do we have a new target date?

Re: [openib-general] DAPL setup/config help

2006-09-26 Thread Arlin Davis
Troy Telford wrote: I've never set up dapl before, however I now have a reason to try... The problem is, I can't seem to find any documentation on how to set it up. I've tried the sample /etc/dat.conf (modified for the IPoIB address on the system), but I'm not sure I've been sucessful. I've: *

Re: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ

2006-09-25 Thread Arlin Davis
Arlin Davis wrote: Sean Hefty wrote: Currently a DREP is only sent in response to a DREQ if a connection has been found matching the DREQ, and it is in the proper state. Once a DREP is sent, the local connection moves into timewait. Duplicate DREQs received while in this state result in re

Re: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ

2006-09-22 Thread Arlin Davis
Sean Hefty wrote: Currently a DREP is only sent in response to a DREQ if a connection has been found matching the DREQ, and it is in the proper state. Once a DREP is sent, the local connection moves into timewait. Duplicate DREQs received while in this state result in re-sending the DREP.

Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Arlin Davis
Robert, Here is a slightly modified patch for your attributes issue. Can you give it a try? Signed-off by: Arlin Davis [EMAIL PROTECTED] Index: dapl/openib/dapl_ib_util.c === --- dapl/openib/dapl_ib_util.c (revision 9106

Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Arlin Davis
Robert Walsh wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Arlin Davis wrote: Robert, Here is a slightly modified patch for your attributes issue. Can you give it a try? I'll give it a spin this afternoon: it looks quite a bit more comprehensive than the small patch I did

Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Arlin Davis
Oddly enough, I'm back to the same problem with your new patch as I saw with the unpatched version: Hmmm. We ran this with OFED 1.1 RC3 and MPI 3.0b on an EM64T server with your adapter and it worked. Did you ever pick up the Intel MPI 3.0 beta? $ mpiexec -n 2 ./a.out I_MPI: [1]

Re: [openib-general] [openfabrics-ewg] Rollup patch for ipath and OFED

2006-08-21 Thread Arlin Davis
Michael S. Tsirkin wrote: Quoting r. Woodruff, Robert J [EMAIL PROTECTED]: We should have one git tree somewhere that has all the latest code that we can pull from I just don't think latest code is a well defined entity in a distributed development environment. kernel.org is well

[openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-16 Thread Arlin Davis
We are running into connection reject issues (IB_CM_REJ_STALE_CONN) with our application under heavy load and lots of connections. We occassionally get a reject based on the QP being in timewait state leftover from a prior connection. It appears that the CM keeps track of the QP's in timewait

Re: [openib-general] DAPL and local_iov in RDMA RR/RW mode

2006-08-15 Thread Arlin Davis
Ryszard Jurga wrote: Hi Arlin, Thank you for your quick reply. Both dat_ep_post_rdma_read nad dat_ep_post_rdma_write return DAT_SUCCESS. When I read a field 'transfered_length' from DAT_DTO_COMPLETION_EVENT_DATA after calling a post function I receive the correct value which equals

Re: [openib-general] DAPL and local_iov in RDMA RR/RW mode

2006-08-11 Thread Arlin Davis
Ryszard Jurga wrote: Hi everybody, I have one question about a number of segments in local_iov when using RDMA Write and Read mode. Is it possible to have num_segments1? I am asking, because when I try to set up num_segments to a value 1, then I can still only read/write one segment,

Re: [openib-general] [openfabrics-ewg] OFED 1.1 planning meeting - summary

2006-08-10 Thread Arlin Davis
Tziporet Koren wrote: You are correct - we forgot about it. Will be fixed in rc2 Can you open a bug in bugzilla for the installer package so we will not miss it this time? Done. Bug 195. ___ openib-general mailing list openib-general@openib.org

Re: [openib-general] [openfabrics-ewg] OFED 1.1 planning meeting - summary

2006-08-09 Thread Arlin Davis
Can we include librdmacm and dapl in the basic installation option? Also, it would be nice to have rdma_ucm and rdma_cm load on boot by default. Thanks, -arlin This is a small change in the OFED scripts. I suggest that if we go for this change we will do it for the HPC install and

Re: [openib-general] [openfabrics-ewg] OFED 1.1 planning meeting - summary

2006-07-28 Thread Arlin Davis
Tziporet Koren wrote: Hi all, This is the outcome of the meeting we had today regarding OFED 1.1 schedule and features. Can we include librdmacm and dapl in the basic installation option? Also, it would be nice to have rdma_ucm and rdma_cm load on boot by default. Thanks, -arlin

Re: [openib-general] [PATCH] uDAPL - OpenIB-cma: added consumer wakeup mechanism for cq wait objects

2006-07-19 Thread Arlin Davis
Arlin Davis wrote: Fix for Bug 158. Add support for dat_evd_set_unwaitable on a DTO EVD. Committed revision 8592. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe

[openib-general] [PATCH] uDAPL cma provider, errno reporting on create thread during open

2006-07-17 Thread Arlin Davis
Added errno reporting (message and return codes) during open to help diagnose create thread issues. Signed-off by: Arlin Davis [EMAIL PROTECTED] Index: openib_cma/dapl_ib_util.c === --- openib_cma/dapl_ib_util.c (revision 8559

Re: [openib-general] [PATCH] uDAPL cma provider, errno reporting on create thread during open

2006-07-17 Thread Arlin Davis
Arlin Davis wrote: Added errno reporting (message and return codes) during open to help diagnose create thread issues. Committed revision 8565. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib

Re: [openib-general] OFED 1.1 release - schedule and features

2006-07-12 Thread Arlin Davis
Tziporet Koren wrote: • Core: – Set options in CMA uCMA (needed for Intel MPI) – HCA fatal - full flow support – Huge pages support • uDAPL: – Scalability features needed for Intel MPI – take from trunk • Arlin James – please reply if there are more features needed. The latest

Re: [openib-general] OFED 1.1 release - schedule and features

2006-07-12 Thread Arlin Davis
Michael S. Tsirkin wrote: Quoting r. Arlin Davis [EMAIL PROTECTED]: The latest uDAPL from the trunk and uCMA set option support is sufficient. Which options do you set? Retry/timeout or path as well? Just retry/timeout. ___ openib

Re: [openib-general] [Bug 146] OFED-1.0 DAPL fails to build on SLES10 on IA64 with IA64_FETCHADD error

2006-07-12 Thread Arlin Davis
John Partridge wrote: I installed the dapl rpm. I do have libdat.so.1 but I also expect a symlink to libdat.so which does not exist (Intel MPI appears to need it) I also noticed that the dat.conf points to /usr/local/ofed/lib/libdaplcma.so but there is no symlink in the /usr/local/ofed/lib

[openib-general] uCMA kernel slab corruption and oops

2006-06-22 Thread Arlin Davis
Sean, I am running a couple of iMPI/uDAPL benchmarks at the same time and ran into this: (2.6.17 kernel and svn8112) Jun 22 10:46:51 localhost kernel: Slab corruption: start=8100202458f8, len=512 Jun 22 10:46:51 localhost kernel: Redzone: 0x5a2cf071/0x5a2cf071. Jun 22 10:46:51 localhost

Re: [openib-general] [PATCH] uDAPL dapl_evd_connection_callback does not support TIMED_OUT event

2006-06-22 Thread Arlin Davis
James, Added support for active side TIMED_OUT event from a provider. Signed-off by: Arlin Davis [EMAIL PROTECTED] Index: dapl/common/dapl_evd_connection_callb.c === --- dapl/common/dapl_evd_connection_callb.c (revision 8166

[openib-general] [PATCH] uDAPL cma: lower debug level on consumer rejects

2006-06-22 Thread Arlin Davis
James, Lower the reject debug message level so we don't see warnings when consumers reject. Signed-off by: Arlin Davis [EMAIL PROTECTED] Index: dapl/openib_cma/dapl_ib_cm.c === --- dapl/openib_cma/dapl_ib_cm.c(revision

[openib-general] [PATCH] uDAPL cma - event processing bug

2006-06-21 Thread Arlin Davis
James, Fix bug in dapls_ib_get_dat_event() call after adding new unreachable event. -arlin Signed-off by: Arlin Davis [EMAIL PROTECTED] Index: dapl/openib_cma/dapl_ib_cm.c === --- dapl/openib_cma/dapl_ib_cm.c(revision

Re: [openib-general] dapltest gets segfaulted in librdmacm init

2006-06-19 Thread Arlin Davis
Or Gerlitz wrote: After fixing the ucma/port space issue with the calls to rdma_create_id i am now trying to run $ ./Target/dapltest -T S -D OpenIB-cma and getting an immediate segfault with the below trace, any idea? Hmm, no idea. I just updated to 8112 and everything runs fine for

Re: [openib-general] Processes not exiting on SVN7946

2006-06-15 Thread Arlin Davis
Woodruff, Robert J wrote: It appears that processes are not exiting cleanly on SVN7946 trunk backported to 2.6.9-34 EL. They seem to be stuck in a state of DL and I cannot even attach to them wil gdb or kill them with a kill -9. [EMAIL PROTECTED] core]# ps -uax | grep IMB woody 4087

Re: [openib-general] Processes not exiting on SVN7946

2006-06-15 Thread Arlin Davis
Roland Dreier wrote: Roland Hmm, any further clue where in ibv_destroy_cq() it's Roland stuck? Is it doing down_write() or something? Can you send me full sysrq-t output when it gets stuck? Thanks... I just added ibv_destroy_cq() to ibv_rc_pingpong test. Here's the output

Re: [openib-general] Processes not exiting on SVN7946

2006-06-15 Thread Arlin Davis
Roland Dreier wrote: OK, just a dumb oversight on my part. The change below (already checked in) fixes it for me: --- infiniband/core/uverbs_cmd.c (revision 8055) +++ infiniband/core/uverbs_cmd.c (working copy) @@ -1123,6 +1123,12 @@ ssize_t ib_uverbs_create_qp(struct ib_uv

Re: [openib-general] [PATCH] uDAPL openib-cma provider - add support for IB_CM_REQ_OPTIONS

2006-06-13 Thread Arlin Davis
Tziporet Koren wrote: Jack put the bug fix to OFED 1.0. Tziporet Great. Did the CMA module (SVN 7742) changes also get in? If not, uDAPL is out of sync with CMA and will not work. -arlin ___ openib-general mailing list

Re: [openib-general] OFED 1.0 release schedule

2006-06-13 Thread Arlin Davis
Woodruff, Robert J wrote: Tziporet wrote, We upload OFED-1.0-pre1.tgz to https://openib.org/svn/gen2/branches/1.0/ofed/releases/ I tried the new tar ball and the pathscale driver now compiles (on Redhat EL4 - U3) and IPoIB and OpenSM appear to work OK, but Intel MPI/uDAPL and

[openib-general] [PATCH] uDAPL cma provider - add missing ia_attributes for the ia_query

2006-06-13 Thread Arlin Davis
James, Here are some changes to include some missing IA attributes during a query. -arlin Signed-off by: Arlin Davis [EMAIL PROTECTED] Index: dapl/openib_cma/dapl_ib_util.c === --- dapl/openib_cma/dapl_ib_util.c (revision

Re: [openib-general] [Bug 126] RDMA_CM and UCM not loaded on boot

2006-06-12 Thread Arlin Davis
[EMAIL PROTECTED] wrote: http://openib.org/bugzilla/show_bug.cgi?id=126 [EMAIL PROTECTED] changed: What|Removed |Added Status|NEW |RESOLVED

Re: [openib-general] IB MTU tunable for uDAPL and/or Intel MPI?

2006-06-12 Thread Arlin Davis
Scott Weitzenkamp (sweitzen) wrote: This didn't help. Osu_bibw.c still reports max bi bandwidth in the 1600s, should be in the 1900s. I looked back at my notes, and OFED 1.0 rc4 had desired max bi bandwidth with OFED 1.0 rc4, did the uDAPL IB MTU change? uDAPL does not have any control over

Re: [openib-general] IB MTU tunable for uDAPL and/or Intel MPI?

2006-06-09 Thread Arlin Davis
Scott Weitzenkamp (sweitzen) wrote: While we're talking about MTUs, is the IB MTU tunable in uDAPL and/or Intel MPI via env var or config file? Looks like Intel MPI 2.0.1 uses 2K for IB MTU like MVAPICH does in OFED 1.0 rc4 and rc6, I'd like to try 1K with Intel MPI. Scott There is no

Re: [openib-general] [PATCH] uDAPL openib-cma provider - add support for IB_CM_REQ_OPTIONS

2006-06-09 Thread Arlin Davis
James Lentini wrote: On Thu, 8 Jun 2006, Jack Morgenstein wrote: On Wednesday 07 June 2006 18:26, James Lentini wrote: On Wed, 7 Jun 2006, Jack Morgenstein wrote: This (bug fix) can still be included in next-week's release, if you think it is important (I have extracted it from

[openib-general] [PATCH] uDAPL openib_cma, cleanup reported CM error events, add TIMEOUT

2006-06-09 Thread Arlin Davis
James, I cleaned up the connection error events to report the proper events during address resolution errors and timeouts. It was returning incorrect DAT event codes. -arlin Signed-off by: Arlin Davis [EMAIL PROTECTED] Index: dapl_ib_cm.c

Re: [openib-general] [PATCH] uDAPL openib-cma provider - add support for IB_CM_REQ_OPTIONS

2006-06-07 Thread Arlin Davis
Scott Weitzenkamp (sweitzen) wrote: Yes, the modules were loaded. Each of the 32 hosts had 3 IB ports up. Does Intel MPI or uDAPL use multiple ports and/or multiple HCAs? I shut down all but one port on each host, and now Pallas is running better on the 32 nodes using Intel MPI 2.0.1. HP

Re: [openib-general] [PATCH] uDAPL openib-cma provider - add support for IB_CM_REQ_OPTIONS

2006-06-06 Thread Arlin Davis
Scott Weitzenkamp (sweitzen) wrote: Arlin, I'm having trouble running Intel MPI 2.0.1 and OFED 1.0 rc5 with Intel MPI Benchmark 2.3 on a 32-node PCI-X RHEL4 U3 i686 cluster. This thread caught my eye, can you look at my output and tell me if this is the same issue? If not, are there other

Re: [openib-general] Re: which dapl/udapl changes in trunk should be imported into OFED branch? (patch enclosed)

2006-05-25 Thread Arlin Davis
Jack Morgenstein wrote: I'm not familiar with what is needed by Intel MPI. My understanding is that Intel MPI works with revision 7141, but you should confirm that with Arlin. Arlin, could you please indicate which is the earliest revision that Intel MPI works with? Yes, 7141

[openib-general] [PATCH] uDAPL: fix uCMA provider event types and dapl_ep_create segv bug

2006-05-17 Thread Arlin Davis
James, Fix for uCMA provider to return the correct event as a result of rejects. Also, ran into a segv bug with dapl_ep_create when creating without a conn_evd. Thanks, -arlin Signed-off by: Arlin Davis [EMAIL PROTECTED] Index: dapl/common/dapl_ep_create.c

[openib-general] RE: [PATCH2] uDAPL: fix uCMA provider event types and dapl_ep_create segv bug

2006-05-17 Thread Arlin Davis
-Original Message- From: Arlin Davis [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 17, 2006 12:17 PM To: 'James Lentini' Cc: openib-general Subject: [PATCH] uDAPL: fix uCMA provider event types and dapl_ep_create segv bug James, Fix for uCMA provider to return the correct event

Re: [openib-general] [PATCH 15 of 53] ipath - make some maximum values more sane

2006-05-16 Thread Arlin Davis
Bryan O'Sullivan wrote: Increase the limits on some maximum values. I noticed a rdma/message max size limitation of 4096 the last time I ran some dapl tests. Are there plans to increase or did I miss it somewhere in all the patches? Here are the max values returned from the ipath

Re: [openib-general] Quick RDMA Write with Immediate Data Item question

2006-05-12 Thread Arlin Davis
Roland Dreier wrote: Steven With an RDMA Write with Immediate Data Item transfer, in Steven the CQE at the destination (the thing that has the Steven Immediate Data it), does the CQE also contain the memory Steven location where the message just got written too? i.e. does Steven

[openib-general] [PATCH] update uDAPL openib_cma provider to work with new uCMA event channels

2006-05-05 Thread Arlin Davis
James, Update the uDAPL openib_cma provider to work with the new uCMA event channel interface. I ran a full set of Intel-MPI test suites with these latest changes and it looks fine. Sync up with Sean on commits. Signed-off by: Arlin Davis [EMAIL PROTECTED] Index: dapl/openib_cma

[openib-general] RE: [PATCH2] uDAPL openib_cma: fixed address bindings, getaddrinfo, and added debug messages for rejects

2006-04-27 Thread Arlin Davis
James, Here is a new patch with your recommended changes. -arlin Signed-off by: Arlin Davis [EMAIL PROTECTED] Index: dapl/openib_cma/dapl_ib_util.c === --- dapl/openib_cma/dapl_ib_util.c (revision 6672) +++ dapl/openib_cma

[openib-general] [PATCH] uDAPL openib_cma: fixed address bindings, getaddrinfo, and added debug messages for rejects

2006-04-26 Thread Arlin Davis
by: Arlin Davis [EMAIL PROTECTED] Index: dapl/openib_cma/dapl_ib_util.c === --- dapl/openib_cma/dapl_ib_util.c (revision 6672) +++ dapl/openib_cma/dapl_ib_util.c (working copy) @@ -121,11 +121,12 @@ static int getipaddr(char

Re: [openib-general] Re: [uDAPL] dat.conf generator

2006-04-19 Thread Arlin Davis
James Lentini wrote: On Tue, 18 Apr 2006, Dotan Barak wrote: On Monday 17 April 2006 23:46, James Lentini wrote: On Sun, 16 Apr 2006, Dotan Barak wrote: On Wednesday 12 April 2006 17:50, James Lentini wrote: OpenIB-cma u1.2 nonthreadsafe default

Re: [openib-general] Re: [uDAPL] dtest server never ends when using the dapl provider OpenIB-scm1

2006-04-19 Thread Arlin Davis
Dotan Barak wrote: Can you attach to the server process with gdb and get me a back trace from each of the threads? What does driver IBED-1.0-rc3 consist of? Thanks, -arlin Here is a back trace of the hanged process: (gdb) bt #0 0x2b31c86a in

Re: [openib-general] Re: [uDAPL] dtest server never ends when using the dapl provider OpenIB-scm1

2006-04-13 Thread Arlin Davis
Dotan Barak wrote: Hi. thanks for the quick response. I executed the dtest with the -v parameter and here is the output of both sides. I added the test the '-l' parameter to be able to change to dapl provider in command line (if you wish i can post you a patch). full server output:

Re: [openib-general] Re: [uDAPL] dtest server never ends when using the dapl provider OpenIB-scm1

2006-04-11 Thread Arlin Davis
James Lentini wrote: It sounds like the disconnect is being lost. Let me see if I can reproduce this. Arlin, have you ever seen this? No. it runs fine on my systems. It looks like the ping pong test on the server side did not finish. Can Dotan add a -v switch to the dtest to help

Re: [openib-general] Re: [PATCH] uDAPL cma; dat_ep_free can return without freeing cm_id

2006-04-10 Thread Arlin Davis
Steve Wise wrote: Steve, can you test this version and see if it works for your iWARP device. I think the patch is good. I ran dapltest/regress.sh over the chelsio iwarp device using this new patch instead of my original patch, and things seem as stable as they were before (i'm

Re: [openib-general] [uDAPLl] question about dapl_ib_cq_resize

2006-04-10 Thread Arlin Davis
Dotan Barak wrote: Hi. I looked at the file: src/userspace/dapl/dapl/openib/dapl_ib_cq.c, function: dapl_ib_cq_resize: In this function, when one wants to resize a CQ, the dapl destroys the old CQ and creates a new one instead of calling to the resize CQ verb (which was added ~3 months

[openib-general] Re: [PATCH] [RFC] - dapl - dat_ep_free() can return without

2006-04-06 Thread Arlin Davis
James Lentini wrote: On Tue, 4 Apr 2006, Steve Wise wrote: What happens if a consumer attempts to free the EP from a callback? There are no direct consumer callbacks in usermode are there? consumers call dat_evd_wait() or whatever and get scheduled. Not like kernel mode... Or

[openib-general] [PATCH] uDAPL cma; dat_ep_free can return without freeing cm_id

2006-04-06 Thread Arlin Davis
if it works for your iWARP device. Thanks, -arlin Signed-off by: Arlin Davis [EMAIL PROTECTED]   Index: dapl/openib_cma/dapl_ib_cm.c === --- dapl/openib_cma/dapl_ib_cm.c(revision 6305) +++ dapl/openib_cma/dapl_ib_cm.c

Re: [openib-general] how to execute the dtest?

2006-04-05 Thread Arlin Davis
Dotan Barak wrote: Some more info: when i changed the dat.conf to be: OpenIB-cma u1.2 nonthreadsafe default /usr/local/lib/libdaplcma.so mv_dapl.1.2 ib0 0 OpenIB-cma-ip u1.2 nonthreadsafe default /usr/local/lib/libdaplcma.so mv_dapl.1.2 192.168.0.22 0 OpenIB-cma-name u1.2 nonthreadsafe

Re: [openib-general] Re: SPAM: [PATCH] [RFC] - dapl - dat_ep_free() can return without freeing the endpoint

2006-04-05 Thread Arlin Davis
Sean Hefty wrote: James Lentini wrote: void dapli_destroy_conn(struct dapl_cm_id *conn) { int in_callback; +struct rdma_cm_id *cm_id; dapl_dbg_log(DAPL_DBG_TYPE_CM, destroy_conn: conn %p id %d\n, conn,conn-cm_id); - dapl_os_lock(conn-lock);

[openib-general] Re: [DAPL] Provider initialialization

2006-04-04 Thread Arlin Davis
James Lentini wrote: Arlin, As part of the uDAPL autotools patch, we changed the mechanism by which the uDAPL provider library's init and fini functions were specified. I've seen (and received reports) of systems on which the init and fini functions are not being called. I'd like to move

Re: [openib-general] Re: [PATCH3] uDAPL/uDAT autotools - Package for udat, udaplcma, udaplscm

2006-03-21 Thread Arlin Davis
James Lentini wrote: On Fri, 17 Mar 2006, Davis, Arlin R wrote: Here is a patch with all the latest changes, including an updated dat.conf. Committed in the trunk and 1.0 branch in revision 5880. James, The spec file name is wrong; libdat.spec should be /libdat.spec.in -arlin

Re: [openib-general] RE: [PATCH2] uDAPL/uDAT autotools - Package for udat, udaplcma, udaplscm

2006-03-17 Thread Arlin Davis
James Lentini wrote: On Thu, 16 Mar 2006, Arlin Davis wrote: This looks excellent Arlin. I think it is essentailly ready to go. I have a few minor questions/commnets: I'll add an empty directory called config to the root of the dapl tree. perfect. Index: AUTHORS

[openib-general] [PATCH] uDAPL cma, missing cma event ack

2006-03-16 Thread Arlin Davis
James, Fixes a corner case where a CMA event was not acknowledged during disconnect processing. -arlin Signed-off by: Arlin Davis [EMAIL PROTECTED] Index: dapl/openib_cma/dapl_ib_cm.c === --- dapl/openib_cma/dapl_ib_cm.c

Re: [openib-general] Re: [PATCH] uDAPL/uDAT autotools - Package for udat, udaplcma, udaplscm

2006-03-16 Thread Arlin Davis
Bryan O'Sullivan wrote: Thanks, Arlin. I think we'll use this stuff instead of Moni's patch, as it's smaller and cleaner. I do have a hack regarding the disti switch (OSVENDOR=REDHAT_EL4) required for the uDAPL build. We need to clean this up with a real disti check. I wasn't sure how to

[openib-general] [PATCH 1.0] uDAPL - QP destroy and HCA close problems fixed

2006-02-27 Thread Arlin Davis
require another small patch. Thanks, -arlin Signed-off by: Arlin Davis [EMAIL PROTECTED] Index: dapl/openib_cma/dapl_ib_util.c === --- dapl/openib_cma/dapl_ib_util.c (revision 5489) +++ dapl/openib_cma/dapl_ib_util.c (working

Re: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-09 Thread Arlin Davis
Roland Dreier wrote: Hmm. Can you put a number on how much better RDMA write with immediate is on current HCA hardware? How does using the underlying OpenIB verbs ability to post a list of work requests compare (ie posting an RDMA write followed by a send in one verbs call)? Maybe post

Re: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-09 Thread Arlin Davis
Michael Krause wrote: RDMA Write with Immediate is part of the IB Extended Transport Header. It is a fixed-sized quantity and not one subject to change, i.e. increasing its size. Your argument above reinforces that the particular application need is IB-specific and thus should not be part

Re: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-08 Thread Arlin Davis
Roland Dreier wrote: Michael So, here we have a long discussion on attempting to Michael perpetuate a concept that is not universal across Michael transports and was deemed to have minimal value that most Michael wanted to see removed from the architecture. But this discussion is

Re: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-07 Thread Arlin Davis
Sean Hefty wrote: The requirement is to provide an API that supports RDMA writes with immediate data. A send that follows an RDMA write is not immediate data, and the API should not be constructed around trying to make it so. To be clear, I believe that write with immediate should be

RE: [openib-general] RE: [RFC] DAT 2.0 immediate data proposal

2006-01-26 Thread Arlin Davis
But this penalizes user which need to deal with 2 way to deal with post calls and completions. I do not think we are not to far from consensus. Transport independent App will allocate 4 bytes extra for buffers that can match immediate data. Completion data will return where the immediate data is

Re: [openib-general] RE: [RFC] DAT 2.0 immediate data proposal

2006-01-24 Thread Arlin Davis
ok, maybe we should backup and start over This is exactly why immediate data was initially proposed as an extension instead of general API. We start to penalize native IB features based on the requirements of other RDMA interfaces that have to emulate the feature anyway. What prevents

Re: [openib-general] RE: [RFC] DAT 2.0 extension proposal

2006-01-17 Thread Arlin Davis
Kanevsky, Arkady wrote: Arlin, 1. Does it mean that existing DAT providers will have to be modified so they report DAT_NOT_IMPLEMENTED for each extension? No. During the open, a dat library built to support extensions, a query call is made to verify that the provider supports extensions

Re: [openib-general] RE: [RFC] DAT 2.0 extension proposal

2006-01-17 Thread Arlin Davis
Arlin Davis wrote: Kanevsky, Arkady wrote: 5. Memory protection extension for atomic operations 6. error returns for extensions? yes and yes; I will work these into the next patch and update the proposal. For error returns I am thinking about carving up the return type, adding

Re: [openib-general] RE: [RFC][PATCH] OpenIB uDAPL extension proposal - sample immed data and atomic api's

2006-01-10 Thread Arlin Davis
Kanevsky, Arkady wrote: comments inline. As mentioned on the con-call, there are two separate items to consider while looking at the proposal. The first is the ability to extend DAT for specific provider value-add and the second is to validate the need for general atomic and immediate

Re: [openib-general] RE: [RFC][PATCH] OpenIB uDAPL extension proposal - sample immed data and atomic api's

2006-01-05 Thread Arlin Davis
Kanevsky, Arkady wrote: Arlin, nice proposal, thanks. I have one high level question and a few specific technical ones. 1. Why do you want to provide this functionality via extension instead of part of new DAT spec, say 2.0? This will allow Consumers to use all events, operations, and

[openib-general] [PATCH] uDAPL openib_cma disconnect processing fix

2006-01-04 Thread Arlin Davis
James, Here is a patch to fix up the disconnect event processing and a change to dtest to validate. Tested with dtest and dapltest. -arlin Signed-off-by: Arlin Davis [EMAIL PROTECTED] Index: test/dtest/dtest.c === --- test

Re: [openib-general] Re: [RFC][PATCH] OpenIB uDAPL extension proposal - sample immed data and atomic api's

2006-01-03 Thread Arlin Davis
James Lentini wrote: On Fri, 23 Dec 2005, Arlin Davis wrote: arlin arlin A single entry point is still there with this patch, I just arlin defined it a little different with a function definition for arlin better DAT API mappings. The idea was to replace the existing arlin pvoid extension

Re: [openib-general] uDAPL disconnect events

2006-01-03 Thread Arlin Davis
Jimmy Hill wrote: I'm running with the latest OpenIB Gen2 uDAPL code (CMA version) and have encountered a problem with disconnect events. The basic problem is that both sides have to call dat_ep_disconnect in order to break down a connection cleanly. It should be possible for just one side

  1   2   >