Bob,

> I have tested what is on the RedHat EL4.0 U3 with Intel MPI and it
> worked ok, so RedHat EL4.0 U3 has all of the userspace libraries needed
> to run MVAPICH, although I have not tried it, but I suspect it will work.
> There is one issue that I ran into with the stock RedHat EL4 U3 release
> and that is with the new Mellenox DDR card I had some problems with rdma,
> using uDAPL and suspect you would see the same issues with MVAPICH with
> those cards.
> The SDR cards seem to work fine with the code that is on the RedHat CD.

We are running RHEL4 U3 and the MVAPICH version from the OpenIB gen2 trunk.  We were able to run the OSU benchmark tests (osu_bw, osu_bibw, and osu_latency) with the Mellanox SDR cards successfully, but when we swapped out the cards for DDR cards, we ran into some problems. We can run some MPI jobs like the simple "calculate pi" job (cpi.c),  and we can run an MPING application, but when we try to run the benchmark tests, we get the following:

[koa] (ib) ib> mpirun_rsh -np 2 koa jatoba /home/ib/mpi/tests/osu/osu_bw
# OSU MPI Bandwidth Test (Version 2.1)
# Size          Bandwidth (MB/s)
[0] Abort: [koa.az05.bull.com:0] Got completion with error, code=1
 at line 2148 in file viacheck.c
mpirun_rsh: Abort signaled from [0]
done.

Looking at the viacheck.c file,  it seems that this error is generated when a bad status is found in the status of a completion queue entry.   From the "code=1" ,  it may be some sort of "length error".    This could be coming from the driver or the card, I suppose?   That's as far as I have gotten so far.

Does this sound like any of the "issues" you referred to above relative to RHEL4 U3 and the DDR cards?   If so, is there a fix?

-Don Albert-
Bull HN Info Systems
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to