I've attached a simple test program that should demonstrate the
limitations I'm seeing when joining multiple multicast groups; the idea
being to allow others to see the weirdness I'm seeing and make some
progress.
An MPI is needed to compile/run the test. No arguments are needed; the
test
Finally was able to have the SM switched over from Cisco on the switch
to OpenSM on a node. Responses inline below..
Sean Hefty wrote:
Now the more interesting part. I'm now able to run on a 128 node
machine using open SM running on a node (before, I was running on an 8
node machine which
Hal Rosenstock wrote:
I'm not quite parsing what is the same with what is different in the
results
(and I presume the only variable is SM).
Yes; this is confusing, I'll try to summarize the various behaviors I'm
getting.
First, there are two machines. One has 8 nodes and runs a Topspin
Andrew Friedley wrote:
Hal Rosenstock wrote:
I'm not quite parsing what is the same with what is different in the
results
(and I presume the only variable is SM).
Yes; this is confusing, I'll try to summarize the various behaviors I'm
getting.
First, there are two machines. One has 8
where to start
looking on my own, so even some hints on where the problem(s) might lie
would be useful.
Andrew
Andrew Friedley wrote:
I've run into a problem where it appears that I cannot join more than 14
multicast groups from a single HCA. I'm using the RDMA CM UD/multicast
interface from
I've run into a problem where it appears that I cannot join more than 14
multicast groups from a single HCA. I'm using the RDMA CM UD/multicast
interface from an OFED v1.2 nightly build, and using a '0' address when
joining to have the SM allocate an unused address. The first 14
Dotan Barak wrote:
In the following URL you can find a very simple example on how to use
multicast:
https://svn.openfabrics.org/svn/openib/trunk/contrib/mellanox/ibtp/gen2/userspace/useraccess/multicast_test/multicast_test.c
I seem to be missing v1.h on my OFED v1.2 nightly install, where can
You say that fixes the problem, does it work even when running more than
one MPI process per node? (that is the case the hack fixes) Simply
doing an mpirun with a -np paremeter higher than the number of nodes you
have set up should trigger this case, and making sure to use '-mca btl
Therefore, the only truly safe thing for an iWARP btl to do (or a
udapl btl since that is also an iWARP btl) is to have the active
layer send an MPI Layer nop of some kind immediately after
establishing the connection if there is nothing else to send.
This is fine for an
Steve Wise wrote:
On Wed, 2007-05-09 at 16:15 -0700, Andrew Friedley wrote:
Steve Wise wrote:
There have been a series of discussions on the ofa general list about
this issue, and the conclusion to date is that it cannot be resolved in
the rdma-cm or iwarp-cm code of the linux rdma stack
Steve Wise wrote:
Well I've tried OMPI on ofed-1.2 udapl today and it doesn't work. I'm
debugging now.
Here's part of the problem (from ompi/btl/udapl/btl_udapl.c):
/* TODO - big bad evil hack! */
/* uDAPL doesn't ever seem to keep track of ports with addresses. This
becomes
Moni Shoua wrote:
Andrew Friedley wrote:
Moni Shoua wrote:
Andrew Friedley wrote:
The chelsio build errors from yesterday appear to be gone, though now
I'm seeing errors building the IB bonding code with the 3/2 alpha
tarball -- error below. I'm wondering, is there a way to selectively
avoid
Steve Wise wrote:
Vlad,
This patch fixes the compile problems with Chelsio cxgb3 on RHEL4 U2/U3.
Looks like I have the problem for this fix on U4 as well, could this be
applied there too?
Andrew
___
general mailing list
13 matches
Mail list logo