What IP interfaces are configured on the cluster? In particular, are there
IPoIB interfaces that are configured? If you use the dynamic connection
method but restrict either the number or type of IP interfaces to be used
via oob_tcp_if_{include,exclude}, do you still see the problem?
--brad
using the flag --mca mpi_preconnect_mpi seems to solved the issue with the
oob connection manager.
This solution is not scalable but it looks more and more like a connection
establishment problem.
I'm still trying to figure out what is the root cause of this and how to
solve it.
Any ideas will be
On 01/18/2011 07:48 AM, Jeff Squyres wrote:
> IBCM is broken and disabled (has been for a long time).
>
> Did you mean RDMACM?
>
>
No I think I meant OMPI oob.
sorry,
--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance
+1. I'm afraid I don't know why offhand there would be such differences. I'm
thinking that you'll need to dive a little deeper to figure it out; sorry. :-(
On Jan 16, 2011, at 10:54 AM, Shamis, Pavel wrote:
> Well, then I would suspect rdmacm vs oob QP configuration. They supposed to
> be
IBCM is broken and disabled (has been for a long time).
Did you mean RDMACM?
On Jan 18, 2011, at 6:22 AM, Terry Dontje wrote:
> Could the issue have anything to do with the how OMPI implements lazy
> connections with IBCM? Does setting the mca parameter mpi_preconnect_all to
> 1 change
Could the issue have anything to do with the how OMPI implements lazy
connections with IBCM? Does setting the mca parameter
mpi_preconnect_all to 1 change things?
--td
On 01/16/2011 04:12 AM, Doron Shoham wrote:
Hi,
The gather hangs only in liner_sync algorithm but works with
basic_linear
Well, then I would suspect rdmacm vs oob QP configuration. They supposed to be
the same, but probably it's some bug there, and somehow rdmacm QP tuning
different from oob, it is potential source cause for the performance
differences that you see.
Regards,
Pavel (Pasha) Shamis
---
Application
Hi,
The gather hangs only in liner_sync algorithm but works with
basic_linear and binomial algorithms.
The gather algorithm is choosen dynamiclly depanding on block size and
communicator size.
So, in the beginning, binomial algorithm is chosen (communicator size
is larger then 60).
When
RDMACM creates the same QPs with the same tunings as OOB, so I don't see how
CPC may effect on performance.
Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Jan 13, 2011, at 2:15 PM, Jeff Squyres wrote:
> +1
+1 on what Pasha said -- if using rdmacm fixes the problem, then there's
something else nefarious going on...
You might want to check padb with your hangs to see where all the processes are
hung to see if anything obvious jumps out. I'd be surprised if there's a bug
in the oob cpc; it's been
Try manually specifying the collective component "-mca coll tuned"
You seem to be using the "sync" collective component, any stale mca param
files lying around ?
--Nysal
On Tue, Jan 11, 2011 at 6:28 PM, Doron Shoham wrote:
> Hi
>
> All machines on the setup are IDataPlex
RDMACM or OOB can not effect on performance of this benchmark, since they are
not involved in communication. So I'm not sure that the performance changes
that you see are related to connection manager changes.
About oob - I'm not aware about hangs issue there, the code is very-very old,
we did
Hi,
For the first problem, I can see that when using rdmacm as openib oob
I get much better performence results (and no hangs!).
mpirun -display-map -np 64 -machinefile voltairenodes -mca btl
sm,self,openib -mca btl_openib_connect_rdmacm_priority 100
imb/src/IMB-MPI1 gather -npmin 64
#bytes
Hi
All machines on the setup are IDataPlex with Nehalem 12 cores per node, 24GB
memory.
· *Problem 1 – OMPI 1.4.3 hangs in gather:*
I’m trying to run IMB and gather operation with OMPI 1.4.3 (Vanilla).
It happens when np >= 64 and message size exceed 4k:
mpirun -np 64
14 matches
Mail list logo