Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-02-03 Thread Brad Benton
What IP interfaces are configured on the cluster? In particular, are there IPoIB interfaces that are configured? If you use the dynamic connection method but restrict either the number or type of IP interfaces to be used via oob_tcp_if_{include,exclude}, do you still see the problem? --brad

Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-26 Thread Doron Shoham
using the flag --mca mpi_preconnect_mpi seems to solved the issue with the oob connection manager. This solution is not scalable but it looks more and more like a connection establishment problem. I'm still trying to figure out what is the root cause of this and how to solve it. Any ideas will be

Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-18 Thread Terry Dontje
On 01/18/2011 07:48 AM, Jeff Squyres wrote: > IBCM is broken and disabled (has been for a long time). > > Did you mean RDMACM? > > No I think I meant OMPI oob. sorry, -- Oracle Terry D. Dontje | Principal Software Engineer Developer Tools Engineering | +1.781.442.2631 Oracle *- Performance

Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-18 Thread Jeff Squyres
+1. I'm afraid I don't know why offhand there would be such differences. I'm thinking that you'll need to dive a little deeper to figure it out; sorry. :-( On Jan 16, 2011, at 10:54 AM, Shamis, Pavel wrote: > Well, then I would suspect rdmacm vs oob QP configuration. They supposed to > be

Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-18 Thread Jeff Squyres
IBCM is broken and disabled (has been for a long time). Did you mean RDMACM? On Jan 18, 2011, at 6:22 AM, Terry Dontje wrote: > Could the issue have anything to do with the how OMPI implements lazy > connections with IBCM? Does setting the mca parameter mpi_preconnect_all to > 1 change

Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-18 Thread Terry Dontje
Could the issue have anything to do with the how OMPI implements lazy connections with IBCM? Does setting the mca parameter mpi_preconnect_all to 1 change things? --td On 01/16/2011 04:12 AM, Doron Shoham wrote: Hi, The gather hangs only in liner_sync algorithm but works with basic_linear

Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-16 Thread Shamis, Pavel
Well, then I would suspect rdmacm vs oob QP configuration. They supposed to be the same, but probably it's some bug there, and somehow rdmacm QP tuning different from oob, it is potential source cause for the performance differences that you see. Regards, Pavel (Pasha) Shamis --- Application

Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-16 Thread Doron Shoham
Hi, The gather hangs only in liner_sync algorithm but works with basic_linear and binomial algorithms. The gather algorithm is choosen dynamiclly depanding on block size and communicator size. So, in the beginning, binomial algorithm is chosen (communicator size is larger then 60). When

Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-13 Thread Shamis, Pavel
RDMACM creates the same QPs with the same tunings as OOB, so I don't see how CPC may effect on performance. Pavel (Pasha) Shamis --- Application Performance Tools Group Computer Science and Math Division Oak Ridge National Laboratory On Jan 13, 2011, at 2:15 PM, Jeff Squyres wrote: > +1

Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-13 Thread Jeff Squyres
+1 on what Pasha said -- if using rdmacm fixes the problem, then there's something else nefarious going on... You might want to check padb with your hangs to see where all the processes are hung to see if anything obvious jumps out. I'd be surprised if there's a bug in the oob cpc; it's been

Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-13 Thread Nysal Jan
Try manually specifying the collective component "-mca coll tuned" You seem to be using the "sync" collective component, any stale mca param files lying around ? --Nysal On Tue, Jan 11, 2011 at 6:28 PM, Doron Shoham wrote: > Hi > > All machines on the setup are IDataPlex

Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-12 Thread Shamis, Pavel
RDMACM or OOB can not effect on performance of this benchmark, since they are not involved in communication. So I'm not sure that the performance changes that you see are related to connection manager changes. About oob - I'm not aware about hangs issue there, the code is very-very old, we did

Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-12 Thread Doron Shoham
Hi, For the first problem, I can see that when using rdmacm as openib oob I get much better performence results (and no hangs!). mpirun -display-map -np 64 -machinefile voltairenodes -mca btl sm,self,openib -mca btl_openib_connect_rdmacm_priority 100 imb/src/IMB-MPI1 gather -npmin 64 #bytes

[OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-11 Thread Doron Shoham
Hi All machines on the setup are IDataPlex with Nehalem 12 cores per node, 24GB memory. · *Problem 1 – OMPI 1.4.3 hangs in gather:* I’m trying to run IMB and gather operation with OMPI 1.4.3 (Vanilla). It happens when np >= 64 and message size exceed 4k: mpirun -np 64