Re: [OMPI users] Big jump from OFED 1.5.4.1 -> recent (stable). Any suggestions?

2016-06-14 Thread Mehmet Belgin
Hi Llolsten, We are trying to keep up with the latest Open MPI as we can and keep a few old versions around (oldest is 1.6), with RHEL 6.5. The OFED upgrade will complement planned OS upgrades to the latest stable RHEL 6.x (probably 6.7, not sure if we will go with 6.8). Did you have to

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Jason Maldonis
Thanks Ralph for all the help. I will do that until it gets fixed. Nathan, I am very very interested in this working because we are developing some new cool code for research in materials science. This is the last piece of the puzzle for us I believe. I can use TCP for now though of course. While

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Ralph Castain
You don’t want to always use those options as your performance will take a hit - TCP vs Infiniband isn’t a good option. Sadly, this is something we need someone like Nathan to address as it is a bug in the code base, and in an area I’m not familiar with For now, just use TCP so you can move

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Jason Maldonis
Ralph, The problem *does* go away if I add "-mca btl tcp,sm,self" to the mpiexec cmd line. (By the way, I am using mpiexec rather than mpirun; do you recommend one over the other?) Will you tell me what this means for me? For example, should I always append these arguments to mpiexec for my

Re: [OMPI users] Client-Server Shared Memory Transport

2016-06-14 Thread Ralph Castain
Nope - we don’t currently support cross-job shared memory operations. Nathan has talked about doing so for vader, but not at this time. > On Jun 14, 2016, at 12:38 PM, Louis Williams > wrote: > > Hi, > > I am attempting to use the sm and vader BTLs between a

[OMPI users] Client-Server Shared Memory Transport

2016-06-14 Thread Louis Williams
Hi, I am attempting to use the sm and vader BTLs between a client and server process, but it seems impossible to use fast transports (i.e. not TCP) between two independent groups started with two separate mpirun invocations. Am I correct, or is there a way to communicate using shared memory

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Nathan Hjelm
That message is coming from udcm in the openib btl. It indicates some sort of failure in the connection mechanism. It can happen if the listening thread no longer exists or is taking too long to process messages. -Nathan On Jun 14, 2016, at 12:20 PM, Ralph Castain wrote:

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Ralph Castain
Hmm…I’m unable to replicate a problem on my machines. What fabric are you using? Does the problem go away if you add “-mca btl tcp,sm,self” to the mpirun cmd line? > On Jun 14, 2016, at 11:15 AM, Jason Maldonis wrote: > > Hi Ralph, et. al, > > Great, thank you for the

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Jason Maldonis
Hi Ralph, et. al, Great, thank you for the help. I downloaded the mpi loop spawn test directly from what I think is the master repo on github: https://github.com/open-mpi/ompi/blob/master/orte/test/mpi/loop_spawn.c I am still using the mpi code from 1.10.2, however. Is that test updated with the

Re: [OMPI users] Big jump from OFED 1.5.4.1 -> recent (stable). Any suggestions?

2016-06-14 Thread Llolsten Kaonga
Hello Grigory, I am not sure what Redhat does exactly but when you install the OS, there is always an InfiniBand Support module during the installation process. We never check/install that module when we do OS installations because it is usually several versions of OFED behind (almost obsolete).

Re: [OMPI users] Big jump from OFED 1.5.4.1 -> recent (stable). Any suggestions?

2016-06-14 Thread Grigory Shamov
On 2016-06-14, 3:42 AM, "users on behalf of Peter Kjellström" wrote: >On Mon, 13 Jun 2016 19:04:59 -0400 >Mehmet Belgin wrote: > >> Greetings! >> >> We have not upgraded our OFED stack for a very long

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Ralph Castain
I dug into this a bit (with some help from others) and found that the spawn code appears to be working correctly - it is the test in orte/test that is wrong. The test has been correctly updated in the 2.x and master repos, but we failed to backport it to the 1.10 series. I have done so this

[OMPI users] MPI_File_read+MPI_BOTTOM crash on NFS ?

2016-06-14 Thread Nicolas Joly
Hi, At work, i do have some mpi codes that make use of custom datatypes to call MPI_File_read with MPI_BOTTOM ... It mostly works, except when the underlying filesystem is NFS where if crash with SIGSEGV. The attached sample (code + data) works just fine with 1.10.1 on my NetBSD/amd64

Re: [OMPI users] scatter/gather, tcp, 3 nodes, homogeneous, # RAM

2016-06-14 Thread Gilles Gouaillardet
Note if your program is synchronous, it will run at the speed of the slowest task. (E.g. Tasks on node 2, 1GB per task, will wait for the other tasks, 2 GB per task) You can use MPI_Comm_split_type in order to create inter node communicators. Then you can find how much memory is available per

[OMPI users] scatter/gather, tcp, 3 nodes, homogeneous, # RAM

2016-06-14 Thread MM
Hello, I have the following 3 1-socket nodes: node1: 4GB RAM 2-core: rank 0 rank 1 node2: 4GB RAM 4-core: rank 2 rank 3 rank 4 rank 5 node3: 8GB RAM 4-core: rank 6 rank 7 rank 8 rank 9 I have a model that takes a input and produces a output, and I want to run this model for N possible

Re: [OMPI users] OMPI users] Big jump from OFED 1.5.4.1 -> recent (stable). Any suggestions?

2016-06-14 Thread Gilles Gouaillardet
HI, Unless you provide Open MPI static libraries, you might not be required to rebuild your apps. You will likely have to / should rebuild OpenMPI though Cheers, Gilles Peter Kjellström wrote: >On Mon, 13 Jun 2016 19:04:59 -0400 >Mehmet Belgin

Re: [OMPI users] Big jump from OFED 1.5.4.1 -> recent (stable). Any suggestions?

2016-06-14 Thread Peter Kjellström
On Mon, 13 Jun 2016 19:04:59 -0400 Mehmet Belgin wrote: > Greetings! > > We have not upgraded our OFED stack for a very long time, and still > running on an ancient version (1.5.4.1, yeah we know). We are now > considering a big jump from this version to a tested