Re: [OMPI users] Isend, Recv and Test

2016-05-06 Thread Zhen Wang
Jeff, The hardware limitation doesn't allow me to use anything other than TCP... I think I have a good understanding of what's going on, and may have a solution. I'll test it out. Thanks to you all. Best regards, Zhen On Fri, May 6, 2016 at 7:13 AM, Jeff Squyres (jsquyres)

[OMPI users] No core dump in some cases

2016-05-06 Thread dpchoudh .
Hello all I run MPI jobs (for test purpose only) on two different 'clusters'. Both 'clusters' have two nodes only, connected back-to-back. The two are very similar, but not identical, both software and hardware wise. Both have ulimit -c set to unlimited. However, only one of the two creates core

Re: [OMPI users] Error building openmpi-dev-4010-g6c9d65c on Linux with Sun C

2016-05-06 Thread Gilles Gouaillardet
Siegmar, at first glance, this looks like a crash of the compiler. so I guess the root cause is not openmpi (that being said, a workaround could be implemented in openmpi) Cheers, Gilles On Saturday, May 7, 2016, Siegmar Gross < siegmar.gr...@informatik.hs-fulda.de> wrote: > Hi, > > today I

[OMPI users] Error building openmpi-dev-4010-g6c9d65c on Linux with Sun C

2016-05-06 Thread Siegmar Gross
Hi, today I tried to build openmpi-dev-4010-g6c9d65c on my machines (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-5.1.0 and Sun C 5.13. I was successful on most machines, but I got the following error on my Linux machine for the Sun C compiler. tyr

Re: [OMPI users] SLOAVx alltoallv

2016-05-06 Thread Joshua Ladd
It did not make it upstream. Josh On Fri, May 6, 2016 at 9:28 AM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Dave, > > I briefly read the papers and it suggests the SLOAVx algorithm is > implemented by the ml collective module > this module had some issues and was judged not

Re: [OMPI users] SLOAVx alltoallv

2016-05-06 Thread Gilles Gouaillardet
Dave, I briefly read the papers and it suggests the SLOAVx algorithm is implemented by the ml collective module this module had some issues and was judged not good for production. it is disabled by default in the v1.10 series, and has been simply removed from the v2.x branch. you can either use

Re: [OMPI users] [open-mpi/ompi] COMM_SPAWN broken on Solaris/v1.10 (#1569)

2016-05-06 Thread Siegmar Gross
Hi Gilles, today I'm building all current versions with both compilers on my machines. Unfortunately it takes some hours, because especially my Solaris Sparc machine is old and slow. Yesterday I've had problems using two Sparc machines and nothing else. Tonight the new versions will be copied to

Re: [OMPI users] barrier algorithm 5

2016-05-06 Thread Dave Love
Gilles Gouaillardet writes: > Dave, > > > i made PR #1644 to abort with a user friendly error message > > https://github.com/open-mpi/ompi/pull/1644 Thanks. Could there be similar cases that might be worth a change?

[OMPI users] SLOAVx alltoallv

2016-05-06 Thread Dave Love
At the risk of banging on too much about collectives: I came across a writeup of the "SLOAVx" algorithm for alltoallv . It was implemented in OMPI with apparently good results, but I can't find any code. I wonder if anyone knows the story on

Re: [OMPI users] Isend, Recv and Test

2016-05-06 Thread Jeff Squyres (jsquyres)
On May 5, 2016, at 10:09 PM, Zhen Wang wrote: > > It's taking so long because you are sleeping for .1 second between calling > MPI_Test(). > > The TCP transport is only sending a few fragments of your message during each > iteration through MPI_Test (because, by definition,

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-06 Thread Jeff Squyres (jsquyres)
Ok, good. I asked that question because typically when we see errors like this, it is usually either a busted compiler installation or inadvertently mixing the run-times of multiple different compilers in some kind of incompatible way. Specifically, the mpifort (aka mpif90) application is a

Re: [OMPI users] [open-mpi/ompi] COMM_SPAWN broken on Solaris/v1.10 (#1569)

2016-05-06 Thread Gilles Gouaillardet
Siegmar, i was unable to reproduce the issue with one solaris 11 x86_64 VM and one linux x86_64 VM what is the minimal configuration you need to reproduce the issue ? are you able to reproduce the issue with only x86_64 nodes ? i was under the impression that solaris vs linux is the issue,

Re: [OMPI users] Multiple Non-blocking Send/Recv calls with MPI_Waitall fails when CUDA IPC is in use

2016-05-06 Thread Jiri Kraus
Hi Iman, How are you handling GPU affinity? Are you using CUDA_VISIBLE_DEVICES for that? If yes can you try using cudaSetDevice in your application instead? Also when multiple processes are assigned to a single GPU are you using MPS and what GPUs are your running this on? Hope this Helps

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-06 Thread Giacomo Rossi
Yes, I've tried three simple "Hello world" programs in fortan, C and C++ and the compile and run with intel 16.0.3. The problem is with the openmpi compiled from source. Giacomo Rossi Ph.D., Space Engineer Research Fellow at Dept. of Mechanical and Aerospace Engineering, "Sapienza" University of

Re: [OMPI users] barrier algorithm 5

2016-05-06 Thread Gilles Gouaillardet
Dave, i made PR #1644 to abort with a user friendly error message https://github.com/open-mpi/ompi/pull/1644 Cheers, Gilles On 5/5/2016 2:05 AM, Dave Love wrote: Gilles Gouaillardet writes: Dave, yes, this is for two MPI tasks only. the MPI subroutine