Re: [OMPI users] large memory usage and hangs when preconnecting beyond 1000 cpus

2014-10-20 Thread Marshall Ward
Thanks, it's at least good to know that the behaviour isn't normal! Could it be some sort of memory leak in the call? The code in ompi/runtime/ompi_mpi_preconnect.c looks reasonably safe, though maybe doing thousands of of isend/irecv pairs is causing problems with the buffer used in ptp

Re: [OMPI users] CuEventCreate Failed...

2014-10-20 Thread Steven Eliuk
Hi Sir, We are using cuda6.0 release and the 331.89 driver… Little background, the master does not init CUDA. We have tried this method too, having all five processes init cuda but it seems to cause the problem more easily. Yes the example below was on one machine, but we have seen it even

Re: [OMPI users] CuEventCreate Failed...

2014-10-20 Thread Rolf vandeVaart
Hi: I just tried running a program similar to yours with CUDA 6.5 and Open MPI and I could not reproduce. Just to make sure I am doing things correctly, your example below is running with np=5 and on a single node? Which version of CUDA are you using? Can you also send the output from

Re: [OMPI users] CuEventCreate Failed...

2014-10-20 Thread Steven Eliuk
Thanks for your quick response, 1)mpiexec --allow-run-as-root --mca btl_openib_want_cuda_gdr 1 --mca btl_openib_cuda_rdma_limit 6 --mca mpi_common_cuda_event_max 1000 -n 5 test/RunTests 2)Yes, cuda aware support using Mellanox IB, 3)Yes, we have the ability to use several version of