Re: [OMPI users] performance issue with OpenMPI 1.10.1

2015-12-16 Thread Gilles Gouaillardet
binding is somehow involved in this, and i do not believe vader nor openib are involved here. Could you please run again with the two ompi versions but in the *same* job ? and before invoking mpirun, could you do env | grep SLURM per your slurm request, you are running 64 tasks on 4 nodes. wi

Re: [OMPI users] OpenMPI non blocking I_Allreduce segfaults when using custom function..

2015-12-16 Thread Nathan Hjelm
I think this is fixed in the 1.10 series. We will not be making any more updates to the 1.8 series so you will need to update to 1.10 to get the fix. -Nathan On Wed, Dec 16, 2015 at 02:48:45PM -0500, Udayanga Wickramasinghe wrote: >Hi all, >I have a custom MPI_Op function which I use wit

Re: [OMPI users] performance issue with OpenMPI 1.10.1

2015-12-16 Thread Jingchao Zhang
Those jobs were launched with mpirun. Please see the attached files for the binding report with OMPI_MCA_hwloc_base_report_bindings=1. Here is a snapshot for v-1.10.1: [c2613.tusker.hcc.unl.edu:12049] MCW rank 0 is not bound (or bound to all available processors) [c2613.tusker.hcc.unl.edu:1204

Re: [OMPI users] performance issue with OpenMPI 1.10.1

2015-12-16 Thread Ralph Castain
When I see such issues, I immediately start to think about binding patterns. How are these jobs being launched - with mpirun or srun? What do you see if you set OMPI_MCA_hwloc_base_report_bindings=1 in your environment? > On Dec 16, 2015, at 11:15 AM, Jingchao Zhang wrote: > > Hi Gilles, > >

[OMPI users] OpenMPI non blocking I_Allreduce segfaults when using custom function..

2015-12-16 Thread Udayanga Wickramasinghe
Hi all, I have a custom MPI_Op function which I use within a non blocking version of all_reduce(). When executing the mpi program I am seeing a segfault thrown from libNBC. It seems like this is a known issue in openMPI atleast [1]. Is this somehow fixed in a later release version of openmpi ? I am

Re: [OMPI users] performance issue with OpenMPI 1.10.1

2015-12-16 Thread Jingchao Zhang
Hi Gilles, The LAMMPS jobs for both versions are pure MPI. In the SLURM script, 64 cores are requested from 4 nodes. So it's 64 MPI tasks and not necessarily evenly distributed across all the nodes. (each node is equipped with 64 cores.) I can reproduce the performance issue using the LAMMPS