binding is somehow involved in this, and i do not believe vader nor
openib are involved here.
Could you please run again with the two ompi versions but in the *same*
job ?
and before invoking mpirun, could you do
env | grep SLURM
per your slurm request, you are running 64 tasks on 4 nodes.
wi
I think this is fixed in the 1.10 series. We will not be making any more
updates to the 1.8 series so you will need to update to 1.10 to get the
fix.
-Nathan
On Wed, Dec 16, 2015 at 02:48:45PM -0500, Udayanga Wickramasinghe wrote:
>Hi all,
>I have a custom MPI_Op function which I use wit
Those jobs were launched with mpirun. Please see the attached files for the
binding report with OMPI_MCA_hwloc_base_report_bindings=1.
Here is a snapshot for v-1.10.1:
[c2613.tusker.hcc.unl.edu:12049] MCW rank 0 is not bound (or bound to all
available processors)
[c2613.tusker.hcc.unl.edu:1204
When I see such issues, I immediately start to think about binding patterns.
How are these jobs being launched - with mpirun or srun? What do you see if you
set OMPI_MCA_hwloc_base_report_bindings=1 in your environment?
> On Dec 16, 2015, at 11:15 AM, Jingchao Zhang wrote:
>
> Hi Gilles,
>
>
Hi all,
I have a custom MPI_Op function which I use within a non blocking version
of all_reduce(). When executing the mpi program I am seeing a segfault
thrown from libNBC. It seems like this is a known issue in openMPI atleast
[1]. Is this somehow fixed in a later release version of openmpi ? I am
Hi Gilles,
The LAMMPS jobs for both versions are pure MPI. In the SLURM script, 64 cores
are requested from 4 nodes. So it's 64 MPI tasks and not necessarily evenly
distributed across all the nodes. (each node is equipped with 64 cores.)
I can reproduce the performance issue using the LAMMPS