Sorry for the Late reply .....Caught COVID and the flu at the same time. Just 
now getting back on my feet.

The File system is Lustre mounted across all the nodes and shared.
Underlying Lustre is ZFS.


Is this a MPI issue or a ior issue ?

thanks
ben

________________________________________
From: Gabriel, Edgar <egabr...@central.uh.edu>
Sent: Tuesday, November 2, 2021 5:16 PM
To: Open MPI Users
Cc: bend linux4ms.net
Subject: RE: mca_sharedfp_lockfile issues

What file system are you running your code on ? And is the same directory 
shared across all nodes? I have seen this error if users try to use a 
non-shared directory for MPI I/O operations ( e.g. /tmp which is a different 
drive/folder on each node).

Thanks
Edgar

-----Original Message-----
From: users <users-boun...@lists.open-mpi.org> On Behalf Of bend linux4ms.net 
via users
Sent: Tuesday, November 2, 2021 3:33 PM
To: Open MPI Open MPI <users@lists.open-mpi.org>
Cc: bend linux4ms.net <b...@linux4ms.net>
Subject: [OMPI users] mca_sharedfp_lockfile issues

Ok, I got more issues. Maybe someone on the list can help me:

Open MPI version: 4.1.1 download from github source Compile on Centos 8.4  
using GCC 8.4.1 Configured is:

./configure --enable-shared --enable-static \
   --without-tm \
   --enable-mpi-cxx \
   --enable-wrapper-runpath \
   --enable-mpirun-prefix-by-default \
   --enable-mpi-thread-multiple \
   --enable-mpi-fortran=yes \
   --prefix=/p/app/compilers/mpi/openmpi/4.1.1 2>&1 \  | tee config.log

Intel HPC system, 850 nodes trying to launch IOR benchmark.

Top portion of the mpi command:
-------------------------------------------------------------------------------------

export OMPI_MCA_btl_openib_allow_ib=1
export OMPI_MCA_btl_openib_if_include="mlx5_0:1"

mpirun -machinefile ${hostlist} \
   --mca opal_common_ucx_opal_mem_hooks 1 \
    \
   -np ${NP} \
   --map-by node \
   -N ${rpn} \
   -vv \
---------------------------------------------------------------------------------------------------------
I am getting the message <node name:pid> [##] mca_sharedfp_lockedfile_file_open 
: Error during file open on all the nodes.

I've tried it with the --mca sharedfp lockedfile and without, I still get the 
errors.

What Have I done wrong ?

Thanks ..

Ben Duncan -



Reply via email to