Sorry for the Late reply .....Caught COVID and the flu at the same time. Just now getting back on my feet.
The File system is Lustre mounted across all the nodes and shared. Underlying Lustre is ZFS. Is this a MPI issue or a ior issue ? thanks ben ________________________________________ From: Gabriel, Edgar <egabr...@central.uh.edu> Sent: Tuesday, November 2, 2021 5:16 PM To: Open MPI Users Cc: bend linux4ms.net Subject: RE: mca_sharedfp_lockfile issues What file system are you running your code on ? And is the same directory shared across all nodes? I have seen this error if users try to use a non-shared directory for MPI I/O operations ( e.g. /tmp which is a different drive/folder on each node). Thanks Edgar -----Original Message----- From: users <users-boun...@lists.open-mpi.org> On Behalf Of bend linux4ms.net via users Sent: Tuesday, November 2, 2021 3:33 PM To: Open MPI Open MPI <users@lists.open-mpi.org> Cc: bend linux4ms.net <b...@linux4ms.net> Subject: [OMPI users] mca_sharedfp_lockfile issues Ok, I got more issues. Maybe someone on the list can help me: Open MPI version: 4.1.1 download from github source Compile on Centos 8.4 using GCC 8.4.1 Configured is: ./configure --enable-shared --enable-static \ --without-tm \ --enable-mpi-cxx \ --enable-wrapper-runpath \ --enable-mpirun-prefix-by-default \ --enable-mpi-thread-multiple \ --enable-mpi-fortran=yes \ --prefix=/p/app/compilers/mpi/openmpi/4.1.1 2>&1 \ | tee config.log Intel HPC system, 850 nodes trying to launch IOR benchmark. Top portion of the mpi command: ------------------------------------------------------------------------------------- export OMPI_MCA_btl_openib_allow_ib=1 export OMPI_MCA_btl_openib_if_include="mlx5_0:1" mpirun -machinefile ${hostlist} \ --mca opal_common_ucx_opal_mem_hooks 1 \ \ -np ${NP} \ --map-by node \ -N ${rpn} \ -vv \ --------------------------------------------------------------------------------------------------------- I am getting the message <node name:pid> [##] mca_sharedfp_lockedfile_file_open : Error during file open on all the nodes. I've tried it with the --mca sharedfp lockedfile and without, I still get the errors. What Have I done wrong ? Thanks .. Ben Duncan -