Thanks for all this suggestions. I'll try to create a small test reproducing this behavior ans try the different parameters. I do not use MPI I/O directly but parallel hdf5 which rely on MPI I/O . NFS is the easiest way to share storage between nodes on a small cluster. Beegfs or lustre require biggest (additional) architectures.
Patrick Le 03/12/2020 à 15:38, Gabriel, Edgar via users a écrit : > the reason for potential performance issues on NFS are very different from > Lustre. Basically, depending on your use-case and the NFS configuration, you > have to enforce different locking policy to ensure correct output files. The > default value for chosen for ompio is the most conservative setting, since > this was the only setting that we found that would result in a correct output > file for all of our tests. You can change settings to see whether other > options would work you. > > The parameter that you need to work with is fs_ufs_lock_algorithm. Setting it > to 1 will completely disable it (and most likely lead to the best > performance), setting it to 3 is a middle ground (lock specific ranges) and > similar to what ROMIO does. So e.g. > > mpiexec -n 16 --mca fs_ufs_lock_algorihtm 1 ./mytests > > That being said, if you google NFS + MPI I/O, you will find a ton of > document and reasons for potential problems, so using MPI I/O on top of NFS > (whether OMPIO or ROMIO) is always at your own risk. > Thanks > > Edgar > > -----Original Message----- > From: users <users-boun...@lists.open-mpi.org> On Behalf Of Gilles > Gouaillardet via users > Sent: Thursday, December 3, 2020 4:46 AM > To: Open MPI Users <users@lists.open-mpi.org> > Cc: Gilles Gouaillardet <gilles.gouaillar...@gmail.com> > Subject: Re: [OMPI users] Parallel HDF5 low performance > > Patrick, > > glad to hear you will upgrade Open MPI thanks to this workaround! > > ompio has known performance issues on Lustre (this is why ROMIO is still the > default on this filesystem) but I do not remember such performance issues > have been reported on a NFS filesystem. > > Sharing a reproducer will be very much appreciated in order to improve ompio > > Cheers, > > Gilles > > On Thu, Dec 3, 2020 at 6:05 PM Patrick Bégou via users > <users@lists.open-mpi.org> wrote: >> Thanks Gilles, >> >> this is the solution. >> I will set OMPI_MCA_io=^ompio automatically when loading the parallel >> hdf5 module on the cluster. >> >> I was tracking this problem for several weeks but not looking in the >> right direction (testing NFS server I/O, network bandwidth.....) >> >> I think we will now move definitively to modern OpenMPI implementations. >> >> Patrick >> >> Le 03/12/2020 à 09:06, Gilles Gouaillardet via users a écrit : >>> Patrick, >>> >>> >>> In recent Open MPI releases, the default component for MPI-IO is >>> ompio (and no more romio) >>> >>> unless the file is on a Lustre filesystem. >>> >>> >>> You can force romio with >>> >>> mpirun --mca io ^ompio ... >>> >>> >>> Cheers, >>> >>> >>> Gilles >>> >>> On 12/3/2020 4:20 PM, Patrick Bégou via users wrote: >>>> Hi, >>>> >>>> I'm using an old (but required by the codes) version of hdf5 >>>> (1.8.12) in parallel mode in 2 fortran applications. It relies on >>>> MPI/IO. The storage is NFS mounted on the nodes of a small cluster. >>>> >>>> With OpenMPI 1.7 it runs fine but using modern OpenMPI 3.1 or 4.0.5 >>>> the I/Os are 10x to 100x slower. Are there fundamentals changes in >>>> MPI/IO for these new releases of OpenMPI and a solution to get back >>>> to the IO performances with this parallel HDF5 release ? >>>> >>>> Thanks for your advices >>>> >>>> Patrick >>>>