Thanks for all this suggestions. I'll try to create a small test
reproducing this behavior ans try the different parameters.
I do not use MPI I/O directly but parallel hdf5 which rely on MPI I/O .
NFS is the easiest way to share storage between nodes on a small
cluster. Beegfs or lustre require biggest (additional) architectures.

Patrick

Le 03/12/2020 à 15:38, Gabriel, Edgar via users a écrit :
> the reason for potential performance issues on NFS are very different from 
> Lustre. Basically, depending on your use-case and the NFS configuration, you 
> have to enforce different locking policy to ensure correct output files. The 
> default value for chosen for ompio is the most conservative setting, since 
> this was the only setting that we found that would result in a correct output 
> file for all of our tests.  You can change settings to see whether other 
> options would work you.
>
> The parameter that you need to work with is fs_ufs_lock_algorithm. Setting it 
> to 1 will completely disable it (and most likely lead to the best 
> performance), setting it to 3 is a middle ground (lock specific ranges) and 
> similar to what ROMIO does. So e.g.
>
> mpiexec -n 16 --mca fs_ufs_lock_algorihtm 1 ./mytests
>
> That being said, if you google NFS + MPI I/O, you will find a  ton of 
> document and reasons for potential problems, so using MPI I/O on top of NFS 
> (whether OMPIO or ROMIO) is always at your own risk.
> Thanks
>
> Edgar
>
> -----Original Message-----
> From: users <users-boun...@lists.open-mpi.org> On Behalf Of Gilles 
> Gouaillardet via users
> Sent: Thursday, December 3, 2020 4:46 AM
> To: Open MPI Users <users@lists.open-mpi.org>
> Cc: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
> Subject: Re: [OMPI users] Parallel HDF5 low performance
>
> Patrick,
>
> glad to hear you will upgrade Open MPI thanks to this workaround!
>
> ompio has known performance issues on Lustre (this is why ROMIO is still the 
> default on this filesystem) but I do not remember such performance issues 
> have been reported on a NFS filesystem.
>
> Sharing a reproducer will be very much appreciated in order to improve ompio
>
> Cheers,
>
> Gilles
>
> On Thu, Dec 3, 2020 at 6:05 PM Patrick Bégou via users 
> <users@lists.open-mpi.org> wrote:
>> Thanks Gilles,
>>
>> this is the solution.
>> I will set OMPI_MCA_io=^ompio automatically when loading the parallel
>> hdf5 module on the cluster.
>>
>> I was tracking this problem for several weeks but not looking in the 
>> right direction (testing NFS server I/O, network bandwidth.....)
>>
>> I think we will now move definitively to modern OpenMPI implementations.
>>
>> Patrick
>>
>> Le 03/12/2020 à 09:06, Gilles Gouaillardet via users a écrit :
>>> Patrick,
>>>
>>>
>>> In recent Open MPI releases, the default component for MPI-IO is 
>>> ompio (and no more romio)
>>>
>>> unless the file is on a Lustre filesystem.
>>>
>>>
>>> You can force romio with
>>>
>>> mpirun --mca io ^ompio ...
>>>
>>>
>>> Cheers,
>>>
>>>
>>> Gilles
>>>
>>> On 12/3/2020 4:20 PM, Patrick Bégou via users wrote:
>>>> Hi,
>>>>
>>>> I'm using an old (but required by the codes) version of hdf5 
>>>> (1.8.12) in parallel mode in 2 fortran applications. It relies on 
>>>> MPI/IO. The storage is NFS mounted on the nodes of a small cluster.
>>>>
>>>> With OpenMPI 1.7 it runs fine but using modern OpenMPI 3.1 or 4.0.5 
>>>> the I/Os are 10x to 100x slower. Are there fundamentals changes in 
>>>> MPI/IO for these new releases of OpenMPI and a solution to get back 
>>>> to the IO performances with this parallel HDF5 release ?
>>>>
>>>> Thanks for your advices
>>>>
>>>> Patrick
>>>>

Reply via email to