Re: [OMPI users] file/process write speed is not scalable

2020-04-14 Thread Patrick Bégou via users
Hi David, could you specify which version of OpenMPI you are using ? I've also some parallel I/O trouble with one code but still have not investigated. Thanks Patrick Le 13/04/2020 à 17:11, Dong-In Kang via users a écrit : > >  Thank you for your suggestion. > I am more concerned about the poor

Re: [OMPI users] Can't start jobs with srun.

2020-04-26 Thread Patrick Bégou via users
I have also this problem on servers I'm benching at DELL's lab with OpenMPI-4.0.3. I've tried  a new build of OpenMPI with "--with-pmi2". No change. Finally my work around in the slurm script was to launch my code with mpirun. As mpirun was only finding one slot per nodes I have used

[OMPI users] opal_path_nfs freeze

2020-04-21 Thread Patrick Bégou via users
Hi OpenMPI maintainers, I have temporary access to servers with AMD Epyc processors running RHEL7. I'm trying to deploy OpenMPI with several setup but each time "make check" fails on *opal_path_nfs*. This test freeze for ever consuming no cpu resources. After nearly one hour I have killed the

Re: [OMPI users] opal_path_nfs freeze

2020-04-23 Thread Patrick Bégou via users
of saying: make sure that you have no other Open > MPI installation findable in your PATH / LD_LIBRARY_PATH and then try > running `make check` again. > > >> On Apr 21, 2020, at 2:37 PM, Patrick Bégou via users >> mailto:users@lists.open-mpi.org>> wrote: >> >> Hi Ope

[OMPI users] OpenMPI 4.3 without ucx

2020-05-10 Thread Patrick Bégou via users
Hi all, I've built OpenMPI 4.3.0 with GCC 9.3.0 but on the server ucx was not available when I set --with-ucx. I remove this option and it compiles fine without ucx. However I have a strange behavior as when using mpirun I must explicitely remove ucx to avoid an error: in my module file I have to

Re: [OMPI users] can't open /dev/ipath, network down (err=26)

2020-05-09 Thread Patrick Bégou via users
Le 08/05/2020 à 21:56, Prentice Bisbal via users a écrit : > > We often get the following errors when more than one job runs on the > same compute node. We are using Slurm with OpenMPI. The IB cards are > QLogic using PSM: > > 10698ipath_userinit: assign_context command failed: Network is down >

Re: [OMPI users] Can't start jobs with srun.

2020-05-07 Thread Patrick Bégou via users
so the problem was not critical for my futur. Patrick > > On Sun, 26 Apr 2020 at 18:09, Patrick Bégou via users > mailto:users@lists.open-mpi.org>> wrote: > > I have also this problem on servers I'm benching at DELL's lab with > OpenMPI-4.0.3. I've tr

Re: [OMPI users] can't open /dev/ipath, network down (err=26)

2020-05-09 Thread Patrick Bégou via users
-and-i-o/fabric-products/OFED_Host_Software_UserGuide_G91902_06.pdf#page72> > > TS is the same hardware as the old QLogic QDR HCAs so the manual might > be helpful to you in the future. > > Sent from my iPad > >> On May 9, 2020, at 9:52 AM, Patrick Bégou via users >> wro

Re: [OMPI users] mpirun only work for 1 processor

2020-06-04 Thread Patrick Bégou via users
Hi Ha Chi do you use a batch scheduler with Rocks Cluster or do you log on the node with ssh ? If ssh, can you check  that you can ssh from one node to the other without password ? Ping just says the network is alive, not that you can connect. Patrick Le 04/06/2020 à 09:06, Hà Chi Nguyễn Nhật

Re: [OMPI users] mpirun only work for 1 processor

2020-06-04 Thread Patrick Bégou via users
> the mpirun --version in all 3 nodes are identical, openmpi 2.1.1, and > same place when testing with "whereis mpirun" > So is there any problem with mpirun causing it to not launch to other > nodes? > > Regards > HaChi > > On Thu, 4 Jun 2020 at 14:35, Patric

Re: [OMPI users] MPI_type_free question

2020-12-07 Thread Patrick Bégou via users
but deeper in the code I think. Patrick Le 04/12/2020 à 19:20, George Bosilca a écrit : > On Fri, Dec 4, 2020 at 2:33 AM Patrick Bégou via users > mailto:users@lists.open-mpi.org>> wrote: > > Hi George and Gilles, > > Thanks George for your suggestion. Is it valua

Re: [OMPI users] MPI_type_free question

2020-12-07 Thread Patrick Bégou via users
refore be used to convert back into a >> valid datatype pointer, until OMPI completely releases the datatype. >> Look into the ompi_datatype_f_to_c_table table to see the datatypes >> that exist and get their pointers, and then use these pointers as >> arguments to ompi_datatype

[OMPI users] MPI_type_free question

2020-12-03 Thread Patrick Bégou via users
Hi, I'm trying to solve a memory leak since my new implementation of communications based on MPI_AllToAllW and MPI_type_Create_SubArray calls.  Arrays of SubArray types are created/destroyed at each time step and used for communications. On my laptop the code runs fine (running for 15000

Re: [OMPI users] Parallel HDF5 low performance

2020-12-03 Thread Patrick Bégou via users
; > Sharing a reproducer will be very much appreciated in order to improve ompio > > Cheers, > > Gilles > > On Thu, Dec 3, 2020 at 6:05 PM Patrick Bégou via users > wrote: >> Thanks Gilles, >> >> this is the solution. >> I will set OMPI_MCA_io=^ompio aut

Re: [OMPI users] MPI_type_free question

2020-12-03 Thread Patrick Bégou via users
ntil OMPI completely releases the datatype. >> Look into the ompi_datatype_f_to_c_table table to see the datatypes >> that exist and get their pointers, and then use these pointers as >> arguments to ompi_datatype_dump() to see if any of these existing >> datatypes are the ones you d

Re: [OMPI users] MPI_type_free question

2020-12-10 Thread Patrick Bégou via users
efore be used to convert back >>> into a valid datatype pointer, until OMPI completely releases the >>> datatype. Look into the ompi_datatype_f_to_c_table table to see the >>> datatypes that exist and get their pointers, and then use these >>> pointers as arguments to ompi_da

Re: [OMPI users] MPI_type_free question

2020-12-04 Thread Patrick Bégou via users
; are used? > > > mpirun --mca pml_base_verbose 10 --mca mtl_base_verbose 10 --mca > btl_base_verbose 10 ... > > will point you to the component(s) used. > > The output is pretty verbose, so feel free to compress and post it if > you cannot decipher it > > > Cheers, > >

[OMPI users] Parallel HDF5 low performance

2020-12-02 Thread Patrick Bégou via users
Hi, I'm using an old (but required by the codes) version of hdf5 (1.8.12) in parallel mode in 2 fortran applications. It relies on MPI/IO. The storage is NFS mounted on the nodes of a small cluster. With OpenMPI 1.7 it runs fine but using modern OpenMPI 3.1 or 4.0.5 the I/Os are 10x to 100x

Re: [OMPI users] Parallel HDF5 low performance

2020-12-03 Thread Patrick Bégou via users
. > > > You can force romio with > > mpirun --mca io ^ompio ... > > > Cheers, > > > Gilles > > On 12/3/2020 4:20 PM, Patrick Bégou via users wrote: >> Hi, >> >> I'm using an old (but required by the codes) version of hdf5 (1.8.12) in >>

Re: [OMPI users] MPI_type_free question

2020-12-14 Thread Patrick Bégou via users
a try? > > > Cheers, > > > Gilles > > > > On 12/7/2020 6:15 PM, Patrick Bégou via users wrote: >> Hi, >> >> I've written a small piece of code to show the problem. Based on my >> application but 2D and using integers arrays for testing. >

Re: [OMPI users] MPI_type_free question

2020-12-14 Thread Patrick Bégou via users
und. > At first glance, I did not spot any issue in the current code. > It turned out that the memory leak disappeared when doing things > differently > > Cheers, > > Gilles > > On Mon, Dec 14, 2020 at 7:11 PM Patrick Bégou via users > mailto:users@lists.open-mpi.org&

Re: [OMPI users] MPI_type_free question

2020-12-15 Thread Patrick Bégou via users
Issue #8290 reported. Thanks all for your help and the workaround provided. Patrick Le 14/12/2020 à 17:40, Jeff Squyres (jsquyres) a écrit : > Yes, opening an issue would be great -- thanks! > > >> On Dec 14, 2020, at 11:32 AM, Patrick Bégou via users >> mailto:users@lists