Re: [OMPI users] Segfault with OpenMPI 4 and dynamic window

2019-02-17 Thread Gilles Gouaillardet

Thanks Bart,


I opened https://github.com/open-mpi/ompi/issues/6394 to track this 
issue, and we should follow-up there from now.



FWIW, I added a more minimal example, and a possible fix.


Cheers,


Gilles

On 2/18/2019 12:43 AM, Bart Janssens wrote:
I just tried on master (commit 
91d05f91e28d3614d8b5da707df2505d8564ecd3), the same crash still 
happens there.
On 16 Feb 2019, 17:15 +0100, Open MPI Users 
, wrote:


Probably not. I think this is now fixed. Might be worth trying master 
to verify.


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-17 Thread Gabriel, Edgar
I will also run our testsuite and the HDF5 testsuite on GPFS, I have access to 
a GPFS file system since recently, and will report back on that, but it will 
take a few days.

Thanks
Edgar

> -Original Message-
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Ryan
> Novosielski
> Sent: Sunday, February 17, 2019 2:37 AM
> To: users@lists.open-mpi.org
> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
> 3.1.3
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> This is on GPFS. I'll try it on XFS to see if it makes any difference.
> 
> On 2/16/19 11:57 PM, Gilles Gouaillardet wrote:
> > Ryan,
> >
> > What filesystem are you running on ?
> >
> > Open MPI defaults to the ompio component, except on Lustre filesystem
> > where ROMIO is used. (if the issue is related to ROMIO, that can
> > explain why you did not see any difference, in that case, you might
> > want to try an other filesystem (local filesystem or NFS for example)\
> >
> >
> > Cheers,
> >
> > Gilles
> >
> > On Sun, Feb 17, 2019 at 3:08 AM Ryan Novosielski
> >  wrote:
> >>
> >> I verified that it makes it through to a bash prompt, but I’m a
> >> little less confident that something make test does doesn’t clear it.
> >> Any recommendation for a way to verify?
> >>
> >> In any case, no change, unfortunately.
> >>
> >> Sent from my iPhone
> >>
> >>> On Feb 16, 2019, at 08:13, Gabriel, Edgar 
> >>> wrote:
> >>>
> >>> What file system are you running on?
> >>>
> >>> I will look into this, but it might be later next week. I just
> >>> wanted to emphasize that we are regularly running the parallel
> >>> hdf5 tests with ompio, and I am not aware of any outstanding items
> >>> that do not work (and are supposed to work). That being said, I run
> >>> the tests manually, and not the 'make test'
> >>> commands. Will have to check which tests are being run by that.
> >>>
> >>> Edgar
> >>>
>  -Original Message- From: users
>  [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles
>  Gouaillardet Sent: Saturday, February 16, 2019 1:49 AM To: Open MPI
>  Users  Subject: Re:
>  [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
>  3.1.3
> 
>  Ryan,
> 
>  Can you
> 
>  export OMPI_MCA_io=^ompio
> 
>  and try again after you made sure this environment variable is
>  passed by srun to the MPI tasks ?
> 
>  We have identified and fixed several issues specific to the
>  (default) ompio component, so that could be a valid workaround
>  until the next release.
> 
>  Cheers,
> 
>  Gilles
> 
>  Ryan Novosielski  wrote:
> > Hi there,
> >
> > Honestly don’t know which piece of this puzzle to look at or how
> > to get more
>  information for troubleshooting. I successfully built HDF5
>  1.10.4 with RHEL system GCC 4.8.5 and OpenMPI 3.1.3. Running the
>  “make check” in HDF5 is failing at the below point; I am using a
>  value of RUNPARALLEL='srun -- mpi=pmi2 -p main -t
>  1:00:00 -n6 -N1’ and have a SLURM that’s otherwise properly
>  configured.
> >
> > Thanks for any help you can provide.
> >
> > make[4]: Entering directory
> > `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>  gcc-4.8-openmpi-3.1.3/testpar'
> >  Testing  t_mpi
> >  t_mpi  Test Log
> >  srun: job 84126610 queued and
> waiting
> > for resources srun: job 84126610 has been allocated resources
> > srun: error: slepner023: tasks 0-5: Alarm clock 0.01user
> > 0.00system 20:03.95elapsed 0%CPU (0avgtext+0avgdata
> > 5152maxresident)k 0inputs+0outputs (0major+1529minor)pagefaults
> > 0swaps make[4]: *** [t_mpi.chkexe_] Error 1 make[4]: Leaving
> > directory
> > `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>  gcc-4.8-openmpi-3.1.3/testpar'
> > make[3]: *** [build-check-p] Error 1 make[3]: Leaving directory
> > `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>  gcc-4.8-openmpi-3.1.3/testpar'
> > make[2]: *** [test] Error 2 make[2]: Leaving directory
> > `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>  gcc-4.8-openmpi-3.1.3/testpar'
> > make[1]: *** [check-am] Error 2 make[1]: Leaving directory
> > `/scratch/novosirj/install-files/hdf5-1.10.4-build-
>  gcc-4.8-openmpi-3.1.3/testpar'
> > make: *** [check-recursive] Error 1
> >
> > --  || \\UTGERS,
> > |---*O*---
> > ||_// the State | Ryan Novosielski -
> > novos...@rutgers.edu || \\ University | Sr. Technologist -
> > 973/972.0922 (2x0922) ~*~ RBHS Campus ||  \\of NJ |
> > Office of Advanced Research Computing - MSB C630, Newark `'
>  ___ users mailing
> list
>  users@lists.open-mpi.org
>  

Re: [OMPI users] Segfault with OpenMPI 4 and dynamic window

2019-02-17 Thread Bart Janssens
I just tried on master (commit 91d05f91e28d3614d8b5da707df2505d8564ecd3), the 
same crash still happens there.
On 16 Feb 2019, 17:15 +0100, Open MPI Users , wrote:
>
> Probably not. I think this is now fixed. Might be worth trying master to 
> verify.
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-17 Thread Ryan Novosielski
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

This is on GPFS. I'll try it on XFS to see if it makes any difference.

On 2/16/19 11:57 PM, Gilles Gouaillardet wrote:
> Ryan,
> 
> What filesystem are you running on ?
> 
> Open MPI defaults to the ompio component, except on Lustre
> filesystem where ROMIO is used. (if the issue is related to ROMIO,
> that can explain why you did not see any difference, in that case,
> you might want to try an other filesystem (local filesystem or NFS
> for example)\
> 
> 
> Cheers,
> 
> Gilles
> 
> On Sun, Feb 17, 2019 at 3:08 AM Ryan Novosielski
>  wrote:
>> 
>> I verified that it makes it through to a bash prompt, but I’m a
>> little less confident that something make test does doesn’t clear
>> it. Any recommendation for a way to verify?
>> 
>> In any case, no change, unfortunately.
>> 
>> Sent from my iPhone
>> 
>>> On Feb 16, 2019, at 08:13, Gabriel, Edgar
>>>  wrote:
>>> 
>>> What file system are you running on?
>>> 
>>> I will look into this, but it might be later next week. I just
>>> wanted to emphasize that we are regularly running the parallel
>>> hdf5 tests with ompio, and I am not aware of any outstanding
>>> items that do not work (and are supposed to work). That being
>>> said, I run the tests manually, and not the 'make test'
>>> commands. Will have to check which tests are being run by
>>> that.
>>> 
>>> Edgar
>>> 
 -Original Message- From: users
 [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
 Gilles Gouaillardet Sent: Saturday, February 16, 2019 1:49
 AM To: Open MPI Users  Subject: Re:
 [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 
 3.1.3
 
 Ryan,
 
 Can you
 
 export OMPI_MCA_io=^ompio
 
 and try again after you made sure this environment variable
 is passed by srun to the MPI tasks ?
 
 We have identified and fixed several issues specific to the
 (default) ompio component, so that could be a valid
 workaround until the next release.
 
 Cheers,
 
 Gilles
 
 Ryan Novosielski  wrote:
> Hi there,
> 
> Honestly don’t know which piece of this puzzle to look at
> or how to get more
 information for troubleshooting. I successfully built HDF5
 1.10.4 with RHEL system GCC 4.8.5 and OpenMPI 3.1.3. Running
 the “make check” in HDF5 is failing at the below point; I am
 using a value of RUNPARALLEL='srun -- mpi=pmi2 -p main -t
 1:00:00 -n6 -N1’ and have a SLURM that’s otherwise properly
 configured.
> 
> Thanks for any help you can provide.
> 
> make[4]: Entering directory
> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
 gcc-4.8-openmpi-3.1.3/testpar'
>  Testing  t_mpi 
>  t_mpi  Test Log 
>  srun: job 84126610 queued and
> waiting for resources srun: job 84126610 has been allocated
> resources srun: error: slepner023: tasks 0-5: Alarm clock
> 0.01user 0.00system 20:03.95elapsed 0%CPU
> (0avgtext+0avgdata 5152maxresident)k 0inputs+0outputs
> (0major+1529minor)pagefaults 0swaps make[4]: ***
> [t_mpi.chkexe_] Error 1 make[4]: Leaving directory
> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
 gcc-4.8-openmpi-3.1.3/testpar'
> make[3]: *** [build-check-p] Error 1 make[3]: Leaving
> directory
> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
 gcc-4.8-openmpi-3.1.3/testpar'
> make[2]: *** [test] Error 2 make[2]: Leaving directory
> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
 gcc-4.8-openmpi-3.1.3/testpar'
> make[1]: *** [check-am] Error 2 make[1]: Leaving directory
> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
 gcc-4.8-openmpi-3.1.3/testpar'
> make: *** [check-recursive] Error 1
> 
> --  || \\UTGERS,
> |---*O*--- 
> ||_// the State | Ryan Novosielski -
> novos...@rutgers.edu || \\ University | Sr. Technologist -
> 973/972.0922 (2x0922) ~*~ RBHS Campus ||  \\of NJ |
> Office of Advanced Research Computing - MSB C630, Newark 
> `'
 ___ users mailing
 list users@lists.open-mpi.org 
 https://lists.open-mpi.org/mailman/listinfo/users
>>> ___ users mailing
>>> list users@lists.open-mpi.org 
>>> https://lists.open-mpi.org/mailman/listinfo/users
>> ___ users mailing
>> list users@lists.open-mpi.org 
>> https://lists.open-mpi.org/mailman/listinfo/users
> ___ users mailing list 
> users@lists.open-mpi.org 
> https://lists.open-mpi.org/mailman/listinfo/users
> 

- -- 
 
 || \\UTGERS, |--*O*
 ||_// the State  |Ryan Novosielski - novos...@rutgers.edu
 ||