Re: [UPDATE] devel/openmpi 4.0.1 -> 4.0.2
Hello Martin, It is odd that I already had "PMIX_MCA_gds=hash" and still got the problem on a beowolf with multiple boxes connected by ethernet. Since the lack of appropriate "#ifdefs" seems like an oversight on the part of the openmpi developers, I think it would be appropriate to push it upstream. Are you prepared to do that? I could try, but it would take me a while to educate myself on this. Best, Dave On 12/19/19, Martin Reindl wrote: > [moved to ports@] > > On Tue, Dec 17, 2019 at 04:16:25PM -0700, Raymond, David wrote: >> Martin, >> >> I have been using openmpi 4.0.2 on my computer system and I found a >> bug that is provoked by running a job (a Go program interfaced to the >> Clang MPI package) on multiple machines connected by ethernet. This >> crashes the program with the following output: > [...] >> >> I traced this to the fact that OpenBSD's version of pthreads doesn't >> have "pthread_mutexattr_setpshared". It turns out that the >> configuration file undefines a flag if this is so, but the actual code >> doesn't pay any attention to this. I fixed the problem by putting >> appropriate ifdefs around the code generating the error, which itself >> is simple error checking code. This seems to work. I have attached >> two patches for the 4.0.2 source. > > Hello Dave, > > Thanks for your input, I've updated the 4.0.2 diff. > > We already were aware of the problem with 4.0.1 back in June and worked > around the problem by setting PMIX_MCA_gds=hash before execution to avoid > GDS/ds21 and GDS/12. > > Your diff is of course a much better way, do you want to try to push it > upstream? > > -m > > Index: Makefile > === > RCS file: /cvs/ports/devel/openmpi/Makefile,v > retrieving revision 1.28 > diff -u -p -u -p -r1.28 Makefile > --- Makefile 28 Jun 2019 11:05:11 - 1.28 > +++ Makefile 19 Dec 2019 07:18:30 - > @@ -2,9 +2,8 @@ > > COMMENT =open source MPI-3.1 implementation > > -V = 4.0.1 > +V = 4.0.2 > DISTNAME = openmpi-$V > -REVISION = 0 > > SHARED_LIBS += mca_common_dstore 0.0 # 1.0 > SHARED_LIBS += mca_common_monitoring 0.0 # 60.0 > Index: distinfo > === > RCS file: /cvs/ports/devel/openmpi/distinfo,v > retrieving revision 1.4 > diff -u -p -u -p -r1.4 distinfo > --- distinfo 27 Jun 2019 13:52:00 - 1.4 > +++ distinfo 19 Dec 2019 07:18:30 - > @@ -1,2 +1,2 @@ > -SHA256 (openmpi-4.0.1.tar.gz) = > 5V4hP+CaIUq58scirP2L97ObvBgA5LekZNON8V5wf1k= > -SIZE (openmpi-4.0.1.tar.gz) = 17513706 > +SHA256 (openmpi-4.0.2.tar.gz) = > ZigFhw6GoUceWXObDDTG+QBODHoi2waFYtU4jsRCGQQ= > +SIZE (openmpi-4.0.2.tar.gz) = 17373487 > Index: > patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds12_gds_ds12_lock_pthread_c > === > RCS file: > patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds12_gds_ds12_lock_pthread_c > diff -N > patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds12_gds_ds12_lock_pthread_c > --- /dev/null 1 Jan 1970 00:00:00 - > +++ > patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds12_gds_ds12_lock_pthread_c > 19 > Dec 2019 07:18:30 - > @@ -0,0 +1,20 @@ > +$OpenBSD$ > + > +Index: opal/mca/pmix/pmix3x/pmix/src/mca/gds/ds12/gds_ds12_lock_pthread.c > +--- > opal/mca/pmix/pmix3x/pmix/src/mca/gds/ds12/gds_ds12_lock_pthread.c.orig > opal/mca/pmix/pmix3x/pmix/src/mca/gds/ds12/gds_ds12_lock_pthread.c > +@@ -132,12 +132,14 @@ pmix_status_t > pmix_gds_ds12_lock_init(pmix_common_dsto > + PMIX_ERROR_LOG(rc); > + goto error; > + } > ++#ifdef HAVE_PTHREAD_SHARED > + if (0 != pthread_rwlockattr_setpshared(, > PTHREAD_PROCESS_SHARED)) { > + pthread_rwlockattr_destroy(); > + rc = PMIX_ERR_INIT; > + PMIX_ERROR_LOG(rc); > + goto error; > + } > ++#endif > + #ifdef HAVE_PTHREAD_SETKIND > + if (0 != pthread_rwlockattr_setkind_np(, > + > PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP)) { > Index: > patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds21_gds_ds21_lock_pthread_c > === > RCS file: > patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds21_gds_ds21_lock_pthread_c > diff -N > patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds21_gds_ds21_lock_pthread_c > --- /dev/null 1 Jan 1970 00:00:00 - > +++ > patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds21_gds_ds21_lock_pthread_c > 19 > Dec 2019 07:18:30 - > @@ -0,0 +1,21 @@ > +$OpenBSD$ > + > +Index: opal/mca/pmix/pmix3x/pmix/src/mca/gds/ds21/gds_ds21_lock_pthread.c > +--- > opal/mca/pmix/pmix3x/pmix/src/mca/gds/ds21/gds_ds21_lock_pthread.c.orig > opal/mca/pmix/pmix3x/pmix/src/mca/gds/ds21/gds_ds21_lock_pthread.c > +@@ -182,12 +182,15 @@ pmix_status_t >
Re: [UPDATE] devel/openmpi 4.0.1 -> 4.0.2
[moved to ports@] On Tue, Dec 17, 2019 at 04:16:25PM -0700, Raymond, David wrote: > Martin, > > I have been using openmpi 4.0.2 on my computer system and I found a > bug that is provoked by running a job (a Go program interfaced to the > Clang MPI package) on multiple machines connected by ethernet. This > crashes the program with the following output: [...] > > I traced this to the fact that OpenBSD's version of pthreads doesn't > have "pthread_mutexattr_setpshared". It turns out that the > configuration file undefines a flag if this is so, but the actual code > doesn't pay any attention to this. I fixed the problem by putting > appropriate ifdefs around the code generating the error, which itself > is simple error checking code. This seems to work. I have attached > two patches for the 4.0.2 source. Hello Dave, Thanks for your input, I've updated the 4.0.2 diff. We already were aware of the problem with 4.0.1 back in June and worked around the problem by setting PMIX_MCA_gds=hash before execution to avoid GDS/ds21 and GDS/12. Your diff is of course a much better way, do you want to try to push it upstream? -m Index: Makefile === RCS file: /cvs/ports/devel/openmpi/Makefile,v retrieving revision 1.28 diff -u -p -u -p -r1.28 Makefile --- Makefile28 Jun 2019 11:05:11 - 1.28 +++ Makefile19 Dec 2019 07:18:30 - @@ -2,9 +2,8 @@ COMMENT = open source MPI-3.1 implementation -V =4.0.1 +V =4.0.2 DISTNAME = openmpi-$V -REVISION = 0 SHARED_LIBS += mca_common_dstore 0.0 # 1.0 SHARED_LIBS += mca_common_monitoring 0.0 # 60.0 Index: distinfo === RCS file: /cvs/ports/devel/openmpi/distinfo,v retrieving revision 1.4 diff -u -p -u -p -r1.4 distinfo --- distinfo27 Jun 2019 13:52:00 - 1.4 +++ distinfo19 Dec 2019 07:18:30 - @@ -1,2 +1,2 @@ -SHA256 (openmpi-4.0.1.tar.gz) = 5V4hP+CaIUq58scirP2L97ObvBgA5LekZNON8V5wf1k= -SIZE (openmpi-4.0.1.tar.gz) = 17513706 +SHA256 (openmpi-4.0.2.tar.gz) = ZigFhw6GoUceWXObDDTG+QBODHoi2waFYtU4jsRCGQQ= +SIZE (openmpi-4.0.2.tar.gz) = 17373487 Index: patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds12_gds_ds12_lock_pthread_c === RCS file: patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds12_gds_ds12_lock_pthread_c diff -N patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds12_gds_ds12_lock_pthread_c --- /dev/null 1 Jan 1970 00:00:00 - +++ patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds12_gds_ds12_lock_pthread_c 19 Dec 2019 07:18:30 - @@ -0,0 +1,20 @@ +$OpenBSD$ + +Index: opal/mca/pmix/pmix3x/pmix/src/mca/gds/ds12/gds_ds12_lock_pthread.c +--- opal/mca/pmix/pmix3x/pmix/src/mca/gds/ds12/gds_ds12_lock_pthread.c.orig opal/mca/pmix/pmix3x/pmix/src/mca/gds/ds12/gds_ds12_lock_pthread.c +@@ -132,12 +132,14 @@ pmix_status_t pmix_gds_ds12_lock_init(pmix_common_dsto + PMIX_ERROR_LOG(rc); + goto error; + } ++#ifdef HAVE_PTHREAD_SHARED + if (0 != pthread_rwlockattr_setpshared(, PTHREAD_PROCESS_SHARED)) { + pthread_rwlockattr_destroy(); + rc = PMIX_ERR_INIT; + PMIX_ERROR_LOG(rc); + goto error; + } ++#endif + #ifdef HAVE_PTHREAD_SETKIND + if (0 != pthread_rwlockattr_setkind_np(, + PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP)) { Index: patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds21_gds_ds21_lock_pthread_c === RCS file: patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds21_gds_ds21_lock_pthread_c diff -N patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds21_gds_ds21_lock_pthread_c --- /dev/null 1 Jan 1970 00:00:00 - +++ patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds21_gds_ds21_lock_pthread_c 19 Dec 2019 07:18:30 - @@ -0,0 +1,21 @@ +$OpenBSD$ + +Index: opal/mca/pmix/pmix3x/pmix/src/mca/gds/ds21/gds_ds21_lock_pthread.c +--- opal/mca/pmix/pmix3x/pmix/src/mca/gds/ds21/gds_ds21_lock_pthread.c.orig opal/mca/pmix/pmix3x/pmix/src/mca/gds/ds21/gds_ds21_lock_pthread.c +@@ -182,12 +182,15 @@ pmix_status_t pmix_gds_ds21_lock_init(pmix_common_dsto + PMIX_ERROR_LOG(rc); + goto error; + } ++ ++#ifdef HAVE_PTHREAD_MUTEXATTR_SETPSHARED + if (0 != pthread_mutexattr_setpshared(, PTHREAD_PROCESS_SHARED)) { + pthread_mutexattr_destroy(); + rc = PMIX_ERR_INIT; + PMIX_ERROR_LOG(rc); + goto error; + } ++#endif + + segment_hdr_t *seg_hdr = (segment_hdr_t*)lock_item->seg_desc->seg_info.seg_base_addr; + seg_hdr->num_locks = local_size; Index: pkg/PLIST
Re: [UPDATE] devel/openmpi 4.0.1 -> 4.0.2
Am 31.10.19 um 15:59 schrieb Martin Reindl: > Hello, > > The attached diff updates devel/openmpi to the current stable release 4.0.2. > Changelog can be found here: > https://raw.githubusercontent.com/open-mpi/ompi/v4.0.x/NEWS > Tested on amd64 and arm64. > > Most noteably this update fixes problems reported by David Raymond off-list. Ping.
[UPDATE] devel/openmpi 4.0.1 -> 4.0.2
Hello, The attached diff updates devel/openmpi to the current stable release 4.0.2. Changelog can be found here: https://raw.githubusercontent.com/open-mpi/ompi/v4.0.x/NEWS Tested on amd64 and arm64. Most noteably this update fixes problems reported by David Raymond off-list. -m Index: Makefile === RCS file: /cvs/ports/devel/openmpi/Makefile,v retrieving revision 1.28 diff -u -p -u -p -r1.28 Makefile --- Makefile28 Jun 2019 11:05:11 - 1.28 +++ Makefile30 Oct 2019 21:44:23 - @@ -2,9 +2,9 @@ COMMENT = open source MPI-3.1 implementation -V =4.0.1 +V =4.0.2 DISTNAME = openmpi-$V -REVISION = 0 +#REVISION =0 SHARED_LIBS += mca_common_dstore 0.0 # 1.0 SHARED_LIBS += mca_common_monitoring 0.0 # 60.0 Index: distinfo === RCS file: /cvs/ports/devel/openmpi/distinfo,v retrieving revision 1.4 diff -u -p -u -p -r1.4 distinfo --- distinfo27 Jun 2019 13:52:00 - 1.4 +++ distinfo30 Oct 2019 21:44:23 - @@ -1,2 +1,2 @@ -SHA256 (openmpi-4.0.1.tar.gz) = 5V4hP+CaIUq58scirP2L97ObvBgA5LekZNON8V5wf1k= -SIZE (openmpi-4.0.1.tar.gz) = 17513706 +SHA256 (openmpi-4.0.2.tar.gz) = ZigFhw6GoUceWXObDDTG+QBODHoi2waFYtU4jsRCGQQ= +SIZE (openmpi-4.0.2.tar.gz) = 17373487 Index: pkg/PLIST === RCS file: /cvs/ports/devel/openmpi/pkg/PLIST,v retrieving revision 1.5 diff -u -p -u -p -r1.5 PLIST --- pkg/PLIST 27 Jun 2019 13:52:00 - 1.5 +++ pkg/PLIST 30 Oct 2019 21:44:23 - @@ -143,15 +143,6 @@ lib/openmpi/mca_compress_gzip.so lib/openmpi/mca_crs_none.a lib/openmpi/mca_crs_none.la lib/openmpi/mca_crs_none.so -lib/openmpi/mca_dfs_app.a -lib/openmpi/mca_dfs_app.la -lib/openmpi/mca_dfs_app.so -lib/openmpi/mca_dfs_orted.a -lib/openmpi/mca_dfs_orted.la -lib/openmpi/mca_dfs_orted.so -lib/openmpi/mca_dfs_test.a -lib/openmpi/mca_dfs_test.la -lib/openmpi/mca_dfs_test.so lib/openmpi/mca_errmgr_default_app.a lib/openmpi/mca_errmgr_default_app.la lib/openmpi/mca_errmgr_default_app.so @@ -221,9 +212,6 @@ lib/openmpi/mca_iof_tool.so lib/openmpi/mca_mpool_hugepage.a lib/openmpi/mca_mpool_hugepage.la lib/openmpi/mca_mpool_hugepage.so -lib/openmpi/mca_notifier_syslog.a -lib/openmpi/mca_notifier_syslog.la -lib/openmpi/mca_notifier_syslog.so lib/openmpi/mca_odls_default.a lib/openmpi/mca_odls_default.la lib/openmpi/mca_odls_default.so @@ -288,6 +276,9 @@ lib/openmpi/mca_reachable_weighted.so lib/openmpi/mca_regx_fwd.a lib/openmpi/mca_regx_fwd.la lib/openmpi/mca_regx_fwd.so +lib/openmpi/mca_regx_naive.a +lib/openmpi/mca_regx_naive.la +lib/openmpi/mca_regx_naive.so lib/openmpi/mca_regx_reverse.a lib/openmpi/mca_regx_reverse.la lib/openmpi/mca_regx_reverse.so @@ -315,9 +306,6 @@ lib/openmpi/mca_rml_oob.so lib/openmpi/mca_routed_binomial.a lib/openmpi/mca_routed_binomial.la lib/openmpi/mca_routed_binomial.so -lib/openmpi/mca_routed_debruijn.a -lib/openmpi/mca_routed_debruijn.la -lib/openmpi/mca_routed_debruijn.so lib/openmpi/mca_routed_direct.a lib/openmpi/mca_routed_direct.la lib/openmpi/mca_routed_direct.so