[OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-20 Thread Raymond Muno
We are doing a test build of a new cluster. We are re-using our Myrinet 10G gear from a previous cluster. I have built OpenMPI 1.4.2 with PGI 10.4. We use this regularly on our Infiniband based cluster and all the install elements were readily available. With a few go-arounds with the

Re: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-20 Thread Raymond Muno
On 10/20/2010 7:59 PM, Ralph Castain wrote: The error message seems to imply that mpirun itself didn't segfault, but that something else did. Is that segfault pid from mpirun? This kind of problem usually is caused by mismatched builds - i.e., you compile against your new build, but you pick

Re: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-20 Thread Raymond Muno
On 10/20/2010 8:30 PM, Scott Atchley wrote: On Oct 20, 2010, at 9:22 PM, Raymond Muno wrote: On 10/20/2010 7:59 PM, Ralph Castain wrote: The error message seems to imply that mpirun itself didn't segfault, but that something else did. Is that segfault pid from mpirun? This kind of problem

Re: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-20 Thread Raymond Muno
On 10/20/2010 8:30 PM, Scott Atchley wrote Are you building OMPI with support for both MX and IB? If not and you only want MX support, try configuring OMPI using --disable-memory-manager (check configure for the exact option). We have fixed this bug in the most recent 1.4.x and 1.5.x

Re: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-21 Thread Raymond Muno
On 10/20/2010 8:30 PM, Scott Atchley wrote: We have fixed this bug in the most recent 1.4.x and 1.5.x releases. Scott OK, a few more tests. I was using PGI 10.4 as the compiler. I have now tried OpenMPI 1.4.3 with PGI 10.8 and Intel 11.1. I get the same results in each case, mpirun seg

[OMPI users] Problem building OpenMPi with SunStudio compilers

2008-10-04 Thread Raymond Muno
We are implementing a new cluster that is InfiniBand based. I am working on getting OpenMPI built for our various compile environments. So far it is working for PGI 7.2 and PathScale 3.1. I found some workarounds for issues with the Pathscale compilers (seg faults) in the OpenMPI FAQ. When

Re: [OMPI users] Problem building OpenMPi with SunStudio compilers

2008-10-04 Thread Raymond Muno
Raymond Muno wrote: We are implementing a new cluster that is InfiniBand based. I am working on getting OpenMPI built for our various compile environments. So far it is working for PGI 7.2 and PathScale 3.1. I found some workarounds for issues with the Pathscale compilers (seg faults

Re: [OMPI users] Problem building OpenMPi with SunStudio compilers

2008-10-04 Thread Raymond Muno
Raymond Muno wrote: Raymond Muno wrote: We are implementing a new cluster that is InfiniBand based. I am working on getting OpenMPI built for our various compile environments. So far it is working for PGI 7.2 and PathScale 3.1. I found some workarounds for issues with the Pathscale

[OMPI users] Building OpenMPI with Lustre support using PGI fails

2018-11-13 Thread Raymond Muno
I am trying  to build OpenMPI with Lustre support using PGI 18.7 on CentOS 7.5 (1804). It builds successfully with Intel compilers, but fails to find the necessary  Lustre components with the PGI compiler. I have tried building  OpenMPI 4.0.0, 3.1.3 and 2.1.5.   I can build OpenMPI, but

Re: [OMPI users] UCX errors after upgrade

2019-10-02 Thread Raymond Muno via users
) wrote: Thanks Raymond; I have filed an issue for this on Github and tagged the relevant Mellanox people: https://github.com/open-mpi/ompi/issues/7009 On Sep 25, 2019, at 3:09 PM, Raymond Muno via users mailto:users@lists.open-mpi.org>> wrote: We are running against 4.0.2R

[OMPI users] Parameters at run time

2019-10-19 Thread Raymond Muno via users
Is there a way to determine, at run time, as to what choices OpenMPI made in terms of transports that are being utilized?  We want to verify we are running UCX over Infiniband. I have two users, executing the identical code, with the same mpirun options, getting vastly different execution

[OMPI users] UCX errors after upgrade

2019-09-25 Thread Raymond Muno via users
We are primarily using OpenMPI 3.1.4 but also have 4.0.1 installed. On our cluster, we were running CentOS 7.5 with updates, alongside MLNX_OFED 4.5.x.   OpenMPI was compiled with GCC, Intel, PGI and AOCC compilers. We could run with no issues. To accommodate updates needed to get our IB

Re: [OMPI users] UCX errors after upgrade

2019-09-25 Thread Raymond Muno via users
in there since v4.0.1. On Sep 25, 2019, at 2:12 PM, Raymond Muno via users mailto:users@lists.open-mpi.org>> wrote: We are primarily using OpenMPI 3.1.4 but also have 4.0.1 installed. On our cluster, we were running CentOS 7.5 with updates, alongside MLNX_OFED 4.5.x.   OpenMPI was co

Re: [OMPI users] UCX errors after upgrade

2019-09-25 Thread Raymond Muno via users
As a test, I rebooted a set of nodes. The user could run on 480 cores, on 5 nodes. We could not run beyond two nodes previous to that. We still get the VM_UNMAP warning, however. On 9/25/19 2:09 PM, Raymond Muno via users wrote: We are running against 4.0.2RC2 now. This is ussing current

[OMPI users] OpenMPI 4.0.2 with PGI 19.10, will not build with hcoll

2020-01-24 Thread Raymond Muno via users
I am having issues building OpenMPI 4.0.2 using the PGI 19.10 compilers.  OS is CentOS 7.7, MLNX_OFED 4.7.3 It dies at: PGC/x86-64 Linux 19.10-0: compilation completed with warnings   CCLD mca_coll_hcoll.la pgcc-Error-Unknown switch: -pthread make[2]: *** [mca_coll_hcoll.la] Error 1

Re: [OMPI users] [External] Re: AMD EPYC 7281: does NOT, support binding memory to the process location

2020-01-08 Thread Raymond Muno via users
We are running EPYC 7451 and 7702 nodes.  I do not recall that CentOS 6 was able to support these. We moved on to CentOS 7.6 at first and are now running 7.7 to support the EPYC2/Rome nodes. The kernel in earlier releases did not support x2APIC and could not handle 256 threads. Not and issue

Re: [OMPI users] [External] Re: AMD EPYC 7281: does NOT, support binding memory to the process location

2020-01-08 Thread Raymond Muno via users
AMD, list the minimum supported kernel for EPYC/NAPLES as RHEL/Centos kernel 3.10-862, which is RHEL/CentOS 7.5 or later. Upgraded kernels can be used in 7.4. http://developer.amd.com/wp-content/resources/56420.pdf -Ray Muno On 1/8/20 7:37 PM, Raymond Muno wrote: We are running EPYC 7451

Re: [OMPI users] [External] Re: AMD EPYC 7281: does NOT, support binding memory to the process location

2020-01-08 Thread Raymond Muno via users
We are running EPYC 7451 and 7702 nodes.  I do not recall that CentOS 6 was able to support these. We moved on to CentOS 7.6 at first and are now running 7.7 to support the EPYC2/Rome nodes. The kernel in earlier releases did not support x2APIC and could not handle 256 threads. Not and issue

Re: [OMPI users] OpenMPI 4.1.1, CentOS 7.9, nVidia HPC-SDk, build hints?

2021-09-30 Thread Raymond Muno via users
 Added -*-enable-mca-no-build=op-avx *to the configure line. Still dies in the same place. CCLD mca_op_avx.la ./.libs/liblocal_ops_avx512.a(liblocal_ops_avx512_la-op_avx_functions.o):(.data+0x0): multiple definition of `ompi_op_avx_functions_avx2'