Re: [OMPI users] Sockets half-broken in Open MPI 2.0.2?

2018-06-06 Thread Jeff Squyres (jsquyres) via users
Alexander -- I don't know offhand if 2.0.2 was faulty in this area. We usually ask users to upgrade to at least the latest release in a given series (e.g., 2.0.4) because various bug fixes are included in each sub-release. It wouldn't be much use to go through all the effort to make a proper

Re: [OMPI users] Fwd: OpenMPI 3.1.0 on aarch64

2018-06-08 Thread Jeff Squyres (jsquyres) via users
Hmm. I'm confused -- can we clarify? I just tried configuring Open MPI v3.1.0 on a RHEL 7.4 system with the RHEL hwloc RPM installed, but *not* the hwloc-devel RPM. Hence, no hwloc.h (for example). When specifying an external hwloc, configure did fail, as expected: - $ ./configure

Re: [OMPI users] Fwd: OpenMPI 3.1.0 on aarch64

2018-06-08 Thread Jeff Squyres (jsquyres) via users
On Jun 8, 2018, at 11:38 AM, Bennet Fauber wrote: > > Hmm. Maybe I had insufficient error checking in our installation process. > > Can you make and make install after the configure fails? I somehow got an > installation, despite the configure status, perhaps? If it's a fresh tarball

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Jeff Squyres (jsquyres) via users
Charles -- It may have gotten lost in the middle of this thread, but the vendor-recommended way of running on InfiniBand these days is with UCX. I.e., install OpenUCX and use one of the UCX transports in Open MPI. Unless you have special requirements, you should likely give this a try and

Re: [OMPI users] A couple of general questions

2018-06-15 Thread Jeff Squyres (jsquyres) via users
> I will help with the OFI part. > > Thanks, > _MAC > > -Original Message- > From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Jeff > Squyres (jsquyres) via users > Sent: Thursday, June 14, 2018 12:50 PM > To: Open MPI User's List > C

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Jeff Squyres (jsquyres) via users
eeling more than a little ignorant these days. :) > > Thanks to all for the responses. It has been a huge help. > > Charlie > >> On Jun 14, 2018, at 1:18 PM, Jeff Squyres (jsquyres) via users >> wrote: >> >> Charles -- >> >> It may have gott

Re: [OMPI users] error building openmpi-master-201806060243-64a5baa on Linux with Sun C

2018-06-06 Thread Jeff Squyres (jsquyres) via users
Siegmar -- I asked some Fortran gurus, and they don't think that there is any restriction on having ASYNCHRONOUS and INTENT on the same line. Indeed, Open MPI's definition of MPI_ACCUMULATE seems to agree with what is in MPI-3.1. Is this a new version of a Fortran compiler that you're using,

Re: [OMPI users] Open MPI: undefined reference to pthread_atfork

2018-06-22 Thread Jeff Squyres (jsquyres) via users
You already asked this question on the devel list, and I've asked you for more information. Please don't just re-post your question over here on the user list and expect to get a different answer. Thanks! > On Jun 22, 2018, at 4:55 PM, lille stor wrote: > > Hi, > > > When compiling a

Re: [OMPI users] Disable network interface selection

2018-06-22 Thread Jeff Squyres (jsquyres) via users
On Jun 22, 2018, at 7:36 PM, carlos aguni wrote: > > I'm trying to run a code on 2 machines that has at least 2 network interfaces > in it. > So I have them as described below: > > compute01 > compute02 > ens3 > 192.168.100.104/24 > 10.0.0.227/24 > ens8 > 10.0.0.228/24 > 172.21.1.128/24 > ens9

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 3.1.0 Lock Up on POWER9 w/ CUDA9.2

2018-07-02 Thread Jeff Squyres (jsquyres) via users
Simon -- You don't currently have another Open MPI installation in your PATH / LD_LIBRARY_PATH, do you? I have seen dependency library loads cause "make check" to get confused, and instead of loading the libraries from the build tree, actually load some -- but not all -- of the required

Re: [OMPI users] Question about undefined routines when using mpi_f08

2018-08-02 Thread Jeff Squyres (jsquyres) via users
On Aug 2, 2018, at 4:40 PM, Grove, John W wrote: > > I am compiling an application using openmpi 3.1.1. The application is mixed > Fortran/C/C++. I am using the intel compiler on a mac pro running OS 10.13.6. > When I try to use the mpi_f08 interface I get unresolved symbols at load > time,

Re: [OMPI users] Trouble writing code for simple 2 node client-server CheckLatency using openMPI

2018-07-30 Thread Jeff Squyres (jsquyres) via users
Do you need to check it with Java, or will any MPI application do? If any language will do, you might want to check out the OSU MPI benchmarks: http://mvapich.cse.ohio-state.edu/benchmarks/ > On Jul 27, 2018, at 10:54 AM, John Bauman wrote: > > Hello everyone, > > I just want to start

Re: [OMPI users] know which CPU has the maximum value

2018-08-10 Thread Jeff Squyres (jsquyres) via users
two cents from a pedestrian MPI user, > who thinks minloc and maxloc are great, > knows nothing about the MPI Forum protocols and activities, > but hopes the Forum pays attention to users' needs. > > Gus Correa > > PS - Jeff S.: Please, bring Diego's request to the Forum! Add

Re: [OMPI users] know which CPU has the maximum value

2018-08-10 Thread Jeff Squyres (jsquyres) via users
It is unlikely that MPI_MINLOC and MPI_MAXLOC will go away any time soon. As far as I know, Nathan hasn't advanced a proposal to kill them in MPI-4, meaning that they'll likely continue to be in MPI for at least another 10 years. :-) (And even if they did get killed in MPI-4, implementations

Re: [OMPI users] MPI group and stuck in communication

2018-08-10 Thread Jeff Squyres (jsquyres) via users
I'm not quite clear what the problem is that you're running in to -- you just said that there is "some problem with MPI_barrier". What problem, exactly, is happening with your code? Be as precise and specific as possible. It's kinda hard to tell what is happening in the code snippet below

Re: [OMPI users] know which CPU has the maximum value

2018-08-10 Thread Jeff Squyres (jsquyres) via users
o deprecate MPI_{MIN,MAX}LOC, they should start that >> discussion on https://github.com/mpi-forum/mpi-issues/issues or >> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-coll. >> Jeff >> On Fri, Aug 10, 2018 at 10:27 AM, Jeff Squyres (jsquyres) via users >> ma

Re: [OMPI users] MPI group and stuck in communication

2018-08-11 Thread Jeff Squyres (jsquyres) via users
On Aug 10, 2018, at 6:27 PM, Diego Avesani wrote: > > The question is: > Is it possible to have a barrier for all CPUs despite they belong to > different group? > If the answer is yes I will go in more details. By "CPUs", I assume you mean "MPI processes", right? (i.e., not threads inside an

Re: [OMPI users] MPI group and stuck in communication

2018-08-13 Thread Jeff Squyres (jsquyres) via users
On Aug 12, 2018, at 2:18 PM, Diego Avesani wrote: > > Dear all, Dear Jeff, > I have three communicator: > > the standard one: > MPI_COMM_WORLD > > and other two: > MPI_LOCAL_COMM > MPI_MASTER_COMM > > a sort of two-level MPI. > > Suppose to have 8 threats, > I use 4 threats for run the same

Re: [OMPI users] mpirun hangs

2018-08-15 Thread Jeff Squyres (jsquyres) via users
There can be lots of reasons that this happens. Can you send all the information listed here? https://www.open-mpi.org/community/help/ > On Aug 15, 2018, at 10:55 AM, Mota, Thyago wrote: > > Hello. > > I have openmpi 2.0.4 installed on a Cent OS 7. When I try to run "mpirun" it >

Re: [OMPI users] [WARNING: UNSCANNABLE EXTRACTION FAILED]Re: mpirun hangs

2018-08-15 Thread Jeff Squyres (jsquyres) via users
config.log and the ompi_info > > Thanks. > > On Wed, Aug 15, 2018 at 11:46 AM, Jeff Squyres (jsquyres) via users > wrote: > There can be lots of reasons that this happens. Can you send all the > information listed here? > > https://www.open-mpi.org/community/he

Re: [OMPI users] Unable to open a shared object libsmartio-rdmav17.so

2018-08-24 Thread Jeff Squyres (jsquyres) via users
I'm afraid the error message you're getting is from libibverbs; it's trying to load a plugin named libsmartio-rdmav17.so. That's not part of Open MPI, sorry. That likely means that some dependency of libsmartio-rdmav17.so wasn't found, and the run-time loading of the plugin failed (vs. not

[OMPI users] lists.open-mpi.org appears to be back

2018-08-28 Thread Jeff Squyres (jsquyres) via users
The lists.open-mpi.org server went offline due to an outage at our hosting provider sometime in the evening on Aug 22 / early morning Aug 23 (US Eastern time). The list server now appears to be back online; I've seen at least a few backlogged emails finally come through. If you sent a mail in

[OMPI users] lists.open-mpi.org appears to be back

2018-08-28 Thread Jeff Squyres (jsquyres) via users
The lists.open-mpi.org server went offline due to an outage at our hosting provider sometime in the evening on Aug 22 / early morning Aug 23 (US Eastern time). As of yesterday morning (Saturday, Aug 25), the list server now appears to be back online; I've seen at least a few backlogged emails

Re: [OMPI users] MPI advantages over PBS

2018-08-28 Thread Jeff Squyres (jsquyres) via users
On Aug 22, 2018, at 11:49 AM, Diego Avesani wrote: > > I have a philosophical question. > > I am reading a lot of papers where people use Portable Batch System or job > scheduler in order to parallelize their code. > > What are the advantages in using MPI instead? It depends on the code in

Re: [OMPI users] MPI_MAXLOC problems

2018-08-28 Thread Jeff Squyres (jsquyres) via users
I think Gilles is right: remember that datatypes like MPI_2DOUBLE_PRECISION are actually 2 values. So if you want to send 1 pair of double precision values with MPI_2DOUBLE_PRECISION, then your count is actually 1. > On Aug 22, 2018, at 8:02 AM, Gilles Gouaillardet > wrote: > > Diego, > >

Re: [OMPI users] lists.open-mpi.org appears to be back

2018-08-28 Thread Jeff Squyres (jsquyres) via users
I originally sent this mail on Saturday, but it looks like lists.open-mpi.org was *not* actually back at this time. I'm finally starting to see all the backlogged messages on Tuesday, around 5pm US Eastern time. So I think lists.open-mpi.org is finally back in service. Sorry for the

Re: [OMPI users] Are MPI datatypes guaranteed to be compile-time constants?

2018-09-04 Thread Jeff Squyres (jsquyres) via users
On Sep 4, 2018, at 5:22 PM, Benjamin Brock wrote: > > Are MPI datatypes like MPI_INT and MPI_CHAR guaranteed to be compile-time > constants? No. They are guaranteed to be link-time constants. > Is this defined by the MPI standard, or in the Open MPI implementation? MPI standard:

Re: [OMPI users] openmpi-3.1.2 libgfortran conflict

2018-09-05 Thread Jeff Squyres (jsquyres) via users
Glad you figured it out. Just for some additional color: https://www.open-mpi.org/faq/?category=building#install-overwrite > On Sep 3, 2018, at 4:17 AM, Patrick Begou > wrote: > > Solved. > Strange conflict (not explained) after several compilation test of OpenMPI > with gcc7. Solved

Re: [OMPI users] 3.1.1 Bindings Change

2018-07-04 Thread Jeff Squyres (jsquyres) via users
Greetings Matt. https://github.com/open-mpi/ompi/commit/4d126c16fa82c64a9a4184bc77e967a502684f02 is the specific commit where the fixes came in. Here's a little creative grepping that shows the APIs affected (there's also some callback function signatures that were fixed, too, but they're

Re: [OMPI users] Disable network interface selection

2018-07-09 Thread Jeff Squyres (jsquyres) via users
Can you send the full verbose output with "--mca btl_base_verbose 100"? > On Jul 4, 2018, at 4:36 PM, carlos aguni wrote: > > Hi Gilles. > > Thank you for your reply! :) > I'm now using a compiled version of OpenMPI 3.0.2 and all seems to work fine > now. > Running `mpirun -n 3 -host

Re: [OMPI users] Seg fault in opal_progress

2018-07-11 Thread Jeff Squyres (jsquyres) via users
Ok, that would be great -- thanks. Recompiling Open MPI with --enable-debug will turn on several debugging/sanity checks inside Open MPI, and it will also enable debugging symbols. Hence, If you can get a failure when a debug Open MPI build, it might give you a core file that can be used to

Re: [OMPI users] Seg fault in opal_progress

2018-07-12 Thread Jeff Squyres (jsquyres) via users
Noam and I actually talked on the phone (whtt!?) and worked through this a bit more. Oddly, he can generate core files if he runs in /tmp, but not if he runs in an NFS-mounted directory (!). I haven't seen that before -- if someone knows why that would happen, I'd love to hear the

Re: [OMPI users] Seg fault in opal_progress

2018-07-11 Thread Jeff Squyres (jsquyres) via users
M, Jeff Squyres (jsquyres) via users >> wrote: >>>> >>> >>> After more extensive testing it’s clear that it still happens with 2.1.3, >>> but much less frequently. I’m going to try to get more detailed info with >>> version 3.1.1, wh

Re: [OMPI users] Seg fault in opal_progress

2018-07-12 Thread Jeff Squyres (jsquyres) via users
On Jul 12, 2018, at 10:59 AM, Noam Bernstein wrote: > >> Do you get core files? >> >> Loading up the core file in a debugger might give us more information. > > No, I don’t, despite setting "ulimit -c unlimited”. I’m not sure what’s > going on with that (or the lack of line info in the

Re: [OMPI users] Seg fault in opal_progress

2018-07-12 Thread Jeff Squyres (jsquyres) via users
Do you get core files? Loading up the core file in a debugger might give us more information. > On Jul 12, 2018, at 9:35 AM, Noam Bernstein > wrote: > > >> On Jul 12, 2018, at 8:37 AM, Noam Bernstein >> wrote: >> >> I’m going to try the 3.1.x 20180710 nightly snapshot next. > > Same

Re: [OMPI users] Seg fault in opal_progress

2018-07-12 Thread Jeff Squyres (jsquyres) via users
On Jul 12, 2018, at 11:45 AM, Noam Bernstein wrote: > >> E.g., if you "ulimit -c" in your interactive shell and see "unlimited", but >> if you "ulimit -c" in a launched job and see "0", then the job scheduler is >> doing that to your environment somewhere. > > I am using a scheduler

Re: [OMPI users] *** Error in `orted': double free or corruption (out): 0x00002aaab4001680 ***, in some node combos.

2018-09-11 Thread Jeff Squyres (jsquyres) via users
Thanks for reporting the issue. First, you can workaround the issue by using: mpirun --mca oob tcp ... This uses a different out-of-band plugin (TCP) instead of verbs unreliable datagrams. Second, I just filed a fix for our current release branches (v2.1.x, v3.0.x, and v3.1.x):

Re: [OMPI users] stdout/stderr question

2018-09-11 Thread Jeff Squyres (jsquyres) via users
Gilles: Can you submit a PR to fix these 2 places? Thanks! > On Sep 11, 2018, at 9:10 AM, emre brookes wrote: > > Gilles Gouaillardet wrote: >> It seems I got it wrong :-( > Ah, you've joined the rest of us :) >> >> Can you please give the attached patch a try ? >> > Working with a git clone

Re: [OMPI users] opal_pmix_base_select failed for master and 4.0.0

2018-10-05 Thread Jeff Squyres (jsquyres) via users
): >> >> opal_pmix_base_select failed >> --> Returned value Not found (-13) instead of ORTE_SUCCESS >> -- >> loki hello_1 118 >> >> >> I don't know, if you have already appli

Re: [OMPI users] [version 2.1.5] invalid memory reference

2018-10-11 Thread Jeff Squyres (jsquyres) via users
Patrick -- You might want to update your HDF code to not use MPI_LB and MPI_UB -- these constants were deprecated in MPI-2.1 in 2009 (an equivalent function, MPI_TYPE_CREATE_RESIZED was added in MPI-2.0 in 1997), and were removed from the MPI-3.0 standard in 2012. Meaning: the death of these

Re: [OMPI users] Cannot run MPI code on multiple cores with PBS

2018-10-11 Thread Jeff Squyres (jsquyres) via users
gt;> On Oct 4, 2018, at 10:30 AM, John Hearns via users >>> wrote: >>> >>> Michele one tip: log into a compute node using ssh and as your own >>> username. >>> If you use the Modules envirnonment then load the modules you use in >>> the

Re: [OMPI users] openmpi-v4.0.0rc5: ORTE_ERROR_LOG: Data unpack would read past end of buffer

2018-10-23 Thread Jeff Squyres (jsquyres) via users
Siegmar: the issue appears to be using the rank mapper. We should get that fixed, but it may not be fixed for v4.0.0. Howard opened the following GitHub issue to track it: https://github.com/open-mpi/ompi/issues/5965 > On Oct 23, 2018, at 9:29 AM, Siegmar Gross > wrote: > > Hi, > >

Re: [OMPI users] (no subject)

2018-11-01 Thread Jeff Squyres (jsquyres) via users
That's pretty weird. I notice that you're using 3.1.0rc2. Does the same thing happen with Open MPI 3.1.3? > On Oct 31, 2018, at 9:08 PM, Dmitry N. Mikushin wrote: > > Dear all, > > ompi_info reports pml components are available: > > $ /usr/mpi/gcc/openmpi-3.1.0rc2/bin/ompi_info -a | grep

Re: [OMPI users] check for CUDA support

2018-10-30 Thread Jeff Squyres (jsquyres) via users
ote: > > +1 to what Jeff said. > > So you would need --with-cuda pointing to a cuda installation to have > cuda-awareness in OpenMPI. > > On Tue, Oct 30, 2018 at 12:47 PM Jeff Squyres (jsquyres) via users > wrote: > The "Configure command line" shows you the comm

Re: [OMPI users] check for CUDA support

2018-10-30 Thread Jeff Squyres (jsquyres) via users
The "Configure command line" shows you the command line that was given to "configure" when building Open MPI. The "MPI extensions" line just indicates which Open MPI "extensions" were built. CUDA is one of the possible extensions that can get built. The CUDA Open MPI extension is actually an

Re: [OMPI users] Wrapper Compilers

2018-10-26 Thread Jeff Squyres (jsquyres) via users
On Oct 25, 2018, at 5:30 PM, Reuti wrote: > >> The program 'mpic++' can be found in the following packages: >> * lam4-dev >> * libmpich-dev >> * libopenmpi-dev > > PS: Interesting that they still include LAM/MPI, which was superseded by Open > MPI some time ago. ZOMG. As one of the last

Re: [OMPI users] Need Help - Thank you for this great tool

2018-11-07 Thread Jeff Squyres (jsquyres) via users
What is the exact problem you are trying to solve? Please send all the information listed here: https://www.open-mpi.org/community/help/ > On Nov 5, 2018, at 4:44 AM, saad alosaimi wrote: > > Dear All, > > First of all, thank you for this great tool. > Actually, I try to bind rank or

Re: [OMPI users] Cannot run MPI code on multiple cores with PBS

2018-10-04 Thread Jeff Squyres (jsquyres) via users
S this cluster is installed with? >> >> >> >> On Thu, 4 Oct 2018 at 00:02, Castellana Michele >> wrote: >> >> I fixed it, the correct file was in /lib64, not in /lib. >> >> Thank you for your help. >> >> On Oct 3, 2018, at 11:

Re: [OMPI users] opal_pmix_base_select failed for master and 4.0.0

2018-10-02 Thread Jeff Squyres (jsquyres) via users
(Ralph sent me Siegmar's pmix config.log, which Siegmar sent to him off-list) It looks like Siegmar passed --with-hwloc=internal. Open MPI's configure understood this and did the appropriate things. PMIX's configure didn't. I think we need to add an adjustment into the PMIx configure.m4 in

Re: [OMPI users] --mca btl params

2018-10-10 Thread Jeff Squyres (jsquyres) via users
On Oct 9, 2018, at 8:55 PM, Noam Bernstein wrote: > >> That's basically what the output of >> ompi_info -a >> says. You actually probably want: ompi_info | grep btl That will show you the names and versions of the "btl" plugins that are available on your system. For example, this is what I

Re: [OMPI users] Cannot run MPI code on multiple cores with PBS

2018-10-03 Thread Jeff Squyres (jsquyres) via users
It's probably in your Linux distro somewhere -- I'd guess you're missing a package (e.g., an RPM or a deb) out on your compute nodes...? > On Oct 3, 2018, at 4:24 PM, Castellana Michele > wrote: > > Dear Ralph, > Thank you for your reply. Do you know where I could find libcrypto.so.0.9.8 ?

Re: [OMPI users] error building openmpi-master-201809290304-73075b8 with Sun C 5.15

2018-10-01 Thread Jeff Squyres (jsquyres) via users
Siegmar -- I created a GitHub issue for this: https://github.com/open-mpi/ompi/issues/5814 Nathan posted a test program on there for you to try; can you try it and reply on the issue? Thanks. > On Oct 1, 2018, at 9:23 AM, Siegmar Gross > wrote: > > Hi, > > I've tried to install

Re: [OMPI users] [version 2.1.5] invalid memory reference

2018-09-19 Thread Jeff Squyres (jsquyres) via users
Yeah, it's a bit terrible, but we didn't reliably reproduce this problem for many months, either. :-\ As George noted, it's been ported to all the release branches but is not yet in an official release. Until an official release (4.0.0 just had an rc; it will be released soon, and 3.0.3 will

Re: [OMPI users] OpenMPI building fails on Windows Linux Subsystem(WLS).

2018-09-19 Thread Jeff Squyres (jsquyres) via users
I can't say that we've tried to build on WSL; the fact that it fails is probably not entirely unsurprising. :-( I looked at your logs, and although I see the compile failure, I don't see any reason *why* it failed. Here's the relevant fail from the tar_openmpi_fail file: - 5523 Making

Re: [OMPI users] How do I build 3.1.0 (or later) with mellanox's libraries

2018-09-19 Thread Jeff Squyres (jsquyres) via users
Alan -- Sorry for the delay. I agree with Gilles: Brian's commit had to do with "reachable" plugins in Open MPI -- they do not appear to be the problem here. >From the config.log you sent, it looks like configure aborted because you >requested UCX support (via --with-ucx) but configure wasn't

Re: [OMPI users] Difficulties when trying to download files?

2018-09-25 Thread Jeff Squyres (jsquyres) via users
Must have been some kind of temporary DNS glitch. Shrug. Next time it happens, also be sure to check https://downforeveryoneorjustme.com/download.open-mpi.org > On Sep 25, 2018, at 9:13 AM, Jorge D'Elia wrote: > > - Mensaje original - >> De: "Llolsten Kaonga" >> Para: "Jorge

Re: [OMPI users] --with-mpi-f90-size in openmpi-3.0.2

2018-09-27 Thread Jeff Squyres (jsquyres) via users
On Sep 27, 2018, at 12:16 AM, Zeinab Salah wrote: > > I have a problem in running an air quality model, maybe because of the size > of calculations, so I tried different versions of openmpi. > I want to install openmpi-3.0.2 with the option of > "--with-mpi-f90-size=medium", but this option

Re: [OMPI users] --with-mpi-f90-size in openmpi-3.0.2

2018-09-27 Thread Jeff Squyres (jsquyres) via users
On Sep 27, 2018, at 1:52 PM, Zeinab Salah wrote: > > Thank you so much for your detailed answers. > I use gfortran 4.8.3, what should I do? or what is the suitable openmpi > version for this version? If you build Open MPI v3.1.2 with gfortran 4.8.3, you will automatically get the "old"

Re: [OMPI users] One question about progression of operations in MPI

2018-11-16 Thread Jeff Squyres (jsquyres) via users
On Nov 13, 2018, at 8:52 PM, Weicheng Xue wrote: > > I am a student whose research work includes using MPI and OpenACC to > accelerate our in-house research CFD code on multiple GPUs. I am having a big > issue related to the "progression of operations in MPI" and am thinking your > inputs

Re: [OMPI users] Help Getting Started with Open MPI and PMIx and UCX

2019-01-18 Thread Jeff Squyres (jsquyres) via users
On Jan 18, 2019, at 12:43 PM, Matt Thompson wrote: > > With some help, I managed to build an Open MPI 4.0.0 with: We can discuss each of these params to let you know what they are. > ./configure --disable-wrapper-rpath --disable-wrapper-runpath Did you have a reason for disabling these?

Re: [OMPI users] v2.1.1 How to utilise multiple NIC ports

2018-12-20 Thread Jeff Squyres (jsquyres) via users
On Dec 20, 2018, at 3:33 PM, Bob Beattie wrote: > > I'm working on OpenFOAM v5 and have been successful in getting two nodes > working together. (both 18.04 LTS connected via GbE) > As both machines have a quad port gigabit NIC I have been trying to persuade > mpirun to use more than a single

Re: [OMPI users] v2.1.1 How to utilise multiple NIC ports

2018-12-22 Thread Jeff Squyres (jsquyres) via users
On Dec 22, 2018, at 10:56 AM, Bob Beattie wrote: > > How do I now go about setting up /etc/hosts, -hostfile entries and bringing > them all together on the mpirun run line ? > For example, my 2nd machine is a quad core Dell T3500. Should I create a > separate entry in /etc/hosts for each NIC

Re: [OMPI users] It's possible to get mpi working without ssh?

2018-12-19 Thread Jeff Squyres (jsquyres) via users
On Dec 19, 2018, at 11:42 AM, Daniel Edreira wrote: > > Does anyone know if there's a possibility to configure a cluster of nodes to > communicate with each other with mpirun without using SSH? > > Someone is asking me about making a cluster with Infiniband that does not use > SSH to

Re: [OMPI users] filesystem-dependent failure building Fortran interfaces

2018-12-04 Thread Jeff Squyres (jsquyres) via users
Hi Dave; thanks for reporting. Yes, we've fixed this -- it should be included in 4.0.1. https://github.com/open-mpi/ompi/pull/6121 If you care, you can try the nightly 4.0.x snapshot tarball -- it should include this fix: https://www.open-mpi.org/nightly/v4.0.x/ > On Dec 4, 2018,

Re: [OMPI users] [Open MPI Announce] Open MPI SC'18 State of the Union BOF slides

2018-11-27 Thread Jeff Squyres (jsquyres) via users
Bert -- Sorry for the slow reply; got caught up in SC'18 and the US Thanksgiving holiday. Yes, you are exactly correct (I saw your GitHub issue/pull request about this before I saw this email). We will fix this in 4.0.1 in the very near future. > On Nov 19, 2018, at 3:10 AM, Bert Wesarg

Re: [OMPI users] One question about progression of operations in MPI

2018-11-27 Thread Jeff Squyres (jsquyres) via users
Sorry for the delay in replying; the SC'18 show and then the US Thanksgiving holiday got in the way. More below. > On Nov 16, 2018, at 10:50 PM, Weicheng Xue wrote: > > Hi Jeff, > > Thank you very much for your reply! I am now using a cluster at my > university

Re: [OMPI users] Suggestion to add one thing to look/check for when running OpenMPI program

2019-01-09 Thread Jeff Squyres (jsquyres) via users
Good suggestion; thank you! > On Jan 8, 2019, at 9:44 PM, Ewen Chan wrote: > > To Whom It May Concern: > > Hello. I'm new here and I got here via OpenFOAM. > > In the FAQ regarding running OpenMPI programs, specifically where someone > might be able to run their OpenMPI program on a local

Re: [OMPI users] Increasing OpenMPI RMA win attach region count.

2019-01-09 Thread Jeff Squyres (jsquyres) via users
You can set this MCA var on a site-wide basis in a file: https://www.open-mpi.org/faq/?category=tuning#setting-mca-params > On Jan 9, 2019, at 1:18 PM, Udayanga Wickramasinghe wrote: > > Thanks. Yes, I am aware of that however, I currently have a requirement to > increase the default.

Re: [OMPI users] No network interfaces were found for out-of-band communications.

2018-09-12 Thread Jeff Squyres (jsquyres) via users
Can you send all the information listed here: https://www.open-mpi.org/community/help/ > On Sep 12, 2018, at 11:03 AM, Greg Russell wrote: > > OpenMPI-3.1.2 > > Sent from my iPhone > > On Sep 12, 2018, at 10:50 AM, Ralph H Castain wrote: > >> What OMPI version are we talking about

Re: [OMPI users] *** Error in `orted': double free or corruption (out): 0x00002aaab4001680 ***, in some node combos.

2018-09-13 Thread Jeff Squyres (jsquyres) via users
On Sep 12, 2018, at 4:54 AM, Balázs Hajgató wrote: > > Setting mca oob to tcp works. I will stick to this solution in our production > environment. Great! > I am not sure that it is relevant, but I also tried the patch on a > non-procduction OpenMPI 3.1.1, and "mpirun -host nic114,nic151

Re: [OMPI users] OpenMPI 4 and pmi2 support

2019-03-22 Thread Jeff Squyres (jsquyres) via users
Noam -- I believe we fixed this issue after v4.0.0 was released. Can you try the v4.0.1rc3 tarball that was just released today? https://www.open-mpi.org/software/ompi/v4.0/ > On Mar 22, 2019, at 6:07 PM, Noam Bernstein via users > wrote: > > Hi - I'm trying to compile openmpi 4.0.0

Re: [OMPI users] Building PMIx and Slurm support

2019-02-28 Thread Jeff Squyres (jsquyres) via users
On Feb 28, 2019, at 11:27 AM, Bennet Fauber wrote: > > 13bb410b52becbfa140f5791bd50d580 /sw/src/arcts/ompi/openmpi-1.10.7.tar.gz > bcea63d634d05c0f5a821ce75a1eb2b2 openmpi-v1.10-201705170239-5e373bf.tar.gz Bennet -- I'm sorry; I don't think we've updated the 1.10.x branch in forever. The

Re: [OMPI users] Building PMIx and Slurm support

2019-02-28 Thread Jeff Squyres (jsquyres) via users
On Feb 28, 2019, at 12:20 PM, Bennet Fauber wrote: > > I was pointing out why someone might think using `--with-FEATURE=/usr` is > sometimes necessary. True, but only in the case of a bug. :-) ...but in this case, the bug is too old, and we're almost certainly not going to fix it (sorry!

Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-20 Thread Jeff Squyres (jsquyres) via users
Can you try the latest 4.0.x nightly snapshot and see if the problem still occurs? https://www.open-mpi.org/nightly/v4.0.x/ > On Feb 20, 2019, at 1:40 PM, Adam LeBlanc wrote: > > I do here is the output: > > 2 total processes killed (some possibly by mpirun during cleanup) >

Re: [OMPI users] Does Close_port invalidates the communicactor?

2019-03-01 Thread Jeff Squyres (jsquyres) via users
Close port should only affect new incoming connections. Your established communicator should still be fully functional. > On Mar 1, 2019, at 4:11 AM, Florian Lindner wrote: > > Hello, > > I wasn't able to find anything about that in the standard. Given this > situation: > > >

Re: [OMPI users] [SciPy-Dev] Fwd: Announcement and thanks to Season of Docs survey respondents: Season of Docs has launched

2019-03-14 Thread Jeff Squyres (jsquyres) via users
Hmm -- yes, this could be quite interesting. We always have a need for documentation to be updated! I'm afraid that I already have direct responsibility for an intern this summer and some other "manage people" kinds of duties that mean that I will not have time to be a mentor in this program,

Re: [OMPI users] error while using open mpi to compile parallel-netcdf-1.8.1

2019-03-07 Thread Jeff Squyres (jsquyres) via users
FWIW, the error message is telling you exactly what is wrong and provides a link to the FAQ item on how to fix it. It's a bit inelegant that it's segv'ing after that, but the real issue is what is described in the help message. > On Mar 6, 2019, at 3:07 PM, Zhifeng Yang wrote: > > Hi > > I

Re: [OMPI users] Web page update needed?

2019-03-12 Thread Jeff Squyres (jsquyres) via users
Good catch! I will go fix. Thanks for the heads up! > On Mar 11, 2019, at 3:40 PM, Bennet Fauber wrote: > > From the web page at > > https://www.open-mpi.org/nightly/ > >Before deciding which series to download, be sure to read Open > MPI's philosophy on >version numbers. The short

Re: [OMPI users] Best way to send on mpi c, architecture dependent data type

2019-03-15 Thread Jeff Squyres (jsquyres) via users
On Mar 15, 2019, at 2:02 PM, Sergio None wrote: > > Yes, i know that C99 have these fixed standard library. > > The point is that many common used function, rand for example, return non > fixed types. And then you need doing castings, typically to more big types. > It can be a bit annoying.

Re: [OMPI users] Double free in collectives

2019-03-14 Thread Jeff Squyres (jsquyres) via users
Lee Ann -- Thanks for your bug report. I'm not able to find a call to NBC_Schedule_request() in ompi_coll_libnbc_iallreduce(). I see 2 calls to NBC_Schedule_request() in ompi/mca/coll/libnbc/nbc_iallreduce.c, but they are in different functions. Can you clarify exactly which one(s) you're

Re: [OMPI users] Double free in collectives

2019-03-15 Thread Jeff Squyres (jsquyres) via users
On Mar 14, 2019, at 8:46 PM, Gilles Gouaillardet wrote: > > The first location is indeed in ompi_coll_libnbc_iallreduce() What on earth was I looking at this morning? Scheesh. Thanks for setting me straight...! -- Jeff Squyres jsquy...@cisco.com

Re: [OMPI users] Error initializing an UCX / OpenFabrics device. #6300

2019-03-22 Thread Jeff Squyres (jsquyres) via users
Greetings Charlie. Yes, it looks like you replied to a closed issue on Github -- would you mind opening a new issue about it? You can certainly refer to the old issue for context. But replying to closed issues is a bit dangerous: if we miss the initial email from GitHub (and all of us have

Re: [OMPI users] _init function being called for every linked OpenMPI library

2019-03-22 Thread Jeff Squyres (jsquyres) via users
Yes, it's the DLL init function. It's not in our source code; it's put there automatically by the compiler/linker. > On Mar 22, 2019, at 2:12 PM, Simone Atzeni wrote: > > Hi, > > I was debugging a program compiled with `mpicxx` and noticed that when the > program is being launched the

Re: [OMPI users] Cannot install open mpi (Mac Mojave 10.14.2 (18C54))

2019-02-07 Thread Jeff Squyres (jsquyres) via users
The same steps you list for Open MPI v2.0.2 should work for Open MPI v4.0.0. Can you send the full set of information listed here: https://www.open-mpi.org/community/help/ > On Feb 1, 2019, at 4:00 PM, Neil Teng wrote: > > Hi, > > I am following the following these steps to install the

Re: [OMPI users] relocating an installation

2019-04-09 Thread Jeff Squyres (jsquyres) via users
Reuti's right. Sorry about the potentially misleading use of "--prefix" -- we basically inherited that CLI option from a different MPI implementation (i.e., people asked for it). So we were locked into that meaning for the "--prefix" CLI options. > On Apr 9, 2019, at 9:14 AM, Reuti wrote:

Re: [OMPI users] allow over-subscription by default

2019-04-16 Thread Jeff Squyres (jsquyres) via users
Steffen -- What version of Open MPI are you using? > On Apr 16, 2019, at 9:21 AM, Steffen Christgau > wrote: > > Hi Tim, > > it helps, up to four processes. But it has two drawbacks. 1) Using more > cores/threads than the machine provides (so the actual > over-subscription) is still not

Re: [OMPI users] growing memory use from MPI application

2019-06-20 Thread Jeff Squyres (jsquyres) via users
On Jun 20, 2019, at 1:34 PM, Noam Bernstein wrote: > > Aha - using Mellanox’s OFED packaging seems to essentially (if not 100%) > fixed the issue. There still appears to be some small leak, but it’s of > order 1 GB, not 10s of GB, and it doesn’t grow continuously. And on later > runs of

Re: [OMPI users] Intel Compilers

2019-06-20 Thread Jeff Squyres (jsquyres) via users
Can you send the exact ./configure line you are using to configure Open MPI? > On Jun 20, 2019, at 12:32 PM, Charles A Taylor via users > wrote: > > > >> On Jun 20, 2019, at 12:10 PM, Carlson, Timothy S >> wrote: >> >> I’ve never seen that error and have built some flavor of this

Re: [OMPI users] OpenMPI 4 and pmi2 support

2019-06-20 Thread Jeff Squyres (jsquyres) via users
Ok. Perhaps we still missed something in the configury. Worst case, you can: $ ./configure CPPFLAGS=-I/usr/include/slurm ...rest of your configure params... That will add the -I to CPPFLAGS, and it will preserve that you set that value in the top few lines of config.log. On Jun 20, 2019,

Re: [OMPI users] OpenMPI 4 and pmi2 support

2019-06-20 Thread Jeff Squyres (jsquyres) via users
On Jun 14, 2019, at 2:02 PM, Noam Bernstein via users wrote: > > Hi Jeff - do you remember this issue from a couple of months ago? Noam: I'm sorry, I totally missed this email. My INBOX is a continual disaster. :-( > Unfortunately, the failure to find pmi.h is still happening. I just

Re: [OMPI users] growing memory use from MPI application

2019-06-20 Thread Jeff Squyres (jsquyres) via users
On Jun 20, 2019, at 9:31 AM, Noam Bernstein via users wrote: > > One thing that I’m wondering if anyone familiar with the internals can > explain is how you get a memory leak that isn’t freed when then program ends? > Doesn’t that suggest that it’s something lower level, like maybe a kernel

Re: [OMPI users] Problems in 4.0.1 version and printf function

2019-05-09 Thread Jeff Squyres (jsquyres) via users
Stdout forwarding should continue to work in v4.0.x just like it did in v3.0.x. I.e., printf's from your app should appear in the stdout of mpirun. Sometimes they can get buffered, however, such as if you redirect the stdout to a file or to a pipe. Such shell buffering may only emit output

Re: [OMPI users] MPI failing on Infiniband (queue pair error)

2019-05-09 Thread Jeff Squyres (jsquyres) via users
You might want to try two things: 1. Upgrade to Open MPI v4.0.1. 2. Use the UCX PML instead of the openib BTL. You may need to download/install UCX first. Then configure Open MPI: ./configure --with-ucx --without-verbs --enable-mca-no-build=btl-uct ... This will build the UCX PML, and that

Re: [OMPI users] error running mpirun command

2019-05-14 Thread Jeff Squyres (jsquyres) via users
Looks like this thread accidentally got dropped; sorry! More below. > On May 4, 2019, at 10:40 AM, Eric F. Alemany via users > wrote: > > Hi Gilles, > > Thank you for your message and your suggestion. As you suggested i tried > mpirun -np 84 - -hostfile hostsfile --mca routed direct

Re: [OMPI users] Packaging issue with linux spec file when not build_all_in_one_rpm due to empty grep

2019-05-14 Thread Jeff Squyres (jsquyres) via users
Daniel -- Many thanks for bringing this to our attention. I've filed a PR with the fix: https://github.com/open-mpi/ompi/pull/6659 > On Apr 16, 2019, at 2:58 PM, Daniel Letai wrote: > > In src rpm version 4.0.1 if building with --define 'build_all_in_one_rpm 0' > the grep -v _mandir

Re: [OMPI users] Segmentation fault when using 31 or 32 ranks

2019-07-10 Thread Jeff Squyres (jsquyres) via users
It might be worth trying the latest v4.0.x nightly snapshot -- we just updated the internal PMIx on the v4.0.x branch: https://www.open-mpi.org/nightly/v4.0.x/ > On Jul 10, 2019, at 1:29 PM, Steven Varga via users > wrote: > > Hi i am fighting similar. Did you try to update the pmix

Re: [OMPI users] relocating an installation

2019-04-10 Thread Jeff Squyres (jsquyres) via users
To be clear, --prefix and OPAL_PREFIX do two different things: 1. "mpirun --prefix" and "/abs/path/to/mpirun" (both documented in mpirun(1)) are used to set PATH and LD_LIBRARY_PATH on remote nodes when invoked via ssh/rsh. It's a way of not having to set your shell startup files to point to

Re: [OMPI users] Error when Building an MPI Java program

2019-04-09 Thread Jeff Squyres (jsquyres) via users
Can you provide a small, standalone example + recipe to show the problem? > On Apr 8, 2019, at 6:45 AM, Benson Muite wrote: > > Hi > > Am trying to build an MPI Java program using OpenMPI 4.0.1: > > I get the following error: > > Compilation failure >

Re: [OMPI users] allow over-subscription by default

2019-04-17 Thread Jeff Squyres (jsquyres) via users
On Apr 17, 2019, at 3:38 AM, Steffen Christgau wrote: > > as written in my original post, I'm using a custom build of 4.0.0 which I'm sorry -- I missed that (it was at the bottom; my bad). > was configured with nothing more than a --prefix and > --enable-mpi-fortran. I checked for updates and

Re: [OMPI users] 3.0.4, 4.0.1 build failure on OSX Mojave with LLVM

2019-04-23 Thread Jeff Squyres (jsquyres) via users
The version of LLVM that I have installed on my Mac 10.14.4 is: $ where clang /usr/bin/clang $ clang --version Apple LLVM version 10.0.1 (clang-1001.0.46.4) Target: x86_64-apple-darwin18.5.0 Thread model: posix InstalledDir:

  1   2   3   4   >