Re: [OMPI users] prterun: symbol lookup error: /usr/lib/libprrte.so.3: undefined symbol: PMIx_Session_control

2024-08-15 Thread Jeff Squyres (jsquyres) via users
This isn't enough information to provide a definitive answer. Can you provide more information about your setup, how you built and installed Open MPI, ... etc.? In general, the error message is the standard Linux error message when a symbol is unable to be found at run time. In particular, mp

Re: [OMPI users] Fwd: Unable to run basic mpirun command (OpenMPI v5.0.3)

2024-05-05 Thread Jeff Squyres (jsquyres) via users
Note that, depending on your environment, you might need to set these env variables on every node where you're running the Open MPI job. For example: https://docs.open-mpi.org/en/v5.0.x/launching-apps/quickstart.html#launching-in-a-non-scheduled-environments-via-ssh and https://docs.open-mpi.o

Re: [OMPI users] Fwd: Unable to run basic mpirun command (OpenMPI v5.0.3)

2024-05-04 Thread Jeff Squyres (jsquyres) via users
You might want to see if your OS has Open MPI installed into default binary / library search paths; you might be able to uninstall it easily. Otherwise, even if you explicitly run the mpirun​ you just built+installed, it might find the libmpi.so​ from some other copy of Open MPI. Alternatively,

Re: [OMPI users] Fwd: Unable to run basic mpirun command (OpenMPI v5.0.3)

2024-05-03 Thread Jeff Squyres (jsquyres) via users
Your config.log file shows that you are trying to build Open MPI 2.1.6 and that configure failed. I'm not sure how to square this with the information that you provided in your message... did you upload the wrong config.log? Can you provide all the information from https://docs.open-mpi.org/en

Re: [OMPI users] [EXTERNAL] Help deciphering error message

2024-03-08 Thread Jeff Squyres (jsquyres) via users
(sorry this is so long – it's a bunch of explanations followed by 2 suggestions at the bottom) One additional thing worth mentioning is that your mpirun command line does not seem to explicitly be asking for the "ucx" PML component, but the error message you're getting indicates that you specif

Re: [OMPI users] Seg error when using v5.0.1

2024-01-31 Thread Jeff Squyres (jsquyres) via users
No worries – glad you figured it out! From: users on behalf of afernandez via users Sent: Wednesday, January 31, 2024 10:56 AM To: Open MPI Users Cc: afernandez Subject: Re: [OMPI users] Seg error when using v5.0.1 Hello, I'm sorry as I totally messed up here.

Re: [OMPI users] MPI Wireshark Packet Dissector

2023-12-11 Thread Jeff Squyres (jsquyres) via users
Cool! I dimly remember this project; it was written independently of the main Open MPI project. It looks like it supports the TCP OOB and TCP BTL. The TCP OOB has since moved from Open MPI's "ORTE" sub-project to the independent PRRTE project. Regardless, TCP OOB traffic is effectively about

Re: [OMPI users] OpenMPI 5.0.0 & Intel OneAPI 2023.2.0 on MacOS 14.0:

2023-11-06 Thread Jeff Squyres (jsquyres) via users
We develop and build with clang on macOS frequently; it would be surprising if it didn't work. That being said, I was able to replicate both errors report here. One macOS Sonoma with XCode 15.x and the OneAPI compilers: * configure fails in the PMIx libevent section, complaining about how

[OMPI users] Open MPI BOF at SC'23

2023-11-06 Thread Jeff Squyres (jsquyres) via users
We're excited to see everyone next week in Denver, Colorado, USA at SC23! Open MPI will be hosting our usual State of the Union Birds of a Feather (BOF) session on Wednesday, 15, November, 2023, from 12:15-1:15pm US Mounta

Re: [OMPI users] OpenMPI 5.0.0 & Intel OneAPI 2023.2.0 on MacOS 14.0:

2023-10-30 Thread Jeff Squyres (jsquyres) via users
Volker -- If that doesn't work, send all the information requested here: https://docs.open-mpi.org/en/v5.0.x/getting-help.html From: users on behalf of Volker Blum via users Sent: Saturday, October 28, 2023 8:47 PM To: Matt Thompson Cc: Volker Blum ; Open MPI

Re: [OMPI users] MPI4Py Only Using Rank 0

2023-10-25 Thread Jeff Squyres (jsquyres) via users
(let's keep users@lists.open-mpi.org in the CC list so that others can reply, too) I don't know exactly how conda installs / re-installs mpi4py -- e.g., I don't know which MPI implementation it compiles and links against. You can check to see which MPI implementation mpiexec uses -- for Open MP

Re: [OMPI users] MPI4Py Only Using Rank 0

2023-10-25 Thread Jeff Squyres (jsquyres) via users
This usually​ means that you have accidentally switched to using a different MPI implementation under the covers somehow. E.g., did you somehow accidentally start using mpiexec from MPICH instead of Open MPI? Or did MPI4Py somehow get upgraded or otherwise re-build itself for MPICH, but you're

Re: [OMPI users] Binding to thread 0

2023-09-08 Thread Jeff Squyres (jsquyres) via users
In addition to what Gilles mentioned, I'm curious: is there a reason you have hardware threads enabled? You could disable them in the BIOS, and then each of your MPI processes can use the full core, not just a single hardware thread. From: users on behalf of Lui

Re: [OMPI users] Segmentation fault

2023-08-09 Thread Jeff Squyres (jsquyres) via users
Without knowing anything about SU2, we can't really help debug the issue. The seg fault stack trace that you provided was quite deep; we don't really have the resources to go learn about how a complex application like SU2 is implemented -- sorry! Can you or they provide a small, simple MPI app

Re: [OMPI users] Segmentation fault

2023-08-09 Thread Jeff Squyres (jsquyres) via users
Ok, thanks for upgrading. Are you also using the latest version of SU2? Without knowing what that application is doing, it's a little hard to debug the issue from our side. At first glance, it looks like it is crashing when it has completed writing a file and is attempting to close it. But th

Re: [OMPI users] Segmentation fault

2023-08-09 Thread Jeff Squyres (jsquyres) via users
I'm afraid I don't know anything about the SU2 application. You are using Open MPI v4.0.3, which is fairly old. Many bug fixes have been released since that version. Can you upgrade to the latest version of Open MPI (v4.1.5)? From: users on behalf of Aziz Ogut

Re: [OMPI users] [EXT] Re: Error handling

2023-07-19 Thread Jeff Squyres (jsquyres) via users
MPI_Allreduce should work just fine, even with negative numbers. If you are seeing something different, can you provide a small reproducer program that shows the problem? We can dig deeper into if if we can reproduce the problem. mpirun's exit status can't distinguish between MPI processes who

Re: [OMPI users] libnuma.so error

2023-07-19 Thread Jeff Squyres (jsquyres) via users
It's not clear if that message is being emitted by Open MPI. It does say it's falling back to a different behavior if libnuma.so is not found, so it appears if it's treating it as a warning, not an error. From: users on behalf of Luis Cebamanos via users Sent:

Re: [OMPI users] Error build Open MPI 4.1.5 with GCC 11.3

2023-07-18 Thread Jeff Squyres (jsquyres) via users
The GNU-generated Makefile dependencies may not be removed during "make clean" -- they may only be removed during "make distclean" (which is kinda equivalent to rm -rf'ing the tree and extracting a fresh tarball). From: Jeffrey Layton Sent: Tuesday, July 18, 2023

Re: [OMPI users] Error build Open MPI 4.1.5 with GCC 11.3

2023-07-18 Thread Jeff Squyres (jsquyres) via users
There were probably quite a few differences from the output of "configure" between GCC 9.4 and GCC 11.3. For example, your original post cited "/usr/lib/gcc/x86_64-linux-gnu/9/include/float.h", which, I assume, does not exist on your new GCC 11.3-based system. Meaning: if you had run make clea

Re: [OMPI users] Error build Open MPI 4.1.5 with GCC 11.3

2023-07-17 Thread Jeff Squyres (jsquyres) via users
That's a little odd. Usually, the specific .h files that are listed as dependencies came from somewhere​ -- usually either part of the GNU Autotools dependency analysis. I'm guessing that /usr/lib/gcc/x86_64-linux-gnu/9/include/float.h doesn't actually exist on your system -- but then how did

Re: [OMPI users] OMPI compilation error in Making all datatypes

2023-07-12 Thread Jeff Squyres (jsquyres) via users
If the file opal/datatype/.lib/libdatatype_reliable.a does not exist after running "ar cru .libs/libdatatype_reliable.a .libs/libdataty...etc.", then there is something wrong with your system. Specifically, "ar" is a Linux command that makes an archive file; this command is not part of Open MPI

Re: [OMPI users] OMPI compilation error in Making all datatypes

2023-07-12 Thread Jeff Squyres (jsquyres) via users
The output you sent (in the attached tarball) in doesn't really make much sense: libtool: link: ar cru .libs/libdatatype_reliable.a .libs/libdatatype_reliable_la-opal_datatype_pack.o .libs/libdatatype_reliable_la-opal_datatype_unpack.o libtool: link: ranlib .libs/libdatatype_reliable.a ranlib

Re: [OMPI users] Issue with Running MPI Job on CentOS 7

2023-06-14 Thread Jeff Squyres (jsquyres) via users
is common to use Modules in an HPC environment https://www.admin-magazine.com/HPC/Articles/Lmod-Alternative-Environment-Modules For compiling software packages and creating Modules files investigate these frameworks: https://spack.io/ https://easybuild.io/ On Mon, 12 Jun 2023 at 22:44, Jeff Sq

Re: [OMPI users] Issue with Running MPI Job on CentOS 7

2023-06-12 Thread Jeff Squyres (jsquyres) via users
Your steps are generally correct, but I cannot speak for whether your /home/wude/.bashrc file is executed for both non-interactive and interactive logins. If /home/wude is your $HOME, it probably is, but I don't know about your specific system. Also, you should be aware that MPI applications b

Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-06 Thread Jeff Squyres (jsquyres) via users
the UCX PML is selected make my above comment moot. Sorry for any confusion! From: users on behalf of Jeff Squyres (jsquyres) via users Sent: Monday, March 6, 2023 10:40 AM To: Chandran, Arun ; Open MPI Users Cc: Jeff Squyres (jsquyres) Subject: Re: [OMPI use

Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-06 Thread Jeff Squyres (jsquyres) via users
Per George's comments, I stand corrected: UCX does​ work fine in single-node cases -- he confirmed to me that he tested it on his laptop, and it worked for him. That being said, you're passing "--mca pml ucx" in the correct place now, and you're therefore telling Open MPI "_only_ use the UCX PM

Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-06 Thread Jeff Squyres (jsquyres) via users
If this run was on a single node, then UCX probably disabled itself since it wouldn't be using InfiniBand or RoCE to communicate between peers. Also, I'm not sure your command line was correct: perf_benchmark $ mpirun -np 32 --map-by core --bind-to core ./perf --mca pml ucx You probably need

Re: [OMPI users] Compile options to disable Infiniband

2022-12-12 Thread Jeff Squyres (jsquyres) via users
You can use: ./configure --enable-mca-no-build=btl-openib,pml-ucx,mtl-psm That should probably do it in the 3.x and 4.x series. You can double check after it installs: look in $prefix/lib/openmpi for any files with "ucx", "openib", or "psm" in them. If they're there, remove them (those ar

Re: [OMPI users] mpi program gets stuck

2022-12-07 Thread Jeff Squyres (jsquyres) via users
To tie up this issue for the web mail archives... There were a bunch more off-list emails exchanged on this thread. It was determined that something is going wrong down in the IB networking stack. It looks like it may be a problem in the environment itself, not Open MPI. The user is continui

Re: [OMPI users] Can't run an MPI program through mpirun command

2022-12-04 Thread Jeff Squyres (jsquyres) via users
Can you try steps 1-3 in https://docs.open-mpi.org/en/v5.0.x/validate.html#testing-your-open-mpi-installation ? -- Jeff Squyres jsquy...@cisco.com From: users on behalf of Blaze Kort via users Sent: Saturday, December 3, 2022 5:52 AM To: users@lists.open-mpi.o

Re: [OMPI users] mpi program gets stuck

2022-12-01 Thread Jeff Squyres (jsquyres) via users
Ok, this looks like the same type of output running ring_c as your Python MPI app -- good. Using a C MPI program for testing just eliminates some possible variables / issues. Ok, let's try running again, but add some more command line parameters: mpirun -n 2 --machinefile hosts --mca plm_base_

Re: [OMPI users] mpi program gets stuck

2022-11-29 Thread Jeff Squyres (jsquyres) via users
(we've conversed a bit off-list; bringing this back to the list with a good subject to differentiate it from other digest threads) I'm glad the tarball I provided (that included the PMIx fix) resolved running "uptime" for you. Can you try running a plain C MPI program instead of a Python MPI pr

Re: [OMPI users] CephFS and striping_factor

2022-11-29 Thread Jeff Squyres (jsquyres) via users
More specifically, Gilles created a skeleton "ceph" component in this draft pull request: https://github.com/open-mpi/ompi/pull/11122 If anyone has any cycles to work on it and develop it beyond the skeleton that is currently there, that would be great! -- Jeff Squyres jsquy...@cisco.com __

Re: [OMPI users] Question about "mca" parameters

2022-11-29 Thread Jeff Squyres (jsquyres) via users
Also, you probably want to add "vader" into your BTL specification. Although the name is counter-intuitive, "vader" in Open MPI v3.x and v4.x is the shared memory transport. Hence, if you run with "btl=tcp,self", you are only allowing MPI processes to talk via the TCP stack or process loopback

Re: [OMPI users] users Digest, Vol 4818, Issue 1

2022-11-25 Thread Jeff Squyres (jsquyres) via users
Ok, this is a good / consistent output. That being said, I don't grok what is happening here: it says it finds 2 slots, but then it tells you it doesn't have enough slots. Let me dig deeper and get back to you... -- Jeff Squyres jsquy...@cisco.com From: timesir

Re: [OMPI users] users Digest, Vol 4818, Issue 1

2022-11-25 Thread Jeff Squyres (jsquyres) via users
Thanks for the output. I'm seeing inconsistent output between your different outputs, however. For example, one of your outputs seems to ignore the hostfile and only show slots on the local host, but another output shows 2 hosts with 1 slot each. But I don't know what was in the hosts file fo

Re: [OMPI users] users Digest, Vol 4818, Issue 1

2022-11-25 Thread Jeff Squyres (jsquyres) via users
Yes, Gilles responded within a few hours: https://www.mail-archive.com/users@lists.open-mpi.org/msg35057.html Looking closer, we should still be seeing more output compared to what you posted. It's almost like you have a busted Open MPI installation -- perhaps it's missing the "hostfile" compo

Re: [OMPI users] users Digest, Vol 4818, Issue 1

2022-11-25 Thread Jeff Squyres (jsquyres) via users
I see 2 config.log files -- can you also send the other information requested on that page? I.e, the version you're using (I think​ you said in a prior email that it was 5.0rc9, but I'm not 100% sure), and the output from ompi_info --all. -- Jeff Squyres jsquy...@cisco.com

Re: [OMPI users] Tracing of openmpi internal functions

2022-11-14 Thread Jeff Squyres (jsquyres) via users
Open MPI uses plug-in modules for its implementations of the MPI collective algorithms. From that perspective, once you understand that infrastructure, it's exactly the same regardless of whether the MPI job is using intra-node or inter-node collectives. We don't have much in the way of detail

Re: [OMPI users] [OMPI devel] There are not enough slots available in the system to satisfy the 2, slots that were requested by the application

2022-11-14 Thread Jeff Squyres (jsquyres) via users
Yes, somehow I'm not seeing all the output that I expect to see. Can you ensure that if you're copy-and-pasting from the email, that it's actually using "dash dash" in front of "mca" and "machinefile" (vs. a copy-and-pasted "em dash")? -- Jeff Squyres jsquy...@cisco.com ___

Re: [OMPI users] [OMPI devel] There are not enough slots available in the system to satisfy the 2, slots that were requested by the application

2022-11-13 Thread Jeff Squyres (jsquyres) via users
Interesting. It says: [computer01:106117] AVAILABLE NODES FOR MAPPING: [computer01:106117] node: computer01 daemon: 0 slots_available: 1 This is why it tells you you're out of slots: you're asking for 2, but it only found 1. This means it's not seeing your hostfile somehow. I should have aske

Re: [OMPI users] --mca btl_base_verbose 30 not working in version 5.0

2022-11-07 Thread Jeff Squyres (jsquyres) via users
Sorry for the delay in replying. To tie up this thread for the web mail archives: this same question was cross-posted over in the devel list; I replied there. -- Jeff Squyres jsquy...@cisco.com From: users on behalf of mrlong via users Sent: Sunday, October 30

Re: [OMPI users] [OMPI devel] There are not enough slots available in the system to satisfy the 2, slots that were requested by the application

2022-11-07 Thread Jeff Squyres (jsquyres) via users
In the future, can you please just mail one of the lists? This particular question is probably more of a users type of question (since we're not talking about the internals of Open MPI itself), so I'll reply just on the users list. For what it's worth, I'm unable to replicate your error: $ mp

Re: [OMPI users] [EXTERNAL] Beginner Troubleshooting OpenMPI Installation - pmi.h Error

2022-10-06 Thread Jeff Squyres (jsquyres) via users
Hmm; that's a little unexpected, but it actually helps simplify the debugging process. It looks like you are using an external hwloc build from /cm/shared/apps/hwloc/1.11.11. Is there a libhwloc.la file in there somewhere? If so, can you see if "-lnuma" and "-ludev" is in this file? If that'

Re: [OMPI users] [EXTERNAL] Beginner Troubleshooting OpenMPI Installation - pmi.h Error

2022-10-05 Thread Jeff Squyres (jsquyres) via users
Actually, I think the problem might be a little more subtle. I see that you configured with both --enable-static and --enable-shared. My gut reaction is that there might be some kind of issue with enabling both of those options (by default, shared is enabled and static is disabled). If you con

Re: [OMPI users] openmpi compile failure

2022-09-28 Thread Jeff Squyres (jsquyres) via users
Looking at the detailed compile line in the "make" output that you sent, I don't see anything too unusual (e.g., in -I or other preprocessor directives). You might want to look around your machine and see if there's an alternate signal.h that is somehow getting found and included. If that doesn

Re: [OMPI users] openmpi compile failure

2022-09-27 Thread Jeff Squyres (jsquyres) via users
I'm not sure why that would happen; it does sound like some kind of misconfiguration on your system. If I compile this trivial application on Ubuntu 18.04: #include #include int main() { printf("NSIG is %d\n", NSIG); return 0; } Like this: $ gcc foo.c -o foo && ./foo NSIG is

Re: [OMPI users] openmpi compile failure

2022-09-27 Thread Jeff Squyres (jsquyres) via users
Can you re-try with the latest Open MPI v4.1.x release (v4.1.4)? There have been many bug fixes since v4.1.0. -- Jeff Squyres jsquy...@cisco.com From: users on behalf of Zilore Mumba via users Sent: Tuesday, September 27, 2022 5:10 AM To: users@lists.open-mpi

Re: [OMPI users] --mca parameter explainer; mpirun WARNING: There was an error initializing an OpenFabrics device

2022-09-26 Thread Jeff Squyres (jsquyres) via users
Just to follow up for the email web archives: this issue was followed up in https://github.com/open-mpi/ompi/issues/10841. -- Jeff Squyres jsquy...@cisco.com From: users on behalf of Rob Kudyba via users Sent: Thursday, September 22, 2022 2:15 PM To: users@list

Re: [OMPI users] Hardware topology influence

2022-09-14 Thread Jeff Squyres (jsquyres) via users
It was pointed out to me off-list that I should update my worldview on HPC in VMs. :-) So let me clarify my remarks about VMs: yes, many organizations run bare-metal HPC environments. However, it is no longer unusual to run HPC in VMs. Using modern VM technology, especially when tuned for HP

Re: [OMPI users] Hardware topology influence

2022-09-13 Thread Jeff Squyres (jsquyres) via users
Let me add a little more color on what Gilles stated. First, you should probably upgrade to the latest v4.1.x release: v4.1.4. It has a bunch of bug fixes compared to v4.1.0. Second, you should know that it is relatively uncommon to run HPC/MPI apps inside VMs because the virtualization infras

Re: [OMPI users] Disabling barrier in MPI_Finalize

2022-09-09 Thread Jeff Squyres (jsquyres) via users
No, it does not, sorry. What are you trying to do? -- Jeff Squyres jsquy...@cisco.com From: users on behalf of Mccall, Kurt E. (MSFC-EV41) via users Sent: Friday, September 9, 2022 2:30 PM To: OpenMpi User List (users@lists.open-mpi.org) Cc: Mccall, Kurt E. (M

Re: [OMPI users] MPI with RoCE

2022-09-06 Thread Jeff Squyres (jsquyres) via users
You can think of RoCE as "IB over IP" -- RoCE is essentially the IB protocol over IP packets (which is different than IPoIB, which is emulating IP and TCP over the InfiniBand protocol). You'll need to consult the docs for your Mellanox cards, but if you have Ethernet cards, you'll want to set t

Re: [OMPI users] ucx problems

2022-08-31 Thread Jeff Squyres (jsquyres) via users
Yes, that is the intended behavior: Open MPI basically only uses UCX for IB transports (and shared memory -- but only when also used with IB transports). If IB can't be used, the UCX PML disqualifies itself. This is by design, even though UCX can handle other transports (including TCP and share

Re: [OMPI users] Oldest version of SLURM in use?

2022-08-17 Thread Jeff Squyres (jsquyres) via users
tional Laboratory | www.pnnl.gov<http://www.pnnl.gov/> 509.371.6435 | t...@pnnl.gov<mailto:t...@pnnl.gov> From: users on behalf of "Jeff Squyres (jsquyres) via users" Reply-To: Open MPI Users Date: Wednesday, August 17, 2022 at 8:18 AM To: Open MPI Users Cc: "

Re: [OMPI users] Oldest version of SLURM in use?

2022-08-17 Thread Jeff Squyres (jsquyres) via users
in the version-mismatch corner. Pardon my rambling, the upshot is, some lazy/disorganized people rely on third-party packagers, and do get pretty far behind. On Tue, Aug 16, 2022 at 9:54 AM Jeff Squyres (jsquyres) via users mailto:users@lists.open-mpi.org>> wrote: I have a curiosity question f

[OMPI users] Oldest version of SLURM in use?

2022-08-16 Thread Jeff Squyres (jsquyres) via users
I have a curiosity question for the Open MPI user community: what version of SLURM are you using? I ask because we're honestly curious about what the expectations are regarding new versions of Open MPI supporting older versions of SLURM. I believe that SchedMD's policy is that they support up t

Re: [OMPI users] RUNPATH vs. RPATH

2022-08-11 Thread Jeff Squyres (jsquyres) via users
ff Squyres jsquy...@cisco.com From: Reuti Sent: Tuesday, August 9, 2022 12:03 PM To: Open MPI Users Cc: Jeff Squyres (jsquyres); zuelc...@staff.uni-marburg.de Subject: Re: [OMPI users] RUNPATH vs. RPATH Hi Jeff, > Am 09.08.2022 um 16:17 schrieb Jeff Squyres (

Re: [OMPI users] RUNPATH vs. RPATH

2022-08-10 Thread Jeff Squyres (jsquyres) via users
uy...@cisco.com From: Reuti Sent: Tuesday, August 9, 2022 12:03 PM To: Open MPI Users Cc: Jeff Squyres (jsquyres); zuelc...@staff.uni-marburg.de Subject: Re: [OMPI users] RUNPATH vs. RPATH Hi Jeff, > Am 09.08.2022 um 16:17 schrieb Jeff Squyres (jsquyres) via users > : > > Just

[OMPI users] Open MPI Java MPI bindings

2022-08-09 Thread Jeff Squyres (jsquyres) via users
During a planning meeting for Open MPI v5.0.0 today, the question came up: is anyone using the Open MPI Java bindings? These bindings are not​ official MPI Forum bindings -- they are an Open MPI-specific extension. They were added a few years ago as a result of a research project. We ask this

Re: [OMPI users] RUNPATH vs. RPATH

2022-08-09 Thread Jeff Squyres (jsquyres) via users
Just to follow up on this thread... Reuti: I merged the PR on to the main docs branch. They're now live -- we changed the text: * here: https://docs.open-mpi.org/en/main/installing-open-mpi/configure-cli-options/installation.html * * and here: https://docs.open-mpi.org/en/main/insta

Re: [OMPI users] Problem with OpenMPI as Third pary library

2022-08-09 Thread Jeff Squyres (jsquyres) via users
I can't see the image that you sent; it seems to be broken. But I think you're asking about this: https://www.open-mpi.org/faq/?category=building#installdirs -- Jeff Squyres jsquy...@cisco.com From: users on behalf of Sebastian Gutierrez via users Sent: Tuesda

Re: [OMPI users] RUNPATH vs. RPATH

2022-08-06 Thread Jeff Squyres (jsquyres) via users
Reuti -- See my disclaimers on other posts about apologies for taking so long to reply! This code was written forever ago; I had to dig through it a bit, read the comments and commit messages, and try to remember why it was done this way. What I thought would be a 5-minute search turned into a

Re: [OMPI users] Multiple IPs on network interface

2022-07-07 Thread Jeff Squyres (jsquyres) via users
Can you send the full output of "ifconfig" (or "ip addr") from one of your compute nodes? -- Jeff Squyres jsquy...@cisco.com From: users on behalf of George Johnson via users Sent: Monday, July 4, 2022 11:06 AM To: users@lists.open-mpi.org Cc: George J

Re: [OMPI users] Intercommunicator issue (any standard about communicator?)

2022-06-24 Thread Jeff Squyres (jsquyres) via users
Open MPI and MPICH are completely unrelated -- we're entirely different code bases (note that Intel MPI is derived from MPICH). Case in point is what Gilles cited: Open MPI chose to implement MPI_Comm handles as pointers, but MPICH chose to implement MPI_Comm handles as integers. Hence, you ca

Re: [OMPI users] Intercommunicator issue (any standard about communicator?)

2022-06-24 Thread Jeff Squyres (jsquyres) via users
Guillaume -- There is an MPI Standard document that you can obtain from mpi-forum.org. Open MPI v4.x adheres to MPI version 3.1 (the latest version of the MPI standard is v4.0, but that is unrelated to Open MPI's version number). Frankly, Open MPI's support of the dynamic API functionality (c

Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-24 Thread Jeff Squyres (jsquyres) via users
I think the files suggested by Gilles are more about the underlying call to get the hostname; those won't be problematic. The regex Open MPI modules are where Open MPI is running into a problem with your hostnames (i.e., your hostnames don't fit into Open MPI's expectations of the format of the

Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Jeff Squyres (jsquyres) via users
Ah; this is a slightly different error than what Gilles was guessing from your prior description. This is what you're running in to: https://github.com/open-mpi/ompi/blob/v4.0.x/orte/mca/regx/fwd/regx_fwd.c#L130-L134 Try running with: mpirun --mca regex naive ... Specifically: the "fwd" regex

Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Jeff Squyres (jsquyres) via users
What exactly is the error that is occurring? -- Jeff Squyres jsquy...@cisco.com From: users on behalf of Patrick Begou via users Sent: Thursday, June 16, 2022 3:21 AM To: Open MPI Users Cc: Patrick Begou Subject: [OMPI users] OpenMPI and names of the no

[OMPI users] Passing of an MPI luminary: Rusty Lusk

2022-05-23 Thread Jeff Squyres (jsquyres) via users
In case you had not heard, Dr. Ewing "Rusty" Lusk passed away at age 78 last week. Rusty was one of the founders and prime movers of the entire MPI ecosystem: the MPI Forum, the MPI standard, and MPICH. Without Rusty, our community would not exist. In addition to all of that, he was an all-ar

Re: [OMPI users] Network traffic packets documentation

2022-05-17 Thread Jeff Squyres (jsquyres) via users
Just to clarify: Open MPI's "out of band" messaging *used* to be called "OOB". Then PMIx split off into its own project, and Open MPI effectively offloaded our out-of-band messaging to PMIx. If you want to inspect PMIx messages, you'll need to look at the headers in its source code repo: https

Re: [OMPI users] Network traffic packets documentation

2022-05-16 Thread Jeff Squyres (jsquyres) via users
Open MPI is generally structured in layers, but adjacent layers don't necessarily have any knowledge of each other. For example, the PML (point-to-point messaging layer) is the first layer behind MPI point-to-point functions such as MPI_SEND and MPI_RECV. Different PMLs do not have the same pa

Re: [OMPI users] Network traffic packets documentation

2022-05-16 Thread Jeff Squyres (jsquyres) via users
Open MPI doesn't proscribe a specific network protocol for anything. Indeed, each network transport uses their own protocols, headers, etc. It's basically a "each Open MPI plugin needs to be able to talk to itself", and therefore no commonality is needed (or desired). Which network and Open M

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread Jeff Squyres (jsquyres) via users
t here on the mailing list. -- Jeff Squyres jsquy...@cisco.com ________ From: users on behalf of Jeff Squyres (jsquyres) via users Sent: Thursday, May 5, 2022 3:31 PM To: George Bosilca; Open MPI Users Cc: Jeff Squyres (jsquyres) Subject: Re: [OMPI users] mpirun hangs on m

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread Jeff Squyres (jsquyres) via users
Scott and I conversed a bit off list, and I got more data. I posted everything in https://github.com/open-mpi/ompi/issues/10358 -- let's follow up on this issue there. -- Jeff Squyres jsquy...@cisco.com From: George Bosilca Sent: Thursday, May 5, 2022

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread Jeff Squyres (jsquyres) via users
You can use "lldb -p PID" to attach to a running process. -- Jeff Squyres jsquy...@cisco.com From: Scott Sayres Sent: Thursday, May 5, 2022 11:22 AM To: Jeff Squyres (jsquyres) Cc: Open MPI Users Subject: Re: [OMPI users] mpirun hangs on m1 mac w openmpi-

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread Jeff Squyres (jsquyres) via users
Scott -- Sorry; something I should have clarified in my original email: I meant you to run the "ps" command **while mpirun was still hung**. I.e., do it in another terminal, before you hit ctrl-C to exit mpirun. I want to see if mpirun has launched the foo.sh or not. Gilles' test is a differ

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread Jeff Squyres (jsquyres) via users
That backtrace seems to imply that the launch may not have completed. Can you make an executable script foo.sh with: #!/bin/bash i=0 while test $i -lt 10; do date sleep 1 let i=$i+1 done Make sure that foo.sh is executable and then run it via: mpirun -np 1 foo.sh If you sta

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread Jeff Squyres (jsquyres) via users
George beat me to the reply. :-) His advice is the correct one (check out what's happening in a debugger). This will likely work better with a hand-built Open MPI (vs. Homebrew), because then you can configure/build Open MPI with -g so that the debugger will be able to see the source code. E

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread Jeff Squyres (jsquyres) via users
Are you able to use mpirun to launch a non-MPI application? E.g.: mpirun -np 2 hostname And if that works, can you run the simple example MPI apps in the "examples" directory of the MPI source tarball (the "hello world" and "ring" programs)? E.g.: cd examples make mpirun -np 4 hello_c mpirun

Re: [OMPI users] help with M1 chip macOS openMPI installation

2022-04-22 Thread Jeff Squyres (jsquyres) via users
topper, because you can run x86 code on the M1 chip, using Rosetta. However, MARE2DEM relies on MKL, the Intel Math Library, and that library will not run on a M1 chip. George. On Thu, Apr 21, 2022 at 7:02 AM Jeff Squyres (jsquyres) via users mailto:users@lists.open-mpi.org>> wrote

Re: [OMPI users] Help diagnosing MPI+OpenMP application segmentation fault only when run with --bind-to none

2022-04-22 Thread Jeff Squyres (jsquyres) via users
With THREAD_FUNNELED, it means that there can only be one thread in MPI at a time -- and it needs to be the same thread as the one that called MPI_INIT_THREAD. Is that the case in your app? Also, what is your app doing at src/pcorona_main.f90:627? It is making a call to MPI, or something else

Re: [OMPI users] help with M1 chip macOS openMPI installation

2022-04-21 Thread Jeff Squyres (jsquyres) via users
A little more color on Gilles' answer: I believe that we had some Open MPI community members work on adding M1 support to Open MPI, but Gilles is absolutely correct: the underlying compiler has to support the M1, or you won't get anywhere. -- Jeff Squyres jsquy...@cisco.com ___

Re: [OMPI users] mixed OpenMP/MPI

2022-03-15 Thread Jeff Squyres (jsquyres) via users
Thanks for the poke! Sorry we missed replying to your github issue. Josh replied to it this morning. -- Jeff Squyres jsquy...@cisco.com From: users on behalf of Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) via users Sent: Tuesday, March 15,

Re: [OMPI users] handle_wc() in openib and IBV_WC_DRIVER2/MLX5DV_WC_RAW_WQE completion code

2022-02-23 Thread Jeff Squyres (jsquyres) via users
The short answer is likely that UCX and Open MPI v4.1.x is your way forward. openib has basically been unmaintained for quite a while -- Nvidia (Mellanox) made it quite clear long ago that UCX was their path forward. openib was kept around until UCX became stable enough to become the preferred

Re: [OMPI users] Unknown breakdown (Transport retry count exceeded on mlx5_0:1/IB)

2022-02-23 Thread Jeff Squyres (jsquyres) via users
I can't comment much on UCX; you'll need to ask Nvidia for support on that. But transport retry count exceeded errors mean that the underlying IB network tried to send a message a bunch of times but never received the corresponding ACK from the receiver indicating that the receiver successfully

Re: [OMPI users] Trouble compiling OpenMPI with Infiniband support

2022-02-23 Thread Jeff Squyres (jsquyres) via users
I'd recommend against using Open MPI v3.1.0 -- it's quite old. If you have to use Open MPI v3.1.x, I'd at least suggest using v3.1.6, which has all the rolled-up bug fixes on the v3.1.x series. That being said, Open MPI v4.1.2 is the most current. Open MPI v4.1.2 does restrict which versions

Re: [OMPI users] Building Open MPI without zlib: what might go wrong/different?

2022-01-31 Thread Jeff Squyres (jsquyres) via users
It's used for compressing the startup time messages in PMIx. I.e., the traffic for when you "mpirun ...". It's mostly beneficial when launching very large MPI jobs. If you're only launching across several nodes, the performance improvement isn't really noticeable. -- Jeff Squyres jsquy...@ci

Re: [OMPI users] RES: OpenMPI - Intel MPI

2022-01-27 Thread Jeff Squyres (jsquyres) via users
This is part of the challenge of HPC: there are general solutions, but no specific silver bullet that works in all scenarios. In short: everyone's setup is different. So we can offer advice, but not necessarily a 100%-guaranteed solution that will work in your environment. In general, we advi

Re: [OMPI users] Gadget2 error 818 when using more than 1 process?

2022-01-27 Thread Jeff Squyres (jsquyres) via users
s jsquy...@cisco.com From: users on behalf of Diego Zuccato via users Sent: Wednesday, January 26, 2022 2:06 AM To: users@lists.open-mpi.org Cc: Diego Zuccato Subject: Re: [OMPI users] Gadget2 error 818 when using more than 1 process? Il 26/01/2022 02:10, Jeff Sq

Re: [OMPI users] Gadget2 error 818 when using more than 1 process?

2022-01-25 Thread Jeff Squyres (jsquyres) via users
I'm afraid I don't know anything about Gadget, so I can't comment there. How exactly does the application fail? Can you try upgrading to Open MPI v4.1.2? What networking are you using? -- Jeff Squyres jsquy...@cisco.com From: users on behalf of Diego

Re: [OMPI users] NAG Fortran 2018 bindings with Open MPI 4.1.2

2022-01-04 Thread Jeff Squyres (jsquyres) via users
ven't fixed everything yet. -- Jeff Squyres jsquy...@cisco.com From: users on behalf of Paul Kapinos via users Sent: Tuesday, January 4, 2022 4:27 AM To: Jeff Squyres (jsquyres) via users Cc: Paul Kapinos Subject: Re: [OMPI users] NAG Fortran 2018 bindings wit

Re: [OMPI users] NAG Fortran 2018 bindings with Open MPI 4.1.2

2021-12-30 Thread Jeff Squyres (jsquyres) via users
@cisco.com From: users on behalf of Jeff Squyres (jsquyres) via users Sent: Thursday, December 30, 2021 4:39 PM To: Matt Thompson Cc: Jeff Squyres (jsquyres); Open MPI Users Subject: Re: [OMPI users] NAG Fortran 2018 bindings with Open MPI 4.1.2 Sweet; thanks! The top-level Fortran test is he

Re: [OMPI users] NAG Fortran 2018 bindings with Open MPI 4.1.2

2021-12-30 Thread Jeff Squyres (jsquyres) via users
Sweet; thanks! The top-level Fortran test is here: https://github.com/open-mpi/ompi/blob/master/config/ompi_setup_mpi_fortran.m4 That file invokes a lot of subtests, all of which are named config/ompi_fortran_*.m4. People who aren't familiar with the GNU Autotools may make the mistake of tryi

Re: [OMPI users] Mac OS + openmpi-4.1.2 + intel oneapi

2021-12-30 Thread Jeff Squyres (jsquyres) via users
Fair enough. For the moment, then, we should probably just document the workaround. I'll add it to README.md for the 4.0.x/4.1.x series and the upcoming 5.0 RST-based docs. I wasn't too excited about making a patch for Libtool -- such that the workaround wouldn't be needed -- because that pro

Re: [OMPI users] NAG Fortran 2018 bindings with Open MPI 4.1.2

2021-12-30 Thread Jeff Squyres (jsquyres) via users
Snarky comments from the NAG tech support people aside, if they could be a little more specific about what non-conformant Fortran code they're referring to, we'd be happy to work with them to get it fixed. I'm one of the few people in the Open MPI dev community who has a clue about Fortran, and

Re: [OMPI users] Mac OS + openmpi-4.1.2 + intel oneapi

2021-12-30 Thread Jeff Squyres (jsquyres) via users
The conclusion we came to on that issue was that this was an issue with Intel ifort. Was anyone able to raise this with Intel ifort tech support? -- Jeff Squyres jsquy...@cisco.com From: users on behalf of Matt Thompson via users Sent: Thursday, Decem

Re: [OMPI users] stdout scrambled in file

2021-12-07 Thread Jeff Squyres (jsquyres) via users
Open MPI launches a single "helper" process on each node (in Open MPI <= v4.x, that helper process is called "orted"). This process is responsible for launching all the individual MPI processes, and it's also responsible for capturing all the stdout/stderr from those processes and sending it ba

Re: [OMPI users] stdout scrambled in file

2021-12-05 Thread Jeff Squyres (jsquyres) via users
FWIW: Open MPI 4.1.2 has been released -- you can probably stop using an RC release. I think you're probably running into an issue that is just a fact of life. Especially when there's a lot of output simultaneously from multiple MPI processes (potentially on different nodes), the stdout/stderr

  1   2   3   4   >