[OMPI users] mpirun seemingly requires --host and --oversubscibe when running more than -np 2 on some nodes

2023-05-19 Thread Morgan via users
Hi All, I am seeing some funky behavior and am hoping someone has some ideas on where to start looking. I have installed openmpi 4.1.4 via spack on this cluster, Slurm aware. I then build Orca against that via spack as well (for context). Orca calls mpi under the hood with simple `mpirun -np X

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread Jeff Squyres (jsquyres) via users
t here on the mailing list. -- Jeff Squyres jsquy...@cisco.com From: users on behalf of Jeff Squyres (jsquyres) via users Sent: Thursday, May 5, 2022 3:31 PM To: George Bosilca; Open MPI Users Cc: Jeff Squyres (jsquyres) Subject: Re: [OMPI users] mpirun hangs on m

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread Jeff Squyres (jsquyres) via users
2022 3:19 PM To: Open MPI Users Cc: Jeff Squyres (jsquyres); Scott Sayres Subject: Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3 That is weird, but maybe it is not a deadlock, but a very slow progress. In the child can you print the fdmax and i in the frame do_child. George. On Thu,

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread George Bosilca via users
That is weird, but maybe it is not a deadlock, but a very slow progress. In the child can you print the fdmax and i in the frame do_child. George. On Thu, May 5, 2022 at 11:50 AM Scott Sayres via users < users@lists.open-mpi.org> wrote: > Jeff, thanks. > from 1: > > (lldb) process attach --pid 9

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread Scott Sayres via users
Jeff, thanks. from 1: (lldb) process attach --pid 95083 Process 95083 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP frame #0: 0x0001bde25628 libsystem_kernel.dylib`close + 8 libsystem_kernel.dylib`close: -> 0x1bde25628 <+8>: b.lo 0x1bde25648

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread Jeff Squyres (jsquyres) via users
You can use "lldb -p PID" to attach to a running process. -- Jeff Squyres jsquy...@cisco.com From: Scott Sayres Sent: Thursday, May 5, 2022 11:22 AM To: Jeff Squyres (jsquyres) Cc: Open MPI Users Subject: Re: [OMPI users] mpirun hangs on m1 mac

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread Scott Sayres via users
Jeff, It does launch two mpirun processes (when hung from another terminal window) scottsayres 95083 99.0 0.0 408918416 1472 s002 R 8:20AM 0:04.48 mpirun -np 4 foo.sh scottsayres 95085 0.0 0.0 408628368 1632 s006 S+8:20AM 0:00.00 egrep mpirun|foo.sh scottsayres

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread Bennet Fauber via users
happens immediately after forking the > child process... which is weird). > > -- > Jeff Squyres > jsquy...@cisco.com > > ____ > From: Scott Sayres > Sent: Wednesday, May 4, 2022 4:02 PM > To: Jeff Squyres (jsquyres) > Cc: Open MPI Users > Subject: Re: [OMPI users] mp

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread Jeff Squyres (jsquyres) via users
ild process... which is weird). -- Jeff Squyres jsquy...@cisco.com From: Scott Sayres Sent: Wednesday, May 4, 2022 4:02 PM To: Jeff Squyres (jsquyres) Cc: Open MPI Users Subject: Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3 foo.sh is executabl

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread Gilles Gouaillardet via users
it via: >> >> mpirun -np 1 foo.sh >> >> If you start seeing output, good!If it completes, better! >> >> If it hangs, and/or if you don't see any output at all, do this: >> >> ps auxwww | egrep 'mpirun|foo.sh' >> >> It should show mp

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread Scott Sayres via users
ut at all, do this: > > ps auxwww | egrep 'mpirun|foo.sh' > > It should show mpirun and 2 copies of foo.sh (and probably a grep). Does > it? > > -- > Jeff Squyres > jsquy...@cisco.com > > ________ > From: Scott Sayres >

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread Jeff Squyres (jsquyres) via users
uy...@cisco.com From: Scott Sayres Sent: Wednesday, May 4, 2022 2:47 PM To: Open MPI Users Cc: Jeff Squyres (jsquyres) Subject: Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3 Following Jeff's advice, I have rebuilt open-mpi by hand using the -g option. This shows more

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread Scott Sayres via users
Following Jeff's advice, I have rebuilt open-mpi by hand using the -g option. This shows more information as below. I am attempting George's advice of how to track the child but notice that gdb does not support arm64. attempting to update lldb. scottsayres@scotts-mbp openmpi-4.1.3 % lldb mpir

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread Jeff Squyres (jsquyres) via users
Sent: Wednesday, May 4, 2022 12:35 PM To: Open MPI Users Cc: George Bosilca Subject: Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3 I compiled a fresh copy of the 4.1.3 branch on my M1 laptop, and I can run both MPI and non-MPI apps without any issues. Try running `lldb mpirun -- -np 1 hos

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread George Bosilca via users
/ >>> >>> [scotts-mbp.3500.dhcp.###:05469] [[48286,0],0] Releasing job data for >>> [INVALID] >>> >>> Can you recommend a way to find where mpirun gets stuck? >>> Thanks! >>> Scott >>> >>> On Wed, May 4, 2022 at 6:06 AM

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread Scott Sayres via users
t; >> On Wed, May 4, 2022 at 6:06 AM Jeff Squyres (jsquyres) < >> jsquy...@cisco.com> wrote: >> >>> Are you able to use mpirun to launch a non-MPI application? E.g.: >>> >>> mpirun -np 2 hostname >>> >>> And if that works, can you r

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread George Bosilca via users
;hello world" and >> "ring" programs)? E.g.: >> >> cd examples >> make >> mpirun -np 4 hello_c >> mpirun -np 4 ring_c >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> >> >

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread Scott Sayres via users
gt; > cd examples > make > mpirun -np 4 hello_c > mpirun -np 4 ring_c > > -- > Jeff Squyres > jsquy...@cisco.com > > > From: users on behalf of Scott Sayres > via users > Sent: Tuesday, May 3, 2022 1:07 PM >

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread Jeff Squyres (jsquyres) via users
ples make mpirun -np 4 hello_c mpirun -np 4 ring_c -- Jeff Squyres jsquy...@cisco.com From: users on behalf of Scott Sayres via users Sent: Tuesday, May 3, 2022 1:07 PM To: users@lists.open-mpi.org Cc: Scott Sayres Subject: [OMPI users] mpirun hangs on m1 ma

[OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-03 Thread Scott Sayres via users
Hello, I am new to openmpi, but would like to use it for ORCA calculations, and plan to run codes on the 10 processors of my macbook pro. I installed this manually and also through homebrew with similar results. I am able to compile codes with mpicc and run them as native codes, but everything th

[OMPI users] mpirun hostfile not running from node00

2021-10-15 Thread Cee Lee via users
I'm having an issue with OpenMPI that just started today. A couple of days ago everything was fine. I could run mpiexec/mpirun using --hostfile flag. I didn't touch the system for those couple of days. I'm just messing around learning MPI using C. These are simple programs from "Parallel Programmin

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-11-15 Thread Jorge SILVA via users
Hello,  I used Brice's workaround and now mpirun works well in all computers ! Thank you all for your help  Jorge Le 14/11/2020 à 23:11, Brice Goglin via users a écrit : Hello The hwloc/X11 stuff is caused by OpenMPI using a hwloc that was built with the GL backend enabled (in your case, it'

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-11-14 Thread Brice Goglin via users
Hello The hwloc/X11 stuff is caused by OpenMPI using a hwloc that was built with the GL backend enabled (in your case, it's because package libhwloc-plugins is installed). That backend is used for querying the locality of X11 displays running on NVIDIA GPUs (using libxnvctrl). Does running "lstopo

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-11-14 Thread Jorge Silva via users
Sorry, if  I  execute mpirun in a *really *bare terminal, without X Server running it works! but with an error message : Invalid MIT-MAGIC-COOKIE-1 key So the problem is related to X, but I have still no solution Jorge Le 14/11/2020 à 12:33, Jorge Silva via users a écrit : Hello, In spite

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-11-14 Thread Jorge Silva via users
Hello, In spite of the delay, I was not able to solve my problem. Thanks to Joseph and Prentice for their interesting suggestions. I uninstalled AppAmor (SElinux is not installed ) as suggested by Prentice but there were no changes, mpirun  sttill hangs. The result of gdb stack trace is the

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-10-22 Thread Joseph Schuchart via users
Hi Jorge, Can you try to get a stack trace of mpirun using the following command in a separate terminal? sudo gdb -batch -ex "thread apply all bt" -p $(ps -C mpirun -o pid= | head -n 1) Maybe that will give some insight where mpirun is hanging. Cheers, Joseph On 10/21/20 9:58 PM, Jorge SI

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-10-21 Thread Jeff Squyres (jsquyres) via users
There's huge differences between Open MPI v2.1.1 and v4.0.3 (i.e., years of development effort); it would be very hard to categorize them all; sorry! What happens if you mpirun -np 1 touch /tmp/foo (Yes, you can run non-MPI apps through mpirun) Is /tmp/foo created? (i.e., did the job run,

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-10-21 Thread Jorge SILVA via users
Hello Jeff, The  program is not executed, seems waits for something to connect with (why twice ctrl-C ?) jorge@gcp26:~/MPIRUN$ mpirun -np 1 touch /tmp/foo ^C^C jorge@gcp26:~/MPIRUN$ ls -l /tmp/foo ls: impossible d'accéder à '/tmp/foo': Aucun fichier ou dossier de ce type no file  is created.

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-10-21 Thread Gilles Gouaillardet via users
Hi Jorge, If a firewall is running on your nodes, I suggest you disable it and try again Cheers, Gilles On Wed, Oct 21, 2020 at 5:50 AM Jorge SILVA via users wrote: > > Hello, > > I installed kubuntu20.4.1 with openmpi 4.0.3-0ubuntu in two different > computers in the standard way. Compiling w

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-10-21 Thread Jorge SILVA via users
Hello Gus,  Thank you for your answer..  Unfortunately my problem is much more basic. I  didn't try to run the program in both computers , but just to run something in one computer. I just installed the new OS an openmpi in two different computers, in the standard way, with the same result.

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-10-21 Thread Gus Correa via users
Hi Jorge You may have an active firewall protecting either computer or both, and preventing mpirun to start the connection. Your /etc/hosts file may also not have the computer IP addresses. You may also want to try the --hostfile option. Likewise, the --verbose option may also help diagnose the pr

[OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-10-20 Thread Jorge SILVA via users
Hello, I installed kubuntu20.4.1 with openmpi 4.0.3-0ubuntu in two different computers in the standard way. Compiling with mpif90 works, but mpirun hangs with no output in both systems. Even mpirun command without parameters hangs and only twice ctrl-C typing can end the sleeping program. Onl

Re: [OMPI users] mpirun only work for 1 processor

2020-06-04 Thread Hà Chi Nguyễn Nhật via users
Dear Patrick and all, Finally I solved the problem. I need to mount -t nfs the home directory of host to the node/home And then I can run in the cluster Thank you for your time. Best regards Ha Chi On Thu, 4 Jun 2020 at 17:09, Patrick Bégou < patrick.be...@legi.grenoble-inp.fr> wrote: > Ha Chi,

Re: [OMPI users] mpirun only work for 1 processor

2020-06-04 Thread Patrick Bégou via users
Ha Chi, first running MPI applications as root in not a good idea. You must create users in your rocks cluster without admin rights for all that is not system management. Let me know a little more about how you launch this: 1) Do you run "mpirun" from the rocks frontend or from a node ? 2) Ok fro

Re: [OMPI users] mpirun only work for 1 processor

2020-06-04 Thread Hà Chi Nguyễn Nhật via users
Dear Patrick, Thanks so much for your reply, Yes, we use ssh to log on the node. From the frontend, we can ssh to the nodes without password. the mpirun --version in all 3 nodes are identical, openmpi 2.1.1, and same place when testing with "whereis mpirun" So is there any problem with mpirun causi

Re: [OMPI users] mpirun only work for 1 processor

2020-06-04 Thread Patrick Bégou via users
Hi Ha Chi do you use a batch scheduler with Rocks Cluster or do you log on the node with ssh ? If ssh, can you check  that you can ssh from one node to the other without password ? Ping just says the network is alive, not that you can connect. Patrick Le 04/06/2020 à 09:06, Hà Chi Nguyễn Nhật vi

[OMPI users] mpirun only work for 1 processor

2020-06-04 Thread Hà Chi Nguyễn Nhật via users
Dear Open MPI users, Please help me to find the solution for the problem using mpirun with a ROCK cluster, 3 nodes. I use the command: mpirun -np 12 --machinefile machinefile.txt --allow-run-as-root ./wrf.exe But mpirun was unable to access other nodes (as the below photo). But actually I checked

Re: [OMPI users] mpirun error only with one node

2020-04-08 Thread Garrett, Charles via users
I hope this replies correctly. I previously had a problem with replies. Anyhow, thank you for the advice. It turns out NUMA was disabled in the BIOS. All other nodes showed 2 NUMA nodes but node125 showed 1 NUMA node. I was able to see this by diffing lscpu on node125 and another node. Afte

Re: [OMPI users] mpirun error only with one node

2020-04-03 Thread John Hearns via users
Are you SURE node125 is identical to the others? systems can boot up and disable DIMMs for instance. I would log on there and runfreelscpu lspci dmidecode Take those outputs and run a diff against outputs from a known good node Also hwloc/lstopo might show some difference? On Thu, 2 A

[OMPI users] mpirun error only with one node

2020-04-02 Thread Garrett, Charles via users
I'm getting the following error with openmpi/3.1.4 and openmpi/3.1.6 compiled with intel/19.5 (openmpi/2 and openmpi/4 do not exhibit the problem). When I run 'mpirun -display-devel-allocation hostname' over 2 nodes including node125 of our cluster, I get an error stating there are not enough s

Re: [OMPI users] mpirun CLI parsing

2020-03-30 Thread Ralph Castain via users
I'm afraid the short answer is "no" - there is no way to do that today. > On Mar 30, 2020, at 1:45 PM, Jean-Baptiste Skutnik via users > wrote: > > Hello, > > I am writing a wrapper around `mpirun` which requires pre-processing of the > user's program. To achieve this, I need to isolate the

[OMPI users] mpirun CLI parsing

2020-03-30 Thread Jean-Baptiste Skutnik via users
Hello, I am writing a wrapper around `mpirun` which requires pre-processing of the user's program. To achieve this, I need to isolate the program from the `mpirun` arguments on the command-line. The manual describes the program as: ``` The program executable. This is identified as the first no

Re: [OMPI users] mpirun --output-filename behavior

2019-11-01 Thread Jeff Squyres (jsquyres) via users
On Nov 1, 2019, at 10:14 AM, Reuti mailto:re...@staff.uni-marburg.de>> wrote: For the most part, this whole thing needs to get documented. Especially that the colon is a disallowed character in the directory name. Any suffix :foo will just be removed AFAICS without any error output about foo b

Re: [OMPI users] mpirun --output-filename behavior

2019-11-01 Thread Reuti via users
> Am 01.11.2019 um 14:46 schrieb Jeff Squyres (jsquyres) via users > : > > On Nov 1, 2019, at 9:34 AM, Jeff Squyres (jsquyres) via users > wrote: >> >>> Point to make: it would be nice to have an option to suppress the output on >>> stdout and/or stderr when output redirection to file is re

Re: [OMPI users] mpirun --output-filename behavior

2019-11-01 Thread Jeff Squyres (jsquyres) via users
On Nov 1, 2019, at 9:34 AM, Jeff Squyres (jsquyres) via users wrote: > >> Point to make: it would be nice to have an option to suppress the output on >> stdout and/or stderr when output redirection to file is requested. In my >> case, having stdout still visible on the terminal is desirable bu

Re: [OMPI users] mpirun --output-filename behavior

2019-11-01 Thread Jeff Squyres (jsquyres) via users
On Oct 31, 2019, at 6:43 PM, Joseph Schuchart via users wrote: > > Just to throw in my $0.02: I recently found that the output to stdout/stderr > may not be desirable: in an application that writes a lot of log data to > stderr on all ranks, stdout was significantly slower than the files I >

Re: [OMPI users] mpirun --output-filename behavior

2019-11-01 Thread Gilles GOUAILLARDET via users
Joseph, I had to use the absolute path of the fork agent. I may have misunderstood your request. Now it seems you want to have each task stderr redirected to a unique file but not to (duplicated) to mpirun stderr. Is that right? If so, instead of the --output-filename option, you can do it "manu

Re: [OMPI users] mpirun --output-filename behavior

2019-11-01 Thread Joseph Schuchart via users
Gilles, Thanks for your suggestions! I just tried both of them, see below: On 11/1/19 1:15 AM, Gilles Gouaillardet via users wrote: Joseph, you can achieve this via an agent (and it works with DDT too) For example, the nostderr script below redirects each MPI task's stderr to /dev/null (so

Re: [OMPI users] mpirun --output-filename behavior

2019-10-31 Thread Gilles Gouaillardet via users
Joseph, you can achieve this via an agent (and it works with DDT too) For example, the nostderr script below redirects each MPI task's stderr to /dev/null (so it is not forwarded to mpirun) $ cat nostderr #!/bin/sh exec 2> /dev/null exec "$@" and then you can simply $ mpirun --mca or

Re: [OMPI users] mpirun --output-filename behavior

2019-10-31 Thread Joseph Schuchart via users
On 10/30/19 2:06 AM, Jeff Squyres (jsquyres) via users wrote: Oh, did the prior behavior *only* output to the file and not to stdout/stderr?  Huh. I guess a workaround for that would be:     mpirun  ... > /dev/null Just to throw in my $0.02: I recently found that the output to stdout/std

Re: [OMPI users] mpirun --output-filename behavior

2019-10-31 Thread Kulshrestha, Vipul via users
Thanks Jeff. “:nojobid” worked well for me and helps me remove 1 extra level of hierarchy for log files. Regards Vipul From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com] Sent: Thursday, October 31, 2019 6:21 PM To: Kulshrestha, Vipul Cc: Open MPI User's List Subject: Re: [OMPI

Re: [OMPI users] mpirun --output-filename behavior

2019-10-31 Thread Jeff Squyres (jsquyres) via users
On Oct 30, 2019, at 2:16 PM, Kulshrestha, Vipul mailto:vipul_kulshres...@mentor.com>> wrote: Given that this is an intended behavior, I have a couple of follow up questions: 1. What is the purpose of the directory “1” that gets created currently? (in /app.log/1/rank./stdout ) Is this hard

Re: [OMPI users] mpirun --output-filename behavior

2019-10-30 Thread Kulshrestha, Vipul via users
] Sent: Tuesday, October 29, 2019 9:07 PM To: Open MPI User's List Cc: Kulshrestha, Vipul Subject: Re: [OMPI users] mpirun --output-filename behavior On Oct 29, 2019, at 7:30 PM, Kulshrestha, Vipul via users mailto:users@lists.open-mpi.org>> wrote: Hi, We recently shifted from openM

Re: [OMPI users] mpirun --output-filename behavior

2019-10-29 Thread Jeff Squyres (jsquyres) via users
On Oct 29, 2019, at 7:30 PM, Kulshrestha, Vipul via users mailto:users@lists.open-mpi.org>> wrote: Hi, We recently shifted from openMPI 2.0.1 to 4.0.1 and are seeing an important behavior change with respect to above option. We invoke mpirun as % mpirun –output-filename /app.log –np With 2

[OMPI users] mpirun --output-filename behavior

2019-10-29 Thread Kulshrestha, Vipul via users
Hi, We recently shifted from openMPI 2.0.1 to 4.0.1 and are seeing an important behavior change with respect to above option. We invoke mpirun as % mpirun -output-filename /app.log -np With 2.0.1, the above produced /app.log. file for stdout of the application, where is the rank of the pro

Re: [OMPI users] mpirun noticed that process rank 5 with PID 0 on node localhost exited on signal 9 (Killed).

2018-09-28 Thread Ralph H Castain
Ummm…looks like you have a problem in your input deck to that application. Not sure what we can say about it… > On Sep 28, 2018, at 9:47 AM, Zeinab Salah wrote: > > Hi everyone, > I use openmpi-3.0.2 and I want to run chimere model with 8 processors, but in > the step of parallel mode, the ru

[OMPI users] mpirun noticed that process rank 5 with PID 0 on node localhost exited on signal 9 (Killed).

2018-09-28 Thread Zeinab Salah
Hi everyone, I use openmpi-3.0.2 and I want to run chimere model with 8 processors, but in the step of parallel mode, the run stopped with the following error message, Please could you help me? Thank you in advance Zeinab +++ CHIMERE RUNNING IN PARALLEL MODE +++ MPI SUB-DOMAINS

Re: [OMPI users] mpirun hangs

2018-08-15 Thread Jeff Squyres (jsquyres) via users
There can be lots of reasons that this happens. Can you send all the information listed here? https://www.open-mpi.org/community/help/ > On Aug 15, 2018, at 10:55 AM, Mota, Thyago wrote: > > Hello. > > I have openmpi 2.0.4 installed on a Cent OS 7. When I try to run "mpirun" it > hang

[OMPI users] mpirun hangs

2018-08-15 Thread Mota, Thyago
Hello. I have openmpi 2.0.4 installed on a Cent OS 7. When I try to run "mpirun" it *hangs*. Below is the output I get using the debug option: $ mpirun -d [elm:07778] procdir: /tmp/openmpi-sessions-551034197@elm_0/12011/0/0 [elm:07778] jobdir: /tmp/openmpi-sessions-551034197@elm_0/12011/0 [elm

Re: [OMPI users] mpirun issue using more than 64 hosts

2018-02-12 Thread Adam Sylvester
A... thanks Gilles. That makes sense. I was stuck thinking there was an ssh problem on rank 0; it never occurred to me mpirun was doing something clever there and that those ssh errors were from a different instance altogether. It's no problem to put my private key on all instances - I'll go

Re: [OMPI users] mpirun issue using more than 64 hosts

2018-02-12 Thread Gilles Gouaillardet
Adam, by default, when more than 64 hosts are involved, mpirun uses a tree spawn in order to remote launch the orted daemons. That means you have two options here : - allow all compute nodes to ssh each other (e.g. the ssh private key of *all* the nodes should be in *all* the authorized_keys -

[OMPI users] mpirun issue using more than 64 hosts

2018-02-12 Thread Adam Sylvester
I'm running OpenMPI 2.1.0, built from source, on RHEL 7. I'm using the default ssh-based launcher, where I have my private ssh key on rank 0 and the associated public key on all ranks. I create a hosts file with a list of unique IPs, with the host that I'm running mpirun from on the first line, a

Re: [OMPI users] mpirun 2.1.1 refuses to start a Torque 6.1.1.1 job if I change the scheduler to Maui 3.3.1

2017-08-10 Thread A M
All solved and now works well! The culprit was the lost line in the "maui.cfg" file: JOBNODEMATCHPOLICY EXACTNODE The default value for this variable is EXACTPROC and, in its presence, Maui completely ignores the "-l nodes=N:ppn=M" PBS instruction and allocates the first M available cores inside

Re: [OMPI users] mpirun 2.1.1 refuses to start a Torque 6.1.1.1 job if I change the scheduler to Maui 3.3.1 [SOLVED]

2017-08-10 Thread A M
All solved and now works well! The culprit was the lost line in the "maui.cfg" file: JOBNODEMATCHPOLICY EXACTNODE The default value for this variable is EXACTPROC and, in its presence, Maui completely ignores the "-l nodes=N:ppn=M" PBS instruction and allocates the first M available cores inside

Re: [OMPI users] mpirun 2.1.1 refuses to start a Torque 6.1.1.1 job if I change the scheduler to Maui 3.3.1

2017-08-09 Thread A M
Thanks! In fact there should be a problem with Maui's node allocation setting. I have checked the $PBS_NODEFILE contents (this is also may be seen with "qstat -n1"): while the default Torque scheduler correctly allocates one slot on node1 and another slot on node2, in case of Maui I always see tha

Re: [OMPI users] mpirun 2.1.1 refuses to start a Torque 6.1.1.1 job if I change the scheduler to Maui 3.3.1

2017-08-09 Thread r...@open-mpi.org
sounds to me like your maui scheduler didn’t provide any allocated slots on the nodes - did you check $PBS_NODEFILE? > On Aug 9, 2017, at 12:41 PM, A M wrote: > > > Hello, > > I have just ran into a strange issue with "mpirun". Here is what happened: > > I successfully installed Torque 6.1.1

[OMPI users] mpirun 2.1.1 refuses to start a Torque 6.1.1.1 job if I change the scheduler to Maui 3.3.1

2017-08-09 Thread A M
Hello, I have just ran into a strange issue with "mpirun". Here is what happened: I successfully installed Torque 6.1.1.1 with the plain pbs_sched on a minimal set of 2 IB nodes. Then I added openmpi 2.1.1 compiled with verbs and tm, and have verified that mpirun works as it should with a small "

Re: [OMPI users] mpirun with ssh tunneling

2017-01-01 Thread Adam Sylvester
Thanks Gilles - I appreciate all the detail. Ahh, that's great that Open MPI now supports specifying an ssh port simply through the hostfile. That'll make things a little simpler when I have that use case in the future. Oh of course - that makes sense that Open MPI requires TCP ports too rather

Re: [OMPI users] mpirun with ssh tunneling

2016-12-25 Thread Gilles Gouaillardet
Adam, there are several things here with an up-to-date master, you can specify an alternate ssh port via a hostfile see https://github.com/open-mpi/ompi/issues/2224 Open MPI requires more than just ssh. - remote nodes (orted) need to call back mpirun (oob/tcp) - nodes (MPI tasks) need t

[OMPI users] mpirun with ssh tunneling

2016-12-25 Thread Adam Sylvester
I'm trying to use OpenMPI 1.10.4 to communicate between two Docker containers running on two different physical machines. Docker doesn't have much to do with my question (unless someone has a suggestion for a better way to do what I'm trying to :o) )... each Docker container is running an OpenSSH

Re: [OMPI users] mpirun --map-by-node

2016-11-09 Thread Mahesh Nanavalla
k..Thank you all. That has solved. On Fri, Nov 4, 2016 at 8:24 PM, r...@open-mpi.org wrote: > All true - but I reiterate. The source of the problem is that the > "--map-by node” on the cmd line must come *before* your application. > Otherwise, none of these suggestions will help. > > > On Nov 4

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread r...@open-mpi.org
All true - but I reiterate. The source of the problem is that the "--map-by node” on the cmd line must come *before* your application. Otherwise, none of these suggestions will help. > On Nov 4, 2016, at 6:52 AM, Jeff Squyres (jsquyres) > wrote: > > In your case, using slots or --npernode or

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread Jeff Squyres (jsquyres)
In your case, using slots or --npernode or --map-by node will result in the same distribution of processes because you're only launching 1 process per node (a.k.a. "1ppn"). They have more pronounced differences when you're launching more than 1ppn. Let's take a step back: you should know that O

Re: [OMPI users] OMPI users] mpirun --map-by-node

2016-11-04 Thread Gilles Gouaillardet
As long as you run 3 MPI tasks, both options will produce the same mapping. If you want to run up to 12 tasks, then --map-by node is the way to go Mahesh Nanavalla wrote: >s... > > >Thanks for responding me. > >i have solved that as below by limiting slots in hostfile > > >root@OpenWrt:~#

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread Bennet Fauber
Mahesh, Depending what you are trying to accomplish, might using the mpirun option -pernode -o- --pernode work for you? That requests that only one process be spawned per available node. We generally use this for hybrid codes, where the single process will spawn threads to the remaining proc

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread Mahesh Nanavalla
s... Thanks for responding me. i have solved that as below by limiting* slots in hostfile* root@OpenWrt:~# cat myhostfile root@10.73.145.1 slots=1 root@10.74.25.1 slots=1 root@10.74.46.1 slots=1 I want the difference between the *slots* limiting in myhostfile and runnig *--map-by node

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread r...@open-mpi.org
My apologies - the problem is that you list the option _after_ your executable name, and so we think it is an argument for your executable. You need to list the option _before_ your executable on the cmd line > On Nov 4, 2016, at 4:44 AM, Mahesh Nanavalla > wrote: > > Thanks for reply, > >

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread Mahesh Nanavalla
Thanks for reply, But,with space also not running on one process one each node root@OpenWrt:~# /usr/bin/mpirun --allow-run-as-root -np 3 --hostfile myhostfile /usr/bin/openmpiWiFiBulb --map-by node And If use like this it,s working fine(running one process on each node) */root@OpenWrt:~#/usr/bi

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread r...@open-mpi.org
you mistyped the option - it is “--map-by node”. Note the space between “by” and “node” - you had typed it with a “-“ instead of a “space” > On Nov 4, 2016, at 4:28 AM, Mahesh Nanavalla > wrote: > > Hi all, > > I am using openmpi-1.10.3,using quad core processor(node). > > I am running 3 pr

[OMPI users] mpirun --map-by-node

2016-11-04 Thread Mahesh Nanavalla
Hi all, I am using openmpi-1.10.3,using quad core processor(node). I am running 3 processes on three nodes(provided by hostfile) each node process is limited by --map-by-node as below *root@OpenWrt:~# /usr/bin/mpirun --allow-run-as-root -np 3 --hostfile myhostfile /usr/bin/openmpiWiFiBulb --map

Re: [OMPI users] mpirun works with cmd line call , but not with app context file arg

2016-10-16 Thread MM
On 16 October 2016 at 14:50, Gilles Gouaillardet wrote: > Out of curiosity, why do you specify both --hostfile and -H ? > Do you observe the same behavior without --hostfile ~/.mpihosts ? When I specify only -H like so: mpirun -H localhost -np 1 prog1 : -H A.lan -np 4 prog2 : -H B.lan -np 4 pro

Re: [OMPI users] mpirun works with cmd line call , but not with app context file arg

2016-10-16 Thread Gilles Gouaillardet
Out of curiosity, why do you specify both --hostfile and -H ? Do you observe the same behavior without --hostfile ~/.mpihosts ? Also, do you have at least 4 cores on both A.lan and B.lan ? Cheers, Gilles On Sunday, October 16, 2016, MM wrote: > Hi, > > openmpi 1.10.3 > > this call: > > mpirun

[OMPI users] mpirun works with cmd line call , but not with app context file arg

2016-10-16 Thread MM
Hi, openmpi 1.10.3 this call: mpirun --hostfile ~/.mpihosts -H localhost -np 1 prog1 : -H A.lan -np 4 prog2 : -H B.lan -np 4 prog2 works, yet this one: mpirun --hostfile ~/.mpihosts --app ~/.mpiapp doesn't. where ~/.mpiapp -H localhost -np 1 prog1 -H A.lan -np 4 prog2 -H B.lan -np 4 prog2

Re: [OMPI users] mpirun won't find programs from the PATH environment variable that are in directories that are relative paths

2016-08-13 Thread Schneider, David A.
thanks! Glad to help. best, David Schneider SLAC/LCLS From: users [users-boun...@lists.open-mpi.org] on behalf of Reuti [re...@staff.uni-marburg.de] Sent: Friday, August 12, 2016 12:00 PM To: Open MPI Users Subject: Re: [OMPI users] mpirun won't

Re: [OMPI users] mpirun won't find programs from the PATH environment variable that are in directories that are relative paths

2016-08-12 Thread Reuti
defensive practice, but it is more cumbersome, the actually path looks >> >> mpirun -n 1 $PWD/arch/x86_64-rhel7-gcc48-opt/bin/psana >> >> best, >> >> David Schneider >> SLAC/LCLS >> >> From: users

Re: [OMPI users] mpirun won't find programs from the PATH environment variable that are in directories that are relative paths

2016-08-12 Thread r...@open-mpi.org
na > > best, > > David Schneider > SLAC/LCLS > > From: users [users-boun...@lists.open-mpi.org > <mailto:users-boun...@lists.open-mpi.org>] on behalf of Phil Regier > [preg...@penguincomputing.com <mailto:preg...@penguincomput

Re: [OMPI users] mpirun won't find programs from the PATH environment variable that are in directories that are relative paths

2016-07-29 Thread Phil Regier
n -n 1 $PWD/arch/x86_64-rhel7-gcc48-opt/bin/psana > > best, > > David Schneider > SLAC/LCLS > > From: users [users-boun...@lists.open-mpi.org] on behalf of Phil Regier [ > preg...@penguincomputing.com] > Sent: Friday, July 29, 2016 5:12 PM > To: Open MPI Users > Subject

Re: [OMPI users] mpirun won't find programs from the PATH environment variable that are in directories that are relative paths

2016-07-29 Thread Ralph Castain
> David Schneider > SLAC/LCLS > > From: users [users-boun...@lists.open-mpi.org] on behalf of Ralph Castain > [r...@open-mpi.org] > Sent: Friday, July 29, 2016 5:19 PM > To: Open MPI Users > Subject: Re: [OMPI users] mpirun won't find programs from the PATH > environ

Re: [OMPI users] mpirun won't find programs from the PATH environment variable that are in directories that are relative paths

2016-07-29 Thread Schneider, David A.
Open MPI Users Subject: Re: [OMPI users] mpirun won't find programs from the PATH environment variable that are in directories that are relative paths Typical practice would be to put a ./myprogram in there to avoid any possible confusion with a “myprogram” sitting in your $PATH. We should

Re: [OMPI users] mpirun won't find programs from the PATH environment variable that are in directories that are relative paths

2016-07-29 Thread Schneider, David A.
/LCLS From: users [users-boun...@lists.open-mpi.org] on behalf of Phil Regier [preg...@penguincomputing.com] Sent: Friday, July 29, 2016 5:12 PM To: Open MPI Users Subject: Re: [OMPI users] mpirun won't find programs from the PATH environment variable t

Re: [OMPI users] mpirun won't find programs from the PATH environment variable that are in directories that are relative paths

2016-07-29 Thread Ralph Castain
Typical practice would be to put a ./myprogram in there to avoid any possible confusion with a “myprogram” sitting in your $PATH. We should search the PATH to find your executable, but the issue might be that it isn’t your PATH on a remote node. So the question is: are you launching strictly lo

Re: [OMPI users] mpirun won't find programs from the PATH environment variable that are in directories that are relative paths

2016-07-29 Thread Phil Regier
I might be three steps behind you here, but does "mpirun pwd" show that all your launched processes are running in the same directory as the mpirun command? I assume that "mpirun env" would show that your PATH variable is being passed along correctly, since you don't have any problems with absol

[OMPI users] mpirun won't find programs from the PATH environment variable that are in directories that are relative paths

2016-07-29 Thread Schneider, David A.
I am finding, on linux, rhel7, with openmpi 1.8.8 and 1.10.3, that mpirun won't find apps that are specified on a relative path, i.e, if I have PATH=dir/bin and I am in a directory which has dir/bin as a subdirectory, and an executable bir/bin/myprogram, I can't do mpirun myprogram I get the

Re: [OMPI users] mpirun: Symbol `orte_schizo' has different size in shared object, consider re-linking

2016-07-19 Thread Ralph Castain
Afraid I have no brilliant ideas to offer - I’m not seeing that problem. It usually indicates that the orte_schizo plugin is being pulled from an incorrect location. You might just look in your install directory and ensure that the plugin is there. Also ensure that your install lib is at the fro

Re: [OMPI users] mpirun: Symbol `orte_schizo' has different size in shared object, consider re-linking

2016-07-19 Thread Nathaniel Graham
Ive also blown away the install directory and did a complete reinstall in case there was something old left in the directory. -Nathan On Tue, Jul 19, 2016 at 2:21 PM, Nathaniel Graham wrote: > The prefix location has to be there. Otherwise ompi attempts to install > to a read only directory. >

Re: [OMPI users] mpirun: Symbol `orte_schizo' has different size in shared object, consider re-linking

2016-07-19 Thread Nathaniel Graham
The prefix location has to be there. Otherwise ompi attempts to install to a read only directory. I have the install bin directory added to my path and the lib directory added to the LD_LIBRARY_PATH. When I run: which mpirun it is pointing to the expected place. -Nathan On Tue, Jul 19, 2016 at

Re: [OMPI users] mpirun: Symbol `orte_schizo' has different size in shared object, consider re-linking

2016-07-19 Thread Ralph Castain
Sounds to me like you have a confused build - I’d whack your prefix location and do a “make install” again > On Jul 19, 2016, at 1:04 PM, Nathaniel Graham wrote: > > Hello, > > I am trying to run the OSU tests for some results for a poster, but I am > getting the following error: > > mpi

[OMPI users] mpirun: Symbol `orte_schizo' has different size in shared object, consider re-linking

2016-07-19 Thread Nathaniel Graham
Hello, I am trying to run the OSU tests for some results for a poster, but I am getting the following error: mpirun: Symbol `orte_schizo' has different size in shared object, consider re-linking I am building off master with gcc on Red Hat Enterprise Linux Server release 6.7. My config comm

Re: [OMPI users] mpirun has exited due to process rank N

2016-07-07 Thread Gilles Gouaillardet
Andrea, On top of what Ralph just wrote, you might want to upgrade OpenMPI to the latest stable version (1.10.3) 1.6.5 is pretty antique and is no more maintained. the message indicates that one process died, and so many things could cause a process crash. (since the crash occurs only wi

Re: [OMPI users] mpirun has exited due to process rank N

2016-07-07 Thread Ralph Castain
Try running one of the OMPI example codes and verify that things run correctly if N > 25. I suspect you have an error in your code that causes it to fail if its rank is > 25. > On Jul 7, 2016, at 2:49 PM, Alberti, Andrea wrote: > > Hi, > > my name is Andrea and I am a new openMPI user. > >

  1   2   3   4   5   6   7   8   >