[OMPI users] mpirun seemingly requires --host and --oversubscibe when running more than -np 2 on some nodes

2023-05-19 Thread Morgan via users
Hi All, I am seeing some funky behavior and am hoping someone has some ideas on where to start looking. I have installed openmpi 4.1.4 via spack on this cluster, Slurm aware. I then build Orca against that via spack as well (for context). Orca calls mpi under the hood with simple `mpirun -np

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread Jeff Squyres (jsquyres) via users
- Jeff Squyres jsquy...@cisco.com From: users on behalf of Jeff Squyres (jsquyres) via users Sent: Thursday, May 5, 2022 3:31 PM To: George Bosilca; Open MPI Users Cc: Jeff Squyres (jsquyres) Subject: Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3 Scott a

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread Jeff Squyres (jsquyres) via users
3:19 PM To: Open MPI Users Cc: Jeff Squyres (jsquyres); Scott Sayres Subject: Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3 That is weird, but maybe it is not a deadlock, but a very slow progress. In the child can you print the fdmax and i in the frame do_child. George. On Thu, May 5

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread George Bosilca via users
That is weird, but maybe it is not a deadlock, but a very slow progress. In the child can you print the fdmax and i in the frame do_child. George. On Thu, May 5, 2022 at 11:50 AM Scott Sayres via users < users@lists.open-mpi.org> wrote: > Jeff, thanks. > from 1: > > (lldb) process attach --pid

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread Scott Sayres via users
Jeff, thanks. from 1: (lldb) process attach --pid 95083 Process 95083 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP frame #0: 0x0001bde25628 libsystem_kernel.dylib`close + 8 libsystem_kernel.dylib`close: -> 0x1bde25628 <+8>: b.lo 0x1bde25648

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread Jeff Squyres (jsquyres) via users
You can use "lldb -p PID" to attach to a running process. -- Jeff Squyres jsquy...@cisco.com From: Scott Sayres Sent: Thursday, May 5, 2022 11:22 AM To: Jeff Squyres (jsquyres) Cc: Open MPI Users Subject: Re: [OMPI users] mpirun hangs on m1 mac

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread Scott Sayres via users
Jeff, It does launch two mpirun processes (when hung from another terminal window) scottsayres 95083 99.0 0.0 408918416 1472 s002 R 8:20AM 0:04.48 mpirun -np 4 foo.sh scottsayres 95085 0.0 0.0 408628368 1632 s006 S+8:20AM 0:00.00 egrep mpirun|foo.sh scottsayres

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread Bennet Fauber via users
the > child process... which is weird). > > -- > Jeff Squyres > jsquy...@cisco.com > > > From: Scott Sayres > Sent: Wednesday, May 4, 2022 4:02 PM > To: Jeff Squyres (jsquyres) > Cc: Open MPI Users > Subject: Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread Jeff Squyres (jsquyres) via users
- Jeff Squyres jsquy...@cisco.com From: Scott Sayres Sent: Wednesday, May 4, 2022 4:02 PM To: Jeff Squyres (jsquyres) Cc: Open MPI Users Subject: Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3 foo.sh is executable, again hangs without output. I co

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread Gilles Gouaillardet via users
gt; >> mpirun -np 1 foo.sh >> >> If you start seeing output, good!If it completes, better! >> >> If it hangs, and/or if you don't see any output at all, do this: >> >> ps auxwww | egrep 'mpirun|foo.sh' >> >> It should show mpirun and 2 copies of foo.sh (and pr

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread Scott Sayres via users
> ps auxwww | egrep 'mpirun|foo.sh' > > It should show mpirun and 2 copies of foo.sh (and probably a grep). Does > it? > > -- > Jeff Squyres > jsquy...@cisco.com > > > From: Scott Sayres > Sent: Wednesday, May 4, 2022 2:47

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread Jeff Squyres (jsquyres) via users
From: Scott Sayres Sent: Wednesday, May 4, 2022 2:47 PM To: Open MPI Users Cc: Jeff Squyres (jsquyres) Subject: Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3 Following Jeff's advice, I have rebuilt open-mpi by hand using the -g option. This shows more information as below

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread Scott Sayres via users
Following Jeff's advice, I have rebuilt open-mpi by hand using the -g option. This shows more information as below. I am attempting George's advice of how to track the child but notice that gdb does not support arm64. attempting to update lldb. scottsayres@scotts-mbp openmpi-4.1.3 % lldb

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread Jeff Squyres (jsquyres) via users
: Wednesday, May 4, 2022 12:35 PM To: Open MPI Users Cc: George Bosilca Subject: Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3 I compiled a fresh copy of the 4.1.3 branch on my M1 laptop, and I can run both MPI and non-MPI apps without any issues. Try running `lldb mpirun -- -np 1 hostname

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread George Bosilca via users
dhcp.###:05469] [[48286,0],0] Releasing job data for >>> [INVALID] >>> >>> Can you recommend a way to find where mpirun gets stuck? >>> Thanks! >>> Scott >>> >>> On Wed, May 4, 2022 at 6:06 AM Jeff Squyres (jsquyres) < >>> js

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread Scott Sayres via users
ff Squyres (jsquyres) < >> jsquy...@cisco.com> wrote: >> >>> Are you able to use mpirun to launch a non-MPI application? E.g.: >>> >>> mpirun -np 2 hostname >>> >>> And if that works, can you run the simple example MPI apps in the >>

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread George Bosilca via users
hello world" and >> "ring" programs)? E.g.: >> >> cd examples >> make >> mpirun -np 4 hello_c >> mpirun -np 4 ring_c >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> >> >&g

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread Scott Sayres via users
gt; cd examples > make > mpirun -np 4 hello_c > mpirun -np 4 ring_c > > -- > Jeff Squyres > jsquy...@cisco.com > > > From: users on behalf of Scott Sayres > via users > Sent: Tuesday, May 3, 2022 1:07 PM > To:

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread Jeff Squyres (jsquyres) via users
ples make mpirun -np 4 hello_c mpirun -np 4 ring_c -- Jeff Squyres jsquy...@cisco.com From: users on behalf of Scott Sayres via users Sent: Tuesday, May 3, 2022 1:07 PM To: users@lists.open-mpi.org Cc: Scott Sayres Subject: [OMPI users] mpirun hangs on m1 ma

[OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-03 Thread Scott Sayres via users
Hello, I am new to openmpi, but would like to use it for ORCA calculations, and plan to run codes on the 10 processors of my macbook pro. I installed this manually and also through homebrew with similar results. I am able to compile codes with mpicc and run them as native codes, but everything

[OMPI users] mpirun hostfile not running from node00

2021-10-15 Thread Cee Lee via users
I'm having an issue with OpenMPI that just started today. A couple of days ago everything was fine. I could run mpiexec/mpirun using --hostfile flag. I didn't touch the system for those couple of days. I'm just messing around learning MPI using C. These are simple programs from "Parallel

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-11-15 Thread Jorge SILVA via users
Hello,  I used Brice's workaround and now mpirun works well in all computers ! Thank you all for your help  Jorge Le 14/11/2020 à 23:11, Brice Goglin via users a écrit : Hello The hwloc/X11 stuff is caused by OpenMPI using a hwloc that was built with the GL backend enabled (in your case,

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-11-14 Thread Brice Goglin via users
Hello The hwloc/X11 stuff is caused by OpenMPI using a hwloc that was built with the GL backend enabled (in your case, it's because package libhwloc-plugins is installed). That backend is used for querying the locality of X11 displays running on NVIDIA GPUs (using libxnvctrl). Does running

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-11-14 Thread Jorge Silva via users
Sorry, if  I  execute mpirun in a *really *bare terminal, without X Server running it works! but with an error message : Invalid MIT-MAGIC-COOKIE-1 key So the problem is related to X, but I have still no solution Jorge Le 14/11/2020 à 12:33, Jorge Silva via users a écrit : Hello, In spite

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-11-14 Thread Jorge Silva via users
Hello, In spite of the delay, I was not able to solve my problem. Thanks to Joseph and Prentice for their interesting suggestions. I uninstalled AppAmor (SElinux is not installed ) as suggested by Prentice but there were no changes, mpirun  sttill hangs. The result of gdb stack trace is

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-10-22 Thread Joseph Schuchart via users
Hi Jorge, Can you try to get a stack trace of mpirun using the following command in a separate terminal? sudo gdb -batch -ex "thread apply all bt" -p $(ps -C mpirun -o pid= | head -n 1) Maybe that will give some insight where mpirun is hanging. Cheers, Joseph On 10/21/20 9:58 PM, Jorge

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-10-21 Thread Jeff Squyres (jsquyres) via users
There's huge differences between Open MPI v2.1.1 and v4.0.3 (i.e., years of development effort); it would be very hard to categorize them all; sorry! What happens if you mpirun -np 1 touch /tmp/foo (Yes, you can run non-MPI apps through mpirun) Is /tmp/foo created? (i.e., did the job

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-10-21 Thread Jorge SILVA via users
Hello Jeff, The  program is not executed, seems waits for something to connect with (why twice ctrl-C ?) jorge@gcp26:~/MPIRUN$ mpirun -np 1 touch /tmp/foo ^C^C jorge@gcp26:~/MPIRUN$ ls -l /tmp/foo ls: impossible d'accéder à '/tmp/foo': Aucun fichier ou dossier de ce type no file  is

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-10-21 Thread Gilles Gouaillardet via users
Hi Jorge, If a firewall is running on your nodes, I suggest you disable it and try again Cheers, Gilles On Wed, Oct 21, 2020 at 5:50 AM Jorge SILVA via users wrote: > > Hello, > > I installed kubuntu20.4.1 with openmpi 4.0.3-0ubuntu in two different > computers in the standard way. Compiling

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-10-21 Thread Jorge SILVA via users
Hello Gus,  Thank you for your answer..  Unfortunately my problem is much more basic. I  didn't try to run the program in both computers , but just to run something in one computer. I just installed the new OS an openmpi in two different computers, in the standard way, with the same result.

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-10-21 Thread Gus Correa via users
Hi Jorge You may have an active firewall protecting either computer or both, and preventing mpirun to start the connection. Your /etc/hosts file may also not have the computer IP addresses. You may also want to try the --hostfile option. Likewise, the --verbose option may also help diagnose the

[OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-10-20 Thread Jorge SILVA via users
Hello, I installed kubuntu20.4.1 with openmpi 4.0.3-0ubuntu in two different computers in the standard way. Compiling with mpif90 works, but mpirun hangs with no output in both systems. Even mpirun command without parameters hangs and only twice ctrl-C typing can end the sleeping program.

Re: [OMPI users] mpirun only work for 1 processor

2020-06-04 Thread Hà Chi Nguyễn Nhật via users
Dear Patrick and all, Finally I solved the problem. I need to mount -t nfs the home directory of host to the node/home And then I can run in the cluster Thank you for your time. Best regards Ha Chi On Thu, 4 Jun 2020 at 17:09, Patrick Bégou < patrick.be...@legi.grenoble-inp.fr> wrote: > Ha Chi,

Re: [OMPI users] mpirun only work for 1 processor

2020-06-04 Thread Patrick Bégou via users
Ha Chi, first running MPI applications as root in not a good idea. You must create users in your rocks cluster without admin rights for all that is not system management. Let me know a little more about how you launch this: 1) Do you run "mpirun" from the rocks frontend or from a node ? 2) Ok

Re: [OMPI users] mpirun only work for 1 processor

2020-06-04 Thread Hà Chi Nguyễn Nhật via users
Dear Patrick, Thanks so much for your reply, Yes, we use ssh to log on the node. From the frontend, we can ssh to the nodes without password. the mpirun --version in all 3 nodes are identical, openmpi 2.1.1, and same place when testing with "whereis mpirun" So is there any problem with mpirun

Re: [OMPI users] mpirun only work for 1 processor

2020-06-04 Thread Patrick Bégou via users
Hi Ha Chi do you use a batch scheduler with Rocks Cluster or do you log on the node with ssh ? If ssh, can you check  that you can ssh from one node to the other without password ? Ping just says the network is alive, not that you can connect. Patrick Le 04/06/2020 à 09:06, Hà Chi Nguyễn Nhật

[OMPI users] mpirun only work for 1 processor

2020-06-04 Thread Hà Chi Nguyễn Nhật via users
Dear Open MPI users, Please help me to find the solution for the problem using mpirun with a ROCK cluster, 3 nodes. I use the command: mpirun -np 12 --machinefile machinefile.txt --allow-run-as-root ./wrf.exe But mpirun was unable to access other nodes (as the below photo). But actually I checked

Re: [OMPI users] mpirun error only with one node

2020-04-08 Thread Garrett, Charles via users
I hope this replies correctly. I previously had a problem with replies. Anyhow, thank you for the advice. It turns out NUMA was disabled in the BIOS. All other nodes showed 2 NUMA nodes but node125 showed 1 NUMA node. I was able to see this by diffing lscpu on node125 and another node.

Re: [OMPI users] mpirun error only with one node

2020-04-03 Thread John Hearns via users
Are you SURE node125 is identical to the others? systems can boot up and disable DIMMs for instance. I would log on there and runfreelscpu lspci dmidecode Take those outputs and run a diff against outputs from a known good node Also hwloc/lstopo might show some difference? On Thu, 2

[OMPI users] mpirun error only with one node

2020-04-02 Thread Garrett, Charles via users
I'm getting the following error with openmpi/3.1.4 and openmpi/3.1.6 compiled with intel/19.5 (openmpi/2 and openmpi/4 do not exhibit the problem). When I run 'mpirun -display-devel-allocation hostname' over 2 nodes including node125 of our cluster, I get an error stating there are not enough

Re: [OMPI users] mpirun CLI parsing

2020-03-30 Thread Ralph Castain via users
I'm afraid the short answer is "no" - there is no way to do that today. > On Mar 30, 2020, at 1:45 PM, Jean-Baptiste Skutnik via users > wrote: > > Hello, > > I am writing a wrapper around `mpirun` which requires pre-processing of the > user's program. To achieve this, I need to isolate the

[OMPI users] mpirun CLI parsing

2020-03-30 Thread Jean-Baptiste Skutnik via users
Hello, I am writing a wrapper around `mpirun` which requires pre-processing of the user's program. To achieve this, I need to isolate the program from the `mpirun` arguments on the command-line. The manual describes the program as: ``` The program executable. This is identified as the first

Re: [OMPI users] mpirun --output-filename behavior

2019-11-01 Thread Jeff Squyres (jsquyres) via users
On Nov 1, 2019, at 10:14 AM, Reuti mailto:re...@staff.uni-marburg.de>> wrote: For the most part, this whole thing needs to get documented. Especially that the colon is a disallowed character in the directory name. Any suffix :foo will just be removed AFAICS without any error output about foo

Re: [OMPI users] mpirun --output-filename behavior

2019-11-01 Thread Reuti via users
> Am 01.11.2019 um 14:46 schrieb Jeff Squyres (jsquyres) via users > : > > On Nov 1, 2019, at 9:34 AM, Jeff Squyres (jsquyres) via users > wrote: >> >>> Point to make: it would be nice to have an option to suppress the output on >>> stdout and/or stderr when output redirection to file is

Re: [OMPI users] mpirun --output-filename behavior

2019-11-01 Thread Jeff Squyres (jsquyres) via users
On Nov 1, 2019, at 9:34 AM, Jeff Squyres (jsquyres) via users wrote: > >> Point to make: it would be nice to have an option to suppress the output on >> stdout and/or stderr when output redirection to file is requested. In my >> case, having stdout still visible on the terminal is desirable

Re: [OMPI users] mpirun --output-filename behavior

2019-11-01 Thread Jeff Squyres (jsquyres) via users
On Oct 31, 2019, at 6:43 PM, Joseph Schuchart via users wrote: > > Just to throw in my $0.02: I recently found that the output to stdout/stderr > may not be desirable: in an application that writes a lot of log data to > stderr on all ranks, stdout was significantly slower than the files I >

Re: [OMPI users] mpirun --output-filename behavior

2019-11-01 Thread Gilles GOUAILLARDET via users
Joseph, I had to use the absolute path of the fork agent. I may have misunderstood your request. Now it seems you want to have each task stderr redirected to a unique file but not to (duplicated) to mpirun stderr. Is that right? If so, instead of the --output-filename option, you can do it

Re: [OMPI users] mpirun --output-filename behavior

2019-10-31 Thread Gilles Gouaillardet via users
Joseph, you can achieve this via an agent (and it works with DDT too) For example, the nostderr script below redirects each MPI task's stderr to /dev/null (so it is not forwarded to mpirun) $ cat nostderr #!/bin/sh exec 2> /dev/null exec "$@" and then you can simply $ mpirun --mca

Re: [OMPI users] mpirun --output-filename behavior

2019-10-31 Thread Joseph Schuchart via users
On 10/30/19 2:06 AM, Jeff Squyres (jsquyres) via users wrote: Oh, did the prior behavior *only* output to the file and not to stdout/stderr?  Huh. I guess a workaround for that would be:     mpirun  ... > /dev/null Just to throw in my $0.02: I recently found that the output to

Re: [OMPI users] mpirun --output-filename behavior

2019-10-31 Thread Kulshrestha, Vipul via users
Thanks Jeff. “:nojobid” worked well for me and helps me remove 1 extra level of hierarchy for log files. Regards Vipul From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com] Sent: Thursday, October 31, 2019 6:21 PM To: Kulshrestha, Vipul Cc: Open MPI User's List Subject: Re: [OMPI users

Re: [OMPI users] mpirun --output-filename behavior

2019-10-31 Thread Jeff Squyres (jsquyres) via users
On Oct 30, 2019, at 2:16 PM, Kulshrestha, Vipul mailto:vipul_kulshres...@mentor.com>> wrote: Given that this is an intended behavior, I have a couple of follow up questions: 1. What is the purpose of the directory “1” that gets created currently? (in /app.log/1/rank./stdout ) Is this

Re: [OMPI users] mpirun --output-filename behavior

2019-10-30 Thread Kulshrestha, Vipul via users
] Sent: Tuesday, October 29, 2019 9:07 PM To: Open MPI User's List Cc: Kulshrestha, Vipul Subject: Re: [OMPI users] mpirun --output-filename behavior On Oct 29, 2019, at 7:30 PM, Kulshrestha, Vipul via users mailto:users@lists.open-mpi.org>> wrote: Hi, We recently shifted from openMPI

Re: [OMPI users] mpirun --output-filename behavior

2019-10-29 Thread Jeff Squyres (jsquyres) via users
On Oct 29, 2019, at 7:30 PM, Kulshrestha, Vipul via users mailto:users@lists.open-mpi.org>> wrote: Hi, We recently shifted from openMPI 2.0.1 to 4.0.1 and are seeing an important behavior change with respect to above option. We invoke mpirun as % mpirun –output-filename /app.log –np With

[OMPI users] mpirun --output-filename behavior

2019-10-29 Thread Kulshrestha, Vipul via users
Hi, We recently shifted from openMPI 2.0.1 to 4.0.1 and are seeing an important behavior change with respect to above option. We invoke mpirun as % mpirun -output-filename /app.log -np With 2.0.1, the above produced /app.log. file for stdout of the application, where is the rank of the

Re: [OMPI users] mpirun noticed that process rank 5 with PID 0 on node localhost exited on signal 9 (Killed).

2018-09-28 Thread Ralph H Castain
Ummm…looks like you have a problem in your input deck to that application. Not sure what we can say about it… > On Sep 28, 2018, at 9:47 AM, Zeinab Salah wrote: > > Hi everyone, > I use openmpi-3.0.2 and I want to run chimere model with 8 processors, but in > the step of parallel mode, the

[OMPI users] mpirun noticed that process rank 5 with PID 0 on node localhost exited on signal 9 (Killed).

2018-09-28 Thread Zeinab Salah
Hi everyone, I use openmpi-3.0.2 and I want to run chimere model with 8 processors, but in the step of parallel mode, the run stopped with the following error message, Please could you help me? Thank you in advance Zeinab +++ CHIMERE RUNNING IN PARALLEL MODE +++ MPI SUB-DOMAINS

Re: [OMPI users] mpirun hangs

2018-08-15 Thread Jeff Squyres (jsquyres) via users
There can be lots of reasons that this happens. Can you send all the information listed here? https://www.open-mpi.org/community/help/ > On Aug 15, 2018, at 10:55 AM, Mota, Thyago wrote: > > Hello. > > I have openmpi 2.0.4 installed on a Cent OS 7. When I try to run "mpirun" it >

[OMPI users] mpirun hangs

2018-08-15 Thread Mota, Thyago
Hello. I have openmpi 2.0.4 installed on a Cent OS 7. When I try to run "mpirun" it *hangs*. Below is the output I get using the debug option: $ mpirun -d [elm:07778] procdir: /tmp/openmpi-sessions-551034197@elm_0/12011/0/0 [elm:07778] jobdir: /tmp/openmpi-sessions-551034197@elm_0/12011/0

Re: [OMPI users] mpirun issue using more than 64 hosts

2018-02-12 Thread Adam Sylvester
A... thanks Gilles. That makes sense. I was stuck thinking there was an ssh problem on rank 0; it never occurred to me mpirun was doing something clever there and that those ssh errors were from a different instance altogether. It's no problem to put my private key on all instances - I'll

Re: [OMPI users] mpirun issue using more than 64 hosts

2018-02-12 Thread Gilles Gouaillardet
Adam, by default, when more than 64 hosts are involved, mpirun uses a tree spawn in order to remote launch the orted daemons. That means you have two options here : - allow all compute nodes to ssh each other (e.g. the ssh private key of *all* the nodes should be in *all* the authorized_keys -

[OMPI users] mpirun issue using more than 64 hosts

2018-02-12 Thread Adam Sylvester
I'm running OpenMPI 2.1.0, built from source, on RHEL 7. I'm using the default ssh-based launcher, where I have my private ssh key on rank 0 and the associated public key on all ranks. I create a hosts file with a list of unique IPs, with the host that I'm running mpirun from on the first line,

Re: [OMPI users] mpirun 2.1.1 refuses to start a Torque 6.1.1.1 job if I change the scheduler to Maui 3.3.1

2017-08-10 Thread A M
All solved and now works well! The culprit was the lost line in the "maui.cfg" file: JOBNODEMATCHPOLICY EXACTNODE The default value for this variable is EXACTPROC and, in its presence, Maui completely ignores the "-l nodes=N:ppn=M" PBS instruction and allocates the first M available cores

Re: [OMPI users] mpirun 2.1.1 refuses to start a Torque 6.1.1.1 job if I change the scheduler to Maui 3.3.1 [SOLVED]

2017-08-10 Thread A M
All solved and now works well! The culprit was the lost line in the "maui.cfg" file: JOBNODEMATCHPOLICY EXACTNODE The default value for this variable is EXACTPROC and, in its presence, Maui completely ignores the "-l nodes=N:ppn=M" PBS instruction and allocates the first M available cores

Re: [OMPI users] mpirun 2.1.1 refuses to start a Torque 6.1.1.1 job if I change the scheduler to Maui 3.3.1

2017-08-09 Thread A M
Thanks! In fact there should be a problem with Maui's node allocation setting. I have checked the $PBS_NODEFILE contents (this is also may be seen with "qstat -n1"): while the default Torque scheduler correctly allocates one slot on node1 and another slot on node2, in case of Maui I always see

Re: [OMPI users] mpirun 2.1.1 refuses to start a Torque 6.1.1.1 job if I change the scheduler to Maui 3.3.1

2017-08-09 Thread r...@open-mpi.org
sounds to me like your maui scheduler didn’t provide any allocated slots on the nodes - did you check $PBS_NODEFILE? > On Aug 9, 2017, at 12:41 PM, A M wrote: > > > Hello, > > I have just ran into a strange issue with "mpirun". Here is what happened: > > I successfully

[OMPI users] mpirun 2.1.1 refuses to start a Torque 6.1.1.1 job if I change the scheduler to Maui 3.3.1

2017-08-09 Thread A M
Hello, I have just ran into a strange issue with "mpirun". Here is what happened: I successfully installed Torque 6.1.1.1 with the plain pbs_sched on a minimal set of 2 IB nodes. Then I added openmpi 2.1.1 compiled with verbs and tm, and have verified that mpirun works as it should with a small

Re: [OMPI users] mpirun with ssh tunneling

2017-01-01 Thread Adam Sylvester
Thanks Gilles - I appreciate all the detail. Ahh, that's great that Open MPI now supports specifying an ssh port simply through the hostfile. That'll make things a little simpler when I have that use case in the future. Oh of course - that makes sense that Open MPI requires TCP ports too rather

Re: [OMPI users] mpirun with ssh tunneling

2016-12-25 Thread Gilles Gouaillardet
Adam, there are several things here with an up-to-date master, you can specify an alternate ssh port via a hostfile see https://github.com/open-mpi/ompi/issues/2224 Open MPI requires more than just ssh. - remote nodes (orted) need to call back mpirun (oob/tcp) - nodes (MPI tasks) need

[OMPI users] mpirun with ssh tunneling

2016-12-25 Thread Adam Sylvester
I'm trying to use OpenMPI 1.10.4 to communicate between two Docker containers running on two different physical machines. Docker doesn't have much to do with my question (unless someone has a suggestion for a better way to do what I'm trying to :o) )... each Docker container is running an OpenSSH

Re: [OMPI users] mpirun --map-by-node

2016-11-09 Thread Mahesh Nanavalla
k..Thank you all. That has solved. On Fri, Nov 4, 2016 at 8:24 PM, r...@open-mpi.org wrote: > All true - but I reiterate. The source of the problem is that the > "--map-by node” on the cmd line must come *before* your application. > Otherwise, none of these suggestions will

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread r...@open-mpi.org
All true - but I reiterate. The source of the problem is that the "--map-by node” on the cmd line must come *before* your application. Otherwise, none of these suggestions will help. > On Nov 4, 2016, at 6:52 AM, Jeff Squyres (jsquyres) > wrote: > > In your case, using

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread Jeff Squyres (jsquyres)
In your case, using slots or --npernode or --map-by node will result in the same distribution of processes because you're only launching 1 process per node (a.k.a. "1ppn"). They have more pronounced differences when you're launching more than 1ppn. Let's take a step back: you should know that

Re: [OMPI users] OMPI users] mpirun --map-by-node

2016-11-04 Thread Gilles Gouaillardet
As long as you run 3 MPI tasks, both options will produce the same mapping. If you want to run up to 12 tasks, then --map-by node is the way to go Mahesh Nanavalla wrote: >s... > > >Thanks for responding me. > >i have solved that as below by limiting slots

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread Bennet Fauber
Mahesh, Depending what you are trying to accomplish, might using the mpirun option -pernode -o- --pernode work for you? That requests that only one process be spawned per available node. We generally use this for hybrid codes, where the single process will spawn threads to the remaining

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread Mahesh Nanavalla
s... Thanks for responding me. i have solved that as below by limiting* slots in hostfile* root@OpenWrt:~# cat myhostfile root@10.73.145.1 slots=1 root@10.74.25.1 slots=1 root@10.74.46.1 slots=1 I want the difference between the *slots* limiting in myhostfile and runnig *--map-by

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread r...@open-mpi.org
My apologies - the problem is that you list the option _after_ your executable name, and so we think it is an argument for your executable. You need to list the option _before_ your executable on the cmd line > On Nov 4, 2016, at 4:44 AM, Mahesh Nanavalla >

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread Mahesh Nanavalla
Thanks for reply, But,with space also not running on one process one each node root@OpenWrt:~# /usr/bin/mpirun --allow-run-as-root -np 3 --hostfile myhostfile /usr/bin/openmpiWiFiBulb --map-by node And If use like this it,s working fine(running one process on each node)

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread r...@open-mpi.org
you mistyped the option - it is “--map-by node”. Note the space between “by” and “node” - you had typed it with a “-“ instead of a “space” > On Nov 4, 2016, at 4:28 AM, Mahesh Nanavalla > wrote: > > Hi all, > > I am using openmpi-1.10.3,using quad core

[OMPI users] mpirun --map-by-node

2016-11-04 Thread Mahesh Nanavalla
Hi all, I am using openmpi-1.10.3,using quad core processor(node). I am running 3 processes on three nodes(provided by hostfile) each node process is limited by --map-by-node as below *root@OpenWrt:~# /usr/bin/mpirun --allow-run-as-root -np 3 --hostfile myhostfile /usr/bin/openmpiWiFiBulb

Re: [OMPI users] mpirun works with cmd line call , but not with app context file arg

2016-10-16 Thread MM
On 16 October 2016 at 14:50, Gilles Gouaillardet wrote: > Out of curiosity, why do you specify both --hostfile and -H ? > Do you observe the same behavior without --hostfile ~/.mpihosts ? When I specify only -H like so: mpirun -H localhost -np 1 prog1 : -H A.lan

Re: [OMPI users] mpirun works with cmd line call , but not with app context file arg

2016-10-16 Thread Gilles Gouaillardet
Out of curiosity, why do you specify both --hostfile and -H ? Do you observe the same behavior without --hostfile ~/.mpihosts ? Also, do you have at least 4 cores on both A.lan and B.lan ? Cheers, Gilles On Sunday, October 16, 2016, MM wrote: > Hi, > > openmpi 1.10.3 >

[OMPI users] mpirun works with cmd line call , but not with app context file arg

2016-10-16 Thread MM
Hi, openmpi 1.10.3 this call: mpirun --hostfile ~/.mpihosts -H localhost -np 1 prog1 : -H A.lan -np 4 prog2 : -H B.lan -np 4 prog2 works, yet this one: mpirun --hostfile ~/.mpihosts --app ~/.mpiapp doesn't. where ~/.mpiapp -H localhost -np 1 prog1 -H A.lan -np 4 prog2 -H B.lan -np 4 prog2

Re: [OMPI users] mpirun won't find programs from the PATH environment variable that are in directories that are relative paths

2016-08-13 Thread Schneider, David A.
thanks! Glad to help. best, David Schneider SLAC/LCLS From: users [users-boun...@lists.open-mpi.org] on behalf of Reuti [re...@staff.uni-marburg.de] Sent: Friday, August 12, 2016 12:00 PM To: Open MPI Users Subject: Re: [OMPI users] mpirun won't find

Re: [OMPI users] mpirun won't find programs from the PATH environment variable that are in directories that are relative paths

2016-08-12 Thread Reuti
ree that is a >> good defensive practice, but it is more cumbersome, the actually path looks >> >> mpirun -n 1 $PWD/arch/x86_64-rhel7-gcc48-opt/bin/psana >> >> best, >> >> David Schneider >> SLAC/LCLS >> ___

Re: [OMPI users] mpirun won't find programs from the PATH environment variable that are in directories that are relative paths

2016-08-12 Thread r...@open-mpi.org
4-rhel7-gcc48-opt/bin/psana > > best, > > David Schneider > SLAC/LCLS > > From: users [users-boun...@lists.open-mpi.org > <mailto:users-boun...@lists.open-mpi.org>] on behalf of Phil Regier > [preg...@penguincomputing.c

Re: [OMPI users] mpirun: Symbol `orte_schizo' has different size in shared object, consider re-linking

2016-07-19 Thread Ralph Castain
Afraid I have no brilliant ideas to offer - I’m not seeing that problem. It usually indicates that the orte_schizo plugin is being pulled from an incorrect location. You might just look in your install directory and ensure that the plugin is there. Also ensure that your install lib is at the

Re: [OMPI users] mpirun: Symbol `orte_schizo' has different size in shared object, consider re-linking

2016-07-19 Thread Nathaniel Graham
Ive also blown away the install directory and did a complete reinstall in case there was something old left in the directory. -Nathan On Tue, Jul 19, 2016 at 2:21 PM, Nathaniel Graham wrote: > The prefix location has to be there. Otherwise ompi attempts to install > to a

Re: [OMPI users] mpirun: Symbol `orte_schizo' has different size in shared object, consider re-linking

2016-07-19 Thread Nathaniel Graham
The prefix location has to be there. Otherwise ompi attempts to install to a read only directory. I have the install bin directory added to my path and the lib directory added to the LD_LIBRARY_PATH. When I run: which mpirun it is pointing to the expected place. -Nathan On Tue, Jul 19, 2016

Re: [OMPI users] mpirun: Symbol `orte_schizo' has different size in shared object, consider re-linking

2016-07-19 Thread Ralph Castain
Sounds to me like you have a confused build - I’d whack your prefix location and do a “make install” again > On Jul 19, 2016, at 1:04 PM, Nathaniel Graham wrote: > > Hello, > > I am trying to run the OSU tests for some results for a poster, but I am > getting the

[OMPI users] mpirun: Symbol `orte_schizo' has different size in shared object, consider re-linking

2016-07-19 Thread Nathaniel Graham
Hello, I am trying to run the OSU tests for some results for a poster, but I am getting the following error: mpirun: Symbol `orte_schizo' has different size in shared object, consider re-linking I am building off master with gcc on Red Hat Enterprise Linux Server release 6.7. My config

Re: [OMPI users] mpirun has exited due to process rank N

2016-07-07 Thread Gilles Gouaillardet
Andrea, On top of what Ralph just wrote, you might want to upgrade OpenMPI to the latest stable version (1.10.3) 1.6.5 is pretty antique and is no more maintained. the message indicates that one process died, and so many things could cause a process crash. (since the crash occurs only

Re: [OMPI users] mpirun has exited due to process rank N

2016-07-07 Thread Ralph Castain
Try running one of the OMPI example codes and verify that things run correctly if N > 25. I suspect you have an error in your code that causes it to fail if its rank is > 25. > On Jul 7, 2016, at 2:49 PM, Alberti, Andrea wrote: > > Hi, > > my name is Andrea and I am a

[OMPI users] mpirun has exited due to process rank N

2016-07-07 Thread Alberti, Andrea
Hi, my name is Andrea and I am a new openMPI user. I have a code compiled with: intel/16.0.3 openmpi/1.6.5 --> When I try to run my code with: mpirun -n N ./code.exe a) the code correctly runs and gives results if N<=25 b) the code gives the following error if N>25:

Re: [OMPI users] mpirun and Torque

2016-06-08 Thread Ralph Castain
I can confirm that mpirun will not direct-launch the applications under Torque. This is done for wireup support - if/when Torque natively supports PMIx, then we could revisit that design. Gilles: the benefit is two-fold: * Torque has direct visibility of the application procs. When we launch

Re: [OMPI users] mpirun and Torque

2016-06-07 Thread Gilles Gouaillardet
Ken, iirc, and under torque when Open MPI is configure'd with --with -tm (this is the default, so assuming your torque headers/libs can be found, you do not even have to specify --with-tm), mpirun does tm_spawn the orted daemon on all nodes except the current one. then mpirun and orted

[OMPI users] mpirun and Torque

2016-06-07 Thread Ken Nielson
I am using openmpi version 1.10.2 with Torque 6.0.1. I launch a job with the following syntax: qsub -L tasks=2:lprocs=2:maxtpn=1 -I This starts an interactive job which is using two nodes. I then use mpirun as follows from the command line of the interactive job. mpirun -np 4 sleep

Re: [OMPI users] mpirun command won't run unless the firewalld daemon is disabled

2016-06-03 Thread Llolsten Kaonga
ct: Re: [OMPI users] mpirun command won't run unless the firewalld daemon is disabled I was basically suggesting you open a few ports to anyone (e.g. any IP address), and Jeff suggests you open all ports to a few trusted IP addresses. btw, how many network ports do you have ? if you have two

Re: [OMPI users] mpirun java

2016-05-23 Thread Howard Pritchard
HI Ralph, Yep, If you could handle this that would be great. I guess we'd like a fix in 1.10.x and for 2.0.1 that would be great. Howard 2016-05-23 14:59 GMT-06:00 Ralph Castain : > Looks to me like there is a bug in the orterun parser that is trying to > add java library

Re: [OMPI users] mpirun java

2016-05-23 Thread Claudio Stamile
Hi Howard. Thank you for your reply. I'm using version 1.10.2 I executed the following command: mpirun -np 2 --mca odls_base_verbose 100 java -cp alot:of:jarfile -Djava.library.path=/Users/stamile/Applications/IBM/ILOG/CPLEX_Studio1263/cplex/bin/x86-64_osx clustering.TensorClusterinCplexMPI

Re: [OMPI users] mpirun java

2016-05-23 Thread Saliya Ekanayake
I tested with OpenMPI 1.10.1 and it works. See this example, which prints java.library.path mpijavac LibPath.java mpirun -np 2 java -Djava.library.path=path LibPath On Mon, May 23, 2016 at 1:38 PM, Howard Pritchard wrote: > Hello Claudio, > > mpirun should be combining

  1   2   3   4   5   6   7   >