Re: [OMPI users] Error using hostfile
Thank you Ralph. I was able to ssh back and forth between nodes. It also seemed that the environment variables were all set fine. Turning the firewall down seems to make this work just fine. From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Friday, July 08, 2011 1:58 PM To: Open MPI Users Subject: Re: [OMPI users] Error using hostfile Is there a firewall in the way? The error indicates that daemons were launched on the remote machines, but failed to communicate back. Also, check that your remote PATH and LD_LIBRARY_PATH are being set to the right place to pickup this version of OMPI. Lots of systems deploy with default versions that may not be compatible, so if you wind up running a daemon on the remote node that comes from another installation, things won't work. On Jul 8, 2011, at 10:52 AM, Mohan, Ashwin wrote: Hi, I am following up on a previous error posted. Based on the previous recommendation, I did set up a password less SSH login. I created a hostfile comprising of 4 nodes (w/ each node having 4 slots). I tried to run my job on 4 slots but get no output. Hence, I end up killing the job. I am trying to run a simple MPI program on 4 nodes and trying to figure out what could be the issue. What could I check to ensure that I can run jobs on 4 nodes (each node has 4 slots) Here is the simple MPI program I am trying to execute on 4 nodes ** if (my_rank != 0) { sprintf(message, "Greetings from the process %d!", my_rank); dest = 0; MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); } else { for (source = 1;source < p; source++) { MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); printf("%s\n", message); } My hostfile looks like this: [amohan@myocyte48 ~]$ cat hostfile myocyte46 myocyte47 myocyte48 myocyte49 *** I use the following run command: : mpirun --hostfile hostfile -np 4 new46 And receive a blank screen. Hence, I have to kill the job. OUTPUT ON KILLING JOB: mpirun: killing job... -- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -- -- mpirun was unable to cleanly terminate the daemons on the nodes shown below. Additional manual cleanup may be required - please refer to the "orte-clean" tool for assistance. -- myocyte46 - daemon did not report back when launched myocyte47 - daemon did not report back when launched myocyte49 - daemon did not report back when launched Thanks, Ashwin. From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Wednesday, July 06, 2011 6:46 PM To: Open MPI Users Subject: Re: [OMPI users] Error using hostfile Please see http://www.open-mpi.org/faq/?category=rsh#ssh-keys On Jul 6, 2011, at 5:09 PM, Mohan, Ashwin wrote: Hi, I use the following command (mpirun --prefix /usr/local/openmpi1.4.3 -np 4 hello) to successfully execute a simple hello world command on a single node. Each node has 4 slots. Following the successful execution on one node, I wish to employ 4 nodes and for this purpose wrote a hostfile. I submitted my job using the following command: mpirun --prefix /usr/local/openmpi1.4.3 -np 4 --hostfile hostfile hello Copied below is the output. How do I go about fixing this issue. ** amohan@myocyte48's password: amohan@myocyte47's password: Permission denied, please try again. amohan@myocyte48's password: Permission denied, please try again. amohan@myocyte47's password: Permission denied, please try again. amohan@myocyte47's password: Permission denied, please try again. amohan@myocyte48's password: Permission denied (publickey,gssapi-with-mic,password). -- A daemon (pid 22085) died unexpectedly with status 255 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -- ---
Re: [OMPI users] Error using hostfile
Hi If your LD_LIBRARY_PATH is not set for a non-interactive startup, then successful runs on the remote machines may not be sufficient evidence. Check this FAQ http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path To see if your variables are set correctly for non-interactive sessions on your nodes, you can execute mpirun --hostfile hostfile -np 4 printenv and scan the output for PATH and LD_LIBRARY_PATH. Hope this helps Jody On Sat, Jul 9, 2011 at 12:25 AM, Mohan, Ashwin wrote: > Thanks Ralph. > > > > I have emailed the network admin on the firewall issue. > > > > About the PATH and LIBRARY PATH issue, is it sufficient evidence that the > path are set alright if I am able to compile and run successfully on > individual nodes mentioned in the machine file. > > > > Thanks, > Ashwin. > > > > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Ralph Castain > Sent: Friday, July 08, 2011 1:58 PM > To: Open MPI Users > Subject: Re: [OMPI users] Error using hostfile > > > > Is there a firewall in the way? The error indicates that daemons were > launched on the remote machines, but failed to communicate back. > > > > Also, check that your remote PATH and LD_LIBRARY_PATH are being set to the > right place to pickup this version of OMPI. Lots of systems deploy with > default versions that may not be compatible, so if you wind up running a > daemon on the remote node that comes from another installation, things won't > work. > > > > > > On Jul 8, 2011, at 10:52 AM, Mohan, Ashwin wrote: > > Hi, > > I am following up on a previous error posted. Based on the previous > recommendation, I did set up a password less SSH login. > > > > I created a hostfile comprising of 4 nodes (w/ each node having 4 slots). I > tried to run my job on 4 slots but get no output. Hence, I end up killing > the job. I am trying to run a simple MPI program on 4 nodes and trying to > figure out what could be the issue. What could I check to ensure that I can > run jobs on 4 nodes (each node has 4 slots) > > > > Here is the simple MPI program I am trying to execute on 4 nodes > > ** > > if (my_rank != 0) > > { > > sprintf(message, "Greetings from the process %d!", my_rank); > > dest = 0; > > MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, > MPI_COMM_WORLD); > > } > > else > > { > > for (source = 1;source < p; source++) > > { > > MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, > &status); > > printf("%s\n", message); > > } > > > > > > My hostfile looks like this: > > > > [amohan@myocyte48 ~]$ cat hostfile > > myocyte46 > > myocyte47 > > myocyte48 > > myocyte49 > > *** > > > > I use the following run command: : mpirun --hostfile hostfile -np 4 new46 > > And receive a blank screen. Hence, I have to kill the job. > > > > OUTPUT ON KILLING JOB: > > mpirun: killing job... > > -- > > mpirun noticed that the job aborted, but has no info as to the process > > that caused that situation. > > -- > > -- > > mpirun was unable to cleanly terminate the daemons on the nodes shown > > below. Additional manual cleanup may be required - please refer to > > the "orte-clean" tool for assistance. > > ---------- > > myocyte46 - daemon did not report back when launched > > myocyte47 - daemon did not report back when launched > > myocyte49 - daemon did not report back when launched > > > > Thanks, > > Ashwin. > > > > > > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Ralph Castain > Sent: Wednesday, July 06, 2011 6:46 PM > To: Open MPI Users > Subject: Re: [OMPI users] Error using hostfile > > > > Please see http://www.open-mpi.org/faq/?category=rsh#ssh-keys > > > > > > On Jul 6, 2011, at 5:09 PM, Mohan, Ashwin wrote: > > > Hi, > > > > I use the following command (mpirun --prefix /usr/local/openmpi1.4.3 -np 4 > hello) to successfully execute a simple hello world command on a single > node. Each node has 4 slots. Following the successful execution on one >
Re: [OMPI users] Error using hostfile
Thanks Ralph. I have emailed the network admin on the firewall issue. About the PATH and LIBRARY PATH issue, is it sufficient evidence that the path are set alright if I am able to compile and run successfully on individual nodes mentioned in the machine file. Thanks, Ashwin. From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Friday, July 08, 2011 1:58 PM To: Open MPI Users Subject: Re: [OMPI users] Error using hostfile Is there a firewall in the way? The error indicates that daemons were launched on the remote machines, but failed to communicate back. Also, check that your remote PATH and LD_LIBRARY_PATH are being set to the right place to pickup this version of OMPI. Lots of systems deploy with default versions that may not be compatible, so if you wind up running a daemon on the remote node that comes from another installation, things won't work. On Jul 8, 2011, at 10:52 AM, Mohan, Ashwin wrote: Hi, I am following up on a previous error posted. Based on the previous recommendation, I did set up a password less SSH login. I created a hostfile comprising of 4 nodes (w/ each node having 4 slots). I tried to run my job on 4 slots but get no output. Hence, I end up killing the job. I am trying to run a simple MPI program on 4 nodes and trying to figure out what could be the issue. What could I check to ensure that I can run jobs on 4 nodes (each node has 4 slots) Here is the simple MPI program I am trying to execute on 4 nodes ** if (my_rank != 0) { sprintf(message, "Greetings from the process %d!", my_rank); dest = 0; MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); } else { for (source = 1;source < p; source++) { MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); printf("%s\n", message); } My hostfile looks like this: [amohan@myocyte48 ~]$ cat hostfile myocyte46 myocyte47 myocyte48 myocyte49 *** I use the following run command: : mpirun --hostfile hostfile -np 4 new46 And receive a blank screen. Hence, I have to kill the job. OUTPUT ON KILLING JOB: mpirun: killing job... -- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -- -- mpirun was unable to cleanly terminate the daemons on the nodes shown below. Additional manual cleanup may be required - please refer to the "orte-clean" tool for assistance. -- myocyte46 - daemon did not report back when launched myocyte47 - daemon did not report back when launched myocyte49 - daemon did not report back when launched Thanks, Ashwin. From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Wednesday, July 06, 2011 6:46 PM To: Open MPI Users Subject: Re: [OMPI users] Error using hostfile Please see http://www.open-mpi.org/faq/?category=rsh#ssh-keys On Jul 6, 2011, at 5:09 PM, Mohan, Ashwin wrote: Hi, I use the following command (mpirun --prefix /usr/local/openmpi1.4.3 -np 4 hello) to successfully execute a simple hello world command on a single node. Each node has 4 slots. Following the successful execution on one node, I wish to employ 4 nodes and for this purpose wrote a hostfile. I submitted my job using the following command: mpirun --prefix /usr/local/openmpi1.4.3 -np 4 --hostfile hostfile hello Copied below is the output. How do I go about fixing this issue. ** amohan@myocyte48's password: amohan@myocyte47's password: Permission denied, please try again. amohan@myocyte48's password: Permission denied, please try again. amohan@myocyte47's password: Permission denied, please try again. amohan@myocyte47's password: Permission denied, please try again. amohan@myocyte48's password: Permission denied (publickey,gssapi-with-mic,password). -- A daemon (pid 22085) died unexpectedly with status 255 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the re
Re: [OMPI users] Error using hostfile
Is there a firewall in the way? The error indicates that daemons were launched on the remote machines, but failed to communicate back. Also, check that your remote PATH and LD_LIBRARY_PATH are being set to the right place to pickup this version of OMPI. Lots of systems deploy with default versions that may not be compatible, so if you wind up running a daemon on the remote node that comes from another installation, things won't work. On Jul 8, 2011, at 10:52 AM, Mohan, Ashwin wrote: > Hi, > I am following up on a previous error posted. Based on the previous > recommendation, I did set up a password less SSH login. > > I created a hostfile comprising of 4 nodes (w/ each node having 4 slots). I > tried to run my job on 4 slots but get no output. Hence, I end up killing the > job. I am trying to run a simple MPI program on 4 nodes and trying to figure > out what could be the issue. What could I check to ensure that I can run > jobs on 4 nodes (each node has 4 slots) > > Here is the simple MPI program I am trying to execute on 4 nodes > ** > if (my_rank != 0) > { > sprintf(message, "Greetings from the process %d!", my_rank); > dest = 0; > MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, > MPI_COMM_WORLD); > } > else > { > for (source = 1;source < p; source++) > { > MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, > &status); > printf("%s\n", message); > } > > > My hostfile looks like this: > > [amohan@myocyte48 ~]$ cat hostfile > myocyte46 > myocyte47 > myocyte48 > myocyte49 > *** > > I use the following run command: : mpirun --hostfile hostfile -np 4 new46 > And receive a blank screen. Hence, I have to kill the job. > > OUTPUT ON KILLING JOB: > mpirun: killing job... > -- > mpirun noticed that the job aborted, but has no info as to the process > that caused that situation. > -- > -- > mpirun was unable to cleanly terminate the daemons on the nodes shown > below. Additional manual cleanup may be required - please refer to > the "orte-clean" tool for assistance. > -- > myocyte46 - daemon did not report back when launched > myocyte47 - daemon did not report back when launched > myocyte49 - daemon did not report back when launched > > Thanks, > Ashwin. > > > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Ralph Castain > Sent: Wednesday, July 06, 2011 6:46 PM > To: Open MPI Users > Subject: Re: [OMPI users] Error using hostfile > > Please see http://www.open-mpi.org/faq/?category=rsh#ssh-keys > > > On Jul 6, 2011, at 5:09 PM, Mohan, Ashwin wrote: > > > Hi, > > I use the following command (mpirun --prefix /usr/local/openmpi1.4.3 -np 4 > hello) to successfully execute a simple hello world command on a single node. > Each node has 4 slots. Following the successful execution on one node, I > wish to employ 4 nodes and for this purpose wrote a hostfile. I submitted my > job using the following command: > > mpirun --prefix /usr/local/openmpi1.4.3 -np 4 --hostfile hostfile hello > > Copied below is the output. How do I go about fixing this issue. > > ** > > amohan@myocyte48's password: amohan@myocyte47's password: > Permission denied, please try again. > amohan@myocyte48's password: > Permission denied, please try again. > amohan@myocyte47's password: > Permission denied, please try again. > amohan@myocyte47's password: > Permission denied, please try again. > amohan@myocyte48's password: > > Permission denied (publickey,gssapi-with-mic,password). > -- > A daemon (pid 22085) died unexpectedly with status 255 while attempting > to launch so we are aborting. > > There may be more information reported by the environment (see above). > > This may be because the daemon was unable to find all the needed shared > libraries on the remote node. You may set your LD_LIBRARY_PATH to have the > location of the shared libraries on the remote nodes and this will > automatically be forwarded to the remote nodes. > ---
Re: [OMPI users] Error using hostfile
Hi, I am following up on a previous error posted. Based on the previous recommendation, I did set up a password less SSH login. I created a hostfile comprising of 4 nodes (w/ each node having 4 slots). I tried to run my job on 4 slots but get no output. Hence, I end up killing the job. I am trying to run a simple MPI program on 4 nodes and trying to figure out what could be the issue. What could I check to ensure that I can run jobs on 4 nodes (each node has 4 slots) Here is the simple MPI program I am trying to execute on 4 nodes ** if (my_rank != 0) { sprintf(message, "Greetings from the process %d!", my_rank); dest = 0; MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); } else { for (source = 1;source < p; source++) { MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); printf("%s\n", message); } My hostfile looks like this: [amohan@myocyte48 ~]$ cat hostfile myocyte46 myocyte47 myocyte48 myocyte49 *** I use the following run command: : mpirun --hostfile hostfile -np 4 new46 And receive a blank screen. Hence, I have to kill the job. OUTPUT ON KILLING JOB: mpirun: killing job... -- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -- -- mpirun was unable to cleanly terminate the daemons on the nodes shown below. Additional manual cleanup may be required - please refer to the "orte-clean" tool for assistance. -- myocyte46 - daemon did not report back when launched myocyte47 - daemon did not report back when launched myocyte49 - daemon did not report back when launched Thanks, Ashwin. From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Wednesday, July 06, 2011 6:46 PM To: Open MPI Users Subject: Re: [OMPI users] Error using hostfile Please see http://www.open-mpi.org/faq/?category=rsh#ssh-keys On Jul 6, 2011, at 5:09 PM, Mohan, Ashwin wrote: Hi, I use the following command (mpirun --prefix /usr/local/openmpi1.4.3 -np 4 hello) to successfully execute a simple hello world command on a single node. Each node has 4 slots. Following the successful execution on one node, I wish to employ 4 nodes and for this purpose wrote a hostfile. I submitted my job using the following command: mpirun --prefix /usr/local/openmpi1.4.3 -np 4 --hostfile hostfile hello Copied below is the output. How do I go about fixing this issue. ** amohan@myocyte48's password: amohan@myocyte47's password: Permission denied, please try again. amohan@myocyte48's password: Permission denied, please try again. amohan@myocyte47's password: Permission denied, please try again. amohan@myocyte47's password: Permission denied, please try again. amohan@myocyte48's password: Permission denied (publickey,gssapi-with-mic,password). -- A daemon (pid 22085) died unexpectedly with status 255 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -- -- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -- -- mpirun was unable to cleanly terminate the daemons on the nodes shown below. Additional manual cleanup may be required - please refer to the "orte-clean" tool for assistance. -- myocyte47 - daemon did not report back when launched myocyte48 - daemon did not report back when launched ** Thanks, Ashwin. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Error using hostfile
Hi, Am 07.07.2011 um 01:09 schrieb Mohan, Ashwin: > I use the following command (mpirun --prefix /usr/local/openmpi1.4.3 -np 4 > hello) to successfully execute a simple hello world command on a single node. > Each node has 4 slots. Following the successful execution on one node, I > wish to employ 4 nodes and for this purpose wrote a hostfile. I submitted my > job using the following command: looks like you will either have to setup a passphraseless ssh login for each user between the machines, or do it one time inside the cluster using hostbased authentication: http://arc.liv.ac.uk/SGE/howto/hostbased-ssh.html You have the same users on all machines with the same UID and GID? -- Reuti > mpirun --prefix /usr/local/openmpi1.4.3 -np 4 --hostfile hostfile hello > > > > Copied below is the output. How do I go about fixing this issue. > > > > ** > > > > amohan@myocyte48's password: amohan@myocyte47's password: > > Permission denied, please try again. > > amohan@myocyte48's password: > > Permission denied, please try again. > > amohan@myocyte47's password: > > Permission denied, please try again. > > amohan@myocyte47's password: > > Permission denied, please try again. > > amohan@myocyte48's password: > > > > Permission denied (publickey,gssapi-with-mic,password). > > -- > > A daemon (pid 22085) died unexpectedly with status 255 while attempting > > to launch so we are aborting. > > > > There may be more information reported by the environment (see above). > > > > This may be because the daemon was unable to find all the needed shared > > libraries on the remote node. You may set your LD_LIBRARY_PATH to have the > > location of the shared libraries on the remote nodes and this will > > automatically be forwarded to the remote nodes. > > -- > > -- > > mpirun noticed that the job aborted, but has no info as to the process > > that caused that situation. > > -- > > -- > > mpirun was unable to cleanly terminate the daemons on the nodes shown > > below. Additional manual cleanup may be required - please refer to > > the "orte-clean" tool for assistance. > > -- > > myocyte47 - daemon did not report back when launched > > myocyte48 - daemon did not report back when launched > > > > ** > > > > Thanks, > > Ashwin. > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Error using hostfile
Please see http://www.open-mpi.org/faq/?category=rsh#ssh-keys On Jul 6, 2011, at 5:09 PM, Mohan, Ashwin wrote: > Hi, > > I use the following command (mpirun --prefix /usr/local/openmpi1.4.3 -np 4 > hello) to successfully execute a simple hello world command on a single node. > Each node has 4 slots. Following the successful execution on one node, I > wish to employ 4 nodes and for this purpose wrote a hostfile. I submitted my > job using the following command: > > mpirun --prefix /usr/local/openmpi1.4.3 -np 4 --hostfile hostfile hello > > Copied below is the output. How do I go about fixing this issue. > > ** > > amohan@myocyte48's password: amohan@myocyte47's password: > Permission denied, please try again. > amohan@myocyte48's password: > Permission denied, please try again. > amohan@myocyte47's password: > Permission denied, please try again. > amohan@myocyte47's password: > Permission denied, please try again. > amohan@myocyte48's password: > > Permission denied (publickey,gssapi-with-mic,password). > -- > A daemon (pid 22085) died unexpectedly with status 255 while attempting > to launch so we are aborting. > > There may be more information reported by the environment (see above). > > This may be because the daemon was unable to find all the needed shared > libraries on the remote node. You may set your LD_LIBRARY_PATH to have the > location of the shared libraries on the remote nodes and this will > automatically be forwarded to the remote nodes. > -- > -- > mpirun noticed that the job aborted, but has no info as to the process > that caused that situation. > -- > -- > mpirun was unable to cleanly terminate the daemons on the nodes shown > below. Additional manual cleanup may be required - please refer to > the "orte-clean" tool for assistance. > -- > myocyte47 - daemon did not report back when launched > myocyte48 - daemon did not report back when launched > > ** > > Thanks, > Ashwin. > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Error using hostfile
Hi, I use the following command (mpirun --prefix /usr/local/openmpi1.4.3 -np 4 hello) to successfully execute a simple hello world command on a single node. Each node has 4 slots. Following the successful execution on one node, I wish to employ 4 nodes and for this purpose wrote a hostfile. I submitted my job using the following command: mpirun --prefix /usr/local/openmpi1.4.3 -np 4 --hostfile hostfile hello Copied below is the output. How do I go about fixing this issue. ** amohan@myocyte48's password: amohan@myocyte47's password: Permission denied, please try again. amohan@myocyte48's password: Permission denied, please try again. amohan@myocyte47's password: Permission denied, please try again. amohan@myocyte47's password: Permission denied, please try again. amohan@myocyte48's password: Permission denied (publickey,gssapi-with-mic,password). -- A daemon (pid 22085) died unexpectedly with status 255 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -- -- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -- -- mpirun was unable to cleanly terminate the daemons on the nodes shown below. Additional manual cleanup may be required - please refer to the "orte-clean" tool for assistance. -- myocyte47 - daemon did not report back when launched myocyte48 - daemon did not report back when launched ** Thanks, Ashwin.