Exxxcellent.

Good luck!


On Jun 7, 2012, at 3:43 AM, Duke wrote:

> On 6/7/12 5:32 PM, Jeff Squyres wrote:
>> Check to ensure that you have firewalls disabled between your two machines; 
>> that's a common cause of hanging (i.e., Open MPI is trying to open 
>> connections and/or send data between your two nodes, and the packets are 
>> getting black-holed at the other side).
>> 
>> Open MPI needs to be able to communicate on random TCP ports between all 
>> machines that will be used in MPI jobs.
> 
> Thanks!!! After switching iptables off on all the machines, I got it working:
> 
> [mpiuser@fantomfs40a ~]$ mpirun -np 8 --machinefile 
> /home/mpiuser/.mpi_hostfile ./test/mpihello
> Hello world!  I am process number: 0 on host fantomfs40a
> Hello world!  I am process number: 1 on host fantomfs40a
> Hello world!  I am process number: 2 on host hp430a
> Hello world!  I am process number: 3 on host hp430a
> Hello world!  I am process number: 4 on host hp430a
> Hello world!  I am process number: 5 on host hp430a
> Hello world!  I am process number: 6 on host hp430b
> Hello world!  I am process number: 7 on host hp430b
> 
> Thanks so much for all the answers/suggestions. I am excited now :).
> 
> D.
> 
>> 
>> 
>> On Jun 7, 2012, at 3:06 AM, Duke wrote:
>> 
>>> Hi again,
>>> 
>>> Somehow the verbose flag (-v) did not work for me. I tried --debug-daemon 
>>> and got:
>>> 
>>> [mpiuser@fantomfs40a ~]$ mpirun --debug-daemons -np 3 --machinefile 
>>> /home/mpiuser/.mpi_hostfile ./test/mpihello
>>> Daemon was launched on hp430a - beginning to initialize
>>> Daemon [[34432,0],1] checking in as pid 3011 on host hp430a
>>> <stuck here>
>>> 
>>> Somehow the program got stuck when checking on hosts. The secure log on 
>>> hp430a showed that mpiuser logged in just fine:
>>> 
>>> tail /var/log/secure
>>> Jun  7 17:07:31 hp430a sshd[3007]: Accepted publickey for mpiuser from 
>>> 192.168.0.101 port 34037 ssh2
>>> Jun  7 17:07:31 hp430a sshd[3007]: pam_unix(sshd:session): session opened 
>>> for user mpiuser by (uid=0)
>>> 
>>> Any idea where/how/what to process/check?
>>> 
>>> Thanks,
>>> 
>>> D.
>>> 
>>> On 6/7/12 4:38 PM, Duke wrote:
>>>> Hi Jingha,
>>>> 
>>>> On 6/7/12 4:28 PM, Jingcha Joba wrote:
>>>>> Hello Duke,
>>>>> Welcome to the forum.
>>>>> 
>>>>> The way openmpi schedules by default is to fill all the slots in a host, 
>>>>> before moving on to next host.
>>>>> 
>>>>> Check this link for some info:
>>>>> http://www.open-mpi.org/faq/?category=running#mpirun-scheduling
>>>> Thanks for quick answer. I checked the FAQ, and tried with processes more 
>>>> than 2, but somehow it got stalled:
>>>> 
>>>> [mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --machinefile 
>>>> /home/mpiuser/.mpi_hostfile ./test/mpihello
>>>> ^Cmpirun: killing job...
>>>> 
>>>> I tried --host flag and it got stalled as well:
>>>> 
>>>> [mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --host hp430a,hp430b 
>>>> ./test/mpihello
>>>> 
>>>> 
>>>> My configuration must be wrong somewhere. Anyidea how I can check the 
>>>> system?
>>>> 
>>>> Thanks,
>>>> 
>>>> D.
>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Jingcha
>>>>> On Thu, Jun 7, 2012 at 2:11 AM, Duke<duke.li...@gmx.com>  wrote:
>>>>> Hi folks,
>>>>> 
>>>>> Please be gentle to the newest member of openMPI, I am totally new to 
>>>>> this field. I just built a test cluster with 3 boxes on Scientific Linux 
>>>>> 6.2 and openMPI (Open MPI 1.5.3), and I wanted to test how the cluster 
>>>>> works but I cant figure out what was/is happening. On my master node, I 
>>>>> have the hostfile:
>>>>> 
>>>>> [mpiuser@fantomfs40a ~]$ cat .mpi_hostfile
>>>>> # The Hostfile for Open MPI
>>>>> fantomfs40a slots=2
>>>>> hp430a slots=4 max-slots=4
>>>>> hp430b slots=4 max-slots=4
>>>>> 
>>>>> To test, I used the following c code:
>>>>> 
>>>>> [mpiuser@fantomfs40a ~]$ cat test/mpihello.c
>>>>> /* program hello */
>>>>> /* Adapted from mpihello.f by drs */
>>>>> 
>>>>> #include<mpi.h>
>>>>> #include<stdio.h>
>>>>> 
>>>>> int main(int argc, char **argv)
>>>>> {
>>>>>  int *buf, i, rank, nints, len;
>>>>>  char hostname[256];
>>>>> 
>>>>>  MPI_Init(&argc,&argv);
>>>>>  MPI_Comm_rank(MPI_COMM_WORLD,&rank);
>>>>>  gethostname(hostname,255);
>>>>>  printf("Hello world!  I am process number: %d on host %s\n", rank, 
>>>>> hostname);
>>>>>  MPI_Finalize();
>>>>>  return 0;
>>>>> }
>>>>> 
>>>>> and then compiled and ran:
>>>>> 
>>>>> [mpiuser@fantomfs40a ~]$ mpicc -o test/mpihello test/mpihello.c
>>>>> [mpiuser@fantomfs40a ~]$ mpirun -np 2 --machinefile 
>>>>> /home/mpiuser/.mpi_hostfile ./test/mpihello
>>>>> Hello world!  I am process number: 0 on host fantomfs40a
>>>>> Hello world!  I am process number: 1 on host fantomfs40a
>>>>> 
>>>>> Unfortunately the result did not show what I wanted. I expected to see 
>>>>> somethign like:
>>>>> 
>>>>> Hello world!  I am process number: 0 on host hp430a
>>>>> Hello world!  I am process number: 1 on host hp430b
>>>>> 
>>>>> Anybody has any idea what I am doing wrong?
>>>>> 
>>>>> Thank you in advance,
>>>>> 
>>>>> D.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> 
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> 
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to