Exxxcellent. Good luck!
On Jun 7, 2012, at 3:43 AM, Duke wrote: > On 6/7/12 5:32 PM, Jeff Squyres wrote: >> Check to ensure that you have firewalls disabled between your two machines; >> that's a common cause of hanging (i.e., Open MPI is trying to open >> connections and/or send data between your two nodes, and the packets are >> getting black-holed at the other side). >> >> Open MPI needs to be able to communicate on random TCP ports between all >> machines that will be used in MPI jobs. > > Thanks!!! After switching iptables off on all the machines, I got it working: > > [mpiuser@fantomfs40a ~]$ mpirun -np 8 --machinefile > /home/mpiuser/.mpi_hostfile ./test/mpihello > Hello world! I am process number: 0 on host fantomfs40a > Hello world! I am process number: 1 on host fantomfs40a > Hello world! I am process number: 2 on host hp430a > Hello world! I am process number: 3 on host hp430a > Hello world! I am process number: 4 on host hp430a > Hello world! I am process number: 5 on host hp430a > Hello world! I am process number: 6 on host hp430b > Hello world! I am process number: 7 on host hp430b > > Thanks so much for all the answers/suggestions. I am excited now :). > > D. > >> >> >> On Jun 7, 2012, at 3:06 AM, Duke wrote: >> >>> Hi again, >>> >>> Somehow the verbose flag (-v) did not work for me. I tried --debug-daemon >>> and got: >>> >>> [mpiuser@fantomfs40a ~]$ mpirun --debug-daemons -np 3 --machinefile >>> /home/mpiuser/.mpi_hostfile ./test/mpihello >>> Daemon was launched on hp430a - beginning to initialize >>> Daemon [[34432,0],1] checking in as pid 3011 on host hp430a >>> <stuck here> >>> >>> Somehow the program got stuck when checking on hosts. The secure log on >>> hp430a showed that mpiuser logged in just fine: >>> >>> tail /var/log/secure >>> Jun 7 17:07:31 hp430a sshd[3007]: Accepted publickey for mpiuser from >>> 192.168.0.101 port 34037 ssh2 >>> Jun 7 17:07:31 hp430a sshd[3007]: pam_unix(sshd:session): session opened >>> for user mpiuser by (uid=0) >>> >>> Any idea where/how/what to process/check? >>> >>> Thanks, >>> >>> D. >>> >>> On 6/7/12 4:38 PM, Duke wrote: >>>> Hi Jingha, >>>> >>>> On 6/7/12 4:28 PM, Jingcha Joba wrote: >>>>> Hello Duke, >>>>> Welcome to the forum. >>>>> >>>>> The way openmpi schedules by default is to fill all the slots in a host, >>>>> before moving on to next host. >>>>> >>>>> Check this link for some info: >>>>> http://www.open-mpi.org/faq/?category=running#mpirun-scheduling >>>> Thanks for quick answer. I checked the FAQ, and tried with processes more >>>> than 2, but somehow it got stalled: >>>> >>>> [mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --machinefile >>>> /home/mpiuser/.mpi_hostfile ./test/mpihello >>>> ^Cmpirun: killing job... >>>> >>>> I tried --host flag and it got stalled as well: >>>> >>>> [mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --host hp430a,hp430b >>>> ./test/mpihello >>>> >>>> >>>> My configuration must be wrong somewhere. Anyidea how I can check the >>>> system? >>>> >>>> Thanks, >>>> >>>> D. >>>> >>>>> >>>>> >>>>> -- >>>>> Jingcha >>>>> On Thu, Jun 7, 2012 at 2:11 AM, Duke<duke.li...@gmx.com> wrote: >>>>> Hi folks, >>>>> >>>>> Please be gentle to the newest member of openMPI, I am totally new to >>>>> this field. I just built a test cluster with 3 boxes on Scientific Linux >>>>> 6.2 and openMPI (Open MPI 1.5.3), and I wanted to test how the cluster >>>>> works but I cant figure out what was/is happening. On my master node, I >>>>> have the hostfile: >>>>> >>>>> [mpiuser@fantomfs40a ~]$ cat .mpi_hostfile >>>>> # The Hostfile for Open MPI >>>>> fantomfs40a slots=2 >>>>> hp430a slots=4 max-slots=4 >>>>> hp430b slots=4 max-slots=4 >>>>> >>>>> To test, I used the following c code: >>>>> >>>>> [mpiuser@fantomfs40a ~]$ cat test/mpihello.c >>>>> /* program hello */ >>>>> /* Adapted from mpihello.f by drs */ >>>>> >>>>> #include<mpi.h> >>>>> #include<stdio.h> >>>>> >>>>> int main(int argc, char **argv) >>>>> { >>>>> int *buf, i, rank, nints, len; >>>>> char hostname[256]; >>>>> >>>>> MPI_Init(&argc,&argv); >>>>> MPI_Comm_rank(MPI_COMM_WORLD,&rank); >>>>> gethostname(hostname,255); >>>>> printf("Hello world! I am process number: %d on host %s\n", rank, >>>>> hostname); >>>>> MPI_Finalize(); >>>>> return 0; >>>>> } >>>>> >>>>> and then compiled and ran: >>>>> >>>>> [mpiuser@fantomfs40a ~]$ mpicc -o test/mpihello test/mpihello.c >>>>> [mpiuser@fantomfs40a ~]$ mpirun -np 2 --machinefile >>>>> /home/mpiuser/.mpi_hostfile ./test/mpihello >>>>> Hello world! I am process number: 0 on host fantomfs40a >>>>> Hello world! I am process number: 1 on host fantomfs40a >>>>> >>>>> Unfortunately the result did not show what I wanted. I expected to see >>>>> somethign like: >>>>> >>>>> Hello world! I am process number: 0 on host hp430a >>>>> Hello world! I am process number: 1 on host hp430b >>>>> >>>>> Anybody has any idea what I am doing wrong? >>>>> >>>>> Thank you in advance, >>>>> >>>>> D. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/