Hi everybody It's me again ;) First: My Oscar / CentOS 5 cluster is up and running, mostly...
Now I have problems with the parallel environment. I tried to run some MPI based scripts like the following: --- [root@lcc102 helloworld]# cat helloworld.c #include <stdio.h> #include <mpi.h> int main(int argc, char *argv[]) { int numprocs, rank, namelen; char processor_name[MPI_MAX_PROCESSOR_NAME]; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &numprocs); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Get_processor_name(processor_name, &namelen); printf("Process %d on %s out of %d\n", rank, processor_name, numprocs); MPI_Finalize(); } --- It's a sample script from OpenMPI so it should work... I started the MPI script with a bash script: --- [root@lcc102 helloworld]# cat openmpi-test #!/bin/bash #$ -N openmpi-helloworld # Here we tell the queue that we want the orte parallel enivironment and request 5 slots # This option take the following form: -pe nameOfEnv min-Max # Where you request a min and max number of slots #$ -pe make 6-10 #$ -cwd #$ -j y /opt/mpich-ch_p4-gcc-1.2.7/bin/mpirun -n $NSLOTS helloworld exit 0 --- After submitting the job to SGE, it ends after a few minutes and writes the following to the output / error: --- [root@lcc102 helloworld]# cat openmpi-helloworld.o129 Warnung: kein Zugriff auf Tty (Ungültiger Dateideskriptor). Daher keine Job Control in dieser Shell. p0_5868: (1999.558060) Procgroup: p0_5868: (1999.558127) entry 0: lcc105.ch.power.alstom.com 0 0 /home/helloworld/helloworld root p0_5868: (1999.558137) entry 1: oscar-rhel5.osl.iu.edu 1 1 /home/helloworld/helloworld root p0_5868: (1999.558144) entry 2: oscar-rhel5.osl.iu.edu 1 2 /home/helloworld/helloworld root p0_5868: (1999.558151) entry 3: oscar-rhel5.osl.iu.edu 1 3 /home/helloworld/helloworld root p0_5868: (1999.558158) entry 4: oscar-rhel5.osl.iu.edu 1 4 /home/helloworld/helloworld root p0_5868: (1999.558165) entry 5: oscar-rhel5.osl.iu.edu 1 5 /home/helloworld/helloworld root p0_5868: (1999.558172) entry 6: oscar-rhel5.osl.iu.edu 1 6 /home/helloworld/helloworld root p0_5868: (1999.558179) entry 7: oscar-rhel5.osl.iu.edu 1 7 /home/helloworld/helloworld root p0_5868: (1999.558186) entry 8: oscar-rhel5.osl.iu.edu 1 8 /home/helloworld/helloworld root p0_5868: (1999.558192) entry 9: oscar-rhel5.osl.iu.edu 1 9 /home/helloworld/helloworld root p0_5868: p4_error: Could not gethostbyname for host oscar-rhel5.osl.iu.edu; may be invalid name : 1999 --- The first two lines are normal, like I read, but the rest sounds strange... Has somebody ever seen an error like this? The name "oscar-rhel5.osl.iu.edu" isn't used in my cluster and could not get resolved by and DNS. I don't know from where the system get this name... The MPI script should return a "Hello I'm process x of y". I think it's related to OpenMPI but I'm not sure. I'm using the following versions: --- [root@lcc102 helloworld]# rpm -qa | grep oscar-base oscar-base-6.0.5-1 oscar-base-server-6.0.5r9167-1 oscar-base-lib-6.0.5-1 oscar-base-scripts-6.0.5-1 oscar-base-client-6.0.5-1 [root@lcc102 helloworld]# rpm -qa | grep sge opkg-sge-server-6.1.4-1 sge-6.0u9-9oscar opkg-sge-6.1.4-1 sge-modulefile-6.0u9-9oscar [root@lcc102 helloworld]# rpm -qa | grep mpi openmpi-switcher-modulefile-1.2.4-1 opkg-openmpi-server-1.2.4-1 mpi-selector-1.0.2-1.el5 opkg-mpich-1.2.7-9 opkg-mpich-server-1.2.7-9 openmpi-libs-1.4-4.el5 opkg-openmpi-client-1.2.4-1 mpich-ch_p4-gcc-oscar-module-1.2.7-8 opkg-openmpi-1.2.4-1 mpich-ch_p4-gcc-oscar-1.2.7-8 openmpi-1.4-4.el5 --- Can somebody help? cheers Patrick 2011/1/28 Patrick Schmid <patrick.sch...@encodingit.ch>: > I could solve the problem by myself but thanks for your help. > > The problem was, that gethostname returns the FQDN (name + domain). > And the script gethostbyname use the FQDN to ask for an ip... > But the hosts file has allocated the ip to only the name (not FQDN) so > gethostbyname failed because lcc103 isn't same like > lcc103.ch.power.alstom.com. > > But after I modified the hosts file like this everything worked: > > # addresses > 10.128.88.103 lcc103.ch.power.alstom.com lcc103 > 10.128.88.104 lcc104.ch.power.alstom.com lcc104 > 10.128.88.105 lcc105.ch.power.alstom.com lcc105 > > (Befor there were only the entries for lcc103, lcc104 and lcc105). > > Thanks, now my cluster is up and running. > > For everybody who's interested here > (http://blog.encodingit.ch/2011/01/linux-high-performance-cluster-mit-oscar/) > I wrote a howto about oscar (in German). > > cheers > Patrick > > 2011/1/28 siavash ghiasvand <siavash.ghiyasv...@gmail.com>: >> It was my pleasure ;) >> I don't know how! but it seems that "gethostname -name" not works correctly. >> take a look at "/etc/hosts.conf" to see if these lines are exist: >> >> # Lookup names via DNS first then fall back to /etc/hosts. >> order bind,hosts >> >> The above line will tell the "gethostname": 1st check the DNS and then check >> my local /etc/hosts entries. >> p.s: >> 1- You can remove "bind" from /etc/hosts.conf to check the correctness of >> /etc/hosts. >> 2- "10.128.88.102" is in a Private IP range but, >> "lcc102.ch.power.alstom.com" (If exists) will be resolved as an Pubic IP so >> with two IPs (one private and one public) ip resolve process will >> permanently fails! (You can change "lcc102.ch.power.alstom.com" with any >> other name to correct it). >> >> Sincerely yours, >> Siavash Ghiasvand >> >> ------------------------------------------------------------------------------ >> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! >> Finally, a world-class log management solution at an even better price-free! >> Download using promo code Free_Logger_4_Dev2Dev. Offer expires >> February 28th, so secure your free ArcSight Logger TODAY! >> http://p.sf.net/sfu/arcsight-sfd2d >> _______________________________________________ >> Oscar-users mailing list >> Oscar-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/oscar-users >> >> > > > > -- > Patrick Schmid > > www.encodingit.ch > patrick.sch...@encodingit.ch > -- Patrick Schmid ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users