Hi everybody

It's me again ;)
First: My Oscar / CentOS 5 cluster is up and running, mostly...

Now I have problems with the parallel environment.
I tried to run some MPI based scripts like the following:

---
[root@lcc102 helloworld]# cat helloworld.c
#include <stdio.h>
#include <mpi.h>

int main(int argc, char *argv[]) {
  int numprocs, rank, namelen;
  char processor_name[MPI_MAX_PROCESSOR_NAME];

  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Get_processor_name(processor_name, &namelen);

  printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);

  MPI_Finalize();
}
---

It's a sample script from OpenMPI so it should work...

I started the MPI script with a bash script:

---
[root@lcc102 helloworld]# cat openmpi-test
#!/bin/bash
#$ -N openmpi-helloworld
# Here we tell the queue that we want the orte parallel enivironment
and request 5 slots
# This option take the following form: -pe nameOfEnv min-Max
# Where you request a min and max number of slots
#$ -pe make 6-10
#$ -cwd
#$ -j y
/opt/mpich-ch_p4-gcc-1.2.7/bin/mpirun -n $NSLOTS helloworld
exit 0
---

After submitting the job to SGE, it ends after a few minutes and
writes the following to the output / error:

---
[root@lcc102 helloworld]# cat openmpi-helloworld.o129
Warnung: kein Zugriff auf Tty (Ungültiger Dateideskriptor).
Daher keine Job Control in dieser Shell.
p0_5868: (1999.558060) Procgroup:
p0_5868: (1999.558127)     entry 0: lcc105.ch.power.alstom.com 0 0
/home/helloworld/helloworld root
p0_5868: (1999.558137)     entry 1: oscar-rhel5.osl.iu.edu 1 1
/home/helloworld/helloworld root
p0_5868: (1999.558144)     entry 2: oscar-rhel5.osl.iu.edu 1 2
/home/helloworld/helloworld root
p0_5868: (1999.558151)     entry 3: oscar-rhel5.osl.iu.edu 1 3
/home/helloworld/helloworld root
p0_5868: (1999.558158)     entry 4: oscar-rhel5.osl.iu.edu 1 4
/home/helloworld/helloworld root
p0_5868: (1999.558165)     entry 5: oscar-rhel5.osl.iu.edu 1 5
/home/helloworld/helloworld root
p0_5868: (1999.558172)     entry 6: oscar-rhel5.osl.iu.edu 1 6
/home/helloworld/helloworld root
p0_5868: (1999.558179)     entry 7: oscar-rhel5.osl.iu.edu 1 7
/home/helloworld/helloworld root
p0_5868: (1999.558186)     entry 8: oscar-rhel5.osl.iu.edu 1 8
/home/helloworld/helloworld root
p0_5868: (1999.558192)     entry 9: oscar-rhel5.osl.iu.edu 1 9
/home/helloworld/helloworld root
p0_5868:  p4_error: Could not gethostbyname for host
oscar-rhel5.osl.iu.edu; may be invalid name
: 1999
---

The first two lines are normal, like I read, but the rest sounds
strange... Has somebody ever seen an error like this?
The name "oscar-rhel5.osl.iu.edu" isn't used in my cluster and could
not get resolved by and DNS. I don't know from where the system get
this name...

The MPI script should return a "Hello I'm process x of y".

I think it's related to OpenMPI but I'm not sure.
I'm using the following versions:

---
[root@lcc102 helloworld]# rpm -qa | grep oscar-base
oscar-base-6.0.5-1
oscar-base-server-6.0.5r9167-1
oscar-base-lib-6.0.5-1
oscar-base-scripts-6.0.5-1
oscar-base-client-6.0.5-1

[root@lcc102 helloworld]# rpm -qa | grep sge
opkg-sge-server-6.1.4-1
sge-6.0u9-9oscar
opkg-sge-6.1.4-1
sge-modulefile-6.0u9-9oscar

[root@lcc102 helloworld]# rpm -qa | grep mpi
openmpi-switcher-modulefile-1.2.4-1
opkg-openmpi-server-1.2.4-1
mpi-selector-1.0.2-1.el5
opkg-mpich-1.2.7-9
opkg-mpich-server-1.2.7-9
openmpi-libs-1.4-4.el5
opkg-openmpi-client-1.2.4-1
mpich-ch_p4-gcc-oscar-module-1.2.7-8
opkg-openmpi-1.2.4-1
mpich-ch_p4-gcc-oscar-1.2.7-8
openmpi-1.4-4.el5
---

Can somebody help?

cheers
Patrick


2011/1/28 Patrick Schmid <patrick.sch...@encodingit.ch>:
> I could solve the problem by myself but thanks for your help.
>
> The problem was, that gethostname returns the FQDN (name + domain).
> And the script gethostbyname use the FQDN to ask for an ip...
> But the hosts file has allocated the ip to only the name (not FQDN) so
> gethostbyname failed because lcc103 isn't same like
> lcc103.ch.power.alstom.com.
>
> But after I modified the hosts file like this everything worked:
>
> #  addresses
> 10.128.88.103        lcc103.ch.power.alstom.com lcc103
> 10.128.88.104        lcc104.ch.power.alstom.com lcc104
> 10.128.88.105        lcc105.ch.power.alstom.com lcc105
>
> (Befor there were only the entries for lcc103, lcc104 and lcc105).
>
> Thanks, now my cluster is up and running.
>
> For everybody who's interested here
> (http://blog.encodingit.ch/2011/01/linux-high-performance-cluster-mit-oscar/)
> I wrote a howto about oscar (in German).
>
> cheers
> Patrick
>
> 2011/1/28 siavash ghiasvand <siavash.ghiyasv...@gmail.com>:
>> It was my pleasure ;)
>> I don't know how! but it seems that "gethostname -name" not works correctly.
>> take a look at "/etc/hosts.conf" to see if these lines are exist:
>>
>> # Lookup names via DNS first then fall back to /etc/hosts.
>> order bind,hosts
>>
>> The above line will tell the "gethostname": 1st check the DNS and then check
>> my local /etc/hosts entries.
>> p.s:
>> 1- You can remove "bind" from /etc/hosts.conf to check the correctness of
>> /etc/hosts.
>> 2- "10.128.88.102" is in a Private IP range but,
>> "lcc102.ch.power.alstom.com" (If exists) will be resolved as an Pubic IP so
>> with two IPs (one private and one public) ip resolve process will
>> permanently fails! (You can change "lcc102.ch.power.alstom.com" with any
>> other name to correct it).
>>
>> Sincerely yours,
>> Siavash Ghiasvand
>>
>> ------------------------------------------------------------------------------
>> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
>> Finally, a world-class log management solution at an even better price-free!
>> Download using promo code Free_Logger_4_Dev2Dev. Offer expires
>> February 28th, so secure your free ArcSight Logger TODAY!
>> http://p.sf.net/sfu/arcsight-sfd2d
>> _______________________________________________
>> Oscar-users mailing list
>> Oscar-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/oscar-users
>>
>>
>
>
>
> --
> Patrick Schmid
>
> www.encodingit.ch
> patrick.sch...@encodingit.ch
>



-- 
Patrick Schmid

------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to