[OMPI users] Question on ./configure error on Tru64unix (OSF1) v5.1B-6 for openmpi-1.6

2012-06-07 Thread Bill . Glessner

Hello,

I am having trouble with the *** Assembler section of the GNU autoconf
step in trying to build OpenMPI version 1.6 on an HP AlphaServer GS160
running Tru64unix version 5.1B-6:

# uname -a
OSF1 zozma.cts.cwu.edu V5.1 2650 alpha

The output is of the ./configure run
zozma(bash)% ./configure --prefix=/usr/local/OpenMPI \
--enable-shared --enable-static :

...

*** Assembler
checking dependency style of gcc... gcc3
checking for BSD- or MS-compatible name lister (nm)... /usr/local/bin/nm -B
checking the name lister (/usr/local/bin/nm -B) interface... BSD nm
checking for fgrep... /usr/local/bin/grep -F
checking if need to remove -g from CCASFLAGS... no
checking whether to enable smp locks... yes
checking if .proc/endp is needed... no
checking directive for setting text section... .text
checking directive for exporting symbols... .globl
checking for objdump... objdump
checking if .note.GNU-stack is needed... no
checking suffix for labels... :
checking prefix for global symbol labels... none
configure: error: Could not determine global symbol label prefix

The ./config.log is appended.

Can anyone provide some information or suggestions on how to resolve this
issue?

Thank you for your assistance,
Bill Glessner   - System programmer , Cenral Washington University

**

This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure 1.6, which was
generated by GNU Autoconf 2.68.  Invocation command line was

  $ ./configure --prefix=/usr/local/OpenMPI --disable-shared --enable-static

## - ##
## Platform. ##
## - ##

hostname = zozma.cts.cwu.edu
uname -m = alpha
uname -r = V5.1
uname -s = OSF1
uname -v = 2650

/usr/bin/uname -p = alpha
/bin/uname -X = unknown

/bin/arch  = unknown
/usr/bin/arch -k   = unknown
/usr/convex/getsysinfo = unknown
/usr/bin/hostinfo  = unknown
/bin/machine   = alpha
/usr/bin/oslevel   = unknown
/bin/universe  = unknown

PATH: /usr/local/GNU463/bin
PATH: /usr/local/sbin
PATH: /usr/local/bin
PATH: /sbin
PATH: /usr/sbin
PATH: /usr/bin
PATH: /usr/bin/X11


## --- ##
## Core tests. ##
## --- ##

configure:4851: checking build system type
configure:4865: result: alphaev68-dec-osf5.1b
configure:4885: checking host system type
configure:4898: result: alphaev68-dec-osf5.1b
configure:4918: checking target system type
configure:4931: result: alphaev68-dec-osf5.1b
configure:5018: checking for gcc
configure:5045: result: gcc
configure:5274: checking for C compiler version
configure:5283: gcc --version >&5
gcc (GCC) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

configure:5294: $? = 0
configure:5283: gcc -v >&5
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/GNU463/bin/../libexec/gcc/alphaev68-dec-osf5.1b/4.6.3/lto-wrapper
Target: alphaev68-dec-osf5.1b
Configured with: /Area51/JunqueYard/gcc-4.6.3/configure ac_cv_prog_cc_c99= 
--prefix=/usr/local/GNU463 --enable-languages=c,c++,fortran --disable-nls 
--with-libiconv=/usr/local --with-gmp=/usr/local --with-mpfr=/usr/local 
--with-mpc=/usr/local
Thread model: posix
gcc version 4.6.3 (GCC) 
configure:5294: $? = 0
configure:5283: gcc -V >&5
gcc: error: unrecognized option '-V'
gcc: fatal error: no input files
compilation terminated.
configure:5294: $? = 1
configure:5283: gcc -qversion >&5
gcc: error: unrecognized option '-qversion'
gcc: fatal error: no input files
compilation terminated.
configure:5294: $? = 1
configure:5314: checking whether the C compiler works
configure:5336: gcc -mieee   conftest.c  >&5
configure:5340: $? = 0
configure:5388: result: yes
configure:5391: checking for C compiler default output file name
configure:5393: result: a.out
configure:5399: checking for suffix of executables
configure:5406: gcc -o conftest -mieee   conftest.c  >&5
configure:5410: $? = 0
configure:5432: result: 
configure:5454: checking whether we are cross compiling
configure:5462: gcc -o conftest -mieee   conftest.c  >&5
configure:5466: $? = 0
configure:5473: ./conftest
configure:5477: $? = 0
configure:5492: result: no
configure:5497: checking for suffix of object files
configure:5519: gcc -c -mieee  conftest.c >&5
configure:5523: $? = 0
configure:5544: result: o
configure:5548: checking whether we are using the GNU C compiler
configure:5567: gcc -c -mieee  conftest.c >&5
configure:5567: $? = 0
configure:5576: result: yes
configure:5585: checking whether gcc accepts -g
configure:5605: gcc -c -g  conftest.c >&5
configure:5605: $? = 0
configure:5646: result: yes
configure:5663: checking for gcc option to accept ISO C89
configure:5727: gcc  -c -mieee  conftest.c >&5
configure:5727: $? = 0
configure:5740: result: none needed
configure:5766: checking how to run the C preprocessor
configure:5797: gcc -E  conftest.c

Re: [OMPI users] problems compiling openmpi-1.6 on some platforms

2012-06-07 Thread Siegmar Gross
Hello Jeff,

thank you very much for your help. You were right with your suggestion
that one of our system commands is responsible for the segmentation
fault. After splitting the command in config.status I found out that
gawk was responsible. We installed the latest version and now
everything works fine. Thank you very much once more.

> > ...
> > configure: creating ./config.status
> > config.status: creating Makefile
> > ./config.status: line 1197: 26418 Done(141)   eval sed 
> > \"\$ac_sed_extra\" "$ac_file_inputs"
> > 26419 Segmentation Fault  (core dumped) | $AWK -f 
> > "$ac_tmp/subs.awk" 
> 
> > $ac_tmp/out
> > config.status: error: could not create Makefile
> > ...
> 
> I'm looking through the tarball you sent now...
> 
> Yow!  This segv happens *3 times* during configure -- during the
> configuration of 3 separate sub-packages in Open MPI: ROMIO,
> VampirTrace, and libltdl.  And all 3 failures were pretty much
> identical (the line numbers were different, but that was to be
> expected).  The failures of the first 2 are not fatal (OMPI will
> just ignore the ROMIO and VT sub-packages), but we need libltdl,
> hence that failure is treated as fatal, and configure aborts.
> 
> I *suspect* that if you configure with --disable-dlopen, then
> the ROMIO and VT sub-configures will still fail in the same way,
> but the libltdl stuff will be skipped and therefore not fail.
>  BUT: I suspect that OMPI's final configuration step (i.e.,
> where it invokes the top-level config.status) will fail with the
> segv as well.  Meaning: this looks like a systemic problem on
> your system with some shell command (awk, sed, eval, ...something).
> 
> With that aside, let's look at exactly what is happening here.
> 
> The segv is occurring in opal/libltdl/config.status.  This is a
> script that's run right near the very end of the libltdl configure
> script (which is invoked from OMPI's top-level configure script). 
> config.status normally does things like creating Makefile's from
> Makefile.in's, etc.
> 
> It looks like you *should* be able to re-create the problem by:
> 
> cd opal/libltdl
> ./config.status
> 
> I.e., just invoke ./config.status it should segv just like it does
> when you run the top-level configure, ...etc.
> 
> Anyhoo, I see the exact line in the config.status where it's
> failing -- line 1256:
> 
> eval sed \"\$ac_sed_extra\" "$ac_file_inputs" | $AWK -f "$ac_tmp/subs.awk" \
>   >$ac_tmp/out || as_fn_error $? "could not create $ac_file" "$LINENO" 5
> 
> $ac_sed_extra is defined right above that, and it's a pretty lengthy
> sed command.  I think what you want to do here is edit config.status
> and determine exactly which command is set faulting, and see if you
> can get a core file from it.  For example, I'd probably split that
> line into multiple lines and see what's going on, maybe something
> like this (I just typed this in my mail client -- forgive typos):
> 
> # Enable core dumps
> limit -c unlimited
> echo ac_sed_extra is: $ac_sed_extra
> echo ac_file_inputs is: $ac_file_inputs
> echo AWK is: $AWK
> echo ac_tmp is: $ac_tmp
> echo ac_file is: $ac_file
> echo contents of subs.awk:
> cat "$ac_tmp/subs.awk"
> 
> echo = output from sed command
> sed \"\$ac_sed_extra\" "$ac_file_inputs"
> 
> echo = eval'ed output from sed command
> eval sed \"\$ac_sed_extra\" "$ac_file_inputs"
> 
> echo = piping to awk command
> eval sed \"\$ac_sed_extra\" "$ac_file_inputs" | $AWK -f "$ac_tmp/subs.awk" 
> 
> ... or something like that.
> 
> See where that takes you.


Kind regards

Siegmar



Re: [OMPI users] problem with sctp.h on Solaris

2012-06-07 Thread Siegmar Gross
Hello,

> Can you try the attached patch and tell me if you get sctp configured?

Yes, it works! Thank you very much for your help.


> > This looks like a missing check in the sctp configure.m4.  I am 
> > working on a patch.
> >
> > --td
> >
> > On 6/5/2012 10:10 AM, Siegmar Gross wrote:
> >> Hello,
> >>
> >> I compiled "openmpi-1.6" on "Solaris 10 sparc" and "Solaris 10 x86"
> >> with "gcc-4.6.2" and "Sun C 5.12". Today I searched my log-files for
> >> "WARNING" and found the following message.
> >>
> >> WARNING: netinet/sctp.h: present but cannot be compiled
> >> WARNING: netinet/sctp.h: check for missing prerequisite headers?
> >> WARNING: netinet/sctp.h: see the Autoconf documentation
> >> WARNING: netinet/sctp.h: section "Present But Cannot Be Compiled"
> >> WARNING: netinet/sctp.h: proceeding with the compiler's result
> >> WARNING: ## -- ##
> >> WARNING: ## Report this tohttp://www.open-mpi.org/community/help/  ##
> >> WARNING: ## -- ##
> >>
> >> Looking in "config.log" showed that some types are undefined.
> >>
> >> tyr openmpi-1.6-SunOS.sparc.64_cc 323 grep sctp config.log
> >> configure:119568: result: elan, mx, ofud, openib, portals, sctp, sm, tcp,
> >> udapl
> >> configure:125730: checking for MCA component btl:sctp compile mode
> >> configure:125752: checking --with-sctp value
> >> configure:125862: checking --with-sctp-libdir value
> >> configure:125946: checking netinet/sctp.h usability
> >> "/usr/include/netinet/sctp.h", line 228:
> >>incomplete struct/union/enum sockaddr_storage: spc_aaddr
> >> "/usr/include/netinet/sctp.h", line 530: syntax error before or at:
> >> socklen_t
> >> "/usr/include/netinet/sctp.h", line 533: syntax error before or at:
> >> socklen_t
> >> "/usr/include/netinet/sctp.h", line 537: syntax error before or at:
> >> socklen_t
> >> "/usr/include/netinet/sctp.h", line 772: syntax error before or at:
> >> ipaddr_t
> >> "/usr/include/netinet/sctp.h", line 779: syntax error before or at:
> >> in6_addr_t
> >> | #include
> >> ...
> >>
> >> The missing types are defined via. In which files must
> >> I include this header file to avoid the warning? Thank you very much
> >> for any help in advance.


Kind regards

Siegmar
Index: ompi/mca/btl/sctp/configure.m4
===
--- ompi/mca/btl/sctp/configure.m4  (revision 26562)
+++ ompi/mca/btl/sctp/configure.m4  (working copy)
@@ -11,6 +11,7 @@
 # Copyright (c) 2004-2005 The Regents of the University of California.
 # All rights reserved.
 # Copyright (c) 2009  Cisco Systems, Inc.  All rights reserved.
+# Copyright (c) 2012  Oracle and/or its affiliates.  All rights reserved.
 # $COPYRIGHT$
 # 
 # Additional copyrights may follow
@@ -100,6 +101,18 @@
AS_IF([test ! -z "$with_sctp_libdir" -a "$with_sctp_libdir" != 
"yes"],
[ompi_check_sctp_libdir="$with_sctp_libdir"])

+# Check for in.h dependency outside OMPI_CHECK_PACKAGE because it 
cannot
+# handle non-system detected dependencies.  This is specifically 
an issue
+# with Oracale Solaris because sctp.h requires in.h to define some 
types where
+# Linux does not.
+AC_CHECK_HEADER([netinet/in.h])
+AC_CHECK_HEADER([netinet/sctp.h],
+[],
+[],
+[
+#ifdef HAVE_NETINET_IN_H
+#include 
+#endif])
OMPI_CHECK_PACKAGE([$1],
[netinet/sctp.h],
[$ompi_sctp_api_libname],
Index: config/ompi_check_package.m4
===
--- config/ompi_check_package.m4(revision 26562)
+++ config/ompi_check_package.m4(working copy)
@@ -10,6 +10,7 @@
 # University of Stuttgart.  All rights reserved.
 # Copyright (c) 2004-2005 The Regents of the University of California.
 # All rights reserved.
+# Copyright (c) 2012  Oracle and/or its affiliates.  All rights reserved.
 # $COPYRIGHT$
 # 
 # Additional copyrights may follow
@@ -35,18 +36,31 @@
 AS_IF([test "$3" = "/usr" -o "$3" = "/usr/local"],
[ # try as is...
 AC_VERBOSE([looking for header without includes])
-AC_CHECK_HEADER([$2], [ompi_check_package_header_happy="yes"],
-[ompi_check_package_header_happy="no"])
+# check to see if the header file was detected previously and only
+# do a check if it was not.  This is necessary to do for
+# things like sctp.h that has a dependency that we cannot detect
+# in this part of the code.
+AS_IF(AS_VAR_IF([ompi_Header], [no],
+  [AC_CHECK_HEADER([$2], 
[ompi_check_package_header_happy="yes"],
+ 

Re: [OMPI users] testing for openMPI

2012-06-07 Thread TERRY DONTJE

Try: ps -elf | grep hello
This should list out all the processes named hello.
In that output is the pid (should be the 4th column) of the process and 
you give your debugger that pid.  For example if the pid was 1234 you'd 
give "gdb - 1234".


Actually Jeff's suggestion of this being a firewall issue is something 
to look into.


--td


On 6/7/2012 6:36 AM, Duke wrote:

On 6/7/12 5:31 PM, TERRY DONTJE wrote:
Can you get on one of the nodes and see the job's processes?  If so 
can you then attach a debugger to it and get a stack?  I wonder if 
the processes are stuck in MPI_Init?


Thanks Terry for your suggestion, but please let me know how would I 
do it? I can ssh to the nodes, but how do I check the job's process? I 
am new to this.


Thanks,

D.



--td

On 6/7/2012 6:06 AM, Duke wrote:

Hi again,

Somehow the verbose flag (-v) did not work for me. I tried 
--debug-daemon and got:


[mpiuser@fantomfs40a ~]$ mpirun --debug-daemons -np 3 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello

Daemon was launched on hp430a - beginning to initialize
Daemon [[34432,0],1] checking in as pid 3011 on host hp430a


Somehow the program got stuck when checking on hosts. The secure log 
on hp430a showed that mpiuser logged in just fine:


tail /var/log/secure
Jun  7 17:07:31 hp430a sshd[3007]: Accepted publickey for mpiuser 
from 192.168.0.101 port 34037 ssh2
Jun  7 17:07:31 hp430a sshd[3007]: pam_unix(sshd:session): session 
opened for user mpiuser by (uid=0)


Any idea where/how/what to process/check?

Thanks,

D.

On 6/7/12 4:38 PM, Duke wrote:

Hi Jingha,

On 6/7/12 4:28 PM, Jingcha Joba wrote:

Hello Duke,
Welcome to the forum.
The way openmpi schedules by default is to fill all the slots in a 
host, before moving on to next host.

Check this link for some info:
http://www.open-mpi.org/faq/?category=running#mpirun-scheduling


Thanks for quick answer. I checked the FAQ, and tried with 
processes more than 2, but somehow it got stalled:


[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello

^Cmpirun: killing job...

I tried --host flag and it got stalled as well:

[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --host hp430a,hp430b 
./test/mpihello



My configuration must be wrong somewhere. Anyidea how I can check 
the system?


Thanks,

D.




--
Jingcha
On Thu, Jun 7, 2012 at 2:11 AM, Duke > wrote:


Hi folks,

Please be gentle to the newest member of openMPI, I am totally
new to this field. I just built a test cluster with 3 boxes on
Scientific Linux 6.2 and openMPI (Open MPI 1.5.3), and I
wanted to test how the cluster works but I cant figure out
what was/is happening. On my master node, I have the hostfile:

[mpiuser@fantomfs40a ~]$ cat .mpi_hostfile
# The Hostfile for Open MPI
fantomfs40a slots=2
hp430a slots=4 max-slots=4
hp430b slots=4 max-slots=4

To test, I used the following c code:

[mpiuser@fantomfs40a ~]$ cat test/mpihello.c
/* program hello */
/* Adapted from mpihello.f by drs */

#include 
#include 

int main(int argc, char **argv)
{
 int *buf, i, rank, nints, len;
 char hostname[256];

 MPI_Init(,);
 MPI_Comm_rank(MPI_COMM_WORLD, );
 gethostname(hostname,255);
 printf("Hello world!  I am process number: %d on host %s\n",
rank, hostname);
 MPI_Finalize();
 return 0;
}

and then compiled and ran:

[mpiuser@fantomfs40a ~]$ mpicc -o test/mpihello test/mpihello.c
[mpiuser@fantomfs40a ~]$ mpirun -np 2 --machinefile
/home/mpiuser/.mpi_hostfile ./test/mpihello
Hello world!  I am process number: 0 on host fantomfs40a
Hello world!  I am process number: 1 on host fantomfs40a

Unfortunately the result did not show what I wanted. I
expected to see somethign like:

Hello world!  I am process number: 0 on host hp430a
Hello world!  I am process number: 1 on host hp430b

Anybody has any idea what I am doing wrong?

Thank you in advance,

D.





___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





___
users mailing list

Re: [OMPI users] testing for openMPI

2012-06-07 Thread Jeff Squyres
Exxxcellent.

Good luck!


On Jun 7, 2012, at 3:43 AM, Duke wrote:

> On 6/7/12 5:32 PM, Jeff Squyres wrote:
>> Check to ensure that you have firewalls disabled between your two machines; 
>> that's a common cause of hanging (i.e., Open MPI is trying to open 
>> connections and/or send data between your two nodes, and the packets are 
>> getting black-holed at the other side).
>> 
>> Open MPI needs to be able to communicate on random TCP ports between all 
>> machines that will be used in MPI jobs.
> 
> Thanks!!! After switching iptables off on all the machines, I got it working:
> 
> [mpiuser@fantomfs40a ~]$ mpirun -np 8 --machinefile 
> /home/mpiuser/.mpi_hostfile ./test/mpihello
> Hello world!  I am process number: 0 on host fantomfs40a
> Hello world!  I am process number: 1 on host fantomfs40a
> Hello world!  I am process number: 2 on host hp430a
> Hello world!  I am process number: 3 on host hp430a
> Hello world!  I am process number: 4 on host hp430a
> Hello world!  I am process number: 5 on host hp430a
> Hello world!  I am process number: 6 on host hp430b
> Hello world!  I am process number: 7 on host hp430b
> 
> Thanks so much for all the answers/suggestions. I am excited now :).
> 
> D.
> 
>> 
>> 
>> On Jun 7, 2012, at 3:06 AM, Duke wrote:
>> 
>>> Hi again,
>>> 
>>> Somehow the verbose flag (-v) did not work for me. I tried --debug-daemon 
>>> and got:
>>> 
>>> [mpiuser@fantomfs40a ~]$ mpirun --debug-daemons -np 3 --machinefile 
>>> /home/mpiuser/.mpi_hostfile ./test/mpihello
>>> Daemon was launched on hp430a - beginning to initialize
>>> Daemon [[34432,0],1] checking in as pid 3011 on host hp430a
>>> 
>>> 
>>> Somehow the program got stuck when checking on hosts. The secure log on 
>>> hp430a showed that mpiuser logged in just fine:
>>> 
>>> tail /var/log/secure
>>> Jun  7 17:07:31 hp430a sshd[3007]: Accepted publickey for mpiuser from 
>>> 192.168.0.101 port 34037 ssh2
>>> Jun  7 17:07:31 hp430a sshd[3007]: pam_unix(sshd:session): session opened 
>>> for user mpiuser by (uid=0)
>>> 
>>> Any idea where/how/what to process/check?
>>> 
>>> Thanks,
>>> 
>>> D.
>>> 
>>> On 6/7/12 4:38 PM, Duke wrote:
 Hi Jingha,
 
 On 6/7/12 4:28 PM, Jingcha Joba wrote:
> Hello Duke,
> Welcome to the forum.
> 
> The way openmpi schedules by default is to fill all the slots in a host, 
> before moving on to next host.
> 
> Check this link for some info:
> http://www.open-mpi.org/faq/?category=running#mpirun-scheduling
 Thanks for quick answer. I checked the FAQ, and tried with processes more 
 than 2, but somehow it got stalled:
 
 [mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --machinefile 
 /home/mpiuser/.mpi_hostfile ./test/mpihello
 ^Cmpirun: killing job...
 
 I tried --host flag and it got stalled as well:
 
 [mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --host hp430a,hp430b 
 ./test/mpihello
 
 
 My configuration must be wrong somewhere. Anyidea how I can check the 
 system?
 
 Thanks,
 
 D.
 
> 
> 
> --
> Jingcha
> On Thu, Jun 7, 2012 at 2:11 AM, Duke  wrote:
> Hi folks,
> 
> Please be gentle to the newest member of openMPI, I am totally new to 
> this field. I just built a test cluster with 3 boxes on Scientific Linux 
> 6.2 and openMPI (Open MPI 1.5.3), and I wanted to test how the cluster 
> works but I cant figure out what was/is happening. On my master node, I 
> have the hostfile:
> 
> [mpiuser@fantomfs40a ~]$ cat .mpi_hostfile
> # The Hostfile for Open MPI
> fantomfs40a slots=2
> hp430a slots=4 max-slots=4
> hp430b slots=4 max-slots=4
> 
> To test, I used the following c code:
> 
> [mpiuser@fantomfs40a ~]$ cat test/mpihello.c
> /* program hello */
> /* Adapted from mpihello.f by drs */
> 
> #include
> #include
> 
> int main(int argc, char **argv)
> {
>  int *buf, i, rank, nints, len;
>  char hostname[256];
> 
>  MPI_Init(,);
>  MPI_Comm_rank(MPI_COMM_WORLD,);
>  gethostname(hostname,255);
>  printf("Hello world!  I am process number: %d on host %s\n", rank, 
> hostname);
>  MPI_Finalize();
>  return 0;
> }
> 
> and then compiled and ran:
> 
> [mpiuser@fantomfs40a ~]$ mpicc -o test/mpihello test/mpihello.c
> [mpiuser@fantomfs40a ~]$ mpirun -np 2 --machinefile 
> /home/mpiuser/.mpi_hostfile ./test/mpihello
> Hello world!  I am process number: 0 on host fantomfs40a
> Hello world!  I am process number: 1 on host fantomfs40a
> 
> Unfortunately the result did not show what I wanted. I expected to see 
> somethign like:
> 
> Hello world!  I am process number: 0 on host hp430a
> Hello world!  I am process number: 1 on host hp430b
> 
> Anybody has any idea what I am doing wrong?
> 
> Thank you in advance,
> 
> D.
> 
> 
> 
> 

Re: [OMPI users] testing for openMPI

2012-06-07 Thread TERRY DONTJE
Another sanity think to try is see if you can run your test program on 
just one of the nodes?  If that works more than likely MPI is having 
issues setting up connections between the nodes.


--td

On 6/7/2012 6:06 AM, Duke wrote:

Hi again,

Somehow the verbose flag (-v) did not work for me. I tried 
--debug-daemon and got:


[mpiuser@fantomfs40a ~]$ mpirun --debug-daemons -np 3 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello

Daemon was launched on hp430a - beginning to initialize
Daemon [[34432,0],1] checking in as pid 3011 on host hp430a


Somehow the program got stuck when checking on hosts. The secure log 
on hp430a showed that mpiuser logged in just fine:


tail /var/log/secure
Jun  7 17:07:31 hp430a sshd[3007]: Accepted publickey for mpiuser from 
192.168.0.101 port 34037 ssh2
Jun  7 17:07:31 hp430a sshd[3007]: pam_unix(sshd:session): session 
opened for user mpiuser by (uid=0)


Any idea where/how/what to process/check?

Thanks,

D.

On 6/7/12 4:38 PM, Duke wrote:

Hi Jingha,

On 6/7/12 4:28 PM, Jingcha Joba wrote:

Hello Duke,
Welcome to the forum.
The way openmpi schedules by default is to fill all the slots in a 
host, before moving on to next host.

Check this link for some info:
http://www.open-mpi.org/faq/?category=running#mpirun-scheduling


Thanks for quick answer. I checked the FAQ, and tried with processes 
more than 2, but somehow it got stalled:


[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello

^Cmpirun: killing job...

I tried --host flag and it got stalled as well:

[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --host hp430a,hp430b 
./test/mpihello



My configuration must be wrong somewhere. Anyidea how I can check the 
system?


Thanks,

D.




--
Jingcha
On Thu, Jun 7, 2012 at 2:11 AM, Duke > wrote:


Hi folks,

Please be gentle to the newest member of openMPI, I am totally
new to this field. I just built a test cluster with 3 boxes on
Scientific Linux 6.2 and openMPI (Open MPI 1.5.3), and I wanted
to test how the cluster works but I cant figure out what was/is
happening. On my master node, I have the hostfile:

[mpiuser@fantomfs40a ~]$ cat .mpi_hostfile
# The Hostfile for Open MPI
fantomfs40a slots=2
hp430a slots=4 max-slots=4
hp430b slots=4 max-slots=4

To test, I used the following c code:

[mpiuser@fantomfs40a ~]$ cat test/mpihello.c
/* program hello */
/* Adapted from mpihello.f by drs */

#include 
#include 

int main(int argc, char **argv)
{
 int *buf, i, rank, nints, len;
 char hostname[256];

 MPI_Init(,);
 MPI_Comm_rank(MPI_COMM_WORLD, );
 gethostname(hostname,255);
 printf("Hello world!  I am process number: %d on host %s\n",
rank, hostname);
 MPI_Finalize();
 return 0;
}

and then compiled and ran:

[mpiuser@fantomfs40a ~]$ mpicc -o test/mpihello test/mpihello.c
[mpiuser@fantomfs40a ~]$ mpirun -np 2 --machinefile
/home/mpiuser/.mpi_hostfile ./test/mpihello
Hello world!  I am process number: 0 on host fantomfs40a
Hello world!  I am process number: 1 on host fantomfs40a

Unfortunately the result did not show what I wanted. I expected
to see somethign like:

Hello world!  I am process number: 0 on host hp430a
Hello world!  I am process number: 1 on host hp430b

Anybody has any idea what I am doing wrong?

Thank you in advance,

D.





___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] testing for openMPI

2012-06-07 Thread Duke

On 6/7/12 5:32 PM, Jeff Squyres wrote:

Check to ensure that you have firewalls disabled between your two machines; 
that's a common cause of hanging (i.e., Open MPI is trying to open connections 
and/or send data between your two nodes, and the packets are getting 
black-holed at the other side).

Open MPI needs to be able to communicate on random TCP ports between all 
machines that will be used in MPI jobs.


Thanks!!! After switching iptables off on all the machines, I got it 
working:


[mpiuser@fantomfs40a ~]$ mpirun -np 8 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello

Hello world!  I am process number: 0 on host fantomfs40a
Hello world!  I am process number: 1 on host fantomfs40a
Hello world!  I am process number: 2 on host hp430a
Hello world!  I am process number: 3 on host hp430a
Hello world!  I am process number: 4 on host hp430a
Hello world!  I am process number: 5 on host hp430a
Hello world!  I am process number: 6 on host hp430b
Hello world!  I am process number: 7 on host hp430b

Thanks so much for all the answers/suggestions. I am excited now :).

D.




On Jun 7, 2012, at 3:06 AM, Duke wrote:


Hi again,

Somehow the verbose flag (-v) did not work for me. I tried --debug-daemon and 
got:

[mpiuser@fantomfs40a ~]$ mpirun --debug-daemons -np 3 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello
Daemon was launched on hp430a - beginning to initialize
Daemon [[34432,0],1] checking in as pid 3011 on host hp430a


Somehow the program got stuck when checking on hosts. The secure log on hp430a 
showed that mpiuser logged in just fine:

tail /var/log/secure
Jun  7 17:07:31 hp430a sshd[3007]: Accepted publickey for mpiuser from 
192.168.0.101 port 34037 ssh2
Jun  7 17:07:31 hp430a sshd[3007]: pam_unix(sshd:session): session opened for 
user mpiuser by (uid=0)

Any idea where/how/what to process/check?

Thanks,

D.

On 6/7/12 4:38 PM, Duke wrote:

Hi Jingha,

On 6/7/12 4:28 PM, Jingcha Joba wrote:

Hello Duke,
Welcome to the forum.

The way openmpi schedules by default is to fill all the slots in a host, before 
moving on to next host.

Check this link for some info:
http://www.open-mpi.org/faq/?category=running#mpirun-scheduling

Thanks for quick answer. I checked the FAQ, and tried with processes more than 
2, but somehow it got stalled:

[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello
^Cmpirun: killing job...

I tried --host flag and it got stalled as well:

[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --host hp430a,hp430b ./test/mpihello


My configuration must be wrong somewhere. Anyidea how I can check the system?

Thanks,

D.




--
Jingcha
On Thu, Jun 7, 2012 at 2:11 AM, Duke  wrote:
Hi folks,

Please be gentle to the newest member of openMPI, I am totally new to this 
field. I just built a test cluster with 3 boxes on Scientific Linux 6.2 and 
openMPI (Open MPI 1.5.3), and I wanted to test how the cluster works but I cant 
figure out what was/is happening. On my master node, I have the hostfile:

[mpiuser@fantomfs40a ~]$ cat .mpi_hostfile
# The Hostfile for Open MPI
fantomfs40a slots=2
hp430a slots=4 max-slots=4
hp430b slots=4 max-slots=4

To test, I used the following c code:

[mpiuser@fantomfs40a ~]$ cat test/mpihello.c
/* program hello */
/* Adapted from mpihello.f by drs */

#include
#include

int main(int argc, char **argv)
{
  int *buf, i, rank, nints, len;
  char hostname[256];

  MPI_Init(,);
  MPI_Comm_rank(MPI_COMM_WORLD,);
  gethostname(hostname,255);
  printf("Hello world!  I am process number: %d on host %s\n", rank, hostname);
  MPI_Finalize();
  return 0;
}

and then compiled and ran:

[mpiuser@fantomfs40a ~]$ mpicc -o test/mpihello test/mpihello.c
[mpiuser@fantomfs40a ~]$ mpirun -np 2 --machinefile /home/mpiuser/.mpi_hostfile 
./test/mpihello
Hello world!  I am process number: 0 on host fantomfs40a
Hello world!  I am process number: 1 on host fantomfs40a

Unfortunately the result did not show what I wanted. I expected to see 
somethign like:

Hello world!  I am process number: 0 on host hp430a
Hello world!  I am process number: 1 on host hp430b

Anybody has any idea what I am doing wrong?

Thank you in advance,

D.





___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list

us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list

us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users






Re: [OMPI users] testing for openMPI

2012-06-07 Thread Duke

On 6/7/12 5:31 PM, TERRY DONTJE wrote:
Can you get on one of the nodes and see the job's processes?  If so 
can you then attach a debugger to it and get a stack?  I wonder if the 
processes are stuck in MPI_Init?


Thanks Terry for your suggestion, but please let me know how would I do 
it? I can ssh to the nodes, but how do I check the job's process? I am 
new to this.


Thanks,

D.



--td

On 6/7/2012 6:06 AM, Duke wrote:

Hi again,

Somehow the verbose flag (-v) did not work for me. I tried 
--debug-daemon and got:


[mpiuser@fantomfs40a ~]$ mpirun --debug-daemons -np 3 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello

Daemon was launched on hp430a - beginning to initialize
Daemon [[34432,0],1] checking in as pid 3011 on host hp430a


Somehow the program got stuck when checking on hosts. The secure log 
on hp430a showed that mpiuser logged in just fine:


tail /var/log/secure
Jun  7 17:07:31 hp430a sshd[3007]: Accepted publickey for mpiuser 
from 192.168.0.101 port 34037 ssh2
Jun  7 17:07:31 hp430a sshd[3007]: pam_unix(sshd:session): session 
opened for user mpiuser by (uid=0)


Any idea where/how/what to process/check?

Thanks,

D.

On 6/7/12 4:38 PM, Duke wrote:

Hi Jingha,

On 6/7/12 4:28 PM, Jingcha Joba wrote:

Hello Duke,
Welcome to the forum.
The way openmpi schedules by default is to fill all the slots in a 
host, before moving on to next host.

Check this link for some info:
http://www.open-mpi.org/faq/?category=running#mpirun-scheduling


Thanks for quick answer. I checked the FAQ, and tried with processes 
more than 2, but somehow it got stalled:


[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello

^Cmpirun: killing job...

I tried --host flag and it got stalled as well:

[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --host hp430a,hp430b 
./test/mpihello



My configuration must be wrong somewhere. Anyidea how I can check 
the system?


Thanks,

D.




--
Jingcha
On Thu, Jun 7, 2012 at 2:11 AM, Duke > wrote:


Hi folks,

Please be gentle to the newest member of openMPI, I am totally
new to this field. I just built a test cluster with 3 boxes on
Scientific Linux 6.2 and openMPI (Open MPI 1.5.3), and I wanted
to test how the cluster works but I cant figure out what was/is
happening. On my master node, I have the hostfile:

[mpiuser@fantomfs40a ~]$ cat .mpi_hostfile
# The Hostfile for Open MPI
fantomfs40a slots=2
hp430a slots=4 max-slots=4
hp430b slots=4 max-slots=4

To test, I used the following c code:

[mpiuser@fantomfs40a ~]$ cat test/mpihello.c
/* program hello */
/* Adapted from mpihello.f by drs */

#include 
#include 

int main(int argc, char **argv)
{
 int *buf, i, rank, nints, len;
 char hostname[256];

 MPI_Init(,);
 MPI_Comm_rank(MPI_COMM_WORLD, );
 gethostname(hostname,255);
 printf("Hello world!  I am process number: %d on host %s\n",
rank, hostname);
 MPI_Finalize();
 return 0;
}

and then compiled and ran:

[mpiuser@fantomfs40a ~]$ mpicc -o test/mpihello test/mpihello.c
[mpiuser@fantomfs40a ~]$ mpirun -np 2 --machinefile
/home/mpiuser/.mpi_hostfile ./test/mpihello
Hello world!  I am process number: 0 on host fantomfs40a
Hello world!  I am process number: 1 on host fantomfs40a

Unfortunately the result did not show what I wanted. I expected
to see somethign like:

Hello world!  I am process number: 0 on host hp430a
Hello world!  I am process number: 1 on host hp430b

Anybody has any idea what I am doing wrong?

Thank you in advance,

D.





___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] testing for openMPI

2012-06-07 Thread Jeff Squyres
Check to ensure that you have firewalls disabled between your two machines; 
that's a common cause of hanging (i.e., Open MPI is trying to open connections 
and/or send data between your two nodes, and the packets are getting 
black-holed at the other side).

Open MPI needs to be able to communicate on random TCP ports between all 
machines that will be used in MPI jobs.


On Jun 7, 2012, at 3:06 AM, Duke wrote:

> Hi again,
> 
> Somehow the verbose flag (-v) did not work for me. I tried --debug-daemon and 
> got:
> 
> [mpiuser@fantomfs40a ~]$ mpirun --debug-daemons -np 3 --machinefile 
> /home/mpiuser/.mpi_hostfile ./test/mpihello
> Daemon was launched on hp430a - beginning to initialize
> Daemon [[34432,0],1] checking in as pid 3011 on host hp430a
> 
> 
> Somehow the program got stuck when checking on hosts. The secure log on 
> hp430a showed that mpiuser logged in just fine:
> 
> tail /var/log/secure
> Jun  7 17:07:31 hp430a sshd[3007]: Accepted publickey for mpiuser from 
> 192.168.0.101 port 34037 ssh2
> Jun  7 17:07:31 hp430a sshd[3007]: pam_unix(sshd:session): session opened for 
> user mpiuser by (uid=0)
> 
> Any idea where/how/what to process/check?
> 
> Thanks,
> 
> D.
> 
> On 6/7/12 4:38 PM, Duke wrote:
>> Hi Jingha,
>> 
>> On 6/7/12 4:28 PM, Jingcha Joba wrote:
>>> Hello Duke,
>>> Welcome to the forum.
>>>  
>>> The way openmpi schedules by default is to fill all the slots in a host, 
>>> before moving on to next host.
>>>  
>>> Check this link for some info:
>>> http://www.open-mpi.org/faq/?category=running#mpirun-scheduling
>> 
>> Thanks for quick answer. I checked the FAQ, and tried with processes more 
>> than 2, but somehow it got stalled:
>> 
>> [mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --machinefile 
>> /home/mpiuser/.mpi_hostfile ./test/mpihello
>> ^Cmpirun: killing job...
>> 
>> I tried --host flag and it got stalled as well:
>> 
>> [mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --host hp430a,hp430b ./test/mpihello
>> 
>> 
>> My configuration must be wrong somewhere. Anyidea how I can check the system?
>> 
>> Thanks,
>> 
>> D.
>> 
>>>  
>>> 
>>> --
>>> Jingcha
>>> On Thu, Jun 7, 2012 at 2:11 AM, Duke  wrote:
>>> Hi folks,
>>> 
>>> Please be gentle to the newest member of openMPI, I am totally new to this 
>>> field. I just built a test cluster with 3 boxes on Scientific Linux 6.2 and 
>>> openMPI (Open MPI 1.5.3), and I wanted to test how the cluster works but I 
>>> cant figure out what was/is happening. On my master node, I have the 
>>> hostfile:
>>> 
>>> [mpiuser@fantomfs40a ~]$ cat .mpi_hostfile
>>> # The Hostfile for Open MPI
>>> fantomfs40a slots=2
>>> hp430a slots=4 max-slots=4
>>> hp430b slots=4 max-slots=4
>>> 
>>> To test, I used the following c code:
>>> 
>>> [mpiuser@fantomfs40a ~]$ cat test/mpihello.c
>>> /* program hello */
>>> /* Adapted from mpihello.f by drs */
>>> 
>>> #include 
>>> #include 
>>> 
>>> int main(int argc, char **argv)
>>> {
>>>  int *buf, i, rank, nints, len;
>>>  char hostname[256];
>>> 
>>>  MPI_Init(,);
>>>  MPI_Comm_rank(MPI_COMM_WORLD, );
>>>  gethostname(hostname,255);
>>>  printf("Hello world!  I am process number: %d on host %s\n", rank, 
>>> hostname);
>>>  MPI_Finalize();
>>>  return 0;
>>> }
>>> 
>>> and then compiled and ran:
>>> 
>>> [mpiuser@fantomfs40a ~]$ mpicc -o test/mpihello test/mpihello.c
>>> [mpiuser@fantomfs40a ~]$ mpirun -np 2 --machinefile 
>>> /home/mpiuser/.mpi_hostfile ./test/mpihello
>>> Hello world!  I am process number: 0 on host fantomfs40a
>>> Hello world!  I am process number: 1 on host fantomfs40a
>>> 
>>> Unfortunately the result did not show what I wanted. I expected to see 
>>> somethign like:
>>> 
>>> Hello world!  I am process number: 0 on host hp430a
>>> Hello world!  I am process number: 1 on host hp430b
>>> 
>>> Anybody has any idea what I am doing wrong?
>>> 
>>> Thank you in advance,
>>> 
>>> D.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> 
>>> ___
>>> users mailing list
>>> 
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> 
>> ___
>> users mailing list
>> 
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] testing for openMPI

2012-06-07 Thread TERRY DONTJE
Can you get on one of the nodes and see the job's processes?  If so can 
you then attach a debugger to it and get a stack?  I wonder if the 
processes are stuck in MPI_Init?


--td

On 6/7/2012 6:06 AM, Duke wrote:

Hi again,

Somehow the verbose flag (-v) did not work for me. I tried 
--debug-daemon and got:


[mpiuser@fantomfs40a ~]$ mpirun --debug-daemons -np 3 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello

Daemon was launched on hp430a - beginning to initialize
Daemon [[34432,0],1] checking in as pid 3011 on host hp430a


Somehow the program got stuck when checking on hosts. The secure log 
on hp430a showed that mpiuser logged in just fine:


tail /var/log/secure
Jun  7 17:07:31 hp430a sshd[3007]: Accepted publickey for mpiuser from 
192.168.0.101 port 34037 ssh2
Jun  7 17:07:31 hp430a sshd[3007]: pam_unix(sshd:session): session 
opened for user mpiuser by (uid=0)


Any idea where/how/what to process/check?

Thanks,

D.

On 6/7/12 4:38 PM, Duke wrote:

Hi Jingha,

On 6/7/12 4:28 PM, Jingcha Joba wrote:

Hello Duke,
Welcome to the forum.
The way openmpi schedules by default is to fill all the slots in a 
host, before moving on to next host.

Check this link for some info:
http://www.open-mpi.org/faq/?category=running#mpirun-scheduling


Thanks for quick answer. I checked the FAQ, and tried with processes 
more than 2, but somehow it got stalled:


[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello

^Cmpirun: killing job...

I tried --host flag and it got stalled as well:

[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --host hp430a,hp430b 
./test/mpihello



My configuration must be wrong somewhere. Anyidea how I can check the 
system?


Thanks,

D.




--
Jingcha
On Thu, Jun 7, 2012 at 2:11 AM, Duke > wrote:


Hi folks,

Please be gentle to the newest member of openMPI, I am totally
new to this field. I just built a test cluster with 3 boxes on
Scientific Linux 6.2 and openMPI (Open MPI 1.5.3), and I wanted
to test how the cluster works but I cant figure out what was/is
happening. On my master node, I have the hostfile:

[mpiuser@fantomfs40a ~]$ cat .mpi_hostfile
# The Hostfile for Open MPI
fantomfs40a slots=2
hp430a slots=4 max-slots=4
hp430b slots=4 max-slots=4

To test, I used the following c code:

[mpiuser@fantomfs40a ~]$ cat test/mpihello.c
/* program hello */
/* Adapted from mpihello.f by drs */

#include 
#include 

int main(int argc, char **argv)
{
 int *buf, i, rank, nints, len;
 char hostname[256];

 MPI_Init(,);
 MPI_Comm_rank(MPI_COMM_WORLD, );
 gethostname(hostname,255);
 printf("Hello world!  I am process number: %d on host %s\n",
rank, hostname);
 MPI_Finalize();
 return 0;
}

and then compiled and ran:

[mpiuser@fantomfs40a ~]$ mpicc -o test/mpihello test/mpihello.c
[mpiuser@fantomfs40a ~]$ mpirun -np 2 --machinefile
/home/mpiuser/.mpi_hostfile ./test/mpihello
Hello world!  I am process number: 0 on host fantomfs40a
Hello world!  I am process number: 1 on host fantomfs40a

Unfortunately the result did not show what I wanted. I expected
to see somethign like:

Hello world!  I am process number: 0 on host hp430a
Hello world!  I am process number: 1 on host hp430b

Anybody has any idea what I am doing wrong?

Thank you in advance,

D.





___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] testing for openMPI

2012-06-07 Thread Duke

Hi again,

Somehow the verbose flag (-v) did not work for me. I tried 
--debug-daemon and got:


[mpiuser@fantomfs40a ~]$ mpirun --debug-daemons -np 3 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello

Daemon was launched on hp430a - beginning to initialize
Daemon [[34432,0],1] checking in as pid 3011 on host hp430a


Somehow the program got stuck when checking on hosts. The secure log on 
hp430a showed that mpiuser logged in just fine:


tail /var/log/secure
Jun  7 17:07:31 hp430a sshd[3007]: Accepted publickey for mpiuser from 
192.168.0.101 port 34037 ssh2
Jun  7 17:07:31 hp430a sshd[3007]: pam_unix(sshd:session): session 
opened for user mpiuser by (uid=0)


Any idea where/how/what to process/check?

Thanks,

D.

On 6/7/12 4:38 PM, Duke wrote:

Hi Jingha,

On 6/7/12 4:28 PM, Jingcha Joba wrote:

Hello Duke,
Welcome to the forum.
The way openmpi schedules by default is to fill all the slots in a 
host, before moving on to next host.

Check this link for some info:
http://www.open-mpi.org/faq/?category=running#mpirun-scheduling


Thanks for quick answer. I checked the FAQ, and tried with processes 
more than 2, but somehow it got stalled:


[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello

^Cmpirun: killing job...

I tried --host flag and it got stalled as well:

[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --host hp430a,hp430b 
./test/mpihello



My configuration must be wrong somewhere. Anyidea how I can check the 
system?


Thanks,

D.




--
Jingcha
On Thu, Jun 7, 2012 at 2:11 AM, Duke > wrote:


Hi folks,

Please be gentle to the newest member of openMPI, I am totally
new to this field. I just built a test cluster with 3 boxes on
Scientific Linux 6.2 and openMPI (Open MPI 1.5.3), and I wanted
to test how the cluster works but I cant figure out what was/is
happening. On my master node, I have the hostfile:

[mpiuser@fantomfs40a ~]$ cat .mpi_hostfile
# The Hostfile for Open MPI
fantomfs40a slots=2
hp430a slots=4 max-slots=4
hp430b slots=4 max-slots=4

To test, I used the following c code:

[mpiuser@fantomfs40a ~]$ cat test/mpihello.c
/* program hello */
/* Adapted from mpihello.f by drs */

#include 
#include 

int main(int argc, char **argv)
{
 int *buf, i, rank, nints, len;
 char hostname[256];

 MPI_Init(,);
 MPI_Comm_rank(MPI_COMM_WORLD, );
 gethostname(hostname,255);
 printf("Hello world!  I am process number: %d on host %s\n",
rank, hostname);
 MPI_Finalize();
 return 0;
}

and then compiled and ran:

[mpiuser@fantomfs40a ~]$ mpicc -o test/mpihello test/mpihello.c
[mpiuser@fantomfs40a ~]$ mpirun -np 2 --machinefile
/home/mpiuser/.mpi_hostfile ./test/mpihello
Hello world!  I am process number: 0 on host fantomfs40a
Hello world!  I am process number: 1 on host fantomfs40a

Unfortunately the result did not show what I wanted. I expected
to see somethign like:

Hello world!  I am process number: 0 on host hp430a
Hello world!  I am process number: 1 on host hp430b

Anybody has any idea what I am doing wrong?

Thank you in advance,

D.





___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] testing for openMPI

2012-06-07 Thread Duke

Hi Jingha,

On 6/7/12 4:28 PM, Jingcha Joba wrote:

Hello Duke,
Welcome to the forum.
The way openmpi schedules by default is to fill all the slots in a 
host, before moving on to next host.

Check this link for some info:
http://www.open-mpi.org/faq/?category=running#mpirun-scheduling


Thanks for quick answer. I checked the FAQ, and tried with processes 
more than 2, but somehow it got stalled:


[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello

^Cmpirun: killing job...

I tried --host flag and it got stalled as well:

[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --host hp430a,hp430b 
./test/mpihello



My configuration must be wrong somewhere. Anyidea how I can check the 
system?


Thanks,

D.




--
Jingcha
On Thu, Jun 7, 2012 at 2:11 AM, Duke > wrote:


Hi folks,

Please be gentle to the newest member of openMPI, I am totally new
to this field. I just built a test cluster with 3 boxes on
Scientific Linux 6.2 and openMPI (Open MPI 1.5.3), and I wanted to
test how the cluster works but I cant figure out what was/is
happening. On my master node, I have the hostfile:

[mpiuser@fantomfs40a ~]$ cat .mpi_hostfile
# The Hostfile for Open MPI
fantomfs40a slots=2
hp430a slots=4 max-slots=4
hp430b slots=4 max-slots=4

To test, I used the following c code:

[mpiuser@fantomfs40a ~]$ cat test/mpihello.c
/* program hello */
/* Adapted from mpihello.f by drs */

#include 
#include 

int main(int argc, char **argv)
{
 int *buf, i, rank, nints, len;
 char hostname[256];

 MPI_Init(,);
 MPI_Comm_rank(MPI_COMM_WORLD, );
 gethostname(hostname,255);
 printf("Hello world!  I am process number: %d on host %s\n",
rank, hostname);
 MPI_Finalize();
 return 0;
}

and then compiled and ran:

[mpiuser@fantomfs40a ~]$ mpicc -o test/mpihello test/mpihello.c
[mpiuser@fantomfs40a ~]$ mpirun -np 2 --machinefile
/home/mpiuser/.mpi_hostfile ./test/mpihello
Hello world!  I am process number: 0 on host fantomfs40a
Hello world!  I am process number: 1 on host fantomfs40a

Unfortunately the result did not show what I wanted. I expected to
see somethign like:

Hello world!  I am process number: 0 on host hp430a
Hello world!  I am process number: 1 on host hp430b

Anybody has any idea what I am doing wrong?

Thank you in advance,

D.





___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] testing for openMPI

2012-06-07 Thread Jingcha Joba
Hello Duke,
Welcome to the forum.

The way openmpi schedules by default is to fill all the slots in a host,
before moving on to next host.

Check this link for some info:
http://www.open-mpi.org/faq/?category=running#mpirun-scheduling


--
Jingcha
On Thu, Jun 7, 2012 at 2:11 AM, Duke  wrote:

> Hi folks,
>
> Please be gentle to the newest member of openMPI, I am totally new to this
> field. I just built a test cluster with 3 boxes on Scientific Linux 6.2 and
> openMPI (Open MPI 1.5.3), and I wanted to test how the cluster works but I
> cant figure out what was/is happening. On my master node, I have the
> hostfile:
>
> [mpiuser@fantomfs40a ~]$ cat .mpi_hostfile
> # The Hostfile for Open MPI
> fantomfs40a slots=2
> hp430a slots=4 max-slots=4
> hp430b slots=4 max-slots=4
>
> To test, I used the following c code:
>
> [mpiuser@fantomfs40a ~]$ cat test/mpihello.c
> /* program hello */
> /* Adapted from mpihello.f by drs */
>
> #include 
> #include 
>
> int main(int argc, char **argv)
> {
>  int *buf, i, rank, nints, len;
>  char hostname[256];
>
>  MPI_Init(,);
>  MPI_Comm_rank(MPI_COMM_WORLD, );
>  gethostname(hostname,255);
>  printf("Hello world!  I am process number: %d on host %s\n", rank,
> hostname);
>  MPI_Finalize();
>  return 0;
> }
>
> and then compiled and ran:
>
> [mpiuser@fantomfs40a ~]$ mpicc -o test/mpihello test/mpihello.c
> [mpiuser@fantomfs40a ~]$ mpirun -np 2 --machinefile
> /home/mpiuser/.mpi_hostfile ./test/mpihello
> Hello world!  I am process number: 0 on host fantomfs40a
> Hello world!  I am process number: 1 on host fantomfs40a
>
> Unfortunately the result did not show what I wanted. I expected to see
> somethign like:
>
> Hello world!  I am process number: 0 on host hp430a
> Hello world!  I am process number: 1 on host hp430b
>
> Anybody has any idea what I am doing wrong?
>
> Thank you in advance,
>
> D.
>
>
>
>
>
> __**_
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/**mailman/listinfo.cgi/users
>


[OMPI users] testing for openMPI

2012-06-07 Thread Duke

Hi folks,

Please be gentle to the newest member of openMPI, I am totally new to 
this field. I just built a test cluster with 3 boxes on Scientific Linux 
6.2 and openMPI (Open MPI 1.5.3), and I wanted to test how the cluster 
works but I cant figure out what was/is happening. On my master node, I 
have the hostfile:


[mpiuser@fantomfs40a ~]$ cat .mpi_hostfile
# The Hostfile for Open MPI
fantomfs40a slots=2
hp430a slots=4 max-slots=4
hp430b slots=4 max-slots=4

To test, I used the following c code:

[mpiuser@fantomfs40a ~]$ cat test/mpihello.c
/* program hello */
/* Adapted from mpihello.f by drs */

#include 
#include 

int main(int argc, char **argv)
{
  int *buf, i, rank, nints, len;
  char hostname[256];

  MPI_Init(,);
  MPI_Comm_rank(MPI_COMM_WORLD, );
  gethostname(hostname,255);
  printf("Hello world!  I am process number: %d on host %s\n", rank, 
hostname);

  MPI_Finalize();
  return 0;
}

and then compiled and ran:

[mpiuser@fantomfs40a ~]$ mpicc -o test/mpihello test/mpihello.c
[mpiuser@fantomfs40a ~]$ mpirun -np 2 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello

Hello world!  I am process number: 0 on host fantomfs40a
Hello world!  I am process number: 1 on host fantomfs40a

Unfortunately the result did not show what I wanted. I expected to see 
somethign like:


Hello world!  I am process number: 0 on host hp430a
Hello world!  I am process number: 1 on host hp430b

Anybody has any idea what I am doing wrong?

Thank you in advance,

D.