Re: [OMPI users] mpirun example program fail on multiple nodes- unable to launch specified application on client node

2009-11-05 Thread Terry Frankcombe
For small ad hoc COWs I'd vote for sshfs too.  It may well be as slow as
a dog, but it actually has some security, unlike NFS, and is a doddle to
make work with no superuser access on the server, unlike NFS.


On Thu, 2009-11-05 at 17:53 -0500, Jeff Squyres wrote:
> On Nov 5, 2009, at 5:34 PM, Douglas Guptill wrote:
> 
> > I am currently using sshfs to mount both OpenMPI and my application on
> > the "other" computers/nodes.  The advantage to this is that I have
> > only one copy of OpenMPI and my application.  There may be a
> > performance penalty, but I haven't seen it yet.
> >
> 
> 
> For a small number of nodes (where small <=32 or sometimes even <=64),  
> I find that simple NFS works just fine.  If your apps aren't IO  
> intensive, that can greatly simplify installation and deployment of  
> both Open MPI and your MPI applications IMNSHO.
> 
> But -- every app is different.  :-)  YMMV.
> 



Re: [OMPI users] mpirun example program fail on multiple nodes- unable to launch specified application on client node

2009-11-05 Thread Jeff Squyres

On Nov 5, 2009, at 5:34 PM, Douglas Guptill wrote:


I am currently using sshfs to mount both OpenMPI and my application on
the "other" computers/nodes.  The advantage to this is that I have
only one copy of OpenMPI and my application.  There may be a
performance penalty, but I haven't seen it yet.




For a small number of nodes (where small <=32 or sometimes even <=64),  
I find that simple NFS works just fine.  If your apps aren't IO  
intensive, that can greatly simplify installation and deployment of  
both Open MPI and your MPI applications IMNSHO.


But -- every app is different.  :-)  YMMV.

--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI users] mpirun example program fail on multiple nodes - unable to launch specified application on client node

2009-11-05 Thread Douglas Guptill
On Thu, Nov 05, 2009 at 03:15:33PM -0600, Qing Pang wrote:

> Thank you Jeff! That solves the problem. :-) You are the lifesaver!
> So does that means I always need to copy my application to all the  
> nodes? Or should I give the pathname of the my executable in a different  
> way to avoid this? Do I need a network file system for that?

I am currently using sshfs to mount both OpenMPI and my application on
the "other" computers/nodes.  The advantage to this is that I have
only one copy of OpenMPI and my application.  There may be a
performance penalty, but I haven't seen it yet.

Douglas.


Re: [OMPI users] mpirun example program fail on multiple nodes - unable to launch specified application on client node

2009-11-05 Thread Jeff Squyres

On Nov 5, 2009, at 4:15 PM, Qing Pang wrote:


Thank you Jeff! That solves the problem. :-) You are the lifesaver!
So does that means I always need to copy my application to all the
nodes? Or should I give the pathname of the my executable in a  
different

way to avoid this? Do I need a network file system for that?



Your executable needs to be available on all nodes, yes, whether you  
have copied it out there or whether you use a network filesystem.  For  
a small number of nodes, using a network filesystem is likely much  
more convenient.


See http://www.open-mpi.org/faq/?category=running#do-i-need-a-common-filesystem

--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI users] mpirun example program fail on multiple nodes - unable to launch specified application on client node

2009-11-05 Thread Qing Pang

Thank you Jeff! That solves the problem. :-) You are the lifesaver!
So does that means I always need to copy my application to all the 
nodes? Or should I give the pathname of the my executable in a different 
way to avoid this? Do I need a network file system for that?



Jeff Squyres wrote:
The short version of the answer is to check to see that the executable 
is in the same location on both nodes (apparently: 
/home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out).  Open MPI is 
complaining that it can't find that specific executable on the .194 node.


See below for more detail.


On Nov 5, 2009, at 3:19 PM, qing pang wrote:


1) I'm trying to run opemMPI with the following setting:

1 PC (as master node) and 1 notebook (as client node) connected to an
ethernet router through ethernet cable. Both running Ubuntu 8.10.
There's no other connections. - Is this setting OK to run OpenMPI?



Yes.


2) Prerequisites

SSH has been set up so that the master node can access the client node
through passwordless ssh. I do notice that it takes 10~15 seconds
between me entering '>ssh 'command and getting onto
the client node.
--- Could this be too slow for openmpi to run properlly?



Nope -- should be ok.


I do not have programs like network file system, network time protocol,
resource management, scheduler, etc installed.
--- Does OpenMPI need any prerequites other than passwordless ssh?



Not in this case, no.


3) OpenMPI is installed on both nodes - downloaded from open-mpi.org,
and do configure/make all using Default Settings.

4) PATH and LD_LIBRARY_PATH
On both nodes,
PATH is
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games,
which is the default setting in ubuntu.
LD_LIBRARY_PATH is set in ~/.bashrc - I added one line at the end of the
file, 'export LD_LIBRARY_PATH=usr/local/lib:usr/lib'
So when I echo them on both nodes, I get:
 >echo $PATH
 >/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games 


 >echo $LD_LIBRARY_PATH
 >usr/local/lib:usr/lib

But, if I do
 >ssh  'echo $LD_LIBRARY_PATH'
nothing comes back.

while
 >ssh  'echo $PATH'
comes back with the right path.

Is that a problem?



No.


4) Problem:
I compiled the example Hello_c using
 >mpicc hello_c.c -o hello_c.out
and run them on both nodes locally, everything works fine.

But when I tried to run it on 2 nodes (-np 2)
 >mpirun -machinefile machine.linux -np 2 $(pwd)/hello_c.out
I got the following error:

 


gordon@gordon-desktop:~/Desktop/openmpi-1.3.3/examples$ mpirun
--machinefile machine.linux -np 2 $(pwd)/hello_c.out
-- 

mpirun was unable to launch the specified application as it could not 
access

or execute an executable:

Executable: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
Node: 192.168.0.194



You are giving an absolute pathname in the mpirun command line:

mpirun -machinefile machine.linux -np 2 $(pwd)/hello_c.out


Hence, it's looking for exactly 
/home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out on both 
nodes.  If the executable is in a different directory on the other 
node, that's where you're probably running into the problem.






Re: [OMPI users] mpirun example program fail on multiple nodes - unable to launch specified application on client node

2009-11-05 Thread Jeff Squyres
The short version of the answer is to check to see that the executable  
is in the same location on both nodes (apparently: /home/gordon/ 
Desktop/openmpi-1.3.3/examples/hello_c.out).  Open MPI is complaining  
that it can't find that specific executable on the .194 node.


See below for more detail.


On Nov 5, 2009, at 3:19 PM, qing pang wrote:


1) I'm trying to run opemMPI with the following setting:

1 PC (as master node) and 1 notebook (as client node) connected to an
ethernet router through ethernet cable. Both running Ubuntu 8.10.
There's no other connections. - Is this setting OK to run OpenMPI?



Yes.


2) Prerequisites

SSH has been set up so that the master node can access the client node
through passwordless ssh. I do notice that it takes 10~15 seconds
between me entering '>ssh 'command and getting onto
the client node.
--- Could this be too slow for openmpi to run properlly?



Nope -- should be ok.

I do not have programs like network file system, network time  
protocol,

resource management, scheduler, etc installed.
--- Does OpenMPI need any prerequites other than passwordless ssh?



Not in this case, no.


3) OpenMPI is installed on both nodes - downloaded from open-mpi.org,
and do configure/make all using Default Settings.

4) PATH and LD_LIBRARY_PATH
On both nodes,
PATH is
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/ 
games,

which is the default setting in ubuntu.
LD_LIBRARY_PATH is set in ~/.bashrc - I added one line at the end of  
the

file, 'export LD_LIBRARY_PATH=usr/local/lib:usr/lib'
So when I echo them on both nodes, I get:
 >echo $PATH
 >/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/ 
games

 >echo $LD_LIBRARY_PATH
 >usr/local/lib:usr/lib

But, if I do
 >ssh  'echo $LD_LIBRARY_PATH'
nothing comes back.

while
 >ssh  'echo $PATH'
comes back with the right path.

Is that a problem?



No.


4) Problem:
I compiled the example Hello_c using
 >mpicc hello_c.c -o hello_c.out
and run them on both nodes locally, everything works fine.

But when I tried to run it on 2 nodes (-np 2)
 >mpirun -machinefile machine.linux -np 2 $(pwd)/hello_c.out
I got the following error:


gordon@gordon-desktop:~/Desktop/openmpi-1.3.3/examples$ mpirun
--machinefile machine.linux -np 2 $(pwd)/hello_c.out
--
mpirun was unable to launch the specified application as it could  
not access

or execute an executable:

Executable: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
Node: 192.168.0.194



You are giving an absolute pathname in the mpirun command line:

mpirun -machinefile machine.linux -np 2 $(pwd)/hello_c.out


Hence, it's looking for exactly /home/gordon/Desktop/openmpi-1.3.3/ 
examples/hello_c.out on both nodes.  If the executable is in a  
different directory on the other node, that's where you're probably  
running into the problem.


--
Jeff Squyres
jsquy...@cisco.com



[OMPI users] mpirun example program fail on multiple nodes - unable to launch specified application on client node

2009-11-05 Thread qing pang

Dear Sir/Madam,

I'm having problem running example program. Please kindly help --- I've 
been fooling with it for days, kind of getting lost.


-
MPIRUN fails on example hello prgram
-unable to launch the specified application on client node
-

1) I'm trying to run opemMPI with the following setting:

1 PC (as master node) and 1 notebook (as client node) connected to an 
ethernet router through ethernet cable. Both running Ubuntu 8.10. 
There's no other connections. - Is this setting OK to run OpenMPI?


2) Prerequisites

SSH has been set up so that the master node can access the client node 
through passwordless ssh. I do notice that it takes 10~15 seconds 
between me entering '>ssh 'command and getting onto 
the client node.

--- Could this be too slow for openmpi to run properlly?

I do not have programs like network file system, network time protocol, 
resource management, scheduler, etc installed.

--- Does OpenMPI need any prerequites other than passwordless ssh?

3) OpenMPI is installed on both nodes - downloaded from open-mpi.org, 
and do configure/make all using Default Settings.


4) PATH and LD_LIBRARY_PATH
On both nodes,
PATH is 
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games, 
which is the default setting in ubuntu.
LD_LIBRARY_PATH is set in ~/.bashrc - I added one line at the end of the 
file, 'export LD_LIBRARY_PATH=usr/local/lib:usr/lib'

So when I echo them on both nodes, I get:
>echo $PATH
>/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
>echo $LD_LIBRARY_PATH
>usr/local/lib:usr/lib

But, if I do
>ssh  'echo $LD_LIBRARY_PATH'
nothing comes back.

while
>ssh  'echo $PATH'
comes back with the right path.

Is that a problem?

4) Problem:
I compiled the example Hello_c using
>mpicc hello_c.c -o hello_c.out
and run them on both nodes locally, everything works fine.

But when I tried to run it on 2 nodes (-np 2)
>mpirun -machinefile machine.linux -np 2 $(pwd)/hello_c.out
I got the following error:


gordon@gordon-desktop:~/Desktop/openmpi-1.3.3/examples$ mpirun 
--machinefile machine.linux -np 2 $(pwd)/hello_c.out

--
mpirun was unable to launch the specified application as it could not access
or execute an executable:

Executable: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
Node: 192.168.0.194

while attempting to start process rank 1.
--

Sometimes I get one other error message after that:
--
[gordon-desktop:30748] [[25975,0],0]-[[25975,1],0] mca_oob_tcp_msg_recv: 
readv failed: Connection reset by peer (104)

--

5) Infomation attached:
ifconfig_masternode - output of ifconfig on masternode
ifconfig_slavenode - output of ifconfig on slavenode
ompi_info.txt - output of ompi_info -all
config.log - OpenMPI logfile
machine.linux - the machinefile used in mpirun command

--
Sincerely,
Qing Pang
(601) 979 0270



mpirun_info.tar.gz
Description: application/gzip
-
MPIRUN fails on example hello prgram 
-unable to launch the specified application on client node 
-


1) I'm trying to run opemMPI with the following setting:

1 PC (as master node) and 1 notebook (as client node) connected to an ethernet 
router through ethernet cable. Both running Ubuntu 8.10. There's no other 
connections. - Is this setting OK to run OpenMPI?

2) Prerequisites

SSH has been set up so that the master node can access the client node through 
passwordless ssh. I do notice that it takes 10~15 seconds between me entering 
'>ssh 'command and getting onto the client node. - Can this 
be too slow for openmpi to run properlly? 

I do not have programs like network file system, network time protocol, 
resource management, scheduler, etc installed. - Does OpenMPI have any 
prerequites other than passwordless ssh?

3) OpenMPI is installed on both nodes - downloaded from open-mpi.org, and do 
configure/make all using Default Settings.

4) PATH and LD_LIBRARY_PATH
On both nodes,
PATH is 
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games, which 
is the default setting in ubuntu.
LD_LIBRARY_PATH is set in ~/.bashrc - I added one line at the end of the file, 
'export LD_LIBRARY_PATH=usr/local/lib:usr/lib'
So when I echo them on both nodes, I get:
>echo $PATH 
>/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
>echo $LD_LIBRARY_PATH
>usr/local/lib:usr/lib


Re: [OMPI users] Openmpi on Heterogeneous environment

2009-11-05 Thread Pallab Datta
I have had issues for running in cross platforms..ie. Mac OSX and Linux
(Ubuntu)..haven't got it resolved..check firewalls if thats blocking any
communication..

> Dear Open-mpi users,
>
> I have installed openmpi on 2 different machines with different
> architectures (INTEL and x86_64) separately (command: ./configure
> --enable-heterogeneous). Compiled executables of the same code for these 2
> arch. Kept these executables on individual machines. Prepared a hostfile
> containing the names of those 2 machines.
> Now, when I want to execute the code (giving command - ./mpirun -hostfile
> machines executable), it doesn't work, giving error message:
>
> MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
> with errorcode 1.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --
> --
> mpirun has exited due to process rank 2 with PID 1712 on
> node studpc1.xxx..xx exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here)
>
> When I keep only one machine-name in the hostfile, then the execution
> works
> perfect.
>
> Will anybody please guide me to run the program on heterogeneous
> environment
> using mpirun!
>
> Thanking you,
>
> Sincerely,
> Yogesh
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] Openmpi on Heterogeneous environment

2009-11-05 Thread Yogesh Aher
Dear Open-mpi users,

I have installed openmpi on 2 different machines with different
architectures (INTEL and x86_64) separately (command: ./configure
--enable-heterogeneous). Compiled executables of the same code for these 2
arch. Kept these executables on individual machines. Prepared a hostfile
containing the names of those 2 machines.
Now, when I want to execute the code (giving command - ./mpirun -hostfile
machines executable), it doesn't work, giving error message:

MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--
--
mpirun has exited due to process rank 2 with PID 1712 on
node studpc1.xxx..xx exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here)

When I keep only one machine-name in the hostfile, then the execution works
perfect.

Will anybody please guide me to run the program on heterogeneous environment
using mpirun!

Thanking you,

Sincerely,
Yogesh


Re: [OMPI users] Help: Firewall problems

2009-11-05 Thread Terry Dontje
Technically MPI Spec may not put a requirement on TCP/IP, however Open 
MPI's runtime environment needs some way to launch jobs and pass data 
around in a standard way and it currently uses TCP/IP.  That being said 
there have been rumblings for some time to use other protocols but that 
has not yet come into being.


--td


*Subject:* [OMPI users] Help: Firewall problems
*From:* Lee Amy (/openlinuxsource_at_[hidden]/)
*Date:* 2009-11-05 11:28:50

Hi,

I remembered MPI does not count on TCP/IP but why default iptables
will prevent the MPI programs from running? After I stop iptables then
programs run well. I use Ethernet as connection.

Could anyone tell me tips about fix this problem?

Thank you very much.

Amy



Re: [OMPI users] Segmentation fault whilst running RaXML-MPI

2009-11-05 Thread Jeff Squyres
FWIW, I think Intel released 11.1.059 earlier today (I've been trying  
to download it all morning).  I doubt it's an issue in this case, but  
I thought I'd mention it as a public service announcement.  ;-)


Seg faults are *usually* an application issue (never say "never", but  
they *usually* are).  You might want to first contact the RaXML team  
to see if there are any known issues with their software and Open MPI  
1.3.3...?  (Sorry, I'm totally unfamiliar with RaXML)


On Nov 5, 2009, at 12:30 PM, Nick Holway wrote:


Dear all,

I'm trying to run RaXML 7.0.4 on my 64bit Rocks 5.1 cluster (ie Centos
5.2). I compiled Open MPI 1.3.3 using the Intel compilers v 11.1.056
using ./configure CC=icc CXX=icpc F77=ifort FC=ifort --with-sge
--prefix=/usr/prog/mpi/openmpi/1.3.3/x86_64-no-mem-man
--with-memory-manager=none.

When I run run RaXML in a qlogin session using
/usr/prog/mpi/openmpi/1.3.3/x86_64-no-mem-man/bin/mpirun -np 8
/usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI
-f a -x 12345 -p12345 -# 10 -m GTRGAMMA -s
/users/holwani1/jay/ornodko-1582 -n mpitest39

I get the following output:

This is the RAxML MPI Worker Process Number: 1
This is the RAxML MPI Worker Process Number: 3

This is the RAxML MPI Master process

This is the RAxML MPI Worker Process Number: 7

This is the RAxML MPI Worker Process Number: 4

This is the RAxML MPI Worker Process Number: 5

This is the RAxML MPI Worker Process Number: 2

This is the RAxML MPI Worker Process Number: 6
IMPORTANT WARNING: Alignment column 1695 contains only undetermined
values which will be treated as missing data


IMPORTANT WARNING: Sequences A4_H10 and A3ii_E11 are exactly identical


IMPORTANT WARNING: Sequences A2_A08 and A9_C10 are exactly identical


IMPORTANT WARNING: Sequences A3ii_B03 and A3ii_C06 are exactly  
identical



IMPORTANT WARNING: Sequences A9_D08 and A9_F10 are exactly identical


IMPORTANT WARNING: Sequences A3ii_F07 and A9_C08 are exactly identical


IMPORTANT WARNING: Sequences A6_F05 and A6_F11 are exactly identical

IMPORTANT WARNING
Found 6 sequences that are exactly identical to other sequences in the
alignment.
Normally they should be excluded from the analysis.


IMPORTANT WARNING
Found 1 column that contains only undetermined values which will be
treated as missing data.
Normally these columns should be excluded from the analysis.

An alignment file with undetermined columns and sequence duplicates
removed has already
been printed to file /users/holwani1/jay/ornodko-1582.reduced


You are using RAxML version 7.0.4 released by Alexandros Stamatakis in
April 2008

Alignment has 1280 distinct alignment patterns

Proportion of gaps and completely undetermined characters in this
alignment: 0.124198

RAxML rapid bootstrapping and subsequent ML search


Executing 10 rapid bootstrap inferences and thereafter a thorough ML  
search


All free model parameters will be estimated by RAxML
GAMMA model of rate heteorgeneity, ML estimate of alpha-parameter
GAMMA Model parameters will be estimated up to an accuracy of
0.10 Log Likelihood units

Partition: 0
Name: No Name Provided
DataType: DNA
Substitution Matrix: GTR
Empirical Base Frequencies:
pi(A): 0.261129 pi(C): 0.228570 pi(G): 0.315946 pi(T): 0.194354


Switching from GAMMA to CAT for rapid Bootstrap, final ML search will
be conducted under the GAMMA model you specified
Bootstrap[10]: Time 44.442728 bootstrap likelihood -inf, best
rearrangement setting 5
Bootstrap[0]: Time 44.814948 bootstrap likelihood -inf, best
rearrangement setting 5
Bootstrap[6]: Time 46.470371 bootstrap likelihood -inf, best
rearrangement setting 6
[compute-0-11:08698] *** Process received signal ***
[compute-0-11:08698] Signal: Segmentation fault (11)
[compute-0-11:08698] Signal code: Address not mapped (1)
[compute-0-11:08698] Failing at address: 0x408
[compute-0-11:08698] [ 0] /lib64/libpthread.so.0 [0x3fb580de80]
[compute-0-11:08698] [ 1]
/usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC- 
MPI(hookup+0)

[0x413ca0]
[compute-0-11:08698] [ 2]
/usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC- 
MPI(restoreTL+0xd9)

[0x442c09]
[compute-0-11:08698] [ 3]
/usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI
[0x42c968]
[compute-0-11:08698] [ 4]
/usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC- 
MPI(doAllInOne+0x91a)

[0x42b21a]
[compute-0-11:08698] [ 5]
/usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC- 
MPI(main+0xc25)

[0x4063f5]
[compute-0-11:08698] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x3fb501d8b4]
[compute-0-11:08698] [ 7]
/usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI
[0x405719]
[compute-0-11:08698] *** End of error message ***
Bootstrap[1]: Time 8.400332 bootstrap likelihood -inf, best
rearrangement setting 5
--
mpirun noticed that process rank 1 with PID 8698 on node
compute-0-11.local exited on 

[OMPI users] Segmentation fault whilst running RaXML-MPI

2009-11-05 Thread Nick Holway
Dear all,

I'm trying to run RaXML 7.0.4 on my 64bit Rocks 5.1 cluster (ie Centos
5.2). I compiled Open MPI 1.3.3 using the Intel compilers v 11.1.056
using ./configure CC=icc CXX=icpc F77=ifort FC=ifort --with-sge
--prefix=/usr/prog/mpi/openmpi/1.3.3/x86_64-no-mem-man
--with-memory-manager=none.

When I run run RaXML in a qlogin session using
/usr/prog/mpi/openmpi/1.3.3/x86_64-no-mem-man/bin/mpirun -np 8
/usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI
-f a -x 12345 -p12345 -# 10 -m GTRGAMMA -s
/users/holwani1/jay/ornodko-1582 -n mpitest39

I get the following output:

This is the RAxML MPI Worker Process Number: 1
This is the RAxML MPI Worker Process Number: 3

This is the RAxML MPI Master process

This is the RAxML MPI Worker Process Number: 7

This is the RAxML MPI Worker Process Number: 4

This is the RAxML MPI Worker Process Number: 5

This is the RAxML MPI Worker Process Number: 2

This is the RAxML MPI Worker Process Number: 6
IMPORTANT WARNING: Alignment column 1695 contains only undetermined
values which will be treated as missing data


IMPORTANT WARNING: Sequences A4_H10 and A3ii_E11 are exactly identical


IMPORTANT WARNING: Sequences A2_A08 and A9_C10 are exactly identical


IMPORTANT WARNING: Sequences A3ii_B03 and A3ii_C06 are exactly identical


IMPORTANT WARNING: Sequences A9_D08 and A9_F10 are exactly identical


IMPORTANT WARNING: Sequences A3ii_F07 and A9_C08 are exactly identical


IMPORTANT WARNING: Sequences A6_F05 and A6_F11 are exactly identical

IMPORTANT WARNING
Found 6 sequences that are exactly identical to other sequences in the
alignment.
Normally they should be excluded from the analysis.


IMPORTANT WARNING
Found 1 column that contains only undetermined values which will be
treated as missing data.
Normally these columns should be excluded from the analysis.

An alignment file with undetermined columns and sequence duplicates
removed has already
been printed to file /users/holwani1/jay/ornodko-1582.reduced


You are using RAxML version 7.0.4 released by Alexandros Stamatakis in
April 2008

Alignment has 1280 distinct alignment patterns

Proportion of gaps and completely undetermined characters in this
alignment: 0.124198

RAxML rapid bootstrapping and subsequent ML search


Executing 10 rapid bootstrap inferences and thereafter a thorough ML search

All free model parameters will be estimated by RAxML
GAMMA model of rate heteorgeneity, ML estimate of alpha-parameter
GAMMA Model parameters will be estimated up to an accuracy of
0.10 Log Likelihood units

Partition: 0
Name: No Name Provided
DataType: DNA
Substitution Matrix: GTR
Empirical Base Frequencies:
pi(A): 0.261129 pi(C): 0.228570 pi(G): 0.315946 pi(T): 0.194354


Switching from GAMMA to CAT for rapid Bootstrap, final ML search will
be conducted under the GAMMA model you specified
Bootstrap[10]: Time 44.442728 bootstrap likelihood -inf, best
rearrangement setting 5
Bootstrap[0]: Time 44.814948 bootstrap likelihood -inf, best
rearrangement setting 5
Bootstrap[6]: Time 46.470371 bootstrap likelihood -inf, best
rearrangement setting 6
[compute-0-11:08698] *** Process received signal ***
[compute-0-11:08698] Signal: Segmentation fault (11)
[compute-0-11:08698] Signal code: Address not mapped (1)
[compute-0-11:08698] Failing at address: 0x408
[compute-0-11:08698] [ 0] /lib64/libpthread.so.0 [0x3fb580de80]
[compute-0-11:08698] [ 1]
/usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI(hookup+0)
[0x413ca0]
[compute-0-11:08698] [ 2]
/usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI(restoreTL+0xd9)
[0x442c09]
[compute-0-11:08698] [ 3]
/usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI
[0x42c968]
[compute-0-11:08698] [ 4]
/usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI(doAllInOne+0x91a)
[0x42b21a]
[compute-0-11:08698] [ 5]
/usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI(main+0xc25)
[0x4063f5]
[compute-0-11:08698] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x3fb501d8b4]
[compute-0-11:08698] [ 7]
/usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI
[0x405719]
[compute-0-11:08698] *** End of error message ***
Bootstrap[1]: Time 8.400332 bootstrap likelihood -inf, best
rearrangement setting 5
--
mpirun noticed that process rank 1 with PID 8698 on node
compute-0-11.local exited on signal 11 (Segmentation fault).
--



My $PATH is 

[OMPI users] Help: Firewall problems

2009-11-05 Thread Lee Amy
Hi,

I remembered MPI does not count on TCP/IP but why default iptables
will prevent the MPI programs from running? After I stop iptables then
programs run well. I use Ethernet as connection.

Could anyone tell me tips about fix this problem?

Thank you very much.

Amy


Re: [OMPI users] Mac OSX 10.6 (SL) + openMPI 1.3.3 + IntelCompilers 11.1.076

2009-11-05 Thread Jeff Squyres

On Nov 5, 2009, at 9:00 AM, Christophe Peyret wrote:

How can I deactivate Xgrid launching in order to be able to use open- 
mpi under snow leopard ?


Easiest way is to just remove the xgrid plugin:

  rm where_you_installed_ompi/lib/openmpi/mca_*xgrid*

--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI users] Mac OSX 10.6 (SL) + openMPI 1.3.3 + Intel Compilers 11.1.076

2009-11-05 Thread Christophe Peyret


How can I deactivate Xgrid launching in order to be able to use open- 
mpi under snow leopard ?



Le 5 nov. 2009 à 13:18, Christophe Peyret a écrit :


Hello,

I'm trying to launch a job with mpirun on my Mac Pro and I have a  
strange error message,

any idea ?

Christophe


[santafe.onera:00235] orte:plm:xgrid: Connection to XGrid controller  
unexpectedly closed: (600) The operation couldn’t be completed.  
(BEEP error 600.)
2009-11-05 13:13:53.973 orted[235:903] *** Terminating app due to  
uncaught exception 'NSInvalidArgumentException', reason: '*** - 
[XGConnection<0x100224df0> finalize]: called when collecting not  
enabled'

*** Call stack at first throw:
(
	0   CoreFoundation  0x7fff8712c5a4  
__exceptionPreprocess + 180
	1   libobjc.A.dylib 0x7fff87b8d313  
objc_exception_throw + 45
	2   CoreFoundation  0x7fff87147251 - 
[NSObject(NSObject) finalize] + 129
	3   mca_plm_xgrid.so0x000100149720 - 
[PlmXGridClient dealloc] + 64
	4   mca_plm_xgrid.so0x0001001480e0  
orte_plm_xgrid_finalize + 64
	5   mca_plm_xgrid.so0x000100147fa1  
orte_plm_xgrid_component_query + 529
	6   libopen-pal.0.dylib 0x0001000811ea  
mca_base_select + 186

)
terminate called after throwing an instance of 'NSException'
[santafe:00235] *** Process received signal ***
[santafe:00235] Signal: Abort trap (6)
[santafe:00235] Signal code:  (0)
[santafe:00235] *** End of error message ***
[santafe.onera:00233] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to  
start a daemon on the local node in file ess_singleton_module.c at  
line 381
[santafe.onera:00233] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to  
start a daemon on the local node in file ess_singleton_module.c at  
line 143
[santafe.onera:00233] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to  
start a daemon on the local node in file runtime/orte_init.c at line  
132

--
It looks like orte_init failed for some reason; your parallel  
process is

likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Unable to start a daemon on the local node  
(-128) instead of ORTE_SUCCESS

--
--
It looks like MPI_INIT failed for some reason; your parallel process  
is

likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or  
environment

problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Unable to start a daemon on the local node" (-128)  
instead of "Success" (0)

--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[santafe.onera:233] Abort before MPI_INIT completed successfully;  
not able to guarantee that all other processes were killed!

santafe:Example peyret$






Re: [OMPI users] Mac OSX 10.6 (SL) + openMPI 1.3.3 + Intel Compilers11.1.076

2009-11-05 Thread Jeff Squyres
I'm afraid that Open MPI v1.3.x's xgrid support is currently broken --  
we haven't had anyone with the knowledge or experience available to  
fix it.  :-(  Patches would be welcome...


Note that Open MPI itself works fine on Snow Leopard -- it's just the  
xgrid launching support that is broken.



On Nov 5, 2009, at 7:18 AM, Christophe Peyret wrote:


Hello,

I'm trying to launch a job with mpirun on my Mac Pro and I have a  
strange error message,

any idea ?

Christophe


[santafe.onera:00235] orte:plm:xgrid: Connection to XGrid controller  
unexpectedly closed: (600) The operation couldn’t be completed.  
(BEEP error 600.)
2009-11-05 13:13:53.973 orted[235:903] *** Terminating app due to  
uncaught exception 'NSInvalidArgumentException', reason: '*** - 
[XGConnection<0x100224df0> finalize]: called when collecting not  
enabled'

*** Call stack at first throw:
(
	0   CoreFoundation  0x7fff8712c5a4  
__exceptionPreprocess + 180
	1   libobjc.A.dylib 0x7fff87b8d313  
objc_exception_throw + 45
	2   CoreFoundation  0x7fff87147251 - 
[NSObject(NSObject) finalize] + 129
	3   mca_plm_xgrid.so0x000100149720 - 
[PlmXGridClient dealloc] + 64
	4   mca_plm_xgrid.so0x0001001480e0  
orte_plm_xgrid_finalize + 64
	5   mca_plm_xgrid.so0x000100147fa1  
orte_plm_xgrid_component_query + 529
	6   libopen-pal.0.dylib 0x0001000811ea  
mca_base_select + 186

)
terminate called after throwing an instance of 'NSException'
[santafe:00235] *** Process received signal ***
[santafe:00235] Signal: Abort trap (6)
[santafe:00235] Signal code:  (0)
[santafe:00235] *** End of error message ***
[santafe.onera:00233] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to  
start a daemon on the local node in file ess_singleton_module.c at  
line 381
[santafe.onera:00233] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to  
start a daemon on the local node in file ess_singleton_module.c at  
line 143
[santafe.onera:00233] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to  
start a daemon on the local node in file runtime/orte_init.c at line  
132

--
It looks like orte_init failed for some reason; your parallel  
process is

likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Unable to start a daemon on the local node  
(-128) instead of ORTE_SUCCESS

--
--
It looks like MPI_INIT failed for some reason; your parallel process  
is

likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or  
environment

problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Unable to start a daemon on the local node" (-128)  
instead of "Success" (0)

--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[santafe.onera:233] Abort before MPI_INIT completed successfully;  
not able to guarantee that all other processes were killed!

santafe:Example peyret$


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
jsquy...@cisco.com




[OMPI users] Mac OSX 10.6 (SL) + openMPI 1.3.3 + Intel Compilers 11.1.076

2009-11-05 Thread Christophe Peyret

Hello,

I'm trying to launch a job with mpirun on my Mac Pro and I have a  
strange error message,

any idea ?

Christophe


[santafe.onera:00235] orte:plm:xgrid: Connection to XGrid controller  
unexpectedly closed: (600) The operation couldn’t be completed. (BEEP  
error 600.)
2009-11-05 13:13:53.973 orted[235:903] *** Terminating app due to  
uncaught exception 'NSInvalidArgumentException', reason: '*** - 
[XGConnection<0x100224df0> finalize]: called when collecting not  
enabled'

*** Call stack at first throw:
(
	0   CoreFoundation  0x7fff8712c5a4  
__exceptionPreprocess + 180
	1   libobjc.A.dylib 0x7fff87b8d313  
objc_exception_throw + 45
	2   CoreFoundation  0x7fff87147251 -[NSObject 
(NSObject) finalize] + 129
	3   mca_plm_xgrid.so0x000100149720 - 
[PlmXGridClient dealloc] + 64
	4   mca_plm_xgrid.so0x0001001480e0  
orte_plm_xgrid_finalize + 64
	5   mca_plm_xgrid.so0x000100147fa1  
orte_plm_xgrid_component_query + 529
	6   libopen-pal.0.dylib 0x0001000811ea  
mca_base_select + 186

)
terminate called after throwing an instance of 'NSException'
[santafe:00235] *** Process received signal ***
[santafe:00235] Signal: Abort trap (6)
[santafe:00235] Signal code:  (0)
[santafe:00235] *** End of error message ***
[santafe.onera:00233] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to  
start a daemon on the local node in file ess_singleton_module.c at  
line 381
[santafe.onera:00233] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to  
start a daemon on the local node in file ess_singleton_module.c at  
line 143
[santafe.onera:00233] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to  
start a daemon on the local node in file runtime/orte_init.c at line 132

--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Unable to start a daemon on the local node  
(-128) instead of ORTE_SUCCESS

--
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or  
environment

problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Unable to start a daemon on the local node" (-128)  
instead of "Success" (0)

--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[santafe.onera:233] Abort before MPI_INIT completed successfully; not  
able to guarantee that all other processes were killed!

santafe:Example peyret$




Re: [OMPI users] Question about checkpoint/restart protocol

2009-11-05 Thread Mohamed Adel
Dear Sergio,

Thank you for your reply. I've inserted the modules into the kernel and it all 
worked fine. But there is still a weired issue. I use the command "mpirun -n 2 
-am ft-enable-cr -H comp001 checkpoint-restart-test" to start the an mpi job. I 
then use "ompi-checkpoint PID" to checkpoint a job, but the ompi-checkpoint 
didn't respond and the mpirun produces the following.

--
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.  

The process that invoked fork was:

  Local host:  comp001.local (PID 23514)
  MPI_COMM_WORLD rank: 0

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--
[login01.local:21425] 1 more process has sent help message help-mpi-runtime.txt 
/ mpi_init:warn-fork
[login01.local:21425] Set MCA parameter "orte_base_help_aggregate" to 0 to see 
all help / error messages

Notice: if the -n option has a value more than 1, then this error occurs, but 
if the -n option has the value 1 then the ompi-checkpoint succeeds, mpirun 
produces the same message and ompi-restart fails with the message 
[login01:21417] *** Process received signal ***
[login01:21417] Signal: Segmentation fault (11)
[login01:21417] Signal code: Address not mapped (1)
[login01:21417] Failing at address: (nil)
[login01:21417] [ 0] /lib64/libpthread.so.0 [0x32df20de70]
[login01:21417] [ 1] /home/mab/openmpi-1.3.3/lib/openmpi/mca_crs_blcr.so 
[0x2b093509dfee]
[login01:21417] [ 2] 
/home/mab/openmpi-1.3.3/lib/openmpi/mca_crs_blcr.so(opal_crs_blcr_restart+0xd9) 
[0x2b093509d251]
[login01:21417] [ 3] opal-restart [0x401c3e]
[login01:21417] [ 4] /lib64/libc.so.6(__libc_start_main+0xf4) [0x32dea1d8b4]
[login01:21417] [ 5] opal-restart [0x401399]
[login01:21417] *** End of error message ***
--
mpirun noticed that process rank 0 with PID 21417 on node login01.local exited 
on signal 11 (Segmentation fault).
--

Any help with that will be appreciated?

Thanks in advance,
Mohamed Adel


From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of 
Sergio Díaz [sd...@cesga.es]
Sent: Thursday, November 05, 2009 11:38 AM
To: Open MPI Users
Subject: Re: [OMPI users] Question about checkpoint/restart protocol

Hi,

Did you load the BLCR modules before compiling OpenMPI?

Regards,
Sergio

Mohamed Adel escribió:
> Dear OMPI users,
>
> I'm a new OpenMPI user. I've configured openmpi-1.3.3 with those options 
> "./configure --prefix=/home/mab/openmpi-1.3.3 --with-sge --enable-ft-thread 
> --with-ft=cr --enable-mpi-threads --enable-static --disable-shared 
> --with-blcr=/home/mab/blcr-0.8.2/" then compiled and installed it 
> successfully.
> Now I'm trying to use the checkpoint/restart protocol. I run a program with 
> the options "mpirun -n 2 -am ft-enable-cr -H localhost 
> prime/checkpoint-restart-test" but I receive the following error:
>
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> [madel:28896] Abort before MPI_INIT completed successfully; not able to 
> guarantee that all other processes were killed!
> --
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
>   opal_cr_init() failed failed
>   --> Returned value -1 instead of OPAL_SUCCESS
> --
> [madel:28896] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file 
> runtime/orte_init.c at line 77
> --
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open 

Re: [OMPI users] Question about checkpoint/restart protocol

2009-11-05 Thread Sergio Díaz

Hi,

Did you load the BLCR modules before compiling OpenMPI?

Regards,
Sergio

Mohamed Adel escribió:

Dear OMPI users,

I'm a new OpenMPI user. I've configured openmpi-1.3.3 with those options 
"./configure --prefix=/home/mab/openmpi-1.3.3 --with-sge --enable-ft-thread 
--with-ft=cr --enable-mpi-threads --enable-static --disable-shared 
--with-blcr=/home/mab/blcr-0.8.2/" then compiled and installed it successfully.
Now I'm trying to use the checkpoint/restart protocol. I run a program with the options 
"mpirun -n 2 -am ft-enable-cr -H localhost prime/checkpoint-restart-test" but I 
receive the following error:

*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[madel:28896] Abort before MPI_INIT completed successfully; not able to 
guarantee that all other processes were killed!
--
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_cr_init() failed failed
  --> Returned value -1 instead of OPAL_SUCCESS
--
[madel:28896] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file 
runtime/orte_init.c at line 77
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Error" (-1) instead of "Success" (0)
--

I can't find the files mentioned in this post 
"http://www.open-mpi.org/community/lists/users/2009/09/10641.php; 
(mca_crs_blcr.so, mca_crs_blcr.la). Could you please help me with that error?

Thanks in advance
Mohamed Adel

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

  



--
Sergio Díaz Montes
Centro de Supercomputacion de Galicia
Avda. de Vigo. s/n (Campus Sur) 15706 Santiago de Compostela (Spain)
Tel: +34 981 56 98 10 ; Fax: +34 981 59 46 16
email: sd...@cesga.es ; http://www.cesga.es/