I did get MrBayes to run with Xgrid compiled with OpenMPI. However it
was setup as more of a "traditional" cluster. The agents all have a
shared NFS directory to the controller. Basically I'm only using
Xgrid as a job scheduler. It doesn't seem as if MrBayes is a "grid"
application but more of an application for a traidional cluster.
You will need to have the following enabled:
1) NFS shared directory across all the machines on the grid.
2) Open-MPI installed locally on all the machines or via NFS. (You'll
need to compile Open MPI)
3) Here's the part that may make Xgrid not desirable to use for MPI
applications:
a) Compile with MPI support:
MPI = yes
CC= $(MPIPATH)/bin/mpicc
CFLAGS = -fast
b) Make sure that Xgrid is set to properly use password-based
authentication.
c) Set the environment variables for Open-MPI to use Xgrid as the
laucher/scheduler. Assuming bash:
$ export XGRID_CONTROLLER_HOSTNAME=mycomputer.apple.com
$ export XGRID_CONTROLLER_PASSWORD=passwd
You could also add the above to a .bashrc file and have
your .bash_profile source it.
d) Run the MPI application:
$ mpirun -np X ./myapp
There are a couple of issues:
It turns out that the directory and files that MrBayes creates must
be readable and writable by all the agents. MrBayes requires more
than just reading standard input/output but also the creation and
writing of other intermediate files. For an application like HP
Linpack that just reads and writes one file, things work fine.
However, the MrBayes application writes out and reads back two
additional files for each MPI process that is spawned.
All the files that MrBayes are trying to read/write must have
permissions for user 'nobody'. This is a bit of a problem, since
you probably (in general) don't want to allow user nobody to write
all over your home directory. One solution (if possible) would be to
have the application write into /tmp and then collect the files after
the job completes. But I don't know if you can set MrBayes to use a
temporary directory. Perhaps your MrBayes customer can let us know
how to specify a tmpdir.
I don't know how or if MrBayes has the option of specifying a temp
working directory. I have tested the basics of this by executing an
MPI command to copy the *.nex file to /tmp of all the agents. This
seems allows everything to work, but I can't seem to easily clean the
intermediate files off of the agents after this runs since the
MrBayes application created them and the user doesn't own them.
I'm hoping the OMPI developers can come to the rescue on some of
these issues, perhaps working in conjunction with some of the Apple
Xgrid engineers.
Lastly, this is from one of the MrBayes folks:
"Getting help with Xgrid among the phylo community will probably be
difficult.
Fredrik can't help and probably not anyone with CIPRES either. Fredrik
recommends mpi since it is unix based and more people use it.
He also does not recommend setting up a cluster in your lab to run
MrBayes.
This is because of a fault with MrBayes. The way it is currently set
up is that
the runs are only as fast as the slowest machine, in that if someone
sits down
to use a machine in the cluster, everything is processed at that speed.
Here we use mpi for in parallel and condor to distribute for non-
parallel.
And frankly, MrBayes can be somewhat unstable with mpi and seems to
get hung up
on occasion.
Unfortunately for you, I think running large jobs will be a lot
easier in a
couple of years."
-Warner
Warner Yuen
Apple Computer
email: wy...@apple.com
Tel: 408.718.2859
Fax: 408.715.0133
On Apr 14, 2006, at 8:52 AM, users-requ...@open-mpi.org wrote:
Message: 2
Date: Thu, 13 Apr 2006 14:33:29 -0400 (EDT)
From: liuli...@stat.ohio-state.edu
Subject: Re: [OMPI users] running a job problem
To: "Open MPI Users" <us...@open-mpi.org>
Message-ID:
<1122.164.107.248.223.1144953209.squir...@www.stat.ohio-state.edu>
Content-Type: text/plain;charset=iso-8859-1
Brian,
It worked when I used the latest version of Mrbayes. Thanks. By the
way,
do you have any idea to submit an ompi job on xgrid? Thanks again.
Liang
On Apr 12, 2006, at 9:09 AM, liuli...@stat.ohio-state.edu wrote:
We have a Mac network running xgrid and we have successfully
installed
mpi. We want to run a parallell version of mrbayes. It did not have
any
problem when we compiled mrbayes using mpicc. But when we tried to
run the
compiled mrbayes, we got lots errror message
mpiexec -np 4 ./mb -i yeast_noclock_imp.txt
Parallel version of
Parallel version of
Parallel version of
Parallel version of
[ea285fltprinter.scc.ohio-state.edu:03327] *** An error occurred in
MPI_comm_size
[ea285fltprinter.scc.ohio-state.edu:03327] *** on communicator
MPI_COMM_WORLD
[ea285fltprinter.scc.ohio-state.edu:03327] *** MPI_ERR_COMM: invalid
communicator
[ea285fltprinter.scc.ohio-state.edu:03327] *** MPI_ERRORS_ARE_FATAL
(goodbye)
This indicates that the application is calling an MPI function with
an invalid communicator. Unfortunately, this is a hard one to track
down without more information. What version of mrbayes are you using
and can you share your input deck?
Thanks,
Brian
--
Brian Barrett
Open MPI developer
http://www.open-mpi.org/
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users