We have added clusters with different interconnects and decided to build one
OPENMPI 1.4.3 version to handle all the possible interconnects
and run everywhere. I have two questions about this :
1 - is there a way for Openmpi to print out the interconnect it selected to use
at run time? I am
We are eliminating the use of rsh at our company and I'm trying to test out
openmpi with the Nasa Overflow program using ssh.
I've been testing other MPI's (MPICH1 and LAM/MPI) and if I tried to use rsh
the programs would just die when running
using PBS. I submitted my Overflow job using --mca
I'm trying to build a code with OPENMPI 1.3.3 that compiles with
LAM/MPI.
It is using mpicc and here is the code segment and error :
void drt_pll_init(int my_rank,int num_processors);
#ifdef DRT_USE_MPI
#include
MPI_Comm drt_pll_mpi_split_comm_world(int key);
#else
int
The building openmpi with sge faq says :
For Open MPI v1.2, SGE support is built automatically; there is nothing
that you need to do. Note that SGE support first appeared in v1.2.
NOTE: For Open MPI v1.3, or starting with trunk revision number r16422,
you will need to explicitly request the
With version 1.3, should I see the both MCA ras and MCA pls when doing
an ompi_info. After doing my build with 1.3, I only see the ras
component.
Bernie Borenstein
Yes I know I didn't attach any info, but I'm just trying to determine if
there is a problem or something has changed between
We now have a cluster with myrinet and another cluster with tcp. I want
to build a static OPENMPI that will detect if there is myrinet on the
cluster and use that, if myrinet is not available, run with tcp. I see
the --enable-mca-static option but am confused as to how to use
it for what I want
We now have a myrinet cluster along with our GIGE clusters and I was
wondering how to have openmpi select the right interconnect. We use PBS
and would like to have Openmpi select the right interconnect
automatically, depending on whether we are on the Myrinet cluster or the
Gige cluster. Any
When building the nasa overflow 2.0aa code with openmpi 1.1.1b3 using
the intel compilers on a Opteron cluster running
SLES 9 with the intel 9 compilers, I get the following warnings when
linking :
/acct/bsb3227/openmpi_1.1.1b3/bin/mpif90 -xW -O2 -convert big_endian
-align all
-o
I'm running on a cluster with mvapi. I built with mvapi and it runs,
but I want to make absolutely sure that I'm using the IB interconnect
and nothing else. How can I tell specifically what interconnect I'm
using when I run.
Bernie Borenstein
The Boeing Company
I build the Nasa Overflow code with Open-mpi 1.0.1rc5 and now get this
error message :
[w052:19034] *** An error occurred in MPI_Cart_get
[w051:19104] *** An error occurred in MPI_Cart_get
[w051:19104] *** on communicator MPI_COMM_WORLD
[w051:19104] *** MPI_ERR_COMM: invalid communicator
Just tried to run a very large case on another cluster with TCP. It
cranks away for quite awhile then I get this message :
FOR GRID 78 AT STEP 733 L2NORM = 0.30385345E-03^M
FOR GRID 79 AT STEP 733 L2NORM = 0.26182533E+00^M
[hsd660:02490] spawn: in
Things have improved alot since I ran the code using the earlier betas,
but it still fails near the end of the run.
<>
The error messages are :
FOR GRID 4 AT STEP 466 L2NORM = 0.74601987E-09
FOR GRID 5 AT STEP 466 L2NORM = 0.86085437E-08
FOR GRID 6 AT
I am again trying to build and run the Nasa Overflow 1.8ab version using
Open MPI and have run into this error message :
[hsd653:05053] *** An error occurred in MPI_Allreduce: the reduction
operation M
PI_OP_MIN is not defined on the MPI_DBLPREC datatype
[hsd653:05053] *** on communicator
I built the Nasa Overflow 1.8ab code yesterday with openmpi-1.0a1r7632.
It runs fine with 4 or 8 opteron processors on a myrinet linux cluster.
But if I increase the number of processors to 20, I get errors like this
:
[e053:01260] *** An error occurred in MPI_Free_mem
[e030:15585] *** An error
I posted an issue with the Nasa Overflow 1.8 code and have traced it
further to a program failure in the malloc
areas of the code (data in these areas gets corrupted). Overflow is
mostly fortran, but since it is an old program,
it uses some c routines to do dynamic memory allocation. I'm still
I was able to get open-mpi working fine on a cluster with gige, but when
building and trying to run the Nasa Overflow
program on a cluster with Myrinet, it does not work. The program starts
to run and then gives the following error :
spawn: in job_state_callback(jobid = 1, state = 0xa)
spawn: in
16 matches
Mail list logo