[OMPI users] OpenMPI with system call -- openib error on SNL tbird

2007-04-16 Thread Adams, Brian M
Hello,

I am attempting to port Sandia's DAKOTA code from MVAPICH to the default
OpenMPI/Intel environment on Sandia's thunderbird cluster.  I can
successfully build DAKOTA in the default tbird software environment, but
I'm having runtime problems when DAKOTA attempts to make a system call.
Typical output looks like:

[0,1,1][btl_openib_component.c:897:mca_btl_openib_component_progress]
from an64 to: an64 error polling HP CQ with status LOCAL LENGTH ERROR
status number 1 for wr_id 5714048 opcode 0

I'm attaching a tarball containing output from `ompi_info --all` as well
as two simple sample programs with output to demonstrate the problem
behavior.  I built them in the default tbird MPI environment
(openmpi-1.1.2-ofed-intel-9.1) with 

  mpicc mpi_syscall.c -i_dynamic -o mpi_syscall
  mpicc mpi_nosyscall.c -i_dynamic -o mpi_nosyscall

where `which mpicc` =
/apps/x86_64/mpi/openmpi/intel-9.1/openmpi-1.1.2-ofed/bin/mpicc The
latter has no system call and runs fine on two processors, whereas the
former gives the openib error (not in the attached output, though dumped
to the screen).  The problem exists regardless of whether -i_dynamic is
included.  I am executing from within an interactive 2 processor job
using 

  /apps/x86_64/mpi/openmpi/intel-9.1/openmpi-1.1.2-ofed/bin/mpiexec ->
orterun

I know some OpenMPI developers have access to thunderbird for testing,
but if you require additional information on the build or runtime
environment, please advise and I will attempt to send it along. 

Note:  Both programs run fine with MVAPICH on tbird, and with OpenMPI or
MPICH on my Linux x86_64 SMP workstation.

Thanks,
Brian

Brian M. Adams, PhD (bria...@sandia.gov) 
Optimization and Uncertainty Estimation 
Sandia National Laboratories 
P.O. Box 5800, Mail Stop 1318 
Albuquerque, NM 87185-1318
Voice: 505-284-8845, FAX: 505-284-2518






ompi_tbird_system.tgz
Description: ompi_tbird_system.tgz


Re: [OMPI users] Open MPI - Signal: Segmentation fault (11) Problem

2007-04-16 Thread Michael Gauckler
Hi George, 

thank you for replying and the hint of using MPI_BOTTOM. I changed this 
part of the code and still receive the same segmentation fault.

Unfortunately I cannot post a full example, but here is the code that 
seems most relevant to the problem.

The mechanism is as follows: From object that needs to be transmitted
a list is created which describes the members with their type, offset
and stride (the MemoryMapDescr).  MemoryMap::mapType is used to put 
the members into this list, the so called MemoryMap.

>From this vector of MemoryMapDescr a MPI_Datatype is constructed, which
then is used to transmit the object.

Maybe you could have a look at the code fragments and see if you spot
something that does not go well with OpenMPI.

The testing today showed again the behavior that the size of the 
data structures triggers the problem. This can be either probabilistic
(more processing gives a higher chance that something goes wrong) or
that there is a real dependence, e.g. some buffer is too small or the
differences of the addresses in memory are too large, or I don't know
what else to think of.

Thank you for your help.

Regards,
Michael


int createMPIDataType(const std::vector& memorymap,
MPI_Datatype )
{
int err = MPI_SUCCESS;
int num = memorymap.size();

MPI_Datatype *types = new MPI_Datatype[num];
int *lengths = new int[num];
MPI_Aint *addresses = new MPI_Aint[num];

// copy the vector with information about the type in temp. 
// arrays to be handled by MPI_Type_struct
for (int i = 0; i < num; i++)
{ 
types[i] = MPIDataType[memorymap[i].type];
lengths[i] = memorymap[i].len;

// create address map according to actual memory layout
err = MPI_Address(memorymap[i].addr, [i]);

if (err != MPI_SUCCESS)
{
std::ostringstream msg;
msg << "invalid address at index " << i;
msg << " for type " <<
DataTypeNames[memorymap[i].type];
msg << " at address " << memorymap[i].addr;
GP_THROW_ERR(CommunicationErr, eMPIAddressError,
msg.str());
}
}

// create MPI datatype with equivalent information about types and
offsets
err = MPI_Type_struct(num, lengths, addresses, types, );

if (err != MPI_SUCCESS)
{
GP_THROW_ERR(CommunicationErr, eMPIDatatypeError, "invalid
MPI datatype");
}

err = MPI_Type_commit();

// Invalid datatype argument. May be an uncommitted MPI_Datatype
(see MPI_Type_commit).
if (err != MPI_SUCCESS)
{
GP_THROW_ERR(CommunicationErr, eMPIDatatypeError, "invalid
MPI datatype");
}

// delete temp. arrays
delete [] types;
delete [] lengths;
delete [] addresses;

return err;
}


// Memory map descriptor.
// TODO: Add support for strided vectors.

struct MemoryMapDescr
{
MemoryMapDescr(DataType t, void* a, int l);

//! Data type.
DataType type;

//! Address of data in memory.
void* addr;

//! Number of data elements.
int len;

//! Stride.
// TODO: Add support for strided vectors.
int stride;

//! Type name string.
std::string typeName() const;
};


template
void MemoryMap::mapType(const T& var)
{
memoryMap_.push_back(MemoryMapDescr(DataTypeConverter::type,
(void*), 1));
}

// With specializations such as this following exemplified by a vector of
doubles.
template<>
void MemoryMap::mapType< std::vector >(const std::vector
)
{
if (var.size() > 0) 
memoryMap_.push_back(MemoryMapDescr(DataTypeConverter::type,
(void*)[0], var.size()));
}



-

Message: 1
List-Post: users@lists.open-mpi.org
Date: Wed, 11 Apr 2007 12:33:25 -0400
From: George Bosilca 
Subject: Re: [OMPI users] Open MPI - Signal: Segmentation fault (11)
Problem
To: Open MPI Users 
Message-ID: 
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed

Michael,

The MPI standard is quite clear. In order to have a correct and  
portable MPI code, you are not allowed to use (void*)0. Use  
MPI_BOTTOM instead.

We have plenty of tests which test the exact behavior you describe in  
your email. And they all pass. I will take a look at what's happens  
but I need either the code or at least the part which create the  
datatype.

   Thanks,
 george.

On Apr 11, 2007, at 3:54 AM, Michael Gauckler wrote:

> Dear Open MPI User's and Developers,
>
> I encountered a problem with Open MPI when porting an application,  
> which successfully ran with LAM MPI and MPICH.
>
> The program produces a segmentation fault 

Re: [OMPI users] orte_init failed

2007-04-16 Thread Brian Barrett
That's very odd.  The usual cause for this is /tmp being unwritable  
by the user or full.  Can you check to see if either of those  
conditions are true?


Thanks,

Brian


On Apr 13, 2007, at 2:44 AM, Christine Kreuzer wrote:


Hi,

I run openmpi on a AMD Opteron with two dualcore processors an SLE10,
until today everything worked fine but than I got the following error
message:

[computername:20612][0,0,0] ORTE_ERROR_LOG: Error in file
../../orte/runtime/orte_init_stage1.c at line 302
-- 

It looks like orte_init failed for some reason; your parallel  
process is

likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

   orte_session_dir failed
   --> Returned value -1 instead of ORTE_SUCCESS

-- 


[computername:20612] [0,0,0] ORTE_ERROR_LOG: Error in file
../../orte/runtime/orte_system_init.c at line 42
[computername:20612] [0,0,0] ORTE_ERROR_LOG: Error in file
../../orte/runtime/orte_init.c at line 49
-- 


Open RTE was unable to initialize properly.  The error occured while
attempting to orte_init().  Returned value -1 instead of ORTE_SUCCESS.
-- 



I would appreciate any help or ideas to solve this problem.
Thanks in advance!

Regards,
Christine
--

Universität des Saarlandes
AG Prof. Dr. Christoph Becher
Fachrichtung 7.3 (Technische Physik)
Geb. E2.6, Zimmer 2.04
D-66123 Saarbrücken

Phone:+49(0)681 302 3418
Fax: +49(0)681 302 4676
E-mail: c.kreu...@mx.uni-saarland.de


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users