Dear Open-MPI Developers, investigations on the segmentation fault (see previous postings "Signal: Segmentation fault (11) Problem") lets us suspect that Open-MPI allows only a limited number of elements in the description of user-defined MPI_Datatypes.
Our application segmentation-faults when a large user-defined data structure is passed to MPI_Send. The segmentation fault happens in the function ompi_generic_simple_pack in datatype_pack.c when trying to access pElem (Bad address). The structure pElem is set in line 276, where it is retrieved as 276: pElem = &(description[pos_desc]); pos_desc is of type uint32_t with the value 0xffff929f (4294939295), which itself is set on line 271 by a variable of type int16_t and value -1. This leads to the indexing of the description structure at position -1, producing the segmentation fault. The origin of the pos_desc can be faund in the same function at line 271: 271: pos_desc = pStack->index; The structure to which pStack is pointing is of type dt_stack, defined in ompi/datatype/convertor.h starting at line 65, where index is and int16_t and commented with "index in the element description": typedef struct dt_stack { int16_t index; /**< index in the element description */ int16_t type; /**< the type used for the last pack/unpack (original or DT_BYTE) */ size_t count; /**< number of times we still have to do it */ ptrdiff_t disp; /**< actual displacement depending on the count field */ } dt_stack_t; We therefore conclude that MPI_Datatypes, which are constructed with Open-MPI (in the release of 1.2.1a of April 10th 2007) have the limitation of containing a maximum of 32'768 separate entries. Although changing the type of the index to int32_t solves the problem of the segmentation fault, I would be happy if the author / maintainer of the code could have a look at it and decide if this is viable fix. Having spent a lot of time in hunting down the issue into the Open-MPI code, I would be glad to see the issue fixed in upcoming releases. Thanx and regards, Michael Gauckler