My config.log shows that it found Valgrind even though I didn't specify --with-valgrind. It looks like the issue is in the datatype creation code; looking at the data structure shows unusual values for true_ub and true_lb:

{super = {super = {obj_magic_id = 16046253926196952813, obj_class = 0x5005880, obj_reference_count = 1, cls_init_file_name = 0x4da6f2f "ompi_datatype_create.c", cls_init_lineno = 71}, flags = 276, id = 0, bdt_used = 0, size = 0, true_lb = 9223372036854775807, true_ub = -9223372036854775808, lb = 0, ub = 0, align = 1, nbElems = 0, name = '\000' <repeats 63 times>, desc = { length = 1, used = 0, desc = 0x54348e0}, opt_desc = {length = 0, used = 0, desc = 0x0}, btypes = {0 <repeats 46 times>}}, id = 68, d_f_to_c_index = 68, d_keyhash = 0x0, args = 0x8a7b780, packed_description = 0x0, name = '\000' <repeats 63 times>}

In particular, the true_extent computed on line 99 of memchecker.h is computed as 1 (because of overflows) while the datatype has size 0. This causes it to be treated as non-contiguous, while its desc field is NULL; the code then loops over elements of desc as if it was an array. Fixing true_lb and true_ub might be enough to make the current memchecker code work (since the datatype is actually contiguous).

-- Jeremiah Willcock

On Tue, 25 Sep 2012, Ralph Castain wrote:

IIRC, we found a configure "bug" that allowed you to enable-memchecker without 
also including the required --with-valgrind. You might try again with 1.6.2, which 
includes the change - and be sure to add the extra configure flag.


On Sep 25, 2012, at 12:04 PM, Jeremiah Willcock <jewil...@osl.iu.edu> wrote:

The following C program:

#include <mpi.h>

int main(int argc, char** argv) {
 int blocklengths;
 MPI_Aint displacements;
 MPI_Datatype types, dt;
 int x;
 MPI_Init(&argc, &argv);
 MPI_Type_struct(0, &blocklengths, &displacements, &types, &dt);
 MPI_Type_commit(&dt);
 MPI_Send(&x, 1, dt, MPI_PROC_NULL, 0, MPI_COMM_WORLD);
 MPI_Type_free(&dt);
 MPI_Finalize();
 return 0;
}

produces a segmentation fault (caused by a NULL pointer dereference) when run 
with Open MPI 1.6.1, but only when using Valgrind.  Running without Valgrind 
does not cause any issues; the failure appears to be in the code that checks 
whether MPI buffers are valid.  The configure flags I used to build Open MPI 
were a prefix and:

--disable-pretty-print-stacktrace --enable-mpi-thread-multiple 
--enable-memchecker --enable-mca-no-build=btl-openib --enable-debug

and I am using GCC 4.7.1 on Linux.  Is this a known issue?  Thank you for your 
help.

-- Jeremiah Willcock
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to