Please send all the information listed here:

    http://www.open-mpi.org/community/help/

I am able to run your test program with no problem, so I'm not quite sure what 
the issue is...?

If op->o_func.intrinsic.fns[27] initially points to a valid value and then 
later it points to 0, that could imply that there is memory corruption 
occurring in your application somewhere.  Have you tried running through a 
memory-checking debugger?


On May 6, 2011, at 9:56 AM, hi wrote:

> I am observing crash in MPI_Allreduce() call from my actual application.
> After debugging I found that MPI_Allreduce() with MPI_DOUBLE_PRECISION
> returns NULL for following code in op.h
> 
> if (0 != (op->o_flags & OMPI_OP_FLAGS_INTRINSIC)) {
>       op->o_func.intrinsic.fns[ompi_op_ddt_map[dtype->id]](source, target,
>                                                            &count, &dtype,
> 
> op->o_func.intrinsic.modules[ompi_op_ddt_map[dtype->id]]);
> 
> where, o_func.intrinsic.fns[27] points to 0.

> On further debugging, I found that it is making call to
> mca_coll_basic_reduce_lin_intra(); see below trace...
> 
>>      libmpid.dll!ompi_op_reduce(ompi_op_t * op, void * source, void * 
>> target, int count, ompi_datatype_t * dtype)  Line 500  C++
>       libmpid.dll!mca_coll_basic_reduce_lin_intra(void * sbuf, void *
> rbuf, int count, ompi_datatype_t * dtype, ompi_op_t * op, int root,
> ompi_communicator_t * comm, mca_coll_base_module_2_0_0_t * module)
> Line 249        C++
>       libmpid.dll!mca_coll_sync_reduce(void * sbuf, void * rbuf, int
> count, ompi_datatype_t * dtype, ompi_op_t * op, int root,
> ompi_communicator_t * comm, mca_coll_base_module_2_0_0_t * module)
> Line 45 + 0xd4 bytes    C++
>       libmpid.dll!mca_coll_basic_allreduce_intra(void * sbuf, void * rbuf,
> int count, ompi_datatype_t * dtype, ompi_op_t * op,
> ompi_communicator_t * comm, mca_coll_base_module_2_0_0_t * module)
> Line 57 + 0x58 bytes    C++
>       libmpid.dll!MPI_Allreduce(void * sendbuf, void * recvbuf, int count,
> ompi_datatype_t * datatype, ompi_op_t * op, ompi_communicator_t *
> comm)  Line 107 + 0x5c bytes    C++
>       libmpi_f77d.dll!mpi_allreduce_f(char * sendbuf, char * recvbuf, int
> * count, int * datatype, int * op, int * comm, int * ierr)  Line 79 +
> 0x34 bytes      C++
>       libmpi_f77d.dll!MPI_ALLREDUCE(char * sendbuf, char * recvbuf, int *
> count, int * datatype, int * op, int * comm, int * ierr)  Line 53 +
> 0x67 bytes      C++
> 
> 
> Now to simulate this problem, the attached test program works fine but
> I observed completely different callstack see attached images...
> 
> Just for information: I am executing my application using following command:
> c:/openmpi/bin/orterun -mca mca_component_show_load_errors 0 --prefix
> ... -x ... -x ...  --machinefile ... -np 2 myApplication
> 
> And test program using following command:
> c:/openmpi/bin/mpirun mar_f_dp.exe
> 
> 
> Please let me know based on what criteria "coll_reduce" is pointing to
> "mca_coll_basic_allreduce_intra() or mca_coll_self_allreduce_intra();
> this would help me to debug my application further.
> 
> Thank you in advance.
> -Hiral
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to