Re: [OMPI devel] 3.1.2: Datatype errors and segfault in MPI_Allgatherv

2018-11-02 Thread Gilles Gouaillardet

Thanks Ben !


I opened https://github.com/open-mpi/ompi/issues/6016 in order to track 
this issue, and wrote a simpler example that evidences this issue.


We should follow-up there from now.


fwiw, several bug fixes have not been backported into the v3 branches.

Note that using the ddt datatype instead of MPI_DATATYPE_NULL could be 
good enough as a workaround for the time being


(and unlike forcing the coll/basic component, performances will be 
unaffected)




Cheers,


Gilles


On 11/2/2018 2:34 PM, Ben Menadue wrote:

Hi,

I haven’t heard back from the user yet, but I just put this example 
together which works on 1, 2, and 3 ranks but fails for 4. 
Unfortunately it needs a fair amount of memory, about 14.3GB per 
process, so I was running it with -map-by ppr:1:node.


It doesn’t fail with the segfault as the user’s code does, but it does 
SIGABRT:


16:12bjm900@r4320 MPI_TESTS> mpirun -mca pml ob1 -mca coll ^fca,hcoll 
-map-by ppr:1:node -np 4 ./a.out

[r4450:11544] ../../../../../opal/datatype/opal_datatype_pack.h:53
Pointer 0x2bb7ceedb010 size 131040 is outside 
[0x2b9ec63cb010,0x2bad1458b010] for

base ptr 0x2b9ec63cb010 count 1 and data
[r4450:11544] Datatype 0x145fe90[] size 3072000 align 4 id 0 
length 7 used 6
true_lb 0 true_ub 6144000 (true_extent 6144000) lb 0 ub 
6144000 (extent 6144000)

nbElems -909934592 loops 4 flags 104 (committed )-c-GD--[---][---]
   contain OPAL_FLOAT4:*
--C[---][---]    OPAL_LOOP_S 192 times the next 2 elements 
extent 8000
--C---P-D--[---][---]    OPAL_FLOAT4 count 2000 disp 0xaba95 
(4608000) blen 0 extent 4 (size 8000)
--C[---][---]    OPAL_LOOP_E prev 2 elements first elem 
displacement 4608000 size of data 8000
--C[---][---]    OPAL_LOOP_S 192 times the next 2 elements 
extent 8000
--C---P-D--[---][---]    OPAL_FLOAT4 count 2000 disp 0x0 (0) blen 
0 extent 4 (size 8000)
--C[---][---]    OPAL_LOOP_E prev 2 elements first elem 
displacement 0 size of data 8000
---G---[---][---]    OPAL_LOOP_E prev 6 elements first elem 
displacement 4608000 size of data 655228928

Optimized description
-cC---P-DB-[---][---]     OPAL_UINT1 count -1819869184 disp 
0xaba95 (4608000) blen 1 extent 1 (size 1536000)
-cC---P-DB-[---][---]     OPAL_UINT1 count -1819869184 disp 0x0 (0) 
blen 1 extent 1 (size 1536000)
---G---[---][---]    OPAL_LOOP_E prev 2 elements first elem 
displacement 4608000

[r4450:11544] *** Process received signal ***
[r4450:11544] Signal: Aborted (6)
[r4450:11544] Signal code:  (-6)

Cheers,
Ben




On 2 Nov 2018, at 12:09 pm, Ben Menadue > wrote:


HI Gilles,

On 2 Nov 2018, at 11:03 am, Gilles Gouaillardet > wrote:
I noted the stack traces refers opal_cuda_memcpy(). Is this issue 
specific to CUDA environments ?


No, this is just on normal CPU-only nodes. But memcpy always goes 
through opal_cuda_memcpy when CUDA support is enabled, even if 
there’s no GPUs in use (or indeed, even installed).


The coll/tuned default collective module is known not to work when 
tasks use matching but different signatures.
For example, one task sends one vector of N elements, and the other 
task receives N elements.


This is the call that triggers it:

ierror = MPI_Allgatherv(MPI_IN_PLACE, 0, MPI_DATATYPE_NULL, S[0], 
recvcounts, displs, mpitype_vec_nobs, node_comm);


(and changing the source datatype to MPI_BYTE to avoid the NULL 
handle doesn’t help).



A workaround worth trying is to
mpirun --mca coll basic ...


Thanks — using --mca coll basic,libnbc fixes it (basic on its own 
fails because it can’t work out what to use for Iallgather).


Last but not least, could you please post a minimal example (and the 
number of MPI tasks used) that can evidence the issue ?


I’m just waiting for the user to get back to me with the okay to 
share the code. Otherwise, I’ll see what I can put together myself. 
It works on 42 cores (at 14 per node = 3 nodes) but fails for 43 
cores (so 1 rank on the 4th node). The communicator includes 1 rank 
per node, so it’s going from a three-rank communicator to a four-rank 
communicator — perhaps the tuned algorithm changes at that point?


Cheers,
Ben

___
devel mailing list
devel@lists.open-mpi.org 
https://lists.open-mpi.org/mailman/listinfo/devel




___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] 3.1.2: Datatype errors and segfault in MPI_Allgatherv

2018-11-01 Thread Larry Baker via devel
Things that read like they should be unsigned look suspicious to me:

nbElems -909934592
count -1819869184

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov




> On Nov 1, 2018, at 10:34 PM, Ben Menadue  wrote:
> 
> Hi,
> 
> I haven’t heard back from the user yet, but I just put this example together 
> which works on 1, 2, and 3 ranks but fails for 4. Unfortunately it needs a 
> fair amount of memory, about 14.3GB per process, so I was running it with 
> -map-by ppr:1:node.
> 
> It doesn’t fail with the segfault as the user’s code does, but it does 
> SIGABRT:
> 
> 16:12 bjm900@r4320 MPI_TESTS > mpirun -mca pml ob1 -mca coll ^fca,hcoll 
> -map-by ppr:1:node -np 4 ./a.out
> [r4450:11544] ../../../../../opal/datatype/opal_datatype_pack.h:53
>   Pointer 0x2bb7ceedb010 size 131040 is outside 
> [0x2b9ec63cb010,0x2bad1458b010] for
>   base ptr 0x2b9ec63cb010 count 1 and data 
> [r4450:11544] Datatype 0x145fe90[] size 3072000 align 4 id 0 length 7 
> used 6
> true_lb 0 true_ub 6144000 (true_extent 6144000) lb 0 ub 6144000 
> (extent 6144000)
> nbElems -909934592 loops 4 flags 104 (committed )-c-GD--[---][---]
>contain OPAL_FLOAT4:* 
> --C[---][---]OPAL_LOOP_S 192 times the next 2 elements extent 
> 8000
> --C---P-D--[---][---]OPAL_FLOAT4 count 2000 disp 0xaba95 
> (4608000) blen 0 extent 4 (size 8000)
> --C[---][---]OPAL_LOOP_E prev 2 elements first elem displacement 
> 4608000 size of data 8000
> --C[---][---]OPAL_LOOP_S 192 times the next 2 elements extent 
> 8000
> --C---P-D--[---][---]OPAL_FLOAT4 count 2000 disp 0x0 (0) blen 0 
> extent 4 (size 8000)
> --C[---][---]OPAL_LOOP_E prev 2 elements first elem displacement 
> 0 size of data 8000
> ---G---[---][---]OPAL_LOOP_E prev 6 elements first elem displacement 
> 4608000 size of data 655228928
> Optimized description 
> -cC---P-DB-[---][---] OPAL_UINT1 count -1819869184 disp 0xaba95 
> (4608000) blen 1 extent 1 (size 1536000)
> -cC---P-DB-[---][---] OPAL_UINT1 count -1819869184 disp 0x0 (0) blen 1 
> extent 1 (size 1536000)
> ---G---[---][---]OPAL_LOOP_E prev 2 elements first elem displacement 
> 4608000 
> [r4450:11544] *** Process received signal ***
> [r4450:11544] Signal: Aborted (6)
> [r4450:11544] Signal code:  (-6)
> 
> Cheers,
> Ben
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] 3.1.2: Datatype errors and segfault in MPI_Allgatherv

2018-11-01 Thread Ben Menadue
Hi,I haven’t heard back from the user yet, but I just put this example together which works on 1, 2, and 3 ranks but fails for 4. Unfortunately it needs a fair amount of memory, about 14.3GB per process, so I was running it with -map-by ppr:1:node.It doesn’t fail with the segfault as the user’s code does, but it does SIGABRT:16:12 bjm900@r4320 MPI_TESTS > mpirun -mca pml ob1 -mca coll ^fca,hcoll -map-by ppr:1:node -np 4 ./a.out[r4450:11544] ../../../../../opal/datatype/opal_datatype_pack.h:53	Pointer 0x2bb7ceedb010 size 131040 is outside [0x2b9ec63cb010,0x2bad1458b010] for	base ptr 0x2b9ec63cb010 count 1 and data [r4450:11544] Datatype 0x145fe90[] size 3072000 align 4 id 0 length 7 used 6true_lb 0 true_ub 6144000 (true_extent 6144000) lb 0 ub 6144000 (extent 6144000)nbElems -909934592 loops 4 flags 104 (committed )-c-GD--[---][---]   contain OPAL_FLOAT4:* --C[---][---]    OPAL_LOOP_S 192 times the next 2 elements extent 8000--C---P-D--[---][---]    OPAL_FLOAT4 count 2000 disp 0xaba95 (4608000) blen 0 extent 4 (size 8000)--C[---][---]    OPAL_LOOP_E prev 2 elements first elem displacement 4608000 size of data 8000--C[---][---]    OPAL_LOOP_S 192 times the next 2 elements extent 8000--C---P-D--[---][---]    OPAL_FLOAT4 count 2000 disp 0x0 (0) blen 0 extent 4 (size 8000)--C[---][---]    OPAL_LOOP_E prev 2 elements first elem displacement 0 size of data 8000---G---[---][---]    OPAL_LOOP_E prev 6 elements first elem displacement 4608000 size of data 655228928Optimized description -cC---P-DB-[---][---]     OPAL_UINT1 count -1819869184 disp 0xaba95 (4608000) blen 1 extent 1 (size 1536000)-cC---P-DB-[---][---]     OPAL_UINT1 count -1819869184 disp 0x0 (0) blen 1 extent 1 (size 1536000)---G---[---][---]    OPAL_LOOP_E prev 2 elements first elem displacement 4608000 [r4450:11544] *** Process received signal ***[r4450:11544] Signal: Aborted (6)[r4450:11544] Signal code:  (-6)Cheers,Ben

allgatherv_failure.c
Description: Binary data
On 2 Nov 2018, at 12:09 pm, Ben Menadue  wrote:HI Gilles,On 2 Nov 2018, at 11:03 am, Gilles Gouaillardet  wrote:I noted the stack traces refers opal_cuda_memcpy(). Is this issue specific to CUDA environments ?No, this is just on normal CPU-only nodes. But memcpy always goes through opal_cuda_memcpy when CUDA support is enabled, even if there’s no GPUs in use (or indeed, even installed).The coll/tuned default collective module is known not to work when tasks use matching but different signatures.For example, one task sends one vector of N elements, and the other task receives N elements.This is the call that triggers it:	ierror = MPI_Allgatherv(MPI_IN_PLACE, 0, MPI_DATATYPE_NULL, S[0], recvcounts, displs, mpitype_vec_nobs, node_comm);(and changing the source datatype to MPI_BYTE to avoid the NULL handle doesn’t help).A workaround worth trying is tompirun --mca coll basic ...Thanks — using --mca coll basic,libnbc fixes it (basic on its own fails because it can’t work out what to use for Iallgather).Last but not least, could you please post a minimal example (and the number of MPI tasks used) that can evidence the issue ?I’m just waiting for the user to get back to me with the okay to share the code. Otherwise, I’ll see what I can put together myself. It works on 42 cores (at 14 per node = 3 nodes) but fails for 43 cores (so 1 rank on the 4th node). The communicator includes 1 rank per node, so it’s going from a three-rank communicator to a four-rank communicator — perhaps the tuned algorithm changes at that point?Cheers,Ben___devel mailing listdevel@lists.open-mpi.orghttps://lists.open-mpi.org/mailman/listinfo/devel___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] 3.1.2: Datatype errors and segfault in MPI_Allgatherv

2018-11-01 Thread Ben Menadue
HI Gilles,

> On 2 Nov 2018, at 11:03 am, Gilles Gouaillardet  wrote:
> I noted the stack traces refers opal_cuda_memcpy(). Is this issue specific to 
> CUDA environments ?

No, this is just on normal CPU-only nodes. But memcpy always goes through 
opal_cuda_memcpy when CUDA support is enabled, even if there’s no GPUs in use 
(or indeed, even installed).

> The coll/tuned default collective module is known not to work when tasks use 
> matching but different signatures.
> For example, one task sends one vector of N elements, and the other task 
> receives N elements.


This is the call that triggers it:

ierror = MPI_Allgatherv(MPI_IN_PLACE, 0, MPI_DATATYPE_NULL, S[0], 
recvcounts, displs, mpitype_vec_nobs, node_comm);

(and changing the source datatype to MPI_BYTE to avoid the NULL handle doesn’t 
help).

> A workaround worth trying is to
> mpirun --mca coll basic ...


Thanks — using --mca coll basic,libnbc fixes it (basic on its own fails because 
it can’t work out what to use for Iallgather).

> Last but not least, could you please post a minimal example (and the number 
> of MPI tasks used) that can evidence the issue ?


I’m just waiting for the user to get back to me with the okay to share the 
code. Otherwise, I’ll see what I can put together myself. It works on 42 cores 
(at 14 per node = 3 nodes) but fails for 43 cores (so 1 rank on the 4th node). 
The communicator includes 1 rank per node, so it’s going from a three-rank 
communicator to a four-rank communicator — perhaps the tuned algorithm changes 
at that point?

Cheers,
Ben

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] 3.1.2: Datatype errors and segfault in MPI_Allgatherv

2018-11-01 Thread Gilles Gouaillardet

Hi Ben,


I noted the stack traces refers opal_cuda_memcpy(). Is this issue 
specific to CUDA environments ?



The coll/tuned default collective module is known not to work when tasks 
use matching but different signatures.


For example, one task sends one vector of N elements, and the other task 
receives N elements.



A workaround worth trying is to

mpirun --mca coll basic ...


Last but not least, could you please post a minimal example (and the 
number of MPI tasks used) that can evidence the issue ?



Cheers,


Gilles


On 11/2/2018 7:59 AM, Ben Menadue wrote:

Hi,

One of our users is reporting an issue using MPI_Allgatherv with a 
large derived datatype — it segfaults inside OpenMPI. Using a debug 
build of OpenMPI 3.1.2 produces a ton of messages like this before the 
segfault:


[r3816:50921] ../../../../../opal/datatype/opal_datatype_pack.h:53
Pointer 0x2acd0121b010 size 131040 is outside 
[0x2ac5ed268010,0x2ac980ad8010] for

base ptr 0x2ac5ed268010 count 1 and data
[r3816:50921] Datatype 0x42998b0[] size 592000 align 4 id 0 length 
7 used 6
true_lb 0 true_ub 1536000 (true_extent 1536000) lb 0 ub 
1536000 (extent 1536000)

nbElems 148000 loops 4 flags 104 (committed )-c-GD--[---][---]
contain OPAL_FLOAT4:*
--C[---][---]   OPAL_LOOP_S 4 times the next 2 elements extent 
8000
--C---P-D--[---][---]   OPAL_FLOAT4 count 2000 disp 0x380743000 
(1504000) blen 0 extent 4 (size 8000)
--C[---][---]   OPAL_LOOP_E prev 2 elements first elem 
displacement 1504000 size of data 8000
--C[---][---]   OPAL_LOOP_S 70 times the next 2 elements 
extent 8000
--C---P-D--[---][---]   OPAL_FLOAT4 count 2000 disp 0x0 (0) blen 0 
extent 4 (size 8000)
--C[---][---]   OPAL_LOOP_E prev 2 elements first elem 
displacement 0 size of data 8000
---G---[---][---]   OPAL_LOOP_E prev 6 elements first elem 
displacement 1504000 size of data 1625032704

Optimized description
-cC---P-DB-[---][---]     OPAL_UINT1 count 32000 disp 0x380743000 
(1504000) blen 1 extent 1 (size 32000)
-cC---P-DB-[---][---]     OPAL_UINT1 count 1305032704 disp 0x0 (0) 
blen 1 extent 1 (size 56)
---G---[---][---]   OPAL_LOOP_E prev 2 elements first elem 
displacement 1504000 size of d


Here is the backtrace:

 backtrace 
 0 0x0008987b memcpy()  ???:0
 1 0x000639b6 opal_cuda_memcpy() 
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_datatype_cuda.c:99
 2 0x0005cd7a pack_predefined_data() 
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_datatype_pack.h:56
 3 0x0005e845 opal_generic_simple_pack() 
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_datatype_pack.c:319
 4 0x0004ce6e opal_convertor_pack() 
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_convertor.c:272
 5 0xe3b6 mca_btl_openib_prepare_src() 
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib.c:1609
 6 0x00023c75 mca_bml_base_prepare_src() 
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/bml/bml.h:341
 7 0x00027d2a mca_pml_ob1_send_request_schedule_once() 
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.c:995
 8 0x0002473c mca_pml_ob1_send_request_schedule_exclusive() 
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.h:313
 9 0x0002479d mca_pml_ob1_send_request_schedule() 
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.h:337
10 0x000256fe mca_pml_ob1_frag_completion() 
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.c:321
11 0x0001baaf handle_wc() 
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib_component.c:3565
12 0x0001c20c poll_device() 
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib_component.c:3719
13 0x0001c6c0 progress_one_device() 
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib_component.c:3829
14 0x0001c763 

[OMPI devel] 3.1.2: Datatype errors and segfault in MPI_Allgatherv

2018-11-01 Thread Ben Menadue
Hi,

One of our users is reporting an issue using MPI_Allgatherv with a large 
derived datatype — it segfaults inside OpenMPI. Using a debug build of OpenMPI 
3.1.2 produces a ton of messages like this before the segfault:

[r3816:50921] ../../../../../opal/datatype/opal_datatype_pack.h:53
Pointer 0x2acd0121b010 size 131040 is outside 
[0x2ac5ed268010,0x2ac980ad8010] for
base ptr 0x2ac5ed268010 count 1 and data 
[r3816:50921] Datatype 0x42998b0[] size 592000 align 4 id 0 length 7 used 6
true_lb 0 true_ub 1536000 (true_extent 1536000) lb 0 ub 1536000 
(extent 1536000)
nbElems 148000 loops 4 flags 104 (committed )-c-GD--[---][---]
   contain OPAL_FLOAT4:* 
--C[---][---]OPAL_LOOP_S 4 times the next 2 elements extent 8000
--C---P-D--[---][---]OPAL_FLOAT4 count 2000 disp 0x380743000 
(1504000) blen 0 extent 4 (size 8000)
--C[---][---]OPAL_LOOP_E prev 2 elements first elem displacement 
1504000 size of data 8000
--C[---][---]OPAL_LOOP_S 70 times the next 2 elements extent 
8000
--C---P-D--[---][---]OPAL_FLOAT4 count 2000 disp 0x0 (0) blen 0 extent 
4 (size 8000)
--C[---][---]OPAL_LOOP_E prev 2 elements first elem displacement 0 
size of data 8000
---G---[---][---]OPAL_LOOP_E prev 6 elements first elem displacement 
1504000 size of data 1625032704
Optimized description 
-cC---P-DB-[---][---] OPAL_UINT1 count 32000 disp 0x380743000 
(1504000) blen 1 extent 1 (size 32000)
-cC---P-DB-[---][---] OPAL_UINT1 count 1305032704 disp 0x0 (0) blen 1 
extent 1 (size 56)
---G---[---][---]OPAL_LOOP_E prev 2 elements first elem displacement 
1504000 size of d

Here is the backtrace:

 backtrace 
 0 0x0008987b memcpy()  ???:0
 1 0x000639b6 opal_cuda_memcpy()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_datatype_cuda.c:99
 2 0x0005cd7a pack_predefined_data()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_datatype_pack.h:56
 3 0x0005e845 opal_generic_simple_pack()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_datatype_pack.c:319
 4 0x0004ce6e opal_convertor_pack()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/datatype/../../../../../opal/datatype/opal_convertor.c:272
 5 0xe3b6 mca_btl_openib_prepare_src()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib.c:1609
 6 0x00023c75 mca_bml_base_prepare_src()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/bml/bml.h:341
 7 0x00027d2a mca_pml_ob1_send_request_schedule_once()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.c:995
 8 0x0002473c mca_pml_ob1_send_request_schedule_exclusive()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.h:313
 9 0x0002479d mca_pml_ob1_send_request_schedule()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.h:337
10 0x000256fe mca_pml_ob1_frag_completion()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/ompi/mca/pml/ob1/../../../../../../../ompi/mca/pml/ob1/pml_ob1_sendreq.c:321
11 0x0001baaf handle_wc()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib_component.c:3565
12 0x0001c20c poll_device()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib_component.c:3719
13 0x0001c6c0 progress_one_device()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib_component.c:3829
14 0x0001c763 btl_openib_component_progress()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/mca/btl/openib/../../../../../../../opal/mca/btl/openib/btl_openib_component.c:3853
15 0x0002ff90 opal_progress()  
/short/z00/bjm900/build/openmpi-mofed4.2/openmpi-3.1.2/build/gcc/debug-1/opal/../../../../opal/runtime/opal_progress.c:228
16 0x0001114c ompi_request_wait_completion()