Re: [OMPI devel] [EXTERNAL] Open MPI v4.0.1: Process is hanging inside MPI_Init() when debugged with TotalView

2019-11-12 Thread Larry Baker via devel
"allowing us to weakly synchronize two threads" concerns me if the 
synchronization is important or must be reliable.  I do not understand how 
volatile alone provides reliable synchronization without a mechanism to order 
visible changes to memory.  If the flag(s) in question are suppposed to 
indicate some state has changed in this weakly synchronized behavior, without 
proper memory barriers, there is no guarantee that memory changes will be 
viewed by the two threads in the same order they were issued.  It is quite 
possible that the updated state that is flagged as being "good" or "done" or 
whatever will not yet be visible across multiple cores, even though the updated 
flag indicator may have become visible.  Only if the flag itself is the data 
can this work, it seems to me.  If it is a flag that something has been 
completed, volatile is not sufficient to guarantee the corresponding changes in 
state will be visible.  I have had such experience from code that used volatile 
as a proxy for memory barriers.  I was told "it has never been a problem".  
Rare events can, and do, occur.  In my case, it did after over 3 years running 
the code without interruption.  I doubt anyone had ever run the code for such a 
long sample interval.  We found out because we missed recording an important 
earthquake a week after the race condition was tripped.  Murphy's law triumphs 
again. :)

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



> On 12 Nov 2019, at 1:05:31 PM, George Bosilca via devel 
>  wrote:
> 
> If the issue was some kind of memory consistently between threads, then 
> printing that variable in the context of the debugger would show the value of 
> debugger_event_active being false.
> 
> volatile is not a memory barrier, it simply forces a load for each access of 
> the data, allowing us to weakly synchronize two threads, as long as we dot 
> expect the synchronization to be immediate.
> 
> Anyway, good to see that the issue has been solved.
> 
>   George.
> 
> 
> On Tue, Nov 12, 2019 at 2:25 PM John DelSignore via devel 
> mailto:devel@lists.open-mpi.org>> wrote:
> Hi Austen,
> 
> Thanks for the reply. What I am seeing is consistent with your thought, in 
> that when I see the hang, one or more processes did not have a flag updated. 
> I don't understand how the Open MPI code works well enough to say if it is a 
> memory barrier problem or not. It almost looks like a event delivery or 
> dropped event problem to me.
> The place in the MPI_init() code where the MPI processes hang and the number 
> of "hung" processes seems to vary from run to run. In some cases the 
> processes are waiting for an event or waiting for a fence (whatever that is).
> I did the following run today, which shows that it can hang waiting for an 
> event that apparently was not generated or was dropped:
> 
> Started TV on mpirun: totalview -args mpirun -np 4 ./cpi
> Ran the mpirun process until it hit the MPIR_Breakpoint() event.
> TV attached to all four of the MPI processes and left all five processes 
> stopped.
> Continued all of the processes/threads and let them run freely for about 60 
> seconds. They should have run to completion in that amount of time.
> Halted all of the processes. I included an aggregated backtrace of all of the 
> processes below.
> In this particular run, all four MPI processes were waiting in 
> ompi_rte_wait_for_debugger() in rte_orte_module.c at line 196, which is:
> 
> /* let the MPI progress engine run while we wait for debugger release 
> */
> OMPI_WAIT_FOR_COMPLETION(debugger_event_active);
> 
> I don't know how that is supposed to work, but I can clearly see that 
> debugger_event_active was true in all of the processes, even though TV set 
> MPIR_debug_gate to 1:
> d1.<> f {2.1 3.1 4.1 5.1} p debugger_event_active
> Thread 2.1:
>  debugger_event_active = true (1)
> Thread 3.1:
>  debugger_event_active = true (1)
> Thread 4.1:
>  debugger_event_active = true (1)
> Thread 5.1:
>  debugger_event_active = true (1)
> d1.<> f {2.1 3.1 4.1 5.1} p MPIR_debug_gate
> Thread 2.1:
>  MPIR_debug_gate = 0x0001 (1)
> Thread 3.1:
>  MPIR_debug_gate = 0x0001 (1)
> Thread 4.1:
>  MPIR_debug_gate = 0x0001 (1)
> Thread 5.1:
>  MPIR_debug_gate = 0x0001 (1)
> d1.<> 
> 
> I think the _release_fn() function in rte_orte_module.c is supposed to set 
> debugger_event_active to false, but that apparently did not happen in this 
> case. So, AFAICT, the reason debugger_event_active would not be set to false 
> is that the event was never delivered, so the _release_fn() function was 
> never called. If that's the case, then the lack of a memory barrier is 
> probably a moot point, and the problem is likely related to event generation 
> or dropped events.
> Cheers, John D.
> 
> FWIW: Here's the aggregated backtrace after the whole job was allowed to run 
> freely for about 60 seconds, and then stopped:
> 
> d1.<> f g w -g f+l
> +/
>  +__clone : 5:12[0-3.2-3, p1.2-5]
>  

Re: [OMPI devel] 3.1.2: Datatype errors and segfault in MPI_Allgatherv

2018-11-01 Thread Larry Baker via devel
Things that read like they should be unsigned look suspicious to me:

nbElems -909934592
count -1819869184

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov




> On Nov 1, 2018, at 10:34 PM, Ben Menadue  wrote:
> 
> Hi,
> 
> I haven’t heard back from the user yet, but I just put this example together 
> which works on 1, 2, and 3 ranks but fails for 4. Unfortunately it needs a 
> fair amount of memory, about 14.3GB per process, so I was running it with 
> -map-by ppr:1:node.
> 
> It doesn’t fail with the segfault as the user’s code does, but it does 
> SIGABRT:
> 
> 16:12 bjm900@r4320 MPI_TESTS > mpirun -mca pml ob1 -mca coll ^fca,hcoll 
> -map-by ppr:1:node -np 4 ./a.out
> [r4450:11544] ../../../../../opal/datatype/opal_datatype_pack.h:53
>   Pointer 0x2bb7ceedb010 size 131040 is outside 
> [0x2b9ec63cb010,0x2bad1458b010] for
>   base ptr 0x2b9ec63cb010 count 1 and data 
> [r4450:11544] Datatype 0x145fe90[] size 3072000 align 4 id 0 length 7 
> used 6
> true_lb 0 true_ub 6144000 (true_extent 6144000) lb 0 ub 6144000 
> (extent 6144000)
> nbElems -909934592 loops 4 flags 104 (committed )-c-GD--[---][---]
>contain OPAL_FLOAT4:* 
> --C[---][---]OPAL_LOOP_S 192 times the next 2 elements extent 
> 8000
> --C---P-D--[---][---]OPAL_FLOAT4 count 2000 disp 0xaba95 
> (4608000) blen 0 extent 4 (size 8000)
> --C[---][---]OPAL_LOOP_E prev 2 elements first elem displacement 
> 4608000 size of data 8000
> --C[---][---]OPAL_LOOP_S 192 times the next 2 elements extent 
> 8000
> --C---P-D--[---][---]OPAL_FLOAT4 count 2000 disp 0x0 (0) blen 0 
> extent 4 (size 8000)
> --C[---][---]OPAL_LOOP_E prev 2 elements first elem displacement 
> 0 size of data 8000
> ---G---[---][---]OPAL_LOOP_E prev 6 elements first elem displacement 
> 4608000 size of data 655228928
> Optimized description 
> -cC---P-DB-[---][---] OPAL_UINT1 count -1819869184 disp 0xaba95 
> (4608000) blen 1 extent 1 (size 1536000)
> -cC---P-DB-[---][---] OPAL_UINT1 count -1819869184 disp 0x0 (0) blen 1 
> extent 1 (size 1536000)
> ---G---[---][---]OPAL_LOOP_E prev 2 elements first elem displacement 
> 4608000 
> [r4450:11544] *** Process received signal ***
> [r4450:11544] Signal: Aborted (6)
> [r4450:11544] Signal code:  (-6)
> 
> Cheers,
> Ben
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel