I seem to have had a classical deadlock, A was being assembled while
some threads lurked around elsewhere. Adding some barriers seems to
fix the problem, at least with the cases I currently have.

What I still miss is what would be the advantage of
MPI_Barrier(((PetscObject)A)->comm) over
MPI_Barrier(PETSC_COMM_WORLD).

Many thanks
Dominik

On Fri, Aug 26, 2011 at 11:01 AM, Matthew Knepley <knepley at gmail.com> wrote:
> On Fri, Aug 26, 2011 at 8:37 AM, Dominik Szczerba <dominik at itis.ethz.ch>
> wrote:
>>
>> > When you run in the debugger and break after it has obviously hung, are
>> > all
>> > processes stopped at the same place?
>>
>> Of course not, they are stuck at barriers elsewhere. Thanks for the
>> valuable question.
>>
>> > If you see an error condition, you can
>> > run
>> > CHKMEMQ;
>> > MPI_Barrier(((PetscObject)A)->comm);
>> > MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);
>> > MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);
>> > If it hangs, check where every process is stuck.
>>
>> I obviously seem to be missing some barriers. But why would I need
>> MPI_Barrier(((PetscObject)A)->comm) and not just
>> MPI_Barrier(PETSC_COMM_WORLD)? Would that only force a barrier for
>> A-related traffic?
>
> The idea here is the following:
> ? 1) We would like to isolate the mismatch in synchronizations
> ? 2) We can place barriers in the code to delimit the sections which contain
> the offending code,
> ? ? ? ?and also eliminate bugs in MatAssembly as a possible source of
> problems.
> ? 3) Do you have any MPI code you wrote yourself in here?
> ? ? ?Matt
>
>>
>> Dominik
>
> --
> What most experimenters take for granted before they begin their experiments
> is infinitely more interesting than any results to which their experiments
> lead.
> -- Norbert Wiener
>

Reply via email to