On Oct 7, 2009, at 9:34 AM, Stephan Kramer wrote:
> think I found a nasty bug in the tfs preconditioner/solver. The
> first symptons were MPI complaints about corrupted messages in
> MPI_WAIT on line 1552 of ksp/pc/impls/tfs/gs.c (the file is the same
> in petsc-dev and 3.0.0). Valgrind suggested a buffer overrun (in
> reading) in MPI_Isend of line 1533:
>
>
> ierr = MPI_Isend(dptr3, *msg_size++, MPIU_SCALAR, *list++,
> MSGTAG1+my_id, gs->gs_comm, msg_ids_out++);CHKERRQ(ierr);
>
> Stepping through with a debugger however it looked like everything
> going into the MPI_Isends and MPI_Irecvs was perfectly fine. Until I
> realised that they were both replaced by a macro from petsclog.h:
>
> #define MPI_Isend(buf,count,datatype,dest,tag,comm,request) \
> ((isend_ct++,0) || TypeSize(&isend_len,count,datatype) ||
> MPI_Isend(buf,count,datatype,dest,tag,comm,request))
>
> Because count is used twice in that expression, the argument
> *msg_size++ is evaluated twice, and only gives the right integer
> value in calling TypeSize, and will be wrong in the actual MPI_Isend
> call. The same thing is going on on line 1327 of gs.c btw. If
> someone has time to look into this, it would be much appreciated.
> Has the tfs solver been used/applied much, as far as people know?
Thanks for finding and reporting this bug.
I have pushed a fix into petsc-3.0.0 and petsc-dev (it will be in
the next 3.0.0 patch), please let us know if it does not resolve the
problem.
There is seemingly very little use of tfs.
Barry