Matthias,

I think that the patch attached to the ticket below should address your issue:
 https://svn.open-mpi.org/trac/ompi/ticket/1619

I was able to reproduce this problem fairly reliably with a particular benchmark, on a particular configuration and very frequent checkpoints. With this patch I was not able to reproduce the problem, so I think this fixes the problem.

In the process of tracking this bug, I believe that there is a problem with the way the checkpoint/restart coordination component handles MPI_ANY_SOURCE and MPI_ANY_TAG. I'll pursue a fix for these cases, but it will be much more involved than the one currently attached to the ticket.

Let me know if this patch fixes the problem that you are seeing.

Thank you for your patience and the bug report,
Josh

On Oct 31, 2008, at 9:49 AM, Matthias Hovestadt wrote:

Hi!

I'll work on a patch, and let you know when it is ready. Unfortunately it probably won't be for a couple weeks. :(

Ok, thanks a lot for letting me know. In three weeks we'll
have a booth at ICT
(http://ec.europa.eu/information_society/events/ict/2008)
where we plan to showcase fault tolerance mechanisms, having
OMPI as major checkpointing component. I think I will use the
time until ICT for finding a workaround for this issue... :-)


Best,
Matthias
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to