On Sun, 2009-12-13 at 19:04 +0100, Gijsbert Wiesenekker wrote: > The following routine gives a problem after some (not reproducible) > time on Fedora Core 12. The routine is a CPU usage friendly version of > MPI_Barrier.
There are some proposals for Non-blocking collectives before the MPI forum currently and I believe a working implementation which can be used as a plug-in for OpenMPI, I would urge you to look at these rather than try and implement your own. > My question is: is there a problem with this routine that I overlooked > that somehow did not show up until now Your code both does all-to-all communication and also uses probe, both of these can easily be avoided when implementing Barrier. > Is there a way to see which messages have been sent/received/are > pending? Yes, there is a message queue interface allowing tools to peek inside the MPI library and see these queues. That I know of there are three tools which use this, either TotalView, DDT or my own tool, padb. TotalView and DDT are both full-featured graphical debuggers and commercial products, padb is a open-source text based tool. Ashley, -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk