On Oct 4, 2010, at 10:36 AM, Milan Hodoscek wrote:

>>>>>> "Ralph" == Ralph Castain <r...@open-mpi.org> writes:
> 
>    Ralph> I'm not sure why the group communicator would make a
>    Ralph> difference - the code area in question knows nothing about
>    Ralph> the mpi aspects of the job. It looks like you are hitting a
>    Ralph> race condition that causes a particular internal recv to
>    Ralph> not exist when we subsequently try to cancel it, which
>    Ralph> generates that error message.  How did you configure OMPI?
> 
> Thank you for the reply!
> 
> Must be some race problem, but I have no control of it, or do I?

Not really. What I don't understand is why your code would work fine when using 
comm_world, but encounter a race condition when using comm groups. There 
shouldn't be any timing difference between the two cases.

> 
> These are the configure options that gentoo compiles openmpi-1.4.2 with:
> 
> ./configure --prefix=/usr --build=x86_64-pc-linux-gnu 
> --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info 
> --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib 
> --libdir=/usr/lib64 --sysconfdir=/etc/openmpi --without-xgrid 
> --enable-pretty-print-stacktrace --enable-orterun-prefix-by-default 
> --without-slurm --enable-contrib-no-build=vt --enable-mpi-cxx 
> --disable-io-romio --disable-heterogeneous --without-tm --enable-ipv6
> 

This looks okay.

I'll have to take a look and see if I can spot something in the code...


Reply via email to