On Oct 4, 2010, at 10:36 AM, Milan Hodoscek wrote: >>>>>> "Ralph" == Ralph Castain <r...@open-mpi.org> writes: > > Ralph> I'm not sure why the group communicator would make a > Ralph> difference - the code area in question knows nothing about > Ralph> the mpi aspects of the job. It looks like you are hitting a > Ralph> race condition that causes a particular internal recv to > Ralph> not exist when we subsequently try to cancel it, which > Ralph> generates that error message. How did you configure OMPI? > > Thank you for the reply! > > Must be some race problem, but I have no control of it, or do I?
Not really. What I don't understand is why your code would work fine when using comm_world, but encounter a race condition when using comm groups. There shouldn't be any timing difference between the two cases. > > These are the configure options that gentoo compiles openmpi-1.4.2 with: > > ./configure --prefix=/usr --build=x86_64-pc-linux-gnu > --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info > --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib > --libdir=/usr/lib64 --sysconfdir=/etc/openmpi --without-xgrid > --enable-pretty-print-stacktrace --enable-orterun-prefix-by-default > --without-slurm --enable-contrib-no-build=vt --enable-mpi-cxx > --disable-io-romio --disable-heterogeneous --without-tm --enable-ipv6 > This looks okay. I'll have to take a look and see if I can spot something in the code...