>>>>> "Ralph" == Ralph Castain <r...@open-mpi.org> writes:
Ralph> On Oct 4, 2010, at 10:36 AM, Milan Hodoscek wrote: >>>>>>> "Ralph" == Ralph Castain <r...@open-mpi.org> writes: >> Ralph> I'm not sure why the group communicator would make a Ralph> difference - the code area in question knows nothing about Ralph> the mpi aspects of the job. It looks like you are hitting a Ralph> race condition that causes a particular internal recv to Ralph> not exist when we subsequently try to cancel it, which Ralph> generates that error message. How did you configure OMPI? >> >> Thank you for the reply! >> >> Must be some race problem, but I have no control of it, or do >> I? Ralph> Not really. What I don't understand is why your code would Ralph> work fine when using comm_world, but encounter a race Ralph> condition when using comm groups. There shouldn't be any Ralph> timing difference between the two cases. Fixing race condition is sometime easy by puting some variables into the arrays. I just did for one of them but it didn't help. I'll do some more testing in this direction, but I am running out of ideas. When you put ngrp=1 and uncomment the other mpi_comm_spawn line in the program you basically get only one spawn, so no opportunity for race condition. But in my real project I usually work with many spawn calls, however all using mpi_comm_world, but running different programs, etc. And that always works. This time I want to localize mpi_comm_spawns by similar trick that is in the program I sent. So this small test case is a good model of what I would like to have. I studied the MPI-2 standard and I think I got it right, but one never knows... Ralph> I'll have to take a look and see if I can spot something in Ralph> the code... Thanks a lot -- Milan