Hmm. It *shouldn't* be related to the OS version. I'm using RHEL4 for my tests; RHEL5 performs pretty much the same way with regards to spawn/connect/accept. But then again, who knows? :-\

Can you try attaching a debugger to the hung processes to see where exactly they're hung? Perhaps step/next through a bit and see if you can get a gist of where OMPI is (apparently) looping?


On Mar 30, 2009, at 9:57 AM, Lionel Gamet wrote:

Hi Jeff and all members of the list,

You were perfectly right about the wrong string lengths, but even if
corrected, I do still have the same deadlock problems on this simple
child/parent process.
Could it be some bug specifically related to the CentOS 5.2 Linux
distribution ?

Best regards

Lionel

Jeff Squyres wrote:
> It does not hang for me...
>
> But I do notice one odd thing in your extended program: you send 3
> characters of the string "hi2" -- that will not include the trailing \0.
>
> You might want to send 4 characters to ensure to include the trailing \0.
>
>
>
> On Mar 25, 2009, at 9:52 AM, Lionel Gamet wrote:
>
>> Dear openmpi users and developers,
>>
>> I encounter dead-lock problems with spawn processes in openmpi, as
>> soon as more than one Send/Recv operation is done.
>>
>> The test case I used has been extracted from the MPICH2 examples. It is >> a simple parent/child program. The original version (see attached file
>> parent+child_from_MPICH2.tar.gz) works well under openmpi.
>> I use commands in run.cmd to compile and execute this example.
>>
>> I have tried to add one more communication by duplicating the send/recv >> calls of the original MPICH2 source (see modified files in attached tar
>> archive parent+child_with_more_send_recv.tar.gz) and get dead-lock
>> problems when executing this modified version ...
>>
>> Can anybody reproduce this ? I am using openmpi version 1.3 on a
>> Linux CentOS 5.2 (i386), with all updates of the distribution done.
>> See also attached file ompi_info.txt.gz (result of the command ompi_info
>> --all).
>>
>> Thanks in advance for any hints,
>>
>> Best regards
>>
>> Lionel
>> <parent+child_from_MPICH2.tar.gz><parent + child_with_more_send_recv .tar.gz><ompi_info.txt.gz><Lionel_Gamet.vcf><ATT7299515.txt>
>>
>
>

<Lionel_Gamet.vcf><ATT8227055.txt>


--
Jeff Squyres
Cisco Systems

Reply via email to