Hi Lucho,
Here is the output (two nodes in the cluster):
[EMAIL PROTECTED] examples]# xmvapich -D -a ./hellow
-pmi-> 0: cmd=initack pmiid=1
<-pmi- 0: cmd=initack rc=0
<-pmi- 0: cmd=set rc=0 size=2
<-pmi- 0: cmd=set rc=0 rank=0
<-pmi- 0: cmd=set rc=0 debug=0
-pmi-> 0: cmd=init pmi_version=1 pmi_subversion=1
<-pmi- 0: cmd=response_to_init rc=0
-pmi-> 0: cmd=get_maxes
<-pmi- 0: cmd=maxes rc=0 kvsname_max=64 keylen_max=64 vallen_max=64
-pmi-> 0: cmd=get_appnum
<-pmi- 0: cmd=appnum rc=0 appnum=0
-pmi-> 1: cmd=initack pmiid=1
<-pmi- 1: cmd=initack rc=0
<-pmi- 1: cmd=set rc=0 size=2
<-pmi- 1: cmd=set rc=0 rank=1
<-pmi- 1: cmd=set rc=0 debug=0
-pmi-> 0: cmd=get_my_kvsname
<-pmi- 0: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 0: cmd=get_my_kvsname
<-pmi- 0: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 1: cmd=init pmi_version=1 pmi_subversion=1
<-pmi- 1: cmd=response_to_init rc=0
-pmi-> 1: cmd=get_maxes
<-pmi- 1: cmd=maxes rc=0 kvsname_max=64 keylen_max=64 vallen_max=64
-pmi-> 1: cmd=get_appnum
<-pmi- 1: cmd=appnum rc=0 appnum=0
-pmi-> 0: cmd=put kvsname=kvs_0 key=P0-businesscard
value=port#38675$description#(none)$
<-pmi- 0: cmd=put_result rc=0
-pmi-> 1: cmd=get_my_kvsname
<-pmi- 1: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 0: cmd=barrier_in
-pmi-> 1: cmd=get_my_kvsname
<-pmi- 1: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 1: cmd=put kvsname=kvs_0 key=P1-businesscard
value=port#38697$description#(none)$
<-pmi- 1: cmd=put_result rc=0
-pmi-> 1: cmd=barrier_in
<-pmi- 0: cmd=barrier_out rc=0
<-pmi- 1: cmd=barrier_out rc=0
-pmi-> 0: cmd=get kvsname=kvs_0 key=P1-businesscard
<-pmi- 0: cmd=get_result rc=0 value=port#38697$description#(none)$
-pmi-> 1: cmd=get kvsname=kvs_0 key=P0-businesscard
<-pmi- 1: cmd=get_result rc=0 value=port#38675$description#(none)$
Hello world from process 1 of 2
Hello world from process 0 of 2
It looks like it ran, but then it hangs and never returns.
If I try to run another example (cpi), here is the output from the run
with a single process, and then with two:
[EMAIL PROTECTED] examples]# xmvapich n0001 ./cpi
Process 0 of 1 is on (none)
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.000313
[EMAIL PROTECTED] examples]# xmvapich -D n0001 ./cpi
-pmi-> 0: cmd=initack pmiid=1
<-pmi- 0: cmd=initack rc=0
<-pmi- 0: cmd=set rc=0 size=1
<-pmi- 0: cmd=set rc=0 rank=0
<-pmi- 0: cmd=set rc=0 debug=0
-pmi-> 0: cmd=init pmi_version=1 pmi_subversion=1
<-pmi- 0: cmd=response_to_init rc=0
-pmi-> 0: cmd=get_maxes
<-pmi- 0: cmd=maxes rc=0 kvsname_max=64 keylen_max=64 vallen_max=64
-pmi-> 0: cmd=get_appnum
<-pmi- 0: cmd=appnum rc=0 appnum=0
-pmi-> 0: cmd=get_my_kvsname
<-pmi- 0: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 0: cmd=get_my_kvsname
<-pmi- 0: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 0: cmd=put kvsname=kvs_0 key=P0-businesscard
value=port#48513$description#(none)$
<-pmi- 0: cmd=put_result rc=0
-pmi-> 0: cmd=barrier_in
<-pmi- 0: cmd=barrier_out rc=0
-pmi-> 0: cmd=finalize
<-pmi- 0: cmd=finalize_ack rc=0
Process 0 of 1 is on (none)
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.000332
[EMAIL PROTECTED] examples]
normal termination.
[EMAIL PROTECTED] examples]# xmvapich -D n0000,n0001 ./cpi
-pmi-> 0: cmd=initack pmiid=1
<-pmi- 0: cmd=initack rc=0
<-pmi- 0: cmd=set rc=0 size=2
<-pmi- 0: cmd=set rc=0 rank=0
<-pmi- 0: cmd=set rc=0 debug=0
-pmi-> 0: cmd=init pmi_version=1 pmi_subversion=1
<-pmi- 0: cmd=response_to_init rc=0
-pmi-> 0: cmd=get_maxes
<-pmi- 0: cmd=maxes rc=0 kvsname_max=64 keylen_max=64 vallen_max=64
-pmi-> 0: cmd=get_appnum
<-pmi- 0: cmd=appnum rc=0 appnum=0
-pmi-> 0: cmd=get_my_kvsname
<-pmi- 0: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 1: cmd=initack pmiid=1
<-pmi- 1: cmd=initack rc=0
<-pmi- 1: cmd=set rc=0 size=2
<-pmi- 1: cmd=set rc=0 rank=1
<-pmi- 1: cmd=set rc=0 debug=0
-pmi-> 0: cmd=get_my_kvsname
<-pmi- 0: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 1: cmd=init pmi_version=1 pmi_subversion=1
<-pmi- 1: cmd=response_to_init rc=0
-pmi-> 1: cmd=get_maxes
<-pmi- 1: cmd=maxes rc=0 kvsname_max=64 keylen_max=64 vallen_max=64
-pmi-> 0: cmd=put kvsname=kvs_0 key=P0-businesscard
value=port#45645$description#(none)$
<-pmi- 0: cmd=put_result rc=0
-pmi-> 1: cmd=get_appnum
<-pmi- 1: cmd=appnum rc=0 appnum=0
-pmi-> 0: cmd=barrier_in
-pmi-> 1: cmd=get_my_kvsname
<-pmi- 1: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 1: cmd=get_my_kvsname
<-pmi- 1: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 1: cmd=put kvsname=kvs_0 key=P1-businesscard
value=port#53467$description#(none)$
<-pmi- 1: cmd=put_result rc=0
-pmi-> 1: cmd=barrier_in
<-pmi- 0: cmd=barrier_out rc=0
<-pmi- 1: cmd=barrier_out rc=0
-pmi-> 0: cmd=get kvsname=kvs_0 key=P1-businesscard
<-pmi- 0: cmd=get_result rc=0 value=port#53467$description#(none)$
Process 0 of 2 is on (none)
Process 1 of 2 is on (none)
hung processes....
Daniel
On Wed, Oct 8, 2008 at 3:23 PM, Latchesar Ionkov <[EMAIL PROTECTED]> wrote:
>
> I can't replicate it, it is working fine here :(
> Can you please try xmvapich again with -D option and cut&paste the output?
>
> Thanks,
> Lucho
>
> On Oct 6, 2008, at 2:51 PM, Daniel Gruner wrote:
>
>>
>> I just compiled mpich2-1.1.0a1, and tested it, with the same result as
>> with mvapich. Again I had to do the configure with
>> --with-device=ch3:sock, since otherwise the runtime complains that it
>> can't allocate shared memory or some such thing. When I run a single
>> process using xmvapich it completes fine. However when running two or
>> more it hangs. This is not surprising as it should be the same as
>> mvapich when running over regular TCP/IP on GigE rather than a special
>> interconnect.
>>
>> [EMAIL PROTECTED] examples]# ./hellow
>> Hello world from process 0 of 1
>> [EMAIL PROTECTED] examples]# xmvapich -a ./hellow
>> Hello world from process 1 of 2
>> Hello world from process 0 of 2
>> ^C
>> [EMAIL PROTECTED] examples]# xmvapich n0000 ./hellow
>> Hello world from process 0 of 1
>> [EMAIL PROTECTED] examples]# xmvapich n0001 ./hellow
>> Hello world from process 0 of 1
>> [EMAIL PROTECTED] examples]# xmvapich n0000,n0001 ./hellow
>> Hello world from process 1 of 2
>> Hello world from process 0 of 2
>> ^C
>>
>> Daniel
>>
>>
>>
>> On 10/6/08, Latchesar Ionkov <[EMAIL PROTECTED]> wrote:
>>>
>>> I just compiled mpich2-1.1.0a1 and tried running hellow, everything looks
>>> fine:
>>>
>>> $ xmvapich m1,m2
>>> ~/work/mpich2-1.1.0a1/build/examples/hellow
>>> Hello world from process 0 of 2
>>> Hello world from process 1 of 2
>>> $
>>>
>>> I didn't set any special parameters when compiling, just ./configure.
>>>
>>> Thanks,
>>> Lucho
>>>
>>>
>>> On Oct 3, 2008, at 9:05 AM, Daniel Gruner wrote:
>>>
>>>
>>>>
>>>> Well, I just did the same, but with NO success... The processes are
>>>> apparently started, run at the beginning, but then they hang and do
>>>> not finalize. For example, running the "hellow" example from the
>>>> mvapich2 distribution:
>>>>
>>>> [EMAIL PROTECTED] examples]# cat hellow.c
>>>> /* -*- Mode: C; c-basic-offset:4 ; -*- */
>>>> /*
>>>> * (C) 2001 by Argonne National Laboratory.
>>>> * See COPYRIGHT in top-level directory.
>>>> */
>>>>
>>>> #include <stdio.h>
>>>> #include "mpi.h"
>>>>
>>>> int main( int argc, char *argv[] )
>>>> {
>>>> int rank;
>>>> int size;
>>>>
>>>> MPI_Init( 0, 0 );
>>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>> MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>> printf( "Hello world from process %d of %d\n", rank, size );
>>>> MPI_Finalize();
>>>> return 0;
>>>> }
>>>>
>>>> [EMAIL PROTECTED] examples]# make hellow
>>>> ../bin/mpicc -I../src/include -I../src/include -c hellow.c
>>>> ../bin/mpicc -o hellow hellow.o
>>>> [EMAIL PROTECTED] examples]# ./hellow
>>>> Hello world from process 0 of 1
>>>>
>>>> (this was fine, just running on the master). Running on the two nodes
>>>> requires that the xmvapich process be killed (ctrl-C):
>>>>
>>>> [EMAIL PROTECTED] examples]# xmvapich -ap ./hellow
>>>> n0000: Hello world from process 0 of 2
>>>> n0001: Hello world from process 1 of 2
>>>> [EMAIL PROTECTED] examples]#
>>>>
>>>> I have tried other codes, both in C and Fortran, with the same
>>>> behaviour. I don't know if the issue is with xmvapich or with
>>>> mvapich2. Communication is just GigE.
>>>>
>>>> Daniel
>>>>
>>>>
>>>> On 9/30/08, Abhishek Kulkarni <[EMAIL PROTECTED]> wrote:
>>>>
>>>>>
>>>>> Just gave this a quick try, and xmvapich seems to run MPI apps compiled
>>>>> with mpich2 without any issues.
>>>>>
>>>>> $ xmvapich -a ./mpihello
>>>>> blender: Hello World from process 0 of 1
>>>>> eregion: Hello World from process 0 of 1
>>>>>
>>>>> Hope that helps,
>>>>>
>>>>>
>>>>> -- Abhishek
>>>>>
>>>>>
>>>>> On Tue, 2008-09-30 at 17:02 +0200, Stefan Boresch wrote:
>>>>>
>>>>>> Thanks for the quick reply!
>>>>>>
>>>>>> On Tue, Sep 30, 2008 at 07:34:37AM -0700, ron minnich wrote:
>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 30, 2008 at 1:57 AM, stefan <[EMAIL PROTECTED]>
>>>
>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> the state of xcpu support with MPI libraries -- either of the
>>>
>>> common
>>>>>>>>
>>>>>>>> free ones
>>>>>>>> is fine (e.g., openmpi, mpich2)
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> there is now support for mpich2. openmpi is not supported as openmpi
>>>>>>> is (once again) in flux. it has been supported numerous times and
>>>
>>> has
>>>>>>>
>>>>>>> changed out from under us numerous times. I no longer use openmpi if
>>>
>>> I
>>>>>>>
>>>>>>> have a working mvapich or mpich available.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> I am slightly confused. I guess I had inferred the openmpi issues from
>>>>>> the various mailing lists. But I just looked at the latest mpich2
>>>>>> prerelease and found no mentioning of (s)xcpu(2). I thought that some
>>>>>> patches/support on the side of the mpi library are necessary (as,
>>>
>>> e.g.,
>>>>>>
>>>>>> openmpi provides for bproc ...) Or am I completely misunderstanding
>>>>>> something here, and this is somehow handled by xcpu itself ...
>>>>>> I guess there is some difference between
>>>>>>
>>>>>> xrx 192.168.19.2 /bin/date
>>>>>>
>>>>>> and
>>>>>>
>>>>>> xrx 192.168.19.2 <pathto>/mpiexec ...
>>>>>>
>>>>>> and the latter seems too magic to me to run out of the box (it sure
>>>>>> would be nice though ...)
>>>>>>
>>>>>> Sorry for making myself a nuisance -- thanks,
>>>>>>
>>>>>> Stefan Boresch
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>
>