xmvapich localhost runs fine here. What version of mpich are you running?

Thanks,
        Lucho

On Nov 5, 2008, at 3:26 PM, Daniel Gruner wrote:


I am stumped...

I just tried making the head node part of the xcpu set, by running
xcpufs and adding it to the statfs.conf file.  For the most part this
works, but xmvapich bombs when I try to run a process on the head
node:

[EMAIL PROTECTED] examples]# xmvapich localhost ./hellow
*** glibc detected *** hellow: double free or corruption (fasttop):
0x000000000069e810 ***
======= Backtrace: =========
/tmp/xcpu-SYGWM5/libc.so.6[0x3eca071634]
/tmp/xcpu-SYGWM5/libc.so.6(cfree+0x8c)[0x3eca074c5c]
hellow[0x423227]
hellow[0x40895d]
hellow[0x405d75]
hellow[0x405885]
hellow[0x401b96]
/tmp/xcpu-SYGWM5/libc.so.6(__libc_start_main+0xf4)[0x3eca01d8b4]
hellow[0x401ac9]
======= Memory map: ========
00400000-00477000 r-xp 00000000 09:01 5401493
 /tmp/xcpu-SYGWM5/hellow
00677000-00678000 rw-p 00077000 09:01 5401493
 /tmp/xcpu-SYGWM5/hellow
00678000-006bd000 rw-p 00678000 00:00 0 [heap]
3ec9000000-3ec901a000 r-xp 00000000 09:01 11261457
 /lib64/ld-2.5.so
3ec921a000-3ec921b000 r--p 0001a000 09:01 11261457
 /lib64/ld-2.5.so
3ec921b000-3ec921c000 rw-p 0001b000 09:01 11261457
 /lib64/ld-2.5.so
3eca000000-3eca14a000 r-xp 00000000 09:01 5401500
 /tmp/xcpu-SYGWM5/libc.so.6
3eca14a000-3eca349000 ---p 0014a000 09:01 5401500
 /tmp/xcpu-SYGWM5/libc.so.6
3eca349000-3eca34d000 r--p 00149000 09:01 5401500
 /tmp/xcpu-SYGWM5/libc.so.6
3eca34d000-3eca34e000 rw-p 0014d000 09:01 5401500
 /tmp/xcpu-SYGWM5/libc.so.6
3eca34e000-3eca353000 rw-p 3eca34e000 00:00 0
3ecac00000-3ecac15000 r-xp 00000000 09:01 5401494
 /tmp/xcpu-SYGWM5/libpthread.so.0
3ecac15000-3ecae14000 ---p 00015000 09:01 5401494
 /tmp/xcpu-SYGWM5/libpthread.so.0
3ecae14000-3ecae15000 r--p 00014000 09:01 5401494
 /tmp/xcpu-SYGWM5/libpthread.so.0
3ecae15000-3ecae16000 rw-p 00015000 09:01 5401494
 /tmp/xcpu-SYGWM5/libpthread.so.0
3ecae16000-3ecae1a000 rw-p 3ecae16000 00:00 0
3ece800000-3ece807000 r-xp 00000000 09:01 5401498
 /tmp/xcpu-SYGWM5/librt.so.1
3ece807000-3ecea07000 ---p 00007000 09:01 5401498
 /tmp/xcpu-SYGWM5/librt.so.1
3ecea07000-3ecea08000 r--p 00007000 09:01 5401498
 /tmp/xcpu-SYGWM5/librt.so.1
3ecea08000-3ecea09000 rw-p 00008000 09:01 5401498
 /tmp/xcpu-SYGWM5/librt.so.1
3ecf800000-3ecf80d000 r-xp 00000000 09:01 11261534
 /lib64/libgcc_s-4.1.2-20080102.so.1
3ecf80d000-3ecfa0d000 ---p 0000d000 09:01 11261534
 /lib64/libgcc_s-4.1.2-20080102.so.1
3ecfa0d000-3ecfa0e000 rw-p 0000d000 09:01 11261534
 /lib64/libgcc_s-4.1.2-20080102.so.1
7fe378000000-7fe378021000 rw-p 7fe378000000 00:00 0
7fe378021000-7fe37c000000 ---p 7fe378021000 00:00 0
7fe37fcd4000-7fe37fcde000 r-xp 00000000 09:01 11261212
 /lib64/libnss_files-2.5.so
7fe37fcde000-7fe37fedd000 ---p 0000a000 09:01 11261212
 /lib64/libnss_files-2.5.so
7fe37fedd000-7fe37fede000 r--p 00009000 09:01 11261212
 /lib64/libnss_files-2.5.so
7fe37fede000-7fe37fedf000 rw-p 0000a000 09:01 11261212
 /lib64/libnss_files-2.5.so
7fe37ff09000-7fe37ff0e000 rw-p 7fe37ff09000 00:00 0
7fff87ef9000-7fff87f0e000 rw-p 7ffffffea000 00:00 0 [stack] 7fff87fff000-7fff88000000 r-xp 7fff87fff000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0
 [vsyscall]

I don't know how to attach gdb to the processes running on the slaves,
as you correctly assumed...



On 11/5/08, Latchesar Ionkov <[EMAIL PROTECTED]> wrote:

I don't think there is anything different when running on the same node, at least none in xmvapich. I am running a manually built diskless cluster, no Perceus or anything like that. The kernel is 2.6.239, mpich2 1.1.0a1. I am
not sure I can make much use of it, but can you attach to each of the
processes with gdb and send me a backtrace. You can try running it on the
head node, I guess it would be harder to use gdb on the compute node.

Thanks,
       Lucho


On Nov 5, 2008, at 2:51 PM, Daniel Gruner wrote:



What is different when the processes run on the same node from when
they run on separate nodes? Also, what is your OS version? Any other
suggestions on how I could help debug this?

Daniel

On 11/5/08, Latchesar Ionkov <[EMAIL PROTECTED]> wrote:


The get_result lines are OK, for some reason the processes don't send
"finalize". I can't reproduce it :(


On Nov 5, 2008, at 12:42 PM, Daniel Gruner wrote:




I my case it looks the same, except for the cmd=finalize  stuff:

-pmi-> 1: cmd=get_appnum
<-pmi- 1: cmd=appnum rc=0 appnum=0
-pmi-> 0: cmd=get_my_kvsname
<-pmi- 0: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 1: cmd=get_my_kvsname
<-pmi- 1: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 0: cmd=put kvsname=kvs_0 key=P0-businesscard
value=port#50475$description#n0000$ifname#10.10.0.10$
<-pmi- 0: cmd=put_result rc=0
-pmi-> 1: cmd=get_my_kvsname
<-pmi- 1: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 0: cmd=barrier_in
-pmi-> 1: cmd=put kvsname=kvs_0 key=P1-businesscard
value=port#44398$description#n0000$ifname#10.10.0.10$
<-pmi- 1: cmd=put_result rc=0
-pmi-> 1: cmd=barrier_in
<-pmi- 0: cmd=barrier_out rc=0
<-pmi- 1: cmd=barrier_out rc=0
-pmi-> 0: cmd=get kvsname=kvs_0 key=P1-businesscard
<-pmi- 0: cmd=get_result rc=0
value=port#44398$description#n0000$ifname#10.10.0.10$
-pmi-> 1: cmd=get kvsname=kvs_0 key=P0-businesscard
<-pmi- 1: cmd=get_result rc=0
value=port#50475$description#n0000$ifname#10.10.0.10$
Hello world from process 1 of 2
Hello world from process 0 of 2
[EMAIL PROTECTED] examples]#

and here it hangs...  Actually, there are two extra cmd=get_result
rc=0 lines... ???



On 11/5/08, Latchesar Ionkov <[EMAIL PROTECTED]> wrote:



Strange. This is what I see from cmd=barrier_out to the end of


execution:



<-pmi- 0: cmd=barrier_out rc=0
<-pmi- 1: cmd=barrier_out rc=0
-pmi-> 1: cmd=get kvsname=kvs_0 key=P0-businesscard
<-pmi- 1: cmd=get_result rc=0

value=port#58977$description#m10$ifname#192.168.1.110$
-pmi-> 0: cmd=get kvsname=kvs_0 key=P1-businesscard
<-pmi- 0: cmd=get_result rc=0

value=port#53028$description#m10$ifname#192.168.1.110$
-pmi-> 1: cmd=finalize
<-pmi- 1: cmd=finalize_ack rc=0
-pmi-> 0: cmd=finalize
<-pmi- 0: cmd=finalize_ack rc=0
Hello world from process 1 of 2
Hello world from process 0 of 2


On Nov 5, 2008, at 12:17 PM, Daniel Gruner wrote:





Ok, some progress.  I am now able to run things like:

[EMAIL PROTECTED] examples]# xmvapich n0000,n0001 ./cpi
Process 0 of 2 is on n0000
pi is approximately 3.1415926544231318, Error is
0.0000000008333387
wall clock time = 0.000757
Process 1 of 2 is on n0001

as long as the two nodes specified are different. If, however, I
want
to run two processes on the same node, e.g.:

[EMAIL PROTECTED] examples]# xmvapich n0000,n0000 ./cpi
Process 1 of 2 is on n0000
Process 0 of 2 is on n0000

It hangs as before.  Here is the debugging trace:

[EMAIL PROTECTED] examples]# xmvapich -D n0000,n0000 ./cpi
-pmi-> 0: cmd=initack pmiid=1
<-pmi- 0: cmd=initack rc=0
<-pmi- 0: cmd=set rc=0 size=2
<-pmi- 0: cmd=set rc=0 rank=0
<-pmi- 0: cmd=set rc=0 debug=0
-pmi-> 1: cmd=initack pmiid=1
<-pmi- 1: cmd=initack rc=0
<-pmi- 1: cmd=set rc=0 size=2
<-pmi- 1: cmd=set rc=0 rank=1
<-pmi- 1: cmd=set rc=0 debug=0
-pmi-> 0: cmd=init pmi_version=1 pmi_subversion=1
<-pmi- 0: cmd=response_to_init rc=0
-pmi-> 0: cmd=get_maxes
<-pmi- 0: cmd=maxes rc=0 kvsname_max=64 keylen_max=64
vallen_max=64
-pmi-> 1: cmd=init pmi_version=1 pmi_subversion=1
<-pmi- 1: cmd=response_to_init rc=0
-pmi-> 1: cmd=get_maxes
<-pmi- 1: cmd=maxes rc=0 kvsname_max=64 keylen_max=64
vallen_max=64
-pmi-> 0: cmd=get_appnum
<-pmi- 0: cmd=appnum rc=0 appnum=0
-pmi-> 0: cmd=get_my_kvsname
<-pmi- 0: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 1: cmd=get_appnum
<-pmi- 1: cmd=appnum rc=0 appnum=0
-pmi-> 1: cmd=get_my_kvsname
<-pmi- 1: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 0: cmd=get_my_kvsname
<-pmi- 0: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 1: cmd=get_my_kvsname
<-pmi- 1: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 0: cmd=put kvsname=kvs_0 key=P0-businesscard

value=port#45956$description#n0000$ifname#10.10.0.10$
<-pmi- 0: cmd=put_result rc=0
-pmi-> 1: cmd=put kvsname=kvs_0 key=P1-businesscard

value=port#38363$description#n0000$ifname#10.10.0.10$
<-pmi- 1: cmd=put_result rc=0
-pmi-> 0: cmd=barrier_in
-pmi-> 1: cmd=barrier_in
<-pmi- 0: cmd=barrier_out rc=0
<-pmi- 1: cmd=barrier_out rc=0
-pmi-> 0: cmd=get kvsname=kvs_0 key=P1-businesscard
<-pmi- 0: cmd=get_result rc=0

value=port#38363$description#n0000$ifname#10.10.0.10$
Process 1 of 2 is on n0000
Process 0 of 2 is on n0000
[EMAIL PROTECTED] examples]#



On 11/5/08, Daniel Gruner <[EMAIL PROTECTED]> wrote:



That is what I was going for...  It returns (none).

I am about to run the test after explicitly setting up the
hostnames
of the nodes.
Does xmvapich probe the nodes for their names?  How does it
resolve
their addresses?


Daniel

On 11/5/08, Latchesar Ionkov <[EMAIL PROTECTED]> wrote:




I guess that is the problem. What do you see if you do:

 xrx n0000 hostname

Thanks,
 Lucho


On Nov 5, 2008, at 12:02 PM, Daniel Gruner wrote:






Hi Lucho,

I am provisioning with perceus, and in order to get static
node
addresses I have entries in /etc/hosts that define them,
e.g.:

10.10.0.10      n0000
10.10.0.11      n0001
10.10.0.12      n0002

My /etc/nsswitch.conf is set to resolv hosts like:

hosts:      files dns

One thing I have noticed is that the nodes do not have their
own
hostname defined after provisioning.  Could this be the
problem?

Thanks,
Daniel
On 11/5/08, Latchesar Ionkov <[EMAIL PROTECTED]> wrote:





Hi,

It looks like the MPI processes on the nodes don't send a







correct

















IP










address to connect to. In your case, they send:






-pmi-> 0: cmd=put kvsname=kvs_0 key=P0-businesscard




value=port#38675$description#(none)$











And when I run it, I see:

-pmi-> 0: cmd=put kvsname=kvs_0 key=P0-businesscard












value=port#34283$description#m10$ifname#192.168.1.110$











I tried to figure out how does mpich pick the IP address,
and







it

















looks
















like






it uses the hostname on the node for that. Do you have the







node

















names
















setup






correctly?

Thanks,
Lucho

On Nov 4, 2008, at 1:31 PM, Daniel Gruner wrote:







Hi Lucho,

Did you have a chance to look at this?  Needless to say
it








has




















been












quite frustrating, and perhaps it has to do with the








particular




















Linux












distribution you run.  I am running on a RHEL5.2 system
with












kernel












2.6.26, and the compilation of mpich2 or mvapich2 is
totally












vanilla.












My network is just GigE.  xmvapich works for a single








process,




















but it












always hangs for more than one, regardless of whether
they








are




















on the












same node or separate nodes, and independently of the








example




















program












(hellow, cpi, etc).  Other than some administration
issues








(like




















the












authentication stuff I have been exchanging with
Abhishek












about), this












is the only real obstacle to making my clusters suitable
for
production...

Thanks,
Daniel

---------- Forwarded message ----------
From: Daniel Gruner <[EMAIL PROTECTED]>
Date: Oct 8, 2008 2:49 PM
Subject: Re: [xcpu] Re: (s)xcpu and MPI
To: [email protected]


Hi Lucho,

Here is the output (two nodes in the cluster):

[EMAIL PROTECTED] examples]# xmvapich -D -a ./hellow
-pmi-> 0: cmd=initack pmiid=1
<-pmi- 0: cmd=initack rc=0
<-pmi- 0: cmd=set rc=0 size=2
<-pmi- 0: cmd=set rc=0 rank=0
<-pmi- 0: cmd=set rc=0 debug=0
-pmi-> 0: cmd=init pmi_version=1 pmi_subversion=1
<-pmi- 0: cmd=response_to_init rc=0
-pmi-> 0: cmd=get_maxes
<-pmi- 0: cmd=maxes rc=0 kvsname_max=64 keylen_max=64












vallen_max=64












-pmi-> 0: cmd=get_appnum
<-pmi- 0: cmd=appnum rc=0 appnum=0
-pmi-> 1: cmd=initack pmiid=1
<-pmi- 1: cmd=initack rc=0
<-pmi- 1: cmd=set rc=0 size=2
<-pmi- 1: cmd=set rc=0 rank=1
<-pmi- 1: cmd=set rc=0 debug=0
-pmi-> 0: cmd=get_my_kvsname
<-pmi- 0: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 0: cmd=get_my_kvsname
<-pmi- 0: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 1: cmd=init pmi_version=1 pmi_subversion=1
<-pmi- 1: cmd=response_to_init rc=0
-pmi-> 1: cmd=get_maxes
<-pmi- 1: cmd=maxes rc=0 kvsname_max=64 keylen_max=64












vallen_max=64












-pmi-> 1: cmd=get_appnum
<-pmi- 1: cmd=appnum rc=0 appnum=0
-pmi-> 0: cmd=put kvsname=kvs_0 key=P0-businesscard
value=port#38675$description#(none)$
<-pmi- 0: cmd=put_result rc=0
-pmi-> 1: cmd=get_my_kvsname
<-pmi- 1: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 0: cmd=barrier_in
-pmi-> 1: cmd=get_my_kvsname
<-pmi- 1: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 1: cmd=put kvsname=kvs_0 key=P1-businesscard
value=port#38697$description#(none)$
<-pmi- 1: cmd=put_result rc=0
-pmi-> 1: cmd=barrier_in
<-pmi- 0: cmd=barrier_out rc=0
<-pmi- 1: cmd=barrier_out rc=0
-pmi-> 0: cmd=get kvsname=kvs_0 key=P1-businesscard
<-pmi- 0: cmd=get_result rc=0




value=port#38697$description#(none)$




-pmi-> 1: cmd=get kvsname=kvs_0 key=P0-businesscard
<-pmi- 1: cmd=get_result rc=0




value=port#38675$description#(none)$





Hello world from process 1 of 2
Hello world from process 0 of 2


It looks like it ran, but then it hangs and never
returns.

If I try to run another example (cpi), here is the
output








from




















the run












with a single process, and then with two:

[EMAIL PROTECTED] examples]# xmvapich n0001 ./cpi
Process 0 of 1 is on (none)
pi is approximately 3.1415926544231341, Error is












0.0000000008333410












wall clock time = 0.000313
[EMAIL PROTECTED] examples]# xmvapich -D n0001 ./cpi
-pmi-> 0: cmd=initack pmiid=1
<-pmi- 0: cmd=initack rc=0
<-pmi- 0: cmd=set rc=0 size=1
<-pmi- 0: cmd=set rc=0 rank=0
<-pmi- 0: cmd=set rc=0 debug=0
-pmi-> 0: cmd=init pmi_version=1 pmi_subversion=1
<-pmi- 0: cmd=response_to_init rc=0
-pmi-> 0: cmd=get_maxes
<-pmi- 0: cmd=maxes rc=0 kvsname_max=64 keylen_max=64












vallen_max=64












-pmi-> 0: cmd=get_appnum
<-pmi- 0: cmd=appnum rc=0 appnum=0
-pmi-> 0: cmd=get_my_kvsname
<-pmi- 0: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 0: cmd=get_my_kvsname
<-pmi- 0: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 0: cmd=put kvsname=kvs_0 key=P0-businesscard
value=port#48513$description#(none)$
<-pmi- 0: cmd=put_result rc=0
-pmi-> 0: cmd=barrier_in
<-pmi- 0: cmd=barrier_out rc=0
-pmi-> 0: cmd=finalize
<-pmi- 0: cmd=finalize_ack rc=0
Process 0 of 1 is on (none)
pi is approximately 3.1415926544231341, Error is












0.0000000008333410












wall clock time = 0.000332
[EMAIL PROTECTED] examples]

normal termination.

[EMAIL PROTECTED] examples]# xmvapich -D n0000,n0001 ./cpi
-pmi-> 0: cmd=initack pmiid=1
<-pmi- 0: cmd=initack rc=0
<-pmi- 0: cmd=set rc=0 size=2
<-pmi- 0: cmd=set rc=0 rank=0
<-pmi- 0: cmd=set rc=0 debug=0
-pmi-> 0: cmd=init pmi_version=1 pmi_subversion=1
<-pmi- 0: cmd=response_to_init rc=0
-pmi-> 0: cmd=get_maxes
<-pmi- 0: cmd=maxes rc=0 kvsname_max=64 keylen_max=64












vallen_max=64












-pmi-> 0: cmd=get_appnum
<-pmi- 0: cmd=appnum rc=0 appnum=0
-pmi-> 0: cmd=get_my_kvsname
<-pmi- 0: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 1: cmd=initack pmiid=1
<-pmi- 1: cmd=initack rc=0
<-pmi- 1: cmd=set rc=0 size=2
<-pmi- 1: cmd=set rc=0 rank=1
<-pmi- 1: cmd=set rc=0 debug=0
-pmi-> 0: cmd=get_my_kvsname
<-pmi- 0: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 1: cmd=init pmi_version=1 pmi_subversion=1
<-pmi- 1: cmd=response_to_init rc=0
-pmi-> 1: cmd=get_maxes
<-pmi- 1: cmd=maxes rc=0 kvsname_max=64 keylen_max=64












vallen_max=64












-pmi-> 0: cmd=put kvsname=kvs_0 key=P0-businesscard
value=port#45645$description#(none)$
<-pmi- 0: cmd=put_result rc=0
-pmi-> 1: cmd=get_appnum
<-pmi- 1: cmd=appnum rc=0 appnum=0
-pmi-> 0: cmd=barrier_in
-pmi-> 1: cmd=get_my_kvsname
<-pmi- 1: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 1: cmd=get_my_kvsname
<-pmi- 1: cmd=my_kvsname rc=0 kvsname=kvs_0
-pmi-> 1: cmd=put kvsname=kvs_0 key=P1-businesscard
value=port#53467$description#(none)$
<-pmi- 1: cmd=put_result rc=0
-pmi-> 1: cmd=barrier_in
<-pmi- 0: cmd=barrier_out rc=0
<-pmi- 1: cmd=barrier_out rc=0
-pmi-> 0: cmd=get kvsname=kvs_0 key=P1-businesscard
<-pmi- 0: cmd=get_result rc=0




value=port#53467$description#(none)$




Process 0 of 2 is on (none)
Process 1 of 2 is on (none)

hung processes....


Daniel


On Wed, Oct 8, 2008 at 3:23 PM, Latchesar Ionkov












<[EMAIL PROTECTED]>





















wrote:















I can't replicate it, it is working fine here :(
Can you please try xmvapich again with -D option and









cut&paste























the






















output?









Thanks,
Lucho

On Oct 6, 2008, at 2:51 PM, Daniel Gruner wrote:







I just compiled mpich2-1.1.0a1, and tested it, with
the










same

























result as















with mvapich.  Again I had to do the configure with
--with-device=ch3:sock, since otherwise the runtime
















complains that































it















can't allocate shared memory or some such thing.
When I










run


























a































single















process using xmvapich it completes fine.  However
when
















running































two or















more it hangs.  This is not surprising as it should
be










the


























same as
















mvapich when running over regular TCP/IP on GigE
rather










than


























a































special















interconnect.

[EMAIL PROTECTED] examples]# ./hellow
Hello world from process 0 of 1
[EMAIL PROTECTED] examples]# xmvapich -a ./hellow
Hello world from process 1 of 2
Hello world from process 0 of 2
^C
[EMAIL PROTECTED] examples]# xmvapich n0000 ./hellow
Hello world from process 0 of 1
[EMAIL PROTECTED] examples]# xmvapich n0001 ./hellow
Hello world from process 0 of 1
[EMAIL PROTECTED] examples]# xmvapich n0000,n0001 ./hellow
Hello world from process 1 of 2
Hello world from process 0 of 2
^C

Daniel



On 10/6/08, Latchesar Ionkov <[EMAIL PROTECTED]>
wrote:






I just compiled mpich2-1.1.0a1 and tried running











hellow,





























everything


































looks
















fine:

$ xmvapich m1,m2



















~/work/mpich2-1.1.0a1/build/examples/hellow


















Hello world from process 0 of 2
Hello world from process 1 of 2
$

I didn't set any special parameters when
compiling,











just



























./configure.

















Thanks,
Lucho


On Oct 3, 2008, at 9:05 AM, Daniel Gruner wrote:








Well, I just did the same, but with NO
success...












The

































processes









































are




















apparently started, run at the beginning, but
then












they
































hang









































and









































do




















not finalize.  For example, running the "hellow"












example
































from









































the





















mvapich2 distribution:

[EMAIL PROTECTED] examples]# cat hellow.c
/* -*- Mode: C; c-basic-offset:4 ; -*- */
/*
*  (C) 2001 by Argonne National Laboratory.
*      See COPYRIGHT in top-level directory.
*/

#include <stdio.h>
#include "mpi.h"

int main( int argc, char *argv[] )
{
int rank;
int size;

MPI_Init( 0, 0 );
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf( "Hello world from process %d of %d\n",
rank,




















size );




















MPI_Finalize();
return 0;
}

[EMAIL PROTECTED] examples]# make hellow
../bin/mpicc  -I../src/include -I../src/include
-c




















hellow.c




















../bin/mpicc   -o hellow hellow.o
[EMAIL PROTECTED] examples]# ./hellow
Hello world from process 0 of 1

(this was fine, just running on the master).












Running on
































the









































two









































nodes




















requires that the xmvapich process be killed












(ctrl-C):













[EMAIL PROTECTED] examples]# xmvapich -ap ./hellow
n0000: Hello world from process 0 of 2
n0001: Hello world from process 1 of 2
[EMAIL PROTECTED] examples]#

I have tried other codes, both in C and Fortran,












with
































the same




















behaviour.  I don't know if the issue is with












xmvapich
































or with




















mvapich2.  Communication is just GigE.

Daniel


On 9/30/08, Abhishek Kulkarni
<[EMAIL PROTECTED]>












wrote:



















Just gave this a quick try, and xmvapich seems
to













run



































MPI














































apps






































...

[Message clipped]

Reply via email to