Re: [OMPI users] Running HPL on RPi cluster, seems like MPI is somehow not configured properly since it work with 1 node but not more

2015-05-27 Thread Jeff Squyres (jsquyres)
A few points:

- Just to clarify: Open MPI and MPICH are entirely different code bases / 
entirely different MPI implementations.  They both implement the same C and 
Fortran APIs that can be used by applications (i.e., they're *source code 
compatible*), but they are otherwise not compatible at all.  Hence, you have to 
use entirely one MPI implementation or the other (e.g., use Open MPI or use 
MPICH -- don't use both at the same time).

--> That being said, you can build xhpl for Open MPI and rename the executable 
xhpl.openmpi, and then build xhpl again for MPICH and rename the executable 
xhpl.mpich, and then you can use the appropriate mpirun or mpiexec to launch 
the executable that you want to invoke (e.g., use Open MPI's mpirun to launch 
xhpl.openmpi and use MPICH's mpiexec to launch xhpl.mpich).

- In Open MPI, mpirun and mpiexec are sym links to the same executable.  
Meaning: they're exactly equivalent.  I don't know offhand if the same is true 
for MPICH -- I have a dim recollection that MPICH prefers "mpiexec" -- I don't 
know if they still have "mpirun".  Check their docs.

- ldd takes the absolute name of an executable.  If "mpirun" or "mpiexe" is not 
in your current directory, you likely need to give its full path (which is why 
"ldd mpirun" failed; the error message indicates that there is no "mpirun" in 
the . directory).

- The ldd of xhpl shows that it is linked against libmpich -- which is 
definitely an MPICH library, not an Open MPI library.

- Hence, if you're using Open MPI's mpirun and an MPICH-compiled XHPL, this is 
why things are failing.  You need to use a single MPI implementation's wrapper 
compilers and mpirun/mpiexec -- you can't build with one MPI implementation and 
then launch with the other.  Open MPI and MPICH are not compatible in that way.



> On May 27, 2015, at 12:47 PM, Heerdt, Lanze M.  wrote:
> 
> I ran<-tag-output and ldd.PNG>
mpirun -machinefile ~/machinefile -np 4 -tag-output xhpl and just to be sure I 
ran the same thing with mpiexec (because I think I have it set up to use mpich 
and not openmpi, correct if I am wrong but the idea is the same?) and tried the 
ldd mpirun but that didn’t work at all

 

In the second image I got some feedback from the ldd xhpl and also have my 
HPL.dat shown with p and q equal to 2. Like I said, running with that HPL.dat 
and

mpiexec -machinefile ~/machinefile -n 4 xhpl

it just gives me the same error

 

Thank you for responding so quickly by the way :) you guys are a live saver.

 

-Lanze

 

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gilles Gouaillardet
Sent: Tuesday, May 26, 2015 10:08 PM
To: Open MPI Users
Subject: Re: [OMPI users] Running HPL on RPi cluster, seems like MPI is somehow 
not configured properly since it work with 1 node but not more

 

First you can run
mpirun -machinefile ~/machinefile -np 4 -tag-output xhpl

if all tasks report they believe they are task 0, then this is the origin of 
the problem.

then you can run
ldd mpirun
ldd xphl
they should use the same mpi flavor

then
mpirun -machinefile ~/machinefile -np 4 -tag-output ldd xhpl

and make sure xhpl use the very same mpi flavor all the nodes


HPL make process can be error prone, especially if you modify some config file 
/ arch in the middle.
a simple option is to rebuild xhpl from scratch and with OpenMPI

you can also post your HPL.dat and i will have a look

Cheers,

Gilles

On 5/27/2015 10:38 AM, Heerdt, Lanze M. wrote:

I have run a hello world program for any number of processes. If I say “–n 16” 
I get 4 responses from each node saying “Hello world! I am process (0-15) of 16 
on RPI-0(1-4)” so I know the cluster Can work how I want it to. I also tested 
with just normal hostname and I see the names of each of the 4 Pis as a 
response.

 

As a response to the illegal entry in HPL.dat, that doesn’t really make much 
sense since I run it just fine with p =1 and q =1, it only says that when I 
change p and q to 2, which I know is not an illegal entry

 

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gilles Gouaillardet
Sent: Tuesday, May 26, 2015 8:14 PM
To: Open MPI Users
Subject: Re: [OMPI users] Running HPL on RPi cluster, seems like MPI is somehow 
not configured properly since it work with 1 node but not more

 

At first glance, it seems all mpi tasks believe they are rank zero and comm 
world size is 1 (!)

Did you compile xhpl with OpenMPI (and not a stub library for serial version 
only) ?
can you make sure there is nothing wrong with your LD_LIBRARY_PATH and you do 
not mix MPI librairies
(e.g. OpenMPI mpirun but xhpl ends up using mpich, or the other way around)

As already suggested by Ralph, i would start by running a hello world program
(just print rank and size to confirm it works)

Cheers,

Gilles



On 5/27/2015 8:42 AM, Ralph Castain wrote:

I don't know enough about HPL to resolve the problem. However, I would suggest 
that you first just try to run the example 

Re: [OMPI users] Running HPL on RPi cluster, seems like MPI is somehow not configured properly since it work with 1 node but not more

2015-05-26 Thread Jeff Squyres (jsquyres)
I agree with Gilles -- when you compile with one MPI implementation, but then 
accidentally use the mpirun/mpiexec from a different MPI implementation to 
launch it, it's quite a common symptom to see an MPI_COMM_WORLD size of 1 
(i.e., each MPI process is rank 0 in MPI_COMM_WORLD).

Make sure that you're using the mpifort and mpirun from the same MPI 
implementation (e.g., Open MPI).  You might want to re-build HPL from scratch 
and ensure that you are using a specific mpifort, and then be absolutely 100% 
sure to re-run the resulting HPL with the mpirun from that same MPI 
implementation.


> On May 26, 2015, at 9:38 PM, Heerdt, Lanze M.  wrote:
> 
> I have run a hello world program for any number of processes. If I say “–n 
> 16” I get 4 responses from each node saying “Hello world! I am process (0-15) 
> of response.
> 
>  
> 
> As a response to the illegal entry in HPL.dat, that doesn’t really make much 
> sense since I run it just fine with p =1 and q =1, it only says that when I 
> changeand q to 2, which I know is not an illegal entry
> 
>  
> 
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gilles 
> Gouaillardet
> Sent: Tuesday, May 26, 2015 8:14 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] Running HPL on RPi cluster, seems like MPI is 
> somehow not configured properly since it work with 1 node but not more
> 
>  
> 
> At first glance, it seems all mpi tasks believe they are rank zero and comm 
> world size is 1 (!)
> 
> Did you compile xhpl with OpenMPI (and not a stub library for serial version 
> only) ?
> can you make sure there is nothing wrong with your LD_LIBRARY_PATH and you do 
> not mix MPI librairies
> (e.g. OpenMPI mpirun but xhpl ends up using mpich, or the other way around)
> 
> As already suggested by Ralph, i would start by running a hello world program
> (just print rank and size to confirm it works)
> 
> Cheers,
> 
> Gilles
> 
> 
> On 5/27/2015 8:42 AM, Ralph Castain wrote:
> 
> I don't know enough about HPL to resolve the problem. However, I would 
> suggest that you first just try to run the example programs in the examples 
> directory to ensure you have everything working. If they work, then the 
> problem is clearly in the HPL arena.
> 
>  
> 
> I do note that your image reports that you have an illegal entry in HPL.dat - 
> if the examples work, you might start there.
> 
>  
> 
>  
> 
> On Tue, May 26, 2015 at 12:26 PM, Heerdt, Lanze M.  wrote:
> 
> I realize this may be a bit off topic, but since what I am doing seems to be 
> a pretty commonly done thing I am hoping to find someone who has done it 
> before/can help since I’ve been at my wits end for so long they are calling 
> me Mr. Whittaker.
> 
>  
> 
> I am trying to run HPL on a Raspberry Pi cluster. I used the following guides 
> to get to where I am now:
> 
> http://www.tinkernut.com/2014/04/make-cluster-computer/
> 
> http://www.tinkernut.com/2014/05/make-cluster-computer-part-2/
> 
> https://www.howtoforge.com/tutorial/hpl-high-performance-linpack-benchmark-raspberry-pi/#comments
> 
> and a bit of: 
> https://www.raspberrypi.org/forums/viewtopic.php?p=301458#p301458 when the 
> above guide wasn’t working
> 
>  
> 
> basically when I run: “mpiexec -machinefile ~/machinefile -n 1 xhpl” it works 
> just fine
> 
> but when I run “mpiexec -machinefile ~/machinefile -n 4 xhpl” it errors with 
> the attached image. (if I use “mpirun…” I get the exact same behavior)
> 
> [Note: I HAVE changed the HPL.dat to have “2Ps” and “2Qs” from 1 and 
> 1 for when I try to run it with 4 processes]
> 
>  
> 
> This is for a project of mine which I need done by the end of the week so if 
> you see this after 5/29 thank you but don’t bother responding
> 
>  
> 
> I have hpl-2.1, mpi4py-1.3.1, mpich-3.1, and openmpi-1.8.5 at my disposal
> 
> In the machinefile are the 4 IP addresses of my 4 RPi nodes
> 
> 10.15.106.107
> 
> 10.15.101.29
> 
> 10.15.106.108
> 
> 10.15.101.30
> 
>  
> 
> Any other information you need I can easily get to you so please do not 
> hesitate to ask. I have nothing else to do but try and get this to work :P
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/05/26945.php
> 
>  
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/05/26948.php
>  
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/05/26950.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 

Re: [OMPI users] Running HPL on RPi cluster, seems like MPI is somehow not configured properly since it work with 1 node but not more

2015-05-26 Thread Gilles Gouaillardet

First you can run
mpirun -machinefile ~/machinefile -np 4 -tag-output xhpl

if all tasks report they believe they are task 0, then this is the 
origin of the problem.


then you can run
ldd mpirun
ldd xphl
they should use the same mpi flavor

then
mpirun -machinefile ~/machinefile -np 4 -tag-output ldd xhpl

and make sure xhpl use the very same mpi flavor all the nodes


HPL make process can be error prone, especially if you modify some 
config file / arch in the middle.

a simple option is to rebuild xhpl from scratch and with OpenMPI

you can also post your HPL.dat and i will have a look

Cheers,

Gilles

On 5/27/2015 10:38 AM, Heerdt, Lanze M. wrote:


I have run a hello world program for any number of processes. If I say 
“–n 16” I get 4 responses from each node saying “Hello world! I am 
process (0-15) of 16 on RPI-0(1-4)” so I know the cluster Can work how 
I want it to. I also tested with just normal hostname and I see the 
names of each of the 4 Pis as a response.


As a response to the illegal entry in HPL.dat, that doesn’t really 
make much sense since I run it just fine with p =1 and q =1, it only 
says that when I change p and q to 2, which I know is not an illegal entry


*From:*users [mailto:users-boun...@open-mpi.org] *On Behalf Of *Gilles 
Gouaillardet

*Sent:* Tuesday, May 26, 2015 8:14 PM
*To:* Open MPI Users
*Subject:* Re: [OMPI users] Running HPL on RPi cluster, seems like MPI 
is somehow not configured properly since it work with 1 node but not more


At first glance, it seems all mpi tasks believe they are rank zero and 
comm world size is 1 (!)


Did you compile xhpl with OpenMPI (and not a stub library for serial 
version only) ?
can you make sure there is nothing wrong with your LD_LIBRARY_PATH and 
you do not mix MPI librairies
(e.g. OpenMPI mpirun but xhpl ends up using mpich, or the other way 
around)


As already suggested by Ralph, i would start by running a hello world 
program

(just print rank and size to confirm it works)

Cheers,

Gilles

On 5/27/2015 8:42 AM, Ralph Castain wrote:

I don't know enough about HPL to resolve the problem. However, I
would suggest that you first just try to run the example programs
in the examples directory to ensure you have everything working.
If they work, then the problem is clearly in the HPL arena.

I do note that your image reports that you have an illegal entry
in HPL.dat - if the examples work, you might start there.

On Tue, May 26, 2015 at 12:26 PM, Heerdt, Lanze M.
> wrote:

I realize this may be a bit off topic, but since what I am
doing seems to be a pretty commonly done thing I am hoping to
find someone who has done it before/can help since I’ve been
at my wits end for so long they are calling me Mr. Whittaker.

I am trying to run HPL on a Raspberry Pi cluster. I used the
following guides to get to where I am now:

http://www.tinkernut.com/2014/04/make-cluster-computer/

http://www.tinkernut.com/2014/05/make-cluster-computer-part-2/


https://www.howtoforge.com/tutorial/hpl-high-performance-linpack-benchmark-raspberry-pi/#comments

and a bit of:
https://www.raspberrypi.org/forums/viewtopic.php?p=301458#p301458
when the above guide wasn’t working

basically when I run: “mpiexec -machinefile ~/machinefile -n 1
xhpl” it works just fine

but when I run “mpiexec -machinefile ~/machinefile -n 4 xhpl”
it errors with the attached image. (if I use “mpirun…” I get
the exact same behavior)

[Note: I HAVE changed the HPL.dat to have “2Ps” and “2 
  Qs” from 1 and 1 for when I try to run it with 4 processes]


This is for a project of mine which I need done by the end of
the week so if you see this after 5/29 thank you but don’t
bother responding

I have hpl-2.1, mpi4py-1.3.1, mpich-3.1, and openmpi-1.8.5 at
my disposal

In the machinefile are the 4 IP addresses of my 4 RPi nodes

10.15.106.107

10.15.101.29

10.15.106.108

10.15.101.30

Any other information you need I can easily get to you so
please do not hesitate to ask. I have nothing else to do but
try and get this to work :P


___
users mailing list
us...@open-mpi.org 
Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users

Link to this post:
http://www.open-mpi.org/community/lists/users/2015/05/26945.php




___

users mailing list

us...@open-mpi.org  

Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users

Link to this 

Re: [OMPI users] Running HPL on RPi cluster, seems like MPI is somehow not configured properly since it work with 1 node but not more

2015-05-26 Thread Heerdt, Lanze M.
I have run a hello world program for any number of processes. If I say "-n 16" 
I get 4 responses from each node saying "Hello world! I am process (0-15) of 16 
on RPI-0(1-4)" so I know the cluster Can work how I want it to. I also tested 
with just normal hostname and I see the names of each of the 4 Pis as a 
response.

As a response to the illegal entry in HPL.dat, that doesn't really make much 
sense since I run it just fine with p =1 and q =1, it only says that when I 
change p and q to 2, which I know is not an illegal entry

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gilles Gouaillardet
Sent: Tuesday, May 26, 2015 8:14 PM
To: Open MPI Users
Subject: Re: [OMPI users] Running HPL on RPi cluster, seems like MPI is somehow 
not configured properly since it work with 1 node but not more

At first glance, it seems all mpi tasks believe they are rank zero and comm 
world size is 1 (!)

Did you compile xhpl with OpenMPI (and not a stub library for serial version 
only) ?
can you make sure there is nothing wrong with your LD_LIBRARY_PATH and you do 
not mix MPI librairies
(e.g. OpenMPI mpirun but xhpl ends up using mpich, or the other way around)

As already suggested by Ralph, i would start by running a hello world program
(just print rank and size to confirm it works)

Cheers,

Gilles

On 5/27/2015 8:42 AM, Ralph Castain wrote:
I don't know enough about HPL to resolve the problem. However, I would suggest 
that you first just try to run the example programs in the examples directory 
to ensure you have everything working. If they work, then the problem is 
clearly in the HPL arena.

I do note that your image reports that you have an illegal entry in HPL.dat - 
if the examples work, you might start there.


On Tue, May 26, 2015 at 12:26 PM, Heerdt, Lanze M. 
> wrote:
I realize this may be a bit off topic, but since what I am doing seems to be a 
pretty commonly done thing I am hoping to find someone who has done it 
before/can help since I've been at my wits end for so long they are calling me 
Mr. Whittaker.

I am trying to run HPL on a Raspberry Pi cluster. I used the following guides 
to get to where I am now:
http://www.tinkernut.com/2014/04/make-cluster-computer/
http://www.tinkernut.com/2014/05/make-cluster-computer-part-2/
https://www.howtoforge.com/tutorial/hpl-high-performance-linpack-benchmark-raspberry-pi/#comments
and a bit of: https://www.raspberrypi.org/forums/viewtopic.php?p=301458#p301458 
when the above guide wasn't working

basically when I run: "mpiexec -machinefile ~/machinefile -n 1 xhpl" it works 
just fine
but when I run "mpiexec -machinefile ~/machinefile -n 4 xhpl" it errors with 
the attached image. (if I use "mpirun..." I get the exact same behavior)
[Note: I HAVE changed the HPL.dat to have "2Ps" and "2Qs" from 1 and 1 
for when I try to run it with 4 processes]

This is for a project of mine which I need done by the end of the week so if 
you see this after 5/29 thank you but don't bother responding

I have hpl-2.1, mpi4py-1.3.1, mpich-3.1, and openmpi-1.8.5 at my disposal
In the machinefile are the 4 IP addresses of my 4 RPi nodes
10.15.106.107
10.15.101.29
10.15.106.108
10.15.101.30

Any other information you need I can easily get to you so please do not 
hesitate to ask. I have nothing else to do but try and get this to work :P

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/05/26945.php





___

users mailing list

us...@open-mpi.org

Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users

Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/05/26948.php



Re: [OMPI users] Running HPL on RPi cluster, seems like MPI is somehow not configured properly since it work with 1 node but not more

2015-05-26 Thread Gilles Gouaillardet
At first glance, it seems all mpi tasks believe they are rank zero and 
comm world size is 1 (!)


Did you compile xhpl with OpenMPI (and not a stub library for serial 
version only) ?
can you make sure there is nothing wrong with your LD_LIBRARY_PATH and 
you do not mix MPI librairies

(e.g. OpenMPI mpirun but xhpl ends up using mpich, or the other way around)

As already suggested by Ralph, i would start by running a hello world 
program

(just print rank and size to confirm it works)

Cheers,

Gilles


On 5/27/2015 8:42 AM, Ralph Castain wrote:
I don't know enough about HPL to resolve the problem. However, I would 
suggest that you first just try to run the example programs in the 
examples directory to ensure you have everything working. If they 
work, then the problem is clearly in the HPL arena.


I do note that your image reports that you have an illegal entry in 
HPL.dat - if the examples work, you might start there.



On Tue, May 26, 2015 at 12:26 PM, Heerdt, Lanze M. > wrote:


I realize this may be a bit off topic, but since what I am doing
seems to be a pretty commonly done thing I am hoping to find
someone who has done it before/can help since I’ve been at my wits
end for so long they are calling me Mr. Whittaker.

I am trying to run HPL on a Raspberry Pi cluster. I used the
following guides to get to where I am now:

http://www.tinkernut.com/2014/04/make-cluster-computer/

http://www.tinkernut.com/2014/05/make-cluster-computer-part-2/


https://www.howtoforge.com/tutorial/hpl-high-performance-linpack-benchmark-raspberry-pi/#comments

and a bit of:
https://www.raspberrypi.org/forums/viewtopic.php?p=301458#p301458
when the above guide wasn’t working

basically when I run: “mpiexec -machinefile ~/machinefile -n 1
xhpl” it works just fine

but when I run “mpiexec -machinefile ~/machinefile -n 4 xhpl” it
errors with the attached image. (if I use “mpirun…” I get the
exact same behavior)

[Note: I HAVE changed the HPL.dat to have “2Ps” and “2Qs”
from 1 and 1 for when I try to run it with 4 processes]

This is for a project of mine which I need done by the end of the
week so if you see this after 5/29 thank you but don’t bother
responding

I have hpl-2.1, mpi4py-1.3.1, mpich-3.1, and openmpi-1.8.5 at my
disposal

In the machinefile are the 4 IP addresses of my 4 RPi nodes

10.15.106.107

10.15.101.29

10.15.106.108

10.15.101.30

Any other information you need I can easily get to you so please
do not hesitate to ask. I have nothing else to do but try and get
this to work :P


___
users mailing list
us...@open-mpi.org 
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/05/26945.php




___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/05/26948.php




Re: [OMPI users] Running HPL on RPi cluster, seems like MPI is somehow not configured properly since it work with 1 node but not more

2015-05-26 Thread Ralph Castain
I don't know enough about HPL to resolve the problem. However, I would
suggest that you first just try to run the example programs in the examples
directory to ensure you have everything working. If they work, then the
problem is clearly in the HPL arena.

I do note that your image reports that you have an illegal entry in HPL.dat
- if the examples work, you might start there.


On Tue, May 26, 2015 at 12:26 PM, Heerdt, Lanze M. 
wrote:

>  I realize this may be a bit off topic, but since what I am doing seems
> to be a pretty commonly done thing I am hoping to find someone who has done
> it before/can help since I’ve been at my wits end for so long they are
> calling me Mr. Whittaker.
>
>
>
> I am trying to run HPL on a Raspberry Pi cluster. I used the following
> guides to get to where I am now:
>
> http://www.tinkernut.com/2014/04/make-cluster-computer/
>
> http://www.tinkernut.com/2014/05/make-cluster-computer-part-2/
>
>
> https://www.howtoforge.com/tutorial/hpl-high-performance-linpack-benchmark-raspberry-pi/#comments
>
> and a bit of:
> https://www.raspberrypi.org/forums/viewtopic.php?p=301458#p301458 when
> the above guide wasn’t working
>
>
>
> basically when I run: “mpiexec -machinefile ~/machinefile -n 1 xhpl” it
> works just fine
>
> but when I run “mpiexec -machinefile ~/machinefile -n 4 xhpl” it errors
> with the attached image. (if I use “mpirun…” I get the exact same
> behavior)
>
> [Note: I HAVE changed the HPL.dat to have “2Ps” and “2Qs” from 1
> and 1 for when I try to run it with 4 processes]
>
>
>
> This is for a project of mine which I need done by the end of the week so
> if you see this after 5/29 thank you but don’t bother responding
>
>
>
> I have hpl-2.1, mpi4py-1.3.1, mpich-3.1, and openmpi-1.8.5 at my disposal
>
> In the machinefile are the 4 IP addresses of my 4 RPi nodes
>
> 10.15.106.107
>
> 10.15.101.29
>
> 10.15.106.108
>
> 10.15.101.30
>
>
>
> Any other information you need I can easily get to you so please do not
> hesitate to ask. I have nothing else to do but try and get this to work :P
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/05/26945.php
>