Re: [OMPI users] Newbie Question.

2021-11-01 Thread Jeff Squyres (jsquyres) via users
Gilles' question is correct; the larger point to make here: the openib BTL is 
obsolete and is actively replaced by the UCX PML.  UCX is supported by the 
vendor (NVIDIA); openib is not.

If you're just starting a new project, I would strongly advocate using UCX 
instead of openib.



On Nov 1, 2021, at 8:26 AM, Gilles Gouaillardet via users 
mailto:users@lists.open-mpi.org>> wrote:

Hi Ben,

have you tried

export OMPI_MCA_common_ucx_opal_mem_hooks=1

Cheers,

Gilles

On Mon, Nov 1, 2021 at 9:22 PM bend linux4ms.net via 
users mailto:users@lists.open-mpi.org>> wrote:
Ok, I a am newbie supporting the a HPC project and learning about MPI.

I have the following portion of a shells script:

export OMPI_MCA_btl_openib_allow_ib=1
export OMPI_MCA_btl_openib_if_include="mlx5_0:1"

mpirun -machinefile ${hostlist} \
  --mca opal_common_ucx_opal_mem_hooks 1 \
  -np $NP \
  -N $rpn \
  -vv \

My question is there a way to take the '-mca opal_common_ucx_opal_mem_hooks 1 ' 
and make
it into a environment variable like the others ?

Thanks

Ben Duncan - Business Network Solutions, Inc. 336 Elton Road Jackson MS, 39212
"Never attribute to malice, that which can be adequately explained by stupidity"
- Hanlon's Razor




--
Jeff Squyres
jsquy...@cisco.com





Re: [OMPI users] Newbie Question.

2021-11-01 Thread Gilles Gouaillardet via users
Hi Ben,

have you tried

export OMPI_MCA_common_ucx_opal_mem_hooks=1

Cheers,

Gilles

On Mon, Nov 1, 2021 at 9:22 PM bend linux4ms.net via users <
users@lists.open-mpi.org> wrote:

> Ok, I a am newbie supporting the a HPC project and learning about MPI.
>
> I have the following portion of a shells script:
>
> export OMPI_MCA_btl_openib_allow_ib=1
> export OMPI_MCA_btl_openib_if_include="mlx5_0:1"
>
> mpirun -machinefile ${hostlist} \
>   --mca opal_common_ucx_opal_mem_hooks 1 \
>   -np $NP \
>   -N $rpn \
>   -vv \
>
> My question is there a way to take the '-mca
> opal_common_ucx_opal_mem_hooks 1 ' and make
> it into a environment variable like the others ?
>
> Thanks
>
> Ben Duncan - Business Network Solutions, Inc. 336 Elton Road Jackson MS,
> 39212
> "Never attribute to malice, that which can be adequately explained by
> stupidity"
> - Hanlon's Razor
>
>
>


[OMPI users] Newbie Question.

2021-11-01 Thread bend linux4ms.net via users
Ok, I a am newbie supporting the a HPC project and learning about MPI.

I have the following portion of a shells script:

export OMPI_MCA_btl_openib_allow_ib=1
export OMPI_MCA_btl_openib_if_include="mlx5_0:1"

mpirun -machinefile ${hostlist} \
  --mca opal_common_ucx_opal_mem_hooks 1 \
  -np $NP \
  -N $rpn \
  -vv \

My question is there a way to take the '-mca opal_common_ucx_opal_mem_hooks 1 ' 
and make
it into a environment variable like the others ?

Thanks

Ben Duncan - Business Network Solutions, Inc. 336 Elton Road Jackson MS, 39212
"Never attribute to malice, that which can be adequately explained by stupidity"
- Hanlon's Razor




Re: [OMPI users] Newbie question

2016-04-03 Thread dpchoudh .
Hello Gilles

Thanks again for your inputs. Since that code snippet works for you, I am
now fairly certain that my 'instrumentation' has broken something; sorry
for troubling the whole community while I climb the learning curve. The
netcat script that you mention does work correctly; that and that fact that
the issue happens even when I use the openib BTL makes me convinced it is
not a firewall issue.

Best regards
Durga

We learn from history that we never learn from history.

On Sun, Apr 3, 2016 at 9:05 PM, Gilles Gouaillardet 
wrote:

> your program works fine on my environment.
>
> this is typical of a firewall running on your host(s), can you double
> check that ?
>
> a simple way to do that is to
> 10.10.10.11# nc -l 1024
>
> and on the other node
> echo ahah | nc 10.10.10.11 1024
>
> the first command should print "ahah" unless the host is unreachable
> and/or the tcp connection is denied by the firewall.
>
> Cheers,
>
> Gilles
>
>
>
> On 4/4/2016 9:44 AM, dpchoudh . wrote:
>
> Hello Gilles
>
> Thanks for your help.
>
> My question was more of a sanity check on myself. That little program I
> sent looked correct to me; do you see anything wrong with it?
>
> What I am running on my setup is an instrumented OMPI stack, taken from
> git HEAD, in an attempt to understand how some of the internals work. If
> you think the code is correct, it is quite possible that one of those
> 'instrumentations' is causing this.
>
> And BTW, adding -mca pml ob1 makes the code hang at MPI_Send (as opposed
> to MPI_Recv())
>
> [smallMPI:51673] mca: bml: Using tcp btl for send to [[51894,1],1] on node
> 10.10.10.11
> [smallMPI:51673] mca: bml: Using tcp btl for send to [[51894,1],1] on node
> 10.10.10.11
> [smallMPI:51673] mca: bml: Using tcp btl for send to [[51894,1],1] on node
> 10.10.10.11
> [smallMPI:51673] mca: bml: Using tcp btl for send to [[51894,1],1] on node
> 10.10.10.11
> [smallMPI:51673] btl: tcp: attempting to connect() to [[51894,1],1]
> address 10.10.10.11 on port 1024 <--- Hangs here
>
> But 10.10.10.11 is pingable:
> [durga@smallMPI ~]$ ping bigMPI
> PING bigMPI (10.10.10.11) 56(84) bytes of data.
> 64 bytes from bigMPI (10.10.10.11): icmp_seq=1 ttl=64 time=0.247 ms
>
>
> We learn from history that we never learn from history.
>
> On Sun, Apr 3, 2016 at 8:04 PM, Gilles Gouaillardet 
> wrote:
>
>> Hi,
>>
>> per a previous message, can you give a try to
>> mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp --mca pml ob1
>> ./mpitest
>>
>> if it still hangs, the issue could be OpenMPI think some subnets are
>> reachable but they are not.
>>
>> for diagnostic :
>> mpirun --mca btl_base_verbose 100 ...
>>
>> you can explicitly include/exclude subnets with
>> --mca btl_tcp_if_include xxx
>> or
>> --mca btl_tcp_if_exclude yyy
>>
>> for example,
>> mpirun --mca btl_btp_if_include 192.168.0.0/24 -np 2 -hostfile
>> ~/hostfile --mca btl self,tcp --mca pml ob1 ./mpitest
>> should do the trick
>>
>> Cheers,
>>
>> Gilles
>>
>>
>>
>>
>> On 4/4/2016 8:32 AM, dpchoudh . wrote:
>>
>> Hello all
>>
>> I don't mean to be competing for the 'silliest question of the year
>> award', but I can't figure this out on my own:
>>
>> My 'cluster' has 2 machines, bigMPI and smallMPI. They are connected via
>> several (types of) networks and the connectivity is OK.
>>
>> In this setup, the following program hangs after printing
>>
>> Hello world from processor smallMPI, rank 0 out of 2 processors
>> Hello world from processor bigMPI, rank 1 out of 2 processors
>> smallMPI sent haha!
>>
>>
>> Obviously it is hanging at MPI_Recv(). But why? My command line is as
>> follows, but this happens if I try openib BTL (instead of TCP) as well.
>>
>> mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp ./mpitest
>>
>> It must be something *really* trivial, but I am drawing a blank right now.
>>
>> Please help!
>>
>> #include 
>> #include 
>> #include 
>>
>> int main(int argc, char** argv)
>> {
>> int world_size, world_rank, name_len;
>> char hostname[MPI_MAX_PROCESSOR_NAME], buf[8];
>>
>> MPI_Init(, );
>> MPI_Comm_size(MPI_COMM_WORLD, _size);
>> MPI_Comm_rank(MPI_COMM_WORLD, _rank);
>> MPI_Get_processor_name(hostname, _len);
>> printf("Hello world from processor %s, rank %d out of %d
>> processors\n", hostname, world_rank, world_size);
>> if (world_rank == 1)
>> {
>> MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
>> printf("%s received %s\n", hostname, buf);
>> }
>> else
>> {
>> strcpy(buf, "haha!");
>> MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD);
>> printf("%s sent %s\n", hostname, buf);
>> }
>> MPI_Barrier(MPI_COMM_WORLD);
>> MPI_Finalize();
>> return 0;
>> }
>>
>>
>>
>> We learn from history that we never learn from history.
>>
>>
>> ___
>> users mailing listus...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link 

Re: [OMPI users] Newbie question

2016-04-03 Thread Gilles Gouaillardet

your program works fine on my environment.

this is typical of a firewall running on your host(s), can you double 
check that ?


a simple way to do that is to
10.10.10.11# nc -l 1024

and on the other node
echo ahah | nc 10.10.10.11 1024

the first command should print "ahah" unless the host is unreachable 
and/or the tcp connection is denied by the firewall.


Cheers,

Gilles


On 4/4/2016 9:44 AM, dpchoudh . wrote:

Hello Gilles

Thanks for your help.

My question was more of a sanity check on myself. That little program 
I sent looked correct to me; do you see anything wrong with it?


What I am running on my setup is an instrumented OMPI stack, taken 
from git HEAD, in an attempt to understand how some of the internals 
work. If you think the code is correct, it is quite possible that one 
of those 'instrumentations' is causing this.


And BTW, adding -mca pml ob1 makes the code hang at MPI_Send (as 
opposed to MPI_Recv())


[smallMPI:51673] mca: bml: Using tcp btl for send to [[51894,1],1] on 
node 10.10.10.11
[smallMPI:51673] mca: bml: Using tcp btl for send to [[51894,1],1] on 
node 10.10.10.11
[smallMPI:51673] mca: bml: Using tcp btl for send to [[51894,1],1] on 
node 10.10.10.11
[smallMPI:51673] mca: bml: Using tcp btl for send to [[51894,1],1] on 
node 10.10.10.11
[smallMPI:51673] btl: tcp: attempting to connect() to [[51894,1],1] 
address 10.10.10.11 on port 1024 <--- Hangs here


But 10.10.10.11 is pingable:
[durga@smallMPI ~]$ ping bigMPI
PING bigMPI (10.10.10.11) 56(84) bytes of data.
64 bytes from bigMPI (10.10.10.11): icmp_seq=1 ttl=64 time=0.247 ms


We learn from history that we never learn from history.

On Sun, Apr 3, 2016 at 8:04 PM, Gilles Gouaillardet > wrote:


Hi,

per a previous message, can you give a try to
mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp --mca pml ob1
./mpitest

if it still hangs, the issue could be OpenMPI think some subnets
are reachable but they are not.

for diagnostic :
mpirun --mca btl_base_verbose 100 ...

you can explicitly include/exclude subnets with
--mca btl_tcp_if_include xxx
or
--mca btl_tcp_if_exclude yyy

for example,
mpirun --mca btl_btp_if_include 192.168.0.0/24
 -np 2 -hostfile ~/hostfile --mca btl
self,tcp --mca pml ob1 ./mpitest
should do the trick

Cheers,

Gilles




On 4/4/2016 8:32 AM, dpchoudh . wrote:

Hello all

I don't mean to be competing for the 'silliest question of the
year award', but I can't figure this out on my own:

My 'cluster' has 2 machines, bigMPI and smallMPI. They are
connected via several (types of) networks and the connectivity is OK.

In this setup, the following program hangs after printing

Hello world from processor smallMPI, rank 0 out of 2 processors
Hello world from processor bigMPI, rank 1 out of 2 processors
smallMPI sent haha!


Obviously it is hanging at MPI_Recv(). But why? My command line
is as follows, but this happens if I try openib BTL (instead of
TCP) as well.

mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp ./mpitest

It must be something *really* trivial, but I am drawing a blank
right now.

Please help!

#include 
#include 
#include 

int main(int argc, char** argv)
{
int world_size, world_rank, name_len;
char hostname[MPI_MAX_PROCESSOR_NAME], buf[8];

MPI_Init(, );
MPI_Comm_size(MPI_COMM_WORLD, _size);
MPI_Comm_rank(MPI_COMM_WORLD, _rank);
MPI_Get_processor_name(hostname, _len);
printf("Hello world from processor %s, rank %d out of %d
processors\n", hostname, world_rank, world_size);
if (world_rank == 1)
{
MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
printf("%s received %s\n", hostname, buf);
}
else
{
strcpy(buf, "haha!");
MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD);
printf("%s sent %s\n", hostname, buf);
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}



We learn from history that we never learn from history.


___
users mailing list
us...@open-mpi.org 
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2016/04/28876.php



___
users mailing list
us...@open-mpi.org 
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/04/28877.php




___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to 

Re: [OMPI users] Newbie question

2016-04-03 Thread dpchoudh .
Hello Gilles

Thanks for your help.

My question was more of a sanity check on myself. That little program I
sent looked correct to me; do you see anything wrong with it?

What I am running on my setup is an instrumented OMPI stack, taken from git
HEAD, in an attempt to understand how some of the internals work. If you
think the code is correct, it is quite possible that one of those
'instrumentations' is causing this.

And BTW, adding -mca pml ob1 makes the code hang at MPI_Send (as opposed to
MPI_Recv())

[smallMPI:51673] mca: bml: Using tcp btl for send to [[51894,1],1] on node
10.10.10.11
[smallMPI:51673] mca: bml: Using tcp btl for send to [[51894,1],1] on node
10.10.10.11
[smallMPI:51673] mca: bml: Using tcp btl for send to [[51894,1],1] on node
10.10.10.11
[smallMPI:51673] mca: bml: Using tcp btl for send to [[51894,1],1] on node
10.10.10.11
[smallMPI:51673] btl: tcp: attempting to connect() to [[51894,1],1] address
10.10.10.11 on port 1024 <--- Hangs here

But 10.10.10.11 is pingable:
[durga@smallMPI ~]$ ping bigMPI
PING bigMPI (10.10.10.11) 56(84) bytes of data.
64 bytes from bigMPI (10.10.10.11): icmp_seq=1 ttl=64 time=0.247 ms


We learn from history that we never learn from history.

On Sun, Apr 3, 2016 at 8:04 PM, Gilles Gouaillardet 
wrote:

> Hi,
>
> per a previous message, can you give a try to
> mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp --mca pml ob1 ./mpitest
>
> if it still hangs, the issue could be OpenMPI think some subnets are
> reachable but they are not.
>
> for diagnostic :
> mpirun --mca btl_base_verbose 100 ...
>
> you can explicitly include/exclude subnets with
> --mca btl_tcp_if_include xxx
> or
> --mca btl_tcp_if_exclude yyy
>
> for example,
> mpirun --mca btl_btp_if_include 192.168.0.0/24 -np 2 -hostfile ~/hostfile
> --mca btl self,tcp --mca pml ob1 ./mpitest
> should do the trick
>
> Cheers,
>
> Gilles
>
>
>
>
> On 4/4/2016 8:32 AM, dpchoudh . wrote:
>
> Hello all
>
> I don't mean to be competing for the 'silliest question of the year
> award', but I can't figure this out on my own:
>
> My 'cluster' has 2 machines, bigMPI and smallMPI. They are connected via
> several (types of) networks and the connectivity is OK.
>
> In this setup, the following program hangs after printing
>
> Hello world from processor smallMPI, rank 0 out of 2 processors
> Hello world from processor bigMPI, rank 1 out of 2 processors
> smallMPI sent haha!
>
>
> Obviously it is hanging at MPI_Recv(). But why? My command line is as
> follows, but this happens if I try openib BTL (instead of TCP) as well.
>
> mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp ./mpitest
>
> It must be something *really* trivial, but I am drawing a blank right now.
>
> Please help!
>
> #include 
> #include 
> #include 
>
> int main(int argc, char** argv)
> {
> int world_size, world_rank, name_len;
> char hostname[MPI_MAX_PROCESSOR_NAME], buf[8];
>
> MPI_Init(, );
> MPI_Comm_size(MPI_COMM_WORLD, _size);
> MPI_Comm_rank(MPI_COMM_WORLD, _rank);
> MPI_Get_processor_name(hostname, _len);
> printf("Hello world from processor %s, rank %d out of %d
> processors\n", hostname, world_rank, world_size);
> if (world_rank == 1)
> {
> MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
> printf("%s received %s\n", hostname, buf);
> }
> else
> {
> strcpy(buf, "haha!");
> MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD);
> printf("%s sent %s\n", hostname, buf);
> }
> MPI_Barrier(MPI_COMM_WORLD);
> MPI_Finalize();
> return 0;
> }
>
>
>
> We learn from history that we never learn from history.
>
>
> ___
> users mailing listus...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/04/28876.php
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/04/28877.php
>


Re: [OMPI users] Newbie question

2016-04-03 Thread Gilles Gouaillardet

Hi,

per a previous message, can you give a try to
mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp --mca pml ob1 ./mpitest

if it still hangs, the issue could be OpenMPI think some subnets are 
reachable but they are not.


for diagnostic :
mpirun --mca btl_base_verbose 100 ...

you can explicitly include/exclude subnets with
--mca btl_tcp_if_include xxx
or
--mca btl_tcp_if_exclude yyy

for example,
mpirun --mca btl_btp_if_include 192.168.0.0/24 -np 2 -hostfile 
~/hostfile --mca btl self,tcp --mca pml ob1 ./mpitest

should do the trick

Cheers,

Gilles



On 4/4/2016 8:32 AM, dpchoudh . wrote:

Hello all

I don't mean to be competing for the 'silliest question of the year 
award', but I can't figure this out on my own:


My 'cluster' has 2 machines, bigMPI and smallMPI. They are connected 
via several (types of) networks and the connectivity is OK.


In this setup, the following program hangs after printing

Hello world from processor smallMPI, rank 0 out of 2 processors
Hello world from processor bigMPI, rank 1 out of 2 processors
smallMPI sent haha!


Obviously it is hanging at MPI_Recv(). But why? My command line is as 
follows, but this happens if I try openib BTL (instead of TCP) as well.


mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp ./mpitest

It must be something *really* trivial, but I am drawing a blank right now.

Please help!

#include 
#include 
#include 

int main(int argc, char** argv)
{
int world_size, world_rank, name_len;
char hostname[MPI_MAX_PROCESSOR_NAME], buf[8];

MPI_Init(, );
MPI_Comm_size(MPI_COMM_WORLD, _size);
MPI_Comm_rank(MPI_COMM_WORLD, _rank);
MPI_Get_processor_name(hostname, _len);
printf("Hello world from processor %s, rank %d out of %d 
processors\n", hostname, world_rank, world_size);

if (world_rank == 1)
{
MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("%s received %s\n", hostname, buf);
}
else
{
strcpy(buf, "haha!");
MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD);
printf("%s sent %s\n", hostname, buf);
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}



We learn from history that we never learn from history.


___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/04/28876.php




[OMPI users] Newbie question

2016-04-03 Thread dpchoudh .
Hello all

I don't mean to be competing for the 'silliest question of the year award',
but I can't figure this out on my own:

My 'cluster' has 2 machines, bigMPI and smallMPI. They are connected via
several (types of) networks and the connectivity is OK.

In this setup, the following program hangs after printing

Hello world from processor smallMPI, rank 0 out of 2 processors
Hello world from processor bigMPI, rank 1 out of 2 processors
smallMPI sent haha!


Obviously it is hanging at MPI_Recv(). But why? My command line is as
follows, but this happens if I try openib BTL (instead of TCP) as well.

mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp ./mpitest

It must be something *really* trivial, but I am drawing a blank right now.

Please help!

#include 
#include 
#include 

int main(int argc, char** argv)
{
int world_size, world_rank, name_len;
char hostname[MPI_MAX_PROCESSOR_NAME], buf[8];

MPI_Init(, );
MPI_Comm_size(MPI_COMM_WORLD, _size);
MPI_Comm_rank(MPI_COMM_WORLD, _rank);
MPI_Get_processor_name(hostname, _len);
printf("Hello world from processor %s, rank %d out of %d processors\n",
hostname, world_rank, world_size);
if (world_rank == 1)
{
MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("%s received %s\n", hostname, buf);
}
else
{
strcpy(buf, "haha!");
MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD);
printf("%s sent %s\n", hostname, buf);
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}



We learn from history that we never learn from history.


Re: [OMPI users] Newbie question?

2012-09-16 Thread Jingcha Joba

> > On a side note, do you have an RDMA supporting device ( 
> > Infiniband/RoCE/iWarp) ?
> 
> I'm just an engineer trying to get something to work on an AMD dual core 
> notebook for the powers-that-be at a small engineering concern (all MEs) in 
> Huntsville, AL - i.e., NASA work.
> 
If on a unix box,
 lspci | grep -i infiniband
 should tell you if u have an infiniband device 
lspci | grep -i eth
Should list list of all eth devices. Google them to see if one is them is an 
iWarp or RoCE device.

--
Sent from my iPhone 




Re: [OMPI users] Newbie question?

2012-09-16 Thread John Chludzinski
> On a side note, do you have an RDMA supporting device (
Infiniband/RoCE/iWarp) ?

I'm just an engineer trying to get something to work on an AMD dual core
notebook for the powers-that-be at a small engineering concern (all MEs) in
Huntsville, AL - i.e., NASA work.

---John

On Sun, Sep 16, 2012 at 3:21 AM, Jingcha Joba  wrote:

> John,
>
> BTL refers to Byte Transfer Layer, a framework to send/receive point to
> point messages on different network. It has several components
> (implementations) like openib, tcp, mx, shared mem, etc.
>
> ^openib means "not" to use openib component for p2p messages.
>
> On a side note, do you have an RDMA supporting device (
> Infiniband/RoCE/iWarp) ? If so, is OFED installed correctly and is running?
> If you do not have, is the OFED running, which it should not, otherwise ?
>
> The message that you are getting could be because of this. As a
> consequence, if you have a RDMA supported device, you might be getting poor
> performance.
>
> A wealth of information is available in the FAQ section regarding these
> things.
>
> --
> Sent from my iPhone
>
> On Sep 15, 2012, at 9:49 PM, John Chludzinski 
> wrote:
>
> BTW, I looked up the -mca option:
>
>  -mca |--mca  
>   Pass context-specific MCA parameters; they are
>   considered global if --gmca is not used and only
>   one context is specified (arg0 is the parameter
>   name; arg1 is the parameter value)
>
> Could you explain the args: btl and ^openib ?
>
> ---John
>
>
> On Sun, Sep 16, 2012 at 12:26 AM, John Chludzinski <
> john.chludzin...@gmail.com> wrote:
>
>> BINGO!  That did it.  Thanks.  ---John
>>
>>
>> On Sat, Sep 15, 2012 at 9:32 PM, Ralph Castain  wrote:
>>
>>> No - the mca param has to be specified *before* your executable
>>>
>>> mpiexec -mca btl ^openib -n 4 ./a.out
>>>
>>> Also, note the space between "btl" and "^openib"
>>>
>>>
>>> On Sep 15, 2012, at 5:45 PM, John Chludzinski <
>>> john.chludzin...@gmail.com> wrote:
>>>
>>> Is this what you intended(?):
>>>
>>> *$ mpiexec -n 4 ./a.out -mca btl^openib
>>>
>>> *librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>>
>>> --
>>> [[5991,1],0]: A high-performance Open MPI point-to-point messaging module
>>> was unable to find any relevant network interfaces:
>>>
>>> Module: OpenFabrics (openib)
>>>   Host: elzbieta
>>>
>>> Another transport will be used instead, although this may result in
>>> lower performance.
>>>
>>> --
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>>  rank=1  Results:5.000   6.000
>>> 7.000   8.000
>>>  rank=0  Results:1.000   2.000
>>> 3.000   4.000
>>>  rank=2  Results:9.000   10.00
>>> 11.00   12.00
>>>  rank=3  Results:13.00   14.00
>>> 15.00   16.00
>>> [elzbieta:02374] 3 more processes have sent help message
>>> help-mpi-btl-base.txt / btl:no-nics
>>> [elzbieta:02374] Set MCA parameter "orte_base_help_aggregate" to 0 to
>>> see all help / error messages
>>>
>>>
>>> On Sat, Sep 15, 2012 at 8:22 PM, Ralph Castain  wrote:
>>>
 Try adding "-mca btl ^openib" to your cmd line and see if that cleans
 it up.


 On Sep 15, 2012, at 12:44 PM, John Chludzinski <
 john.chludzin...@gmail.com> wrote:

 There was a bug in the code.  So now I get this, which is correct but
 how do I get rid of all these ABI, CMA, etc. messages?

 $ mpiexec -n 4 ./a.out
 librdmacm: couldn't read ABI version.
 librdmacm: couldn't read ABI version.
 librdmacm: assuming: 4
 CMA: unable to get RDMA device list
 librdmacm: assuming: 4
 CMA: unable to get RDMA device list
 CMA: unable to get RDMA device list
 librdmacm: couldn't read ABI version.
 librdmacm: assuming: 4
 librdmacm: couldn't read ABI version.
 librdmacm: assuming: 4
 CMA: unable to get RDMA device list

 --
 [[6110,1],1]: A high-performance Open MPI point-to-point messaging
 module
 was unable to find any relevant network interfaces:

 Module: OpenFabrics (openib)
   Host: elzbieta

 Another transport will be used instead, although this may result in
 lower performance.

 

Re: [OMPI users] Newbie question?

2012-09-16 Thread John Chludzinski
Thanks, I'll go to the FAQs.  ---John

On Sun, Sep 16, 2012 at 3:21 AM, Jingcha Joba  wrote:

> John,
>
> BTL refers to Byte Transfer Layer, a framework to send/receive point to
> point messages on different network. It has several components
> (implementations) like openib, tcp, mx, shared mem, etc.
>
> ^openib means "not" to use openib component for p2p messages.
>
> On a side note, do you have an RDMA supporting device (
> Infiniband/RoCE/iWarp) ? If so, is OFED installed correctly and is running?
> If you do not have, is the OFED running, which it should not, otherwise ?
>
> The message that you are getting could be because of this. As a
> consequence, if you have a RDMA supported device, you might be getting poor
> performance.
>
> A wealth of information is available in the FAQ section regarding these
> things.
>
> --
> Sent from my iPhone
>
> On Sep 15, 2012, at 9:49 PM, John Chludzinski 
> wrote:
>
> BTW, I looked up the -mca option:
>
>  -mca |--mca  
>   Pass context-specific MCA parameters; they are
>   considered global if --gmca is not used and only
>   one context is specified (arg0 is the parameter
>   name; arg1 is the parameter value)
>
> Could you explain the args: btl and ^openib ?
>
> ---John
>
>
> On Sun, Sep 16, 2012 at 12:26 AM, John Chludzinski <
> john.chludzin...@gmail.com> wrote:
>
>> BINGO!  That did it.  Thanks.  ---John
>>
>>
>> On Sat, Sep 15, 2012 at 9:32 PM, Ralph Castain  wrote:
>>
>>> No - the mca param has to be specified *before* your executable
>>>
>>> mpiexec -mca btl ^openib -n 4 ./a.out
>>>
>>> Also, note the space between "btl" and "^openib"
>>>
>>>
>>> On Sep 15, 2012, at 5:45 PM, John Chludzinski <
>>> john.chludzin...@gmail.com> wrote:
>>>
>>> Is this what you intended(?):
>>>
>>> *$ mpiexec -n 4 ./a.out -mca btl^openib
>>>
>>> *librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>>
>>> --
>>> [[5991,1],0]: A high-performance Open MPI point-to-point messaging module
>>> was unable to find any relevant network interfaces:
>>>
>>> Module: OpenFabrics (openib)
>>>   Host: elzbieta
>>>
>>> Another transport will be used instead, although this may result in
>>> lower performance.
>>>
>>> --
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>>  rank=1  Results:5.000   6.000
>>> 7.000   8.000
>>>  rank=0  Results:1.000   2.000
>>> 3.000   4.000
>>>  rank=2  Results:9.000   10.00
>>> 11.00   12.00
>>>  rank=3  Results:13.00   14.00
>>> 15.00   16.00
>>> [elzbieta:02374] 3 more processes have sent help message
>>> help-mpi-btl-base.txt / btl:no-nics
>>> [elzbieta:02374] Set MCA parameter "orte_base_help_aggregate" to 0 to
>>> see all help / error messages
>>>
>>>
>>> On Sat, Sep 15, 2012 at 8:22 PM, Ralph Castain  wrote:
>>>
 Try adding "-mca btl ^openib" to your cmd line and see if that cleans
 it up.


 On Sep 15, 2012, at 12:44 PM, John Chludzinski <
 john.chludzin...@gmail.com> wrote:

 There was a bug in the code.  So now I get this, which is correct but
 how do I get rid of all these ABI, CMA, etc. messages?

 $ mpiexec -n 4 ./a.out
 librdmacm: couldn't read ABI version.
 librdmacm: couldn't read ABI version.
 librdmacm: assuming: 4
 CMA: unable to get RDMA device list
 librdmacm: assuming: 4
 CMA: unable to get RDMA device list
 CMA: unable to get RDMA device list
 librdmacm: couldn't read ABI version.
 librdmacm: assuming: 4
 librdmacm: couldn't read ABI version.
 librdmacm: assuming: 4
 CMA: unable to get RDMA device list

 --
 [[6110,1],1]: A high-performance Open MPI point-to-point messaging
 module
 was unable to find any relevant network interfaces:

 Module: OpenFabrics (openib)
   Host: elzbieta

 Another transport will be used instead, although this may result in
 lower performance.

 --
  rank=1  Results:5.000   6.000
 7.000   8.000
  rank=2  Results:9.000   10.00
 11.00   12.00
  rank=0  Results:

Re: [OMPI users] Newbie question?

2012-09-16 Thread Jingcha Joba
John,

BTL refers to Byte Transfer Layer, a framework to send/receive point to point 
messages on different network. It has several components (implementations) like 
openib, tcp, mx, shared mem, etc.

^openib means "not" to use openib component for p2p messages.

On a side note, do you have an RDMA supporting device ( Infiniband/RoCE/iWarp) 
? If so, is OFED installed correctly and is running?
If you do not have, is the OFED running, which it should not, otherwise ?

The message that you are getting could be because of this. As a consequence, if 
you have a RDMA supported device, you might be getting poor performance.
 
A wealth of information is available in the FAQ section regarding these things.

--
Sent from my iPhone

On Sep 15, 2012, at 9:49 PM, John Chludzinski  
wrote:

> BTW, I looked up the -mca option:
> 
>  -mca |--mca
>   Pass context-specific MCA parameters; they are
>   considered global if --gmca is not used and only
>   one context is specified (arg0 is the parameter
>   name; arg1 is the parameter value)
> 
> Could you explain the args: btl and ^openib ?
> 
> ---John
> 
> 
> On Sun, Sep 16, 2012 at 12:26 AM, John Chludzinski 
>  wrote:
> BINGO!  That did it.  Thanks.  ---John
> 
> 
> On Sat, Sep 15, 2012 at 9:32 PM, Ralph Castain  wrote:
> No - the mca param has to be specified *before* your executable
> 
> mpiexec -mca btl ^openib -n 4 ./a.out
> 
> Also, note the space between "btl" and "^openib"
> 
> 
> On Sep 15, 2012, at 5:45 PM, John Chludzinski  
> wrote:
> 
>> Is this what you intended(?):
>> 
>> $ mpiexec -n 4 ./a.out -mca btl^openib
>> 
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> --
>> [[5991,1],0]: A high-performance Open MPI point-to-point messaging module
>> was unable to find any relevant network interfaces:
>> 
>> Module: OpenFabrics (openib)
>>   Host: elzbieta
>> 
>> Another transport will be used instead, although this may result in
>> lower performance.
>> --
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>>  rank=1  Results:5.000   6.000   7.000   
>> 8.000
>>  rank=0  Results:1.000   2.000   3.000   
>> 4.000
>>  rank=2  Results:9.000   10.00   11.00   
>> 12.00
>>  rank=3  Results:13.00   14.00   15.00   
>> 16.00
>> [elzbieta:02374] 3 more processes have sent help message 
>> help-mpi-btl-base.txt / btl:no-nics
>> [elzbieta:02374] Set MCA parameter "orte_base_help_aggregate" to 0 to see 
>> all help / error messages
>> 
>> 
>> On Sat, Sep 15, 2012 at 8:22 PM, Ralph Castain  wrote:
>> Try adding "-mca btl ^openib" to your cmd line and see if that cleans it up.
>> 
>> 
>> On Sep 15, 2012, at 12:44 PM, John Chludzinski  
>> wrote:
>> 
>>> There was a bug in the code.  So now I get this, which is correct but how 
>>> do I get rid of all these ABI, CMA, etc. messages?
>>> 
>>> $ mpiexec -n 4 ./a.out 
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>> CMA: unable to get RDMA device list
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>> --
>>> [[6110,1],1]: A high-performance Open MPI point-to-point messaging module
>>> was unable to find any relevant network interfaces:
>>> 
>>> Module: OpenFabrics (openib)
>>>   Host: elzbieta
>>> 
>>> Another transport will be used instead, although this may result in
>>> lower performance.
>>> --
>>>  rank=1  Results:5.000   6.000   7.000  
>>>  8.000
>>>  rank=2  Results:9.000   10.00   11.00  
>>>  12.00
>>>  rank=0  Results:1.000   2.000   3.000  
>>>  4.000
>>>  rank=3  Results:13.00   14.00   15.00  
>>>  16.00
>>> [elzbieta:02559] 3 more 

Re: [OMPI users] Newbie question?

2012-09-16 Thread John Chludzinski
BTW, I looked up the -mca option:

 -mca |--mca  
  Pass context-specific MCA parameters; they are
  considered global if --gmca is not used and only
  one context is specified (arg0 is the parameter
  name; arg1 is the parameter value)

Could you explain the args: btl and ^openib ?

---John


On Sun, Sep 16, 2012 at 12:26 AM, John Chludzinski <
john.chludzin...@gmail.com> wrote:

> BINGO!  That did it.  Thanks.  ---John
>
>
> On Sat, Sep 15, 2012 at 9:32 PM, Ralph Castain  wrote:
>
>> No - the mca param has to be specified *before* your executable
>>
>> mpiexec -mca btl ^openib -n 4 ./a.out
>>
>> Also, note the space between "btl" and "^openib"
>>
>>
>> On Sep 15, 2012, at 5:45 PM, John Chludzinski 
>> wrote:
>>
>> Is this what you intended(?):
>>
>> *$ mpiexec -n 4 ./a.out -mca btl^openib
>>
>> *librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> --
>> [[5991,1],0]: A high-performance Open MPI point-to-point messaging module
>> was unable to find any relevant network interfaces:
>>
>> Module: OpenFabrics (openib)
>>   Host: elzbieta
>>
>> Another transport will be used instead, although this may result in
>> lower performance.
>> --
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>>  rank=1  Results:5.000   6.000
>> 7.000   8.000
>>  rank=0  Results:1.000   2.000
>> 3.000   4.000
>>  rank=2  Results:9.000   10.00
>> 11.00   12.00
>>  rank=3  Results:13.00   14.00
>> 15.00   16.00
>> [elzbieta:02374] 3 more processes have sent help message
>> help-mpi-btl-base.txt / btl:no-nics
>> [elzbieta:02374] Set MCA parameter "orte_base_help_aggregate" to 0 to see
>> all help / error messages
>>
>>
>> On Sat, Sep 15, 2012 at 8:22 PM, Ralph Castain  wrote:
>>
>>> Try adding "-mca btl ^openib" to your cmd line and see if that cleans it
>>> up.
>>>
>>>
>>> On Sep 15, 2012, at 12:44 PM, John Chludzinski <
>>> john.chludzin...@gmail.com> wrote:
>>>
>>> There was a bug in the code.  So now I get this, which is correct but
>>> how do I get rid of all these ABI, CMA, etc. messages?
>>>
>>> $ mpiexec -n 4 ./a.out
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>> CMA: unable to get RDMA device list
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>>
>>> --
>>> [[6110,1],1]: A high-performance Open MPI point-to-point messaging module
>>> was unable to find any relevant network interfaces:
>>>
>>> Module: OpenFabrics (openib)
>>>   Host: elzbieta
>>>
>>> Another transport will be used instead, although this may result in
>>> lower performance.
>>>
>>> --
>>>  rank=1  Results:5.000   6.000
>>> 7.000   8.000
>>>  rank=2  Results:9.000   10.00
>>> 11.00   12.00
>>>  rank=0  Results:1.000   2.000
>>> 3.000   4.000
>>>  rank=3  Results:13.00   14.00
>>> 15.00   16.00
>>> [elzbieta:02559] 3 more processes have sent help message
>>> help-mpi-btl-base.txt / btl:no-nics
>>> [elzbieta:02559] Set MCA parameter "orte_base_help_aggregate" to 0 to
>>> see all help / error messages
>>>
>>>
>>> On Sat, Sep 15, 2012 at 3:34 PM, John Chludzinski <
>>> john.chludzin...@gmail.com> wrote:
>>>
 BTW, here the example code:

 program scatter
 include 'mpif.h'

 integer, parameter :: SIZE=4
 integer :: numtasks, rank, sendcount, recvcount, source, ierr
 real :: sendbuf(SIZE,SIZE), recvbuf(SIZE)

 !  Fortran stores this array in column major order, so the
 !  scatter will actually scatter columns, not rows.
 data sendbuf /1.0, 2.0, 3.0, 4.0, &
 5.0, 6.0, 7.0, 8.0, &
 9.0, 10.0, 11.0, 12.0, &
 13.0, 14.0, 15.0, 16.0 /

 call MPI_INIT(ierr)
 call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
 call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)

 if 

Re: [OMPI users] Newbie question?

2012-09-16 Thread John Chludzinski
BINGO!  That did it.  Thanks.  ---John

On Sat, Sep 15, 2012 at 9:32 PM, Ralph Castain  wrote:

> No - the mca param has to be specified *before* your executable
>
> mpiexec -mca btl ^openib -n 4 ./a.out
>
> Also, note the space between "btl" and "^openib"
>
>
> On Sep 15, 2012, at 5:45 PM, John Chludzinski 
> wrote:
>
> Is this what you intended(?):
>
> *$ mpiexec -n 4 ./a.out -mca btl^openib
>
> *librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> --
> [[5991,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
>
> Module: OpenFabrics (openib)
>   Host: elzbieta
>
> Another transport will be used instead, although this may result in
> lower performance.
> --
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
>  rank=1  Results:5.000   6.000
> 7.000   8.000
>  rank=0  Results:1.000   2.000
> 3.000   4.000
>  rank=2  Results:9.000   10.00
> 11.00   12.00
>  rank=3  Results:13.00   14.00
> 15.00   16.00
> [elzbieta:02374] 3 more processes have sent help message
> help-mpi-btl-base.txt / btl:no-nics
> [elzbieta:02374] Set MCA parameter "orte_base_help_aggregate" to 0 to see
> all help / error messages
>
>
> On Sat, Sep 15, 2012 at 8:22 PM, Ralph Castain  wrote:
>
>> Try adding "-mca btl ^openib" to your cmd line and see if that cleans it
>> up.
>>
>>
>> On Sep 15, 2012, at 12:44 PM, John Chludzinski <
>> john.chludzin...@gmail.com> wrote:
>>
>> There was a bug in the code.  So now I get this, which is correct but how
>> do I get rid of all these ABI, CMA, etc. messages?
>>
>> $ mpiexec -n 4 ./a.out
>> librdmacm: couldn't read ABI version.
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> CMA: unable to get RDMA device list
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> --
>> [[6110,1],1]: A high-performance Open MPI point-to-point messaging module
>> was unable to find any relevant network interfaces:
>>
>> Module: OpenFabrics (openib)
>>   Host: elzbieta
>>
>> Another transport will be used instead, although this may result in
>> lower performance.
>> --
>>  rank=1  Results:5.000   6.000
>> 7.000   8.000
>>  rank=2  Results:9.000   10.00
>> 11.00   12.00
>>  rank=0  Results:1.000   2.000
>> 3.000   4.000
>>  rank=3  Results:13.00   14.00
>> 15.00   16.00
>> [elzbieta:02559] 3 more processes have sent help message
>> help-mpi-btl-base.txt / btl:no-nics
>> [elzbieta:02559] Set MCA parameter "orte_base_help_aggregate" to 0 to see
>> all help / error messages
>>
>>
>> On Sat, Sep 15, 2012 at 3:34 PM, John Chludzinski <
>> john.chludzin...@gmail.com> wrote:
>>
>>> BTW, here the example code:
>>>
>>> program scatter
>>> include 'mpif.h'
>>>
>>> integer, parameter :: SIZE=4
>>> integer :: numtasks, rank, sendcount, recvcount, source, ierr
>>> real :: sendbuf(SIZE,SIZE), recvbuf(SIZE)
>>>
>>> !  Fortran stores this array in column major order, so the
>>> !  scatter will actually scatter columns, not rows.
>>> data sendbuf /1.0, 2.0, 3.0, 4.0, &
>>> 5.0, 6.0, 7.0, 8.0, &
>>> 9.0, 10.0, 11.0, 12.0, &
>>> 13.0, 14.0, 15.0, 16.0 /
>>>
>>> call MPI_INIT(ierr)
>>> call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
>>> call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
>>>
>>> if (numtasks .eq. SIZE) then
>>>   source = 1
>>>   sendcount = SIZE
>>>   recvcount = SIZE
>>>   call MPI_SCATTER(sendbuf, sendcount, MPI_REAL, recvbuf, &
>>>recvcount, MPI_REAL, source, MPI_COMM_WORLD, ierr)
>>>   print *, 'rank= ',rank,' Results: ',recvbuf
>>> else
>>>print *, 'Must specify',SIZE,' processors.  Terminating.'
>>> endif
>>>
>>> call MPI_FINALIZE(ierr)
>>>
>>> end program
>>>
>>>
>>> On Sat, Sep 15, 2012 at 3:02 PM, John Chludzinski <
>>> john.chludzin...@gmail.com> wrote:
>>>
 # export LD_LIBRARY_PATH


 # mpiexec -n 

Re: [OMPI users] Newbie question?

2012-09-15 Thread Ralph Castain
No - the mca param has to be specified *before* your executable

mpiexec -mca btl ^openib -n 4 ./a.out

Also, note the space between "btl" and "^openib"


On Sep 15, 2012, at 5:45 PM, John Chludzinski  
wrote:

> Is this what you intended(?):
> 
> $ mpiexec -n 4 ./a.out -mca btl^openib
> 
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> --
> [[5991,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> 
> Module: OpenFabrics (openib)
>   Host: elzbieta
> 
> Another transport will be used instead, although this may result in
> lower performance.
> --
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
>  rank=1  Results:5.000   6.000   7.000
>8.000
>  rank=0  Results:1.000   2.000   3.000
>4.000
>  rank=2  Results:9.000   10.00   11.00
>12.00
>  rank=3  Results:13.00   14.00   15.00
>16.00
> [elzbieta:02374] 3 more processes have sent help message 
> help-mpi-btl-base.txt / btl:no-nics
> [elzbieta:02374] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
> help / error messages
> 
> 
> On Sat, Sep 15, 2012 at 8:22 PM, Ralph Castain  wrote:
> Try adding "-mca btl ^openib" to your cmd line and see if that cleans it up.
> 
> 
> On Sep 15, 2012, at 12:44 PM, John Chludzinski  
> wrote:
> 
>> There was a bug in the code.  So now I get this, which is correct but how do 
>> I get rid of all these ABI, CMA, etc. messages?
>> 
>> $ mpiexec -n 4 ./a.out 
>> librdmacm: couldn't read ABI version.
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> CMA: unable to get RDMA device list
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> --
>> [[6110,1],1]: A high-performance Open MPI point-to-point messaging module
>> was unable to find any relevant network interfaces:
>> 
>> Module: OpenFabrics (openib)
>>   Host: elzbieta
>> 
>> Another transport will be used instead, although this may result in
>> lower performance.
>> --
>>  rank=1  Results:5.000   6.000   7.000   
>> 8.000
>>  rank=2  Results:9.000   10.00   11.00   
>> 12.00
>>  rank=0  Results:1.000   2.000   3.000   
>> 4.000
>>  rank=3  Results:13.00   14.00   15.00   
>> 16.00
>> [elzbieta:02559] 3 more processes have sent help message 
>> help-mpi-btl-base.txt / btl:no-nics
>> [elzbieta:02559] Set MCA parameter "orte_base_help_aggregate" to 0 to see 
>> all help / error messages
>> 
>> 
>> On Sat, Sep 15, 2012 at 3:34 PM, John Chludzinski 
>>  wrote:
>> BTW, here the example code:
>> 
>> program scatter
>> include 'mpif.h'
>> 
>> integer, parameter :: SIZE=4
>> integer :: numtasks, rank, sendcount, recvcount, source, ierr
>> real :: sendbuf(SIZE,SIZE), recvbuf(SIZE)
>> 
>> !  Fortran stores this array in column major order, so the 
>> !  scatter will actually scatter columns, not rows.
>> data sendbuf /1.0, 2.0, 3.0, 4.0, &
>> 5.0, 6.0, 7.0, 8.0, &
>> 9.0, 10.0, 11.0, 12.0, &
>> 13.0, 14.0, 15.0, 16.0 /
>> 
>> call MPI_INIT(ierr)
>> call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
>> call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
>> 
>> if (numtasks .eq. SIZE) then
>>   source = 1
>>   sendcount = SIZE
>>   recvcount = SIZE
>>   call MPI_SCATTER(sendbuf, sendcount, MPI_REAL, recvbuf, &
>>recvcount, MPI_REAL, source, MPI_COMM_WORLD, ierr)
>>   print *, 'rank= ',rank,' Results: ',recvbuf 
>> else
>>print *, 'Must specify',SIZE,' processors.  Terminating.' 
>> endif
>> 
>> call MPI_FINALIZE(ierr)
>> 
>> end program
>> 
>> 
>> On Sat, Sep 15, 2012 at 3:02 PM, John Chludzinski 
>>  wrote:
>> # export LD_LIBRARY_PATH
>> 
>> 
>> # mpiexec -n 1 printenv | grep PATH
>> LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
>> 
>> 

Re: [OMPI users] Newbie question?

2012-09-15 Thread John Chludzinski
Is this what you intended(?):

*$ mpiexec -n 4 ./a.out -mca btl^openib

*librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
--
[[5991,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: elzbieta

Another transport will be used instead, although this may result in
lower performance.
--
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
 rank=1  Results:5.000   6.000
7.000   8.000
 rank=0  Results:1.000   2.000
3.000   4.000
 rank=2  Results:9.000   10.00
11.00   12.00
 rank=3  Results:13.00   14.00
15.00   16.00
[elzbieta:02374] 3 more processes have sent help message
help-mpi-btl-base.txt / btl:no-nics
[elzbieta:02374] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages


On Sat, Sep 15, 2012 at 8:22 PM, Ralph Castain  wrote:

> Try adding "-mca btl ^openib" to your cmd line and see if that cleans it
> up.
>
>
> On Sep 15, 2012, at 12:44 PM, John Chludzinski 
> wrote:
>
> There was a bug in the code.  So now I get this, which is correct but how
> do I get rid of all these ABI, CMA, etc. messages?
>
> $ mpiexec -n 4 ./a.out
> librdmacm: couldn't read ABI version.
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> --
> [[6110,1],1]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
>
> Module: OpenFabrics (openib)
>   Host: elzbieta
>
> Another transport will be used instead, although this may result in
> lower performance.
> --
>  rank=1  Results:5.000   6.000
> 7.000   8.000
>  rank=2  Results:9.000   10.00
> 11.00   12.00
>  rank=0  Results:1.000   2.000
> 3.000   4.000
>  rank=3  Results:13.00   14.00
> 15.00   16.00
> [elzbieta:02559] 3 more processes have sent help message
> help-mpi-btl-base.txt / btl:no-nics
> [elzbieta:02559] Set MCA parameter "orte_base_help_aggregate" to 0 to see
> all help / error messages
>
>
> On Sat, Sep 15, 2012 at 3:34 PM, John Chludzinski <
> john.chludzin...@gmail.com> wrote:
>
>> BTW, here the example code:
>>
>> program scatter
>> include 'mpif.h'
>>
>> integer, parameter :: SIZE=4
>> integer :: numtasks, rank, sendcount, recvcount, source, ierr
>> real :: sendbuf(SIZE,SIZE), recvbuf(SIZE)
>>
>> !  Fortran stores this array in column major order, so the
>> !  scatter will actually scatter columns, not rows.
>> data sendbuf /1.0, 2.0, 3.0, 4.0, &
>> 5.0, 6.0, 7.0, 8.0, &
>> 9.0, 10.0, 11.0, 12.0, &
>> 13.0, 14.0, 15.0, 16.0 /
>>
>> call MPI_INIT(ierr)
>> call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
>> call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
>>
>> if (numtasks .eq. SIZE) then
>>   source = 1
>>   sendcount = SIZE
>>   recvcount = SIZE
>>   call MPI_SCATTER(sendbuf, sendcount, MPI_REAL, recvbuf, &
>>recvcount, MPI_REAL, source, MPI_COMM_WORLD, ierr)
>>   print *, 'rank= ',rank,' Results: ',recvbuf
>> else
>>print *, 'Must specify',SIZE,' processors.  Terminating.'
>> endif
>>
>> call MPI_FINALIZE(ierr)
>>
>> end program
>>
>>
>> On Sat, Sep 15, 2012 at 3:02 PM, John Chludzinski <
>> john.chludzin...@gmail.com> wrote:
>>
>>> # export LD_LIBRARY_PATH
>>>
>>>
>>> # mpiexec -n 1 printenv | grep PATH
>>> LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
>>>
>>>
>>> PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
>>> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
>>> WINDOWPATH=1
>>>
>>> # mpiexec -n 4 ./a.out
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>>
>>> --
>>> [[3598,1],0]: 

Re: [OMPI users] Newbie question?

2012-09-15 Thread Ralph Castain
Try adding "-mca btl ^openib" to your cmd line and see if that cleans it up.


On Sep 15, 2012, at 12:44 PM, John Chludzinski  
wrote:

> There was a bug in the code.  So now I get this, which is correct but how do 
> I get rid of all these ABI, CMA, etc. messages?
> 
> $ mpiexec -n 4 ./a.out 
> librdmacm: couldn't read ABI version.
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> --
> [[6110,1],1]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> 
> Module: OpenFabrics (openib)
>   Host: elzbieta
> 
> Another transport will be used instead, although this may result in
> lower performance.
> --
>  rank=1  Results:5.000   6.000   7.000
>8.000
>  rank=2  Results:9.000   10.00   11.00
>12.00
>  rank=0  Results:1.000   2.000   3.000
>4.000
>  rank=3  Results:13.00   14.00   15.00
>16.00
> [elzbieta:02559] 3 more processes have sent help message 
> help-mpi-btl-base.txt / btl:no-nics
> [elzbieta:02559] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
> help / error messages
> 
> 
> On Sat, Sep 15, 2012 at 3:34 PM, John Chludzinski 
>  wrote:
> BTW, here the example code:
> 
> program scatter
> include 'mpif.h'
> 
> integer, parameter :: SIZE=4
> integer :: numtasks, rank, sendcount, recvcount, source, ierr
> real :: sendbuf(SIZE,SIZE), recvbuf(SIZE)
> 
> !  Fortran stores this array in column major order, so the 
> !  scatter will actually scatter columns, not rows.
> data sendbuf /1.0, 2.0, 3.0, 4.0, &
> 5.0, 6.0, 7.0, 8.0, &
> 9.0, 10.0, 11.0, 12.0, &
> 13.0, 14.0, 15.0, 16.0 /
> 
> call MPI_INIT(ierr)
> call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
> call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
> 
> if (numtasks .eq. SIZE) then
>   source = 1
>   sendcount = SIZE
>   recvcount = SIZE
>   call MPI_SCATTER(sendbuf, sendcount, MPI_REAL, recvbuf, &
>recvcount, MPI_REAL, source, MPI_COMM_WORLD, ierr)
>   print *, 'rank= ',rank,' Results: ',recvbuf 
> else
>print *, 'Must specify',SIZE,' processors.  Terminating.' 
> endif
> 
> call MPI_FINALIZE(ierr)
> 
> end program
> 
> 
> On Sat, Sep 15, 2012 at 3:02 PM, John Chludzinski 
>  wrote:
> # export LD_LIBRARY_PATH
> 
> 
> # mpiexec -n 1 printenv | grep PATH
> LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
> 
> PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
> WINDOWPATH=1
> 
> # mpiexec -n 4 ./a.out 
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> --
> [[3598,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> 
> Module: OpenFabrics (openib)
>   Host: elzbieta
> 
> Another transport will be used instead, although this may result in
> lower performance.
> --
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> CMA: unable to get RDMA device list
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> [elzbieta:4145] *** An error occurred in MPI_Scatter
> [elzbieta:4145] *** on communicator MPI_COMM_WORLD
> [elzbieta:4145] *** MPI_ERR_TYPE: invalid datatype
> [elzbieta:4145] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
> --
> mpiexec has exited due to process rank 1 with PID 4145 on
> node elzbieta exiting improperly. There are two reasons this could occur:
> 
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
> 
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call 

Re: [OMPI users] Newbie question?

2012-09-15 Thread John Chludzinski
There was a bug in the code.  So now I get this, which is correct but how
do I get rid of all these ABI, CMA, etc. messages?

$ mpiexec -n 4 ./a.out
librdmacm: couldn't read ABI version.
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
librdmacm: assuming: 4
CMA: unable to get RDMA device list
CMA: unable to get RDMA device list
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
--
[[6110,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: elzbieta

Another transport will be used instead, although this may result in
lower performance.
--
 rank=1  Results:5.000   6.000
7.000   8.000
 rank=2  Results:9.000   10.00
11.00   12.00
 rank=0  Results:1.000   2.000
3.000   4.000
 rank=3  Results:13.00   14.00
15.00   16.00
[elzbieta:02559] 3 more processes have sent help message
help-mpi-btl-base.txt / btl:no-nics
[elzbieta:02559] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages


On Sat, Sep 15, 2012 at 3:34 PM, John Chludzinski <
john.chludzin...@gmail.com> wrote:

> BTW, here the example code:
>
> program scatter
> include 'mpif.h'
>
> integer, parameter :: SIZE=4
> integer :: numtasks, rank, sendcount, recvcount, source, ierr
> real :: sendbuf(SIZE,SIZE), recvbuf(SIZE)
>
> !  Fortran stores this array in column major order, so the
> !  scatter will actually scatter columns, not rows.
> data sendbuf /1.0, 2.0, 3.0, 4.0, &
> 5.0, 6.0, 7.0, 8.0, &
> 9.0, 10.0, 11.0, 12.0, &
> 13.0, 14.0, 15.0, 16.0 /
>
> call MPI_INIT(ierr)
> call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
> call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
>
> if (numtasks .eq. SIZE) then
>   source = 1
>   sendcount = SIZE
>   recvcount = SIZE
>   call MPI_SCATTER(sendbuf, sendcount, MPI_REAL, recvbuf, &
>recvcount, MPI_REAL, source, MPI_COMM_WORLD, ierr)
>   print *, 'rank= ',rank,' Results: ',recvbuf
> else
>print *, 'Must specify',SIZE,' processors.  Terminating.'
> endif
>
> call MPI_FINALIZE(ierr)
>
> end program
>
>
> On Sat, Sep 15, 2012 at 3:02 PM, John Chludzinski <
> john.chludzin...@gmail.com> wrote:
>
>> # export LD_LIBRARY_PATH
>>
>>
>> # mpiexec -n 1 printenv | grep PATH
>> LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
>>
>>
>> PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
>> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
>> WINDOWPATH=1
>>
>> # mpiexec -n 4 ./a.out
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> --
>> [[3598,1],0]: A high-performance Open MPI point-to-point messaging module
>> was unable to find any relevant network interfaces:
>>
>> Module: OpenFabrics (openib)
>>   Host: elzbieta
>>
>> Another transport will be used instead, although this may result in
>> lower performance.
>> --
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> librdmacm: couldn't read ABI version.
>> CMA: unable to get RDMA device list
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> [elzbieta:4145] *** An error occurred in MPI_Scatter
>> [elzbieta:4145] *** on communicator MPI_COMM_WORLD
>> [elzbieta:4145] *** MPI_ERR_TYPE: invalid datatype
>> [elzbieta:4145] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
>> --
>> mpiexec has exited due to process rank 1 with PID 4145 on
>> node elzbieta exiting improperly. There are two reasons this could occur:
>>
>> 1. this process did not call "init" before exiting, but others in
>> the job did. This can cause a job to hang indefinitely while it waits
>> for all processes to call "init". By rule, if one process calls "init",
>> then ALL processes must call "init" prior to termination.
>>
>> 2. this process called "init", but exited without calling "finalize".
>> By rule, all processes that call "init" MUST call "finalize" prior to
>> exiting or it will be considered an "abnormal termination"
>>
>> This may have caused other processes in the application to be
>> terminated by signals sent by mpiexec (as reported here).
>> 

Re: [OMPI users] Newbie question?

2012-09-15 Thread John Chludzinski
BTW, here the example code:

program scatter
include 'mpif.h'

integer, parameter :: SIZE=4
integer :: numtasks, rank, sendcount, recvcount, source, ierr
real :: sendbuf(SIZE,SIZE), recvbuf(SIZE)

!  Fortran stores this array in column major order, so the
!  scatter will actually scatter columns, not rows.
data sendbuf /1.0, 2.0, 3.0, 4.0, &
5.0, 6.0, 7.0, 8.0, &
9.0, 10.0, 11.0, 12.0, &
13.0, 14.0, 15.0, 16.0 /

call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)

if (numtasks .eq. SIZE) then
  source = 1
  sendcount = SIZE
  recvcount = SIZE
  call MPI_SCATTER(sendbuf, sendcount, MPI_REAL, recvbuf, &
   recvcount, MPI_REAL, source, MPI_COMM_WORLD, ierr)
  print *, 'rank= ',rank,' Results: ',recvbuf
else
   print *, 'Must specify',SIZE,' processors.  Terminating.'
endif

call MPI_FINALIZE(ierr)

end program


On Sat, Sep 15, 2012 at 3:02 PM, John Chludzinski <
john.chludzin...@gmail.com> wrote:

> # export LD_LIBRARY_PATH
>
>
> # mpiexec -n 1 printenv | grep PATH
> LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
>
>
> PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
> WINDOWPATH=1
>
> # mpiexec -n 4 ./a.out
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> --
> [[3598,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
>
> Module: OpenFabrics (openib)
>   Host: elzbieta
>
> Another transport will be used instead, although this may result in
> lower performance.
> --
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> CMA: unable to get RDMA device list
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> [elzbieta:4145] *** An error occurred in MPI_Scatter
> [elzbieta:4145] *** on communicator MPI_COMM_WORLD
> [elzbieta:4145] *** MPI_ERR_TYPE: invalid datatype
> [elzbieta:4145] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
> --
> mpiexec has exited due to process rank 1 with PID 4145 on
> node elzbieta exiting improperly. There are two reasons this could occur:
>
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
>
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
>
> This may have caused other processes in the application to be
> terminated by signals sent by mpiexec (as reported here).
> --
>
>
>
> On Sat, Sep 15, 2012 at 2:24 PM, Ralph Castain  wrote:
>
>> Ah - note that there is no LD_LIBRARY_PATH in the environment. That's the
>> problem
>>
>> On Sep 15, 2012, at 11:19 AM, John Chludzinski <
>> john.chludzin...@gmail.com> wrote:
>>
>> $ which mpiexec
>> /usr/lib/openmpi/bin/mpiexec
>>
>> # mpiexec -n 1 printenv | grep PATH
>>
>> PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
>> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
>> WINDOWPATH=1
>>
>>
>>
>> On Sat, Sep 15, 2012 at 1:11 PM, Ralph Castain  wrote:
>>
>>> Couple of things worth checking:
>>>
>>> 1. verify that you executed the "mpiexec" you think you did - a simple
>>> "which mpiexec" should suffice
>>>
>>> 2. verify that your environment is correct by "mpiexec -n 1 printenv |
>>> grep PATH". Sometimes the ld_library_path doesn't carry over like you think
>>> it should
>>>
>>>
>>>  On Sep 15, 2012, at 10:00 AM, John Chludzinski <
>>> john.chludzin...@gmail.com> wrote:
>>>
>>> I installed OpenMPI (I have a simple dual core AMD notebook with Fedora
>>> 16) via:
>>>
>>> # yum install openmpi
>>> # yum install openmpi-devel
>>> # mpirun --version
>>> mpirun (Open MPI) 1.5.4
>>>
>>> I added:
>>>
>>> $ PATH=PATH=/usr/lib/openmpi/bin/:$PATH
>>> $ LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
>>>
>>> Then:
>>>
>>> $ mpif90 ex1.f95
>>> $ mpiexec -n 4 ./a.out
>>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
>>> open shared object file: No such file or directory
>>> ./a.out: error while loading shared libraries: 

Re: [OMPI users] Newbie question?

2012-09-15 Thread John Chludzinski
# export LD_LIBRARY_PATH

# mpiexec -n 1 printenv | grep PATH
LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
WINDOWPATH=1

# mpiexec -n 4 ./a.out
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
--
[[3598,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: elzbieta

Another transport will be used instead, although this may result in
lower performance.
--
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
librdmacm: couldn't read ABI version.
CMA: unable to get RDMA device list
librdmacm: assuming: 4
CMA: unable to get RDMA device list
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
[elzbieta:4145] *** An error occurred in MPI_Scatter
[elzbieta:4145] *** on communicator MPI_COMM_WORLD
[elzbieta:4145] *** MPI_ERR_TYPE: invalid datatype
[elzbieta:4145] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
--
mpiexec has exited due to process rank 1 with PID 4145 on
node elzbieta exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpiexec (as reported here).
--


On Sat, Sep 15, 2012 at 2:24 PM, Ralph Castain  wrote:

> Ah - note that there is no LD_LIBRARY_PATH in the environment. That's the
> problem
>
> On Sep 15, 2012, at 11:19 AM, John Chludzinski 
> wrote:
>
> $ which mpiexec
> /usr/lib/openmpi/bin/mpiexec
>
> # mpiexec -n 1 printenv | grep PATH
>
> PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
> WINDOWPATH=1
>
>
>
> On Sat, Sep 15, 2012 at 1:11 PM, Ralph Castain  wrote:
>
>> Couple of things worth checking:
>>
>> 1. verify that you executed the "mpiexec" you think you did - a simple
>> "which mpiexec" should suffice
>>
>> 2. verify that your environment is correct by "mpiexec -n 1 printenv |
>> grep PATH". Sometimes the ld_library_path doesn't carry over like you think
>> it should
>>
>>
>> On Sep 15, 2012, at 10:00 AM, John Chludzinski <
>> john.chludzin...@gmail.com> wrote:
>>
>> I installed OpenMPI (I have a simple dual core AMD notebook with Fedora
>> 16) via:
>>
>> # yum install openmpi
>> # yum install openmpi-devel
>> # mpirun --version
>> mpirun (Open MPI) 1.5.4
>>
>> I added:
>>
>> $ PATH=PATH=/usr/lib/openmpi/bin/:$PATH
>> $ LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
>>
>> Then:
>>
>> $ mpif90 ex1.f95
>> $ mpiexec -n 4 ./a.out
>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
>> open shared object file: No such file or directory
>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
>> open shared object file: No such file or directory
>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
>> open shared object file: No such file or directory
>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
>> open shared object file: No such file or directory
>> --
>> mpiexec noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --
>>
>> ls -l /usr/lib/openmpi/lib/
>> total 6788
>> lrwxrwxrwx. 1 root root  25 Sep 15 12:25 libmca_common_sm.so ->
>> libmca_common_sm.so.2.0.0
>> lrwxrwxrwx. 1 root root  25 Sep 14 16:14 libmca_common_sm.so.2 ->
>> libmca_common_sm.so.2.0.0
>> -rwxr-xr-x. 1 root root8492 Jan 20  2012 libmca_common_sm.so.2.0.0
>> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_cxx.so ->
>> libmpi_cxx.so.1.0.1
>> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_cxx.so.1 ->
>> libmpi_cxx.so.1.0.1
>> -rwxr-xr-x. 1 root root   87604 Jan 20  2012 libmpi_cxx.so.1.0.1

Re: [OMPI users] Newbie question?

2012-09-15 Thread Ralph Castain
Ah - note that there is no LD_LIBRARY_PATH in the environment. That's the 
problem

On Sep 15, 2012, at 11:19 AM, John Chludzinski  
wrote:

> $ which mpiexec
> /usr/lib/openmpi/bin/mpiexec
> 
> # mpiexec -n 1 printenv | grep PATH
> PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
> WINDOWPATH=1
> 
> 
> 
> On Sat, Sep 15, 2012 at 1:11 PM, Ralph Castain  wrote:
> Couple of things worth checking:
> 
> 1. verify that you executed the "mpiexec" you think you did - a simple "which 
> mpiexec" should suffice
> 
> 2. verify that your environment is correct by "mpiexec -n 1 printenv | grep 
> PATH". Sometimes the ld_library_path doesn't carry over like you think it 
> should
> 
> 
> On Sep 15, 2012, at 10:00 AM, John Chludzinski  
> wrote:
> 
>> I installed OpenMPI (I have a simple dual core AMD notebook with Fedora 16) 
>> via:
>> 
>> # yum install openmpi
>> # yum install openmpi-devel
>> # mpirun --version
>> mpirun (Open MPI) 1.5.4
>> 
>> I added: 
>> 
>> $ PATH=PATH=/usr/lib/openmpi/bin/:$PATH
>> $ LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
>> 
>> Then:
>> 
>> $ mpif90 ex1.f95
>> $ mpiexec -n 4 ./a.out 
>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
>> shared object file: No such file or directory
>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
>> shared object file: No such file or directory
>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
>> shared object file: No such file or directory
>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
>> shared object file: No such file or directory
>> --
>> mpiexec noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --
>> 
>> ls -l /usr/lib/openmpi/lib/
>> total 6788
>> lrwxrwxrwx. 1 root root  25 Sep 15 12:25 libmca_common_sm.so -> 
>> libmca_common_sm.so.2.0.0
>> lrwxrwxrwx. 1 root root  25 Sep 14 16:14 libmca_common_sm.so.2 -> 
>> libmca_common_sm.so.2.0.0
>> -rwxr-xr-x. 1 root root8492 Jan 20  2012 libmca_common_sm.so.2.0.0
>> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_cxx.so -> 
>> libmpi_cxx.so.1.0.1
>> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_cxx.so.1 -> 
>> libmpi_cxx.so.1.0.1
>> -rwxr-xr-x. 1 root root   87604 Jan 20  2012 libmpi_cxx.so.1.0.1
>> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f77.so -> 
>> libmpi_f77.so.1.0.2
>> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f77.so.1 -> 
>> libmpi_f77.so.1.0.2
>> -rwxr-xr-x. 1 root root  179912 Jan 20  2012 libmpi_f77.so.1.0.2
>> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f90.so -> 
>> libmpi_f90.so.1.1.0
>> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f90.so.1 -> 
>> libmpi_f90.so.1.1.0
>> -rwxr-xr-x. 1 root root   10364 Jan 20  2012 libmpi_f90.so.1.1.0
>> lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libmpi.so -> libmpi.so.1.0.2
>> lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libmpi.so.1 -> libmpi.so.1.0.2
>> -rwxr-xr-x. 1 root root 1383444 Jan 20  2012 libmpi.so.1.0.2
>> lrwxrwxrwx. 1 root root  21 Sep 15 12:25 libompitrace.so -> 
>> libompitrace.so.0.0.0
>> lrwxrwxrwx. 1 root root  21 Sep 14 16:14 libompitrace.so.0 -> 
>> libompitrace.so.0.0.0
>> -rwxr-xr-x. 1 root root   13572 Jan 20  2012 libompitrace.so.0.0.0
>> lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-pal.so -> 
>> libopen-pal.so.3.0.0
>> lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-pal.so.3 -> 
>> libopen-pal.so.3.0.0
>> -rwxr-xr-x. 1 root root  386324 Jan 20  2012 libopen-pal.so.3.0.0
>> lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-rte.so -> 
>> libopen-rte.so.3.0.0
>> lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-rte.so.3 -> 
>> libopen-rte.so.3.0.0
>> -rwxr-xr-x. 1 root root  790052 Jan 20  2012 libopen-rte.so.3.0.0
>> -rw-r--r--. 1 root root  301520 Jan 20  2012 libotf.a
>> lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libotf.so -> libotf.so.0.0.1
>> lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libotf.so.0 -> libotf.so.0.0.1
>> -rwxr-xr-x. 1 root root  206384 Jan 20  2012 libotf.so.0.0.1
>> -rw-r--r--. 1 root root  337970 Jan 20  2012 libvt.a
>> -rw-r--r--. 1 root root  591070 Jan 20  2012 libvt-hyb.a
>> lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-hyb.so -> 
>> libvt-hyb.so.0.0.0
>> lrwxrwxrwx. 1 root root  18 Sep 14 16:14 libvt-hyb.so.0 -> 
>> libvt-hyb.so.0.0.0
>> -rwxr-xr-x. 1 root root  428844 Jan 20  2012 libvt-hyb.so.0.0.0
>> -rw-r--r--. 1 root root  541004 Jan 20  2012 libvt-mpi.a
>> lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-mpi.so -> 
>> libvt-mpi.so.0.0.0
>> lrwxrwxrwx. 1 root root  18 Sep 14 

Re: [OMPI users] Newbie question?

2012-09-15 Thread Reuti

Am 15.09.2012 um 19:00 schrieb John Chludzinski:

> I installed OpenMPI (I have a simple dual core AMD notebook with Fedora 16) 
> via:
> 
> # yum install openmpi
> # yum install openmpi-devel
> # mpirun --version
> mpirun (Open MPI) 1.5.4
> 
> I added: 
> 
> $ PATH=PATH=/usr/lib/openmpi/bin/:$PATH

Is this a typo - double PATH?


> $ LD_LIBRARY_PATH=/usr/lib/openmpi/lib/

It needs to be exported, so that child processes can use it too.

-- Reuti


> Then:
> 
> $ mpif90 ex1.f95
> $ mpiexec -n 4 ./a.out 
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
> shared object file: No such file or directory
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
> shared object file: No such file or directory
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
> shared object file: No such file or directory
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
> shared object file: No such file or directory
> --
> mpiexec noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --
> 
> ls -l /usr/lib/openmpi/lib/
> total 6788
> lrwxrwxrwx. 1 root root  25 Sep 15 12:25 libmca_common_sm.so -> 
> libmca_common_sm.so.2.0.0
> lrwxrwxrwx. 1 root root  25 Sep 14 16:14 libmca_common_sm.so.2 -> 
> libmca_common_sm.so.2.0.0
> -rwxr-xr-x. 1 root root8492 Jan 20  2012 libmca_common_sm.so.2.0.0
> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_cxx.so -> 
> libmpi_cxx.so.1.0.1
> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_cxx.so.1 -> 
> libmpi_cxx.so.1.0.1
> -rwxr-xr-x. 1 root root   87604 Jan 20  2012 libmpi_cxx.so.1.0.1
> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f77.so -> 
> libmpi_f77.so.1.0.2
> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f77.so.1 -> 
> libmpi_f77.so.1.0.2
> -rwxr-xr-x. 1 root root  179912 Jan 20  2012 libmpi_f77.so.1.0.2
> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f90.so -> 
> libmpi_f90.so.1.1.0
> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f90.so.1 -> 
> libmpi_f90.so.1.1.0
> -rwxr-xr-x. 1 root root   10364 Jan 20  2012 libmpi_f90.so.1.1.0
> lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libmpi.so -> libmpi.so.1.0.2
> lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libmpi.so.1 -> libmpi.so.1.0.2
> -rwxr-xr-x. 1 root root 1383444 Jan 20  2012 libmpi.so.1.0.2
> lrwxrwxrwx. 1 root root  21 Sep 15 12:25 libompitrace.so -> 
> libompitrace.so.0.0.0
> lrwxrwxrwx. 1 root root  21 Sep 14 16:14 libompitrace.so.0 -> 
> libompitrace.so.0.0.0
> -rwxr-xr-x. 1 root root   13572 Jan 20  2012 libompitrace.so.0.0.0
> lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-pal.so -> 
> libopen-pal.so.3.0.0
> lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-pal.so.3 -> 
> libopen-pal.so.3.0.0
> -rwxr-xr-x. 1 root root  386324 Jan 20  2012 libopen-pal.so.3.0.0
> lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-rte.so -> 
> libopen-rte.so.3.0.0
> lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-rte.so.3 -> 
> libopen-rte.so.3.0.0
> -rwxr-xr-x. 1 root root  790052 Jan 20  2012 libopen-rte.so.3.0.0
> -rw-r--r--. 1 root root  301520 Jan 20  2012 libotf.a
> lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libotf.so -> libotf.so.0.0.1
> lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libotf.so.0 -> libotf.so.0.0.1
> -rwxr-xr-x. 1 root root  206384 Jan 20  2012 libotf.so.0.0.1
> -rw-r--r--. 1 root root  337970 Jan 20  2012 libvt.a
> -rw-r--r--. 1 root root  591070 Jan 20  2012 libvt-hyb.a
> lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-hyb.so -> 
> libvt-hyb.so.0.0.0
> lrwxrwxrwx. 1 root root  18 Sep 14 16:14 libvt-hyb.so.0 -> 
> libvt-hyb.so.0.0.0
> -rwxr-xr-x. 1 root root  428844 Jan 20  2012 libvt-hyb.so.0.0.0
> -rw-r--r--. 1 root root  541004 Jan 20  2012 libvt-mpi.a
> lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-mpi.so -> 
> libvt-mpi.so.0.0.0
> lrwxrwxrwx. 1 root root  18 Sep 14 16:14 libvt-mpi.so.0 -> 
> libvt-mpi.so.0.0.0
> -rwxr-xr-x. 1 root root  396352 Jan 20  2012 libvt-mpi.so.0.0.0
> -rw-r--r--. 1 root root  372352 Jan 20  2012 libvt-mt.a
> lrwxrwxrwx. 1 root root  17 Sep 15 12:25 libvt-mt.so -> libvt-mt.so.0.0.0
> lrwxrwxrwx. 1 root root  17 Sep 14 16:14 libvt-mt.so.0 -> 
> libvt-mt.so.0.0.0
> -rwxr-xr-x. 1 root root  266104 Jan 20  2012 libvt-mt.so.0.0.0
> -rw-r--r--. 1 root root   60390 Jan 20  2012 libvt-pomp.a
> lrwxrwxrwx. 1 root root  14 Sep 15 12:25 libvt.so -> libvt.so.0.0.0
> lrwxrwxrwx. 1 root root  14 Sep 14 16:14 libvt.so.0 -> libvt.so.0.0.0
> -rwxr-xr-x. 1 root root  242604 Jan 20  2012 libvt.so.0.0.0
> -rwxr-xr-x. 1 root root  303591 Jan 20  2012 mpi.mod
> drwxr-xr-x. 2 root root4096 Sep 14 16:14 openmpi
> 
> 
> The file (actually, a link) it claims it can't find: libmpi_f90.so.1, is 
> clearly there. And 

Re: [OMPI users] Newbie question?

2012-09-15 Thread John Chludzinski
$ which mpiexec
/usr/lib/openmpi/bin/mpiexec

# mpiexec -n 1 printenv | grep PATH
PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
WINDOWPATH=1



On Sat, Sep 15, 2012 at 1:11 PM, Ralph Castain  wrote:

> Couple of things worth checking:
>
> 1. verify that you executed the "mpiexec" you think you did - a simple
> "which mpiexec" should suffice
>
> 2. verify that your environment is correct by "mpiexec -n 1 printenv |
> grep PATH". Sometimes the ld_library_path doesn't carry over like you think
> it should
>
>
> On Sep 15, 2012, at 10:00 AM, John Chludzinski 
> wrote:
>
> I installed OpenMPI (I have a simple dual core AMD notebook with Fedora
> 16) via:
>
> # yum install openmpi
> # yum install openmpi-devel
> # mpirun --version
> mpirun (Open MPI) 1.5.4
>
> I added:
>
> $ PATH=PATH=/usr/lib/openmpi/bin/:$PATH
> $ LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
>
> Then:
>
> $ mpif90 ex1.f95
> $ mpiexec -n 4 ./a.out
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
> open shared object file: No such file or directory
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
> open shared object file: No such file or directory
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
> open shared object file: No such file or directory
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
> open shared object file: No such file or directory
> --
> mpiexec noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --
>
> ls -l /usr/lib/openmpi/lib/
> total 6788
> lrwxrwxrwx. 1 root root  25 Sep 15 12:25 libmca_common_sm.so ->
> libmca_common_sm.so.2.0.0
> lrwxrwxrwx. 1 root root  25 Sep 14 16:14 libmca_common_sm.so.2 ->
> libmca_common_sm.so.2.0.0
> -rwxr-xr-x. 1 root root8492 Jan 20  2012 libmca_common_sm.so.2.0.0
> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_cxx.so ->
> libmpi_cxx.so.1.0.1
> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_cxx.so.1 ->
> libmpi_cxx.so.1.0.1
> -rwxr-xr-x. 1 root root   87604 Jan 20  2012 libmpi_cxx.so.1.0.1
> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f77.so ->
> libmpi_f77.so.1.0.2
> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f77.so.1 ->
> libmpi_f77.so.1.0.2
> -rwxr-xr-x. 1 root root  179912 Jan 20  2012 libmpi_f77.so.1.0.2
> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f90.so ->
> libmpi_f90.so.1.1.0
> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f90.so.1 ->
> libmpi_f90.so.1.1.0
> -rwxr-xr-x. 1 root root   10364 Jan 20  2012 libmpi_f90.so.1.1.0
> lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libmpi.so -> libmpi.so.1.0.2
> lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libmpi.so.1 -> libmpi.so.1.0.2
> -rwxr-xr-x. 1 root root 1383444 Jan 20  2012 libmpi.so.1.0.2
> lrwxrwxrwx. 1 root root  21 Sep 15 12:25 libompitrace.so ->
> libompitrace.so.0.0.0
> lrwxrwxrwx. 1 root root  21 Sep 14 16:14 libompitrace.so.0 ->
> libompitrace.so.0.0.0
> -rwxr-xr-x. 1 root root   13572 Jan 20  2012 libompitrace.so.0.0.0
> lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-pal.so ->
> libopen-pal.so.3.0.0
> lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-pal.so.3 ->
> libopen-pal.so.3.0.0
> -rwxr-xr-x. 1 root root  386324 Jan 20  2012 libopen-pal.so.3.0.0
> lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-rte.so ->
> libopen-rte.so.3.0.0
> lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-rte.so.3 ->
> libopen-rte.so.3.0.0
> -rwxr-xr-x. 1 root root  790052 Jan 20  2012 libopen-rte.so.3.0.0
> -rw-r--r--. 1 root root  301520 Jan 20  2012 libotf.a
> lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libotf.so -> libotf.so.0.0.1
> lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libotf.so.0 -> libotf.so.0.0.1
> -rwxr-xr-x. 1 root root  206384 Jan 20  2012 libotf.so.0.0.1
> -rw-r--r--. 1 root root  337970 Jan 20  2012 libvt.a
> -rw-r--r--. 1 root root  591070 Jan 20  2012 libvt-hyb.a
> lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-hyb.so ->
> libvt-hyb.so.0.0.0
> lrwxrwxrwx. 1 root root  18 Sep 14 16:14 libvt-hyb.so.0 ->
> libvt-hyb.so.0.0.0
> -rwxr-xr-x. 1 root root  428844 Jan 20  2012 libvt-hyb.so.0.0.0
> -rw-r--r--. 1 root root  541004 Jan 20  2012 libvt-mpi.a
> lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-mpi.so ->
> libvt-mpi.so.0.0.0
> lrwxrwxrwx. 1 root root  18 Sep 14 16:14 libvt-mpi.so.0 ->
> libvt-mpi.so.0.0.0
> -rwxr-xr-x. 1 root root  396352 Jan 20  2012 libvt-mpi.so.0.0.0
> -rw-r--r--. 1 root root  372352 Jan 20  2012 libvt-mt.a
> lrwxrwxrwx. 1 root root  17 Sep 15 12:25 libvt-mt.so ->
> libvt-mt.so.0.0.0
> lrwxrwxrwx. 1 root root  17 Sep 14 16:14 libvt-mt.so.0 

Re: [OMPI users] Newbie question?

2012-09-15 Thread Ralph Castain
Couple of things worth checking:

1. verify that you executed the "mpiexec" you think you did - a simple "which 
mpiexec" should suffice

2. verify that your environment is correct by "mpiexec -n 1 printenv | grep 
PATH". Sometimes the ld_library_path doesn't carry over like you think it should


On Sep 15, 2012, at 10:00 AM, John Chludzinski  
wrote:

> I installed OpenMPI (I have a simple dual core AMD notebook with Fedora 16) 
> via:
> 
> # yum install openmpi
> # yum install openmpi-devel
> # mpirun --version
> mpirun (Open MPI) 1.5.4
> 
> I added: 
> 
> $ PATH=PATH=/usr/lib/openmpi/bin/:$PATH
> $ LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
> 
> Then:
> 
> $ mpif90 ex1.f95
> $ mpiexec -n 4 ./a.out 
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
> shared object file: No such file or directory
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
> shared object file: No such file or directory
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
> shared object file: No such file or directory
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open 
> shared object file: No such file or directory
> --
> mpiexec noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --
> 
> ls -l /usr/lib/openmpi/lib/
> total 6788
> lrwxrwxrwx. 1 root root  25 Sep 15 12:25 libmca_common_sm.so -> 
> libmca_common_sm.so.2.0.0
> lrwxrwxrwx. 1 root root  25 Sep 14 16:14 libmca_common_sm.so.2 -> 
> libmca_common_sm.so.2.0.0
> -rwxr-xr-x. 1 root root8492 Jan 20  2012 libmca_common_sm.so.2.0.0
> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_cxx.so -> 
> libmpi_cxx.so.1.0.1
> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_cxx.so.1 -> 
> libmpi_cxx.so.1.0.1
> -rwxr-xr-x. 1 root root   87604 Jan 20  2012 libmpi_cxx.so.1.0.1
> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f77.so -> 
> libmpi_f77.so.1.0.2
> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f77.so.1 -> 
> libmpi_f77.so.1.0.2
> -rwxr-xr-x. 1 root root  179912 Jan 20  2012 libmpi_f77.so.1.0.2
> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f90.so -> 
> libmpi_f90.so.1.1.0
> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f90.so.1 -> 
> libmpi_f90.so.1.1.0
> -rwxr-xr-x. 1 root root   10364 Jan 20  2012 libmpi_f90.so.1.1.0
> lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libmpi.so -> libmpi.so.1.0.2
> lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libmpi.so.1 -> libmpi.so.1.0.2
> -rwxr-xr-x. 1 root root 1383444 Jan 20  2012 libmpi.so.1.0.2
> lrwxrwxrwx. 1 root root  21 Sep 15 12:25 libompitrace.so -> 
> libompitrace.so.0.0.0
> lrwxrwxrwx. 1 root root  21 Sep 14 16:14 libompitrace.so.0 -> 
> libompitrace.so.0.0.0
> -rwxr-xr-x. 1 root root   13572 Jan 20  2012 libompitrace.so.0.0.0
> lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-pal.so -> 
> libopen-pal.so.3.0.0
> lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-pal.so.3 -> 
> libopen-pal.so.3.0.0
> -rwxr-xr-x. 1 root root  386324 Jan 20  2012 libopen-pal.so.3.0.0
> lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-rte.so -> 
> libopen-rte.so.3.0.0
> lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-rte.so.3 -> 
> libopen-rte.so.3.0.0
> -rwxr-xr-x. 1 root root  790052 Jan 20  2012 libopen-rte.so.3.0.0
> -rw-r--r--. 1 root root  301520 Jan 20  2012 libotf.a
> lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libotf.so -> libotf.so.0.0.1
> lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libotf.so.0 -> libotf.so.0.0.1
> -rwxr-xr-x. 1 root root  206384 Jan 20  2012 libotf.so.0.0.1
> -rw-r--r--. 1 root root  337970 Jan 20  2012 libvt.a
> -rw-r--r--. 1 root root  591070 Jan 20  2012 libvt-hyb.a
> lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-hyb.so -> 
> libvt-hyb.so.0.0.0
> lrwxrwxrwx. 1 root root  18 Sep 14 16:14 libvt-hyb.so.0 -> 
> libvt-hyb.so.0.0.0
> -rwxr-xr-x. 1 root root  428844 Jan 20  2012 libvt-hyb.so.0.0.0
> -rw-r--r--. 1 root root  541004 Jan 20  2012 libvt-mpi.a
> lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-mpi.so -> 
> libvt-mpi.so.0.0.0
> lrwxrwxrwx. 1 root root  18 Sep 14 16:14 libvt-mpi.so.0 -> 
> libvt-mpi.so.0.0.0
> -rwxr-xr-x. 1 root root  396352 Jan 20  2012 libvt-mpi.so.0.0.0
> -rw-r--r--. 1 root root  372352 Jan 20  2012 libvt-mt.a
> lrwxrwxrwx. 1 root root  17 Sep 15 12:25 libvt-mt.so -> libvt-mt.so.0.0.0
> lrwxrwxrwx. 1 root root  17 Sep 14 16:14 libvt-mt.so.0 -> 
> libvt-mt.so.0.0.0
> -rwxr-xr-x. 1 root root  266104 Jan 20  2012 libvt-mt.so.0.0.0
> -rw-r--r--. 1 root root   60390 Jan 20  2012 libvt-pomp.a
> lrwxrwxrwx. 1 root root  14 Sep 15 12:25 libvt.so -> libvt.so.0.0.0
> lrwxrwxrwx. 1 root root  14 Sep 14 16:14 libvt.so.0 -> libvt.so.0.0.0
> -rwxr-xr-x. 1 root root  242604 Jan 20  2012 libvt.so.0.0.0
> 

[OMPI users] Newbie question?

2012-09-15 Thread John Chludzinski
I installed OpenMPI (I have a simple dual core AMD notebook with Fedora 16)
via:

# yum install openmpi
# yum install openmpi-devel
# mpirun --version
mpirun (Open MPI) 1.5.4

I added:

$ PATH=PATH=/usr/lib/openmpi/bin/:$PATH
$ LD_LIBRARY_PATH=/usr/lib/openmpi/lib/

Then:

$ mpif90 ex1.f95
$ mpiexec -n 4 ./a.out
./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open
shared object file: No such file or directory
./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open
shared object file: No such file or directory
./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open
shared object file: No such file or directory
./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open
shared object file: No such file or directory
--
mpiexec noticed that the job aborted, but has no info as to the process
that caused that situation.
--

ls -l /usr/lib/openmpi/lib/
total 6788
lrwxrwxrwx. 1 root root  25 Sep 15 12:25 libmca_common_sm.so ->
libmca_common_sm.so.2.0.0
lrwxrwxrwx. 1 root root  25 Sep 14 16:14 libmca_common_sm.so.2 ->
libmca_common_sm.so.2.0.0
-rwxr-xr-x. 1 root root8492 Jan 20  2012 libmca_common_sm.so.2.0.0
lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_cxx.so ->
libmpi_cxx.so.1.0.1
lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_cxx.so.1 ->
libmpi_cxx.so.1.0.1
-rwxr-xr-x. 1 root root   87604 Jan 20  2012 libmpi_cxx.so.1.0.1
lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f77.so ->
libmpi_f77.so.1.0.2
lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f77.so.1 ->
libmpi_f77.so.1.0.2
-rwxr-xr-x. 1 root root  179912 Jan 20  2012 libmpi_f77.so.1.0.2
lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f90.so ->
libmpi_f90.so.1.1.0
lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f90.so.1 ->
libmpi_f90.so.1.1.0
-rwxr-xr-x. 1 root root   10364 Jan 20  2012 libmpi_f90.so.1.1.0
lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libmpi.so -> libmpi.so.1.0.2
lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libmpi.so.1 -> libmpi.so.1.0.2
-rwxr-xr-x. 1 root root 1383444 Jan 20  2012 libmpi.so.1.0.2
lrwxrwxrwx. 1 root root  21 Sep 15 12:25 libompitrace.so ->
libompitrace.so.0.0.0
lrwxrwxrwx. 1 root root  21 Sep 14 16:14 libompitrace.so.0 ->
libompitrace.so.0.0.0
-rwxr-xr-x. 1 root root   13572 Jan 20  2012 libompitrace.so.0.0.0
lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-pal.so ->
libopen-pal.so.3.0.0
lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-pal.so.3 ->
libopen-pal.so.3.0.0
-rwxr-xr-x. 1 root root  386324 Jan 20  2012 libopen-pal.so.3.0.0
lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-rte.so ->
libopen-rte.so.3.0.0
lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-rte.so.3 ->
libopen-rte.so.3.0.0
-rwxr-xr-x. 1 root root  790052 Jan 20  2012 libopen-rte.so.3.0.0
-rw-r--r--. 1 root root  301520 Jan 20  2012 libotf.a
lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libotf.so -> libotf.so.0.0.1
lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libotf.so.0 -> libotf.so.0.0.1
-rwxr-xr-x. 1 root root  206384 Jan 20  2012 libotf.so.0.0.1
-rw-r--r--. 1 root root  337970 Jan 20  2012 libvt.a
-rw-r--r--. 1 root root  591070 Jan 20  2012 libvt-hyb.a
lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-hyb.so ->
libvt-hyb.so.0.0.0
lrwxrwxrwx. 1 root root  18 Sep 14 16:14 libvt-hyb.so.0 ->
libvt-hyb.so.0.0.0
-rwxr-xr-x. 1 root root  428844 Jan 20  2012 libvt-hyb.so.0.0.0
-rw-r--r--. 1 root root  541004 Jan 20  2012 libvt-mpi.a
lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-mpi.so ->
libvt-mpi.so.0.0.0
lrwxrwxrwx. 1 root root  18 Sep 14 16:14 libvt-mpi.so.0 ->
libvt-mpi.so.0.0.0
-rwxr-xr-x. 1 root root  396352 Jan 20  2012 libvt-mpi.so.0.0.0
-rw-r--r--. 1 root root  372352 Jan 20  2012 libvt-mt.a
lrwxrwxrwx. 1 root root  17 Sep 15 12:25 libvt-mt.so ->
libvt-mt.so.0.0.0
lrwxrwxrwx. 1 root root  17 Sep 14 16:14 libvt-mt.so.0 ->
libvt-mt.so.0.0.0
-rwxr-xr-x. 1 root root  266104 Jan 20  2012 libvt-mt.so.0.0.0
-rw-r--r--. 1 root root   60390 Jan 20  2012 libvt-pomp.a
lrwxrwxrwx. 1 root root  14 Sep 15 12:25 libvt.so -> libvt.so.0.0.0
lrwxrwxrwx. 1 root root  14 Sep 14 16:14 libvt.so.0 -> libvt.so.0.0.0
-rwxr-xr-x. 1 root root  242604 Jan 20  2012 libvt.so.0.0.0
-rwxr-xr-x. 1 root root  303591 Jan 20  2012 mpi.mod
drwxr-xr-x. 2 root root4096 Sep 14 16:14 openmpi


The file (actually, a link) it claims it can't find: libmpi_f90.so.1, is
clearly there. And LD_LIBRARY_PATH=/usr/lib/openmpi/lib/.

What's the problem?

---John


Re: [OMPI users] Newbie question continues, a step toward real app

2011-01-14 Thread Martin Siegert
On Thu, Jan 13, 2011 at 05:34:48PM -0800, Tena Sakai wrote:
> Hi Gus,
> 
> > Did you speak to the Rmpi author about this?
> 
> No, I haven't, but here's what the author wrote:
> https://stat.ethz.ch/pipermail/r-sig-hpc/2009-February/000104.html
> in which he states:
>...The way of spawning R slaves under LAM is not working
>any more under OpenMPI. Under LAM, one just uses
>  R -> library(Rmpi) ->  mpi.spawn.Rslaves()
>as long as host file is set. Under OpenMPI this leads only one R slave on
>the master host no matter how many remote hosts are specified in OpenMPI
>hostfile. ...
> His README file doesn't tell what I need to know.  In the light of
> LAM MPI being "absorbed" into openMPI, I find this unfortunate.

Hmm. It has been a while that I had to compile Rmpi, but the following
works with openmpi-1.3.3, R-2.10.1:

# mpiexec -n 1 -hostfile mfile R --vanilla < Rmpi-hello.R

with a script Rmpi-hello.R like

library(Rmpi)
mpi.spawn.Rslaves()
mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))
mpi.close.Rslaves()
mpi.quit()

The only unfortunate effect is that by default mpi.spawn.Rslaves()
spawns as many slaves as there are lines in the hostfile, hence you
end up with one too many processes: 1 master + N slaves. You can repair
that by using

Nprocs <- mpi.universe.size()
mpi.spawn.Rslaves(nslaves=Nprocs-1)

instead of the simple mpi.spawn.Rslaves() call.

BTW: the whole script works in the same way when submitting under torque
using the TM interface and without specifying -hostfile ... on the
mpiexec command line.

Cheers,
Martin

-- 
Martin Siegert
Head, Research Computing
WestGrid/ComputeCanada Site Lead
IT Servicesphone: 778 782-4691
Simon Fraser Universityfax:   778 782-4242
Burnaby, British Columbia  email: sieg...@sfu.ca
Canada  V5A 1S6


Re: [OMPI users] Newbie question continues, a step toward real app

2011-01-13 Thread Tena Sakai
Hi Gus,

> Did you speak to the Rmpi author about this?

No, I haven't, but here's what the author wrote:
https://stat.ethz.ch/pipermail/r-sig-hpc/2009-February/000104.html
in which he states:
   ...The way of spawning R slaves under LAM is not working
   any more under OpenMPI. Under LAM, one just uses
 R -> library(Rmpi) ->  mpi.spawn.Rslaves()
   as long as host file is set. Under OpenMPI this leads only one R slave on
   the master host no matter how many remote hosts are specified in OpenMPI
   hostfile. ...
His README file doesn't tell what I need to know.  In the light of
LAM MPI being "absorbed" into openMPI, I find this unfortunate.

There are other ways to achieve parallelism from R.  The most recent
Offering is from Revolution Analytics:
  http://www.revolutionanalytics.com/products/revolution-r.php
They have package called foreach which can use different parallel underling,
doSNOW, doRedis, etc, but not openMPI (or any other mpi variants).  In fact,
I saw someone posting:

  Just wanted to share a working example of doSNOW and foreach for an
  openMPI cluster.  The function eddcmp() is just an example and returns
  some inocuous warnings.  The example first has each node return its
  nodename, then runs an example comparing dopar, do and a for loop.  In the
  directory containing rtest.R it is run from the command line with:
  "mpirun -n --hostfile /home/hostfile --no-save -f rtest.R"
  ...
  ...
  

What I discovered was that on my version of openMPI (v 1.4.3), this command
line doesn't work.  I need to add 1 after -n and get rid of --no-save and
-f then it runs, but generates something a bit traumatic:
  [compute-0-0.local:16448] [[42316,0],1]->[[42316,0],0]
mca_oob_tcp_msg_send_handler: writev failed: Bad file descriptor (9) [sd =
9]
  [compute-0-0.local:16448] [[42316,0],1] routed:binomial: Connection to
lifeline [[42316,0],0] lost

The long and short of it is that the mechanism you showed me works for
me and (while I want to keep my eyes open for other mechanism/methods)
I want to get on to solve my science.  (And I haven't forgotten to look
into Torque.)

Regards,

Tena Sakai
tsa...@gallo.ucsf.edu


On 1/13/11 4:18 PM, "Gus Correa"  wrote:

> Tena Sakai wrote:
>> Fantastic, Gus!  Now I think I got framework pretty much done.
>> The rest is to work on 'problem solving' end with R.
>> 
>> Many thanks for your insight and kindness.  I really appreciate it.
>> 
>> Regards,
>> 
>> Tena Sakai
>> tsa...@gallo.ucsf.edu
>> 
> Hi Tena
> 
> I'm glad that it helped somebody at the other side of the country,
> but solving a problem (MIMD) so close to ours here at home.
> 
> Still thinking of what could one do to fix the Rmpi guts,
> to work nicely with OpenMPI, MPICH2, etc.
> The hint I took from your postings was that the whole
> issue revolves around the mechanism to launch MPI jobs
> (the whole mambo jumbo of start LAM boot, and stuff like that,
> that is no longer there).
> I think typically this is where the MPIs differ,
> and the difficulties in portability appear.
> Did you speak to the Rmpi author about this?
> If I only had the time to learn some R and take a look at Rmpi
> I might give it a try.
> The MIMD trick will do for the embarrassingly parallel problem you
> mentioned, but it would be nice to have Rmpi working for when
> parallelism is essential.
> Nobody uses R here (but do in the Statistics Department),
> probably because they're used to other tools (Matlab, etc).
> However, there is plenty of statistics of climate and other
> Earth Science data that goes on here,
> hence R might be used also.
> 
> Good luck with your research and with "R on the cloud"!
> 
> Regards,
> Gus Correa
> -
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> -
> 
>> 
>> On 1/13/11 2:40 PM, "Gus Correa"  wrote:
>> 
>>> Tena Sakai wrote:
 Hi,
 
 I have a script I call fib.r.  It looks like:
 
#!/usr/bin/env r

fib <- function( n ) {
 a <- 0
 b <- 1
 for ( i in 1:n ) {
 t <- b
 b <- a
 a <- a + t
 }
 a
 }

print( fib(argv[1]) )
 
 When I run this script with a parameter, it generates a fibonocci number:
 
$ fib.r 5
5
$ fib.r 6
8
 
 and if I stick this into  part of MIMD example I have used
 previously:
 
$ mpirun -H vixen -np 1 hostname : --hostfile myhosts -np 8 fib.r 7
 
 I get:
 
vixen.egcrc.org
[1] 13
[1] 13
[1] 13
[1] 13
[1] 13
[1] 13
[1] 13
[1] 13
 
 This is good 

Re: [OMPI users] Newbie question continues, a step toward real app

2011-01-13 Thread Gus Correa

Tena Sakai wrote:

Fantastic, Gus!  Now I think I got framework pretty much done.
The rest is to work on 'problem solving' end with R.

Many thanks for your insight and kindness.  I really appreciate it.

Regards,

Tena Sakai
tsa...@gallo.ucsf.edu


Hi Tena

I'm glad that it helped somebody at the other side of the country,
but solving a problem (MIMD) so close to ours here at home.

Still thinking of what could one do to fix the Rmpi guts,
to work nicely with OpenMPI, MPICH2, etc.
The hint I took from your postings was that the whole
issue revolves around the mechanism to launch MPI jobs
(the whole mambo jumbo of start LAM boot, and stuff like that,
that is no longer there).
I think typically this is where the MPIs differ,
and the difficulties in portability appear.
Did you speak to the Rmpi author about this?
If I only had the time to learn some R and take a look at Rmpi
I might give it a try.
The MIMD trick will do for the embarrassingly parallel problem you 
mentioned, but it would be nice to have Rmpi working for when 
parallelism is essential.

Nobody uses R here (but do in the Statistics Department),
probably because they're used to other tools (Matlab, etc).
However, there is plenty of statistics of climate and other
Earth Science data that goes on here,
hence R might be used also.

Good luck with your research and with "R on the cloud"!

Regards,
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-



On 1/13/11 2:40 PM, "Gus Correa"  wrote:


Tena Sakai wrote:

Hi,

I have a script I call fib.r.  It looks like:

   #!/usr/bin/env r
   
   fib <- function( n ) {

a <- 0
b <- 1
for ( i in 1:n ) {
t <- b
b <- a
a <- a + t
}
a
}
   
   print( fib(argv[1]) )


When I run this script with a parameter, it generates a fibonocci number:

   $ fib.r 5
   5
   $ fib.r 6
   8

and if I stick this into  part of MIMD example I have used
previously:

   $ mpirun -H vixen -np 1 hostname : --hostfile myhosts -np 8 fib.r 7

I get:

   vixen.egcrc.org
   [1] 13
   [1] 13
   [1] 13
   [1] 13
   [1] 13
   [1] 13
   [1] 13
   [1] 13

This is good as proof of concept, but what I really want to do is to
have that 7
different for each (slave) process.  Ie., I want to run ³rfib 5² on node
0, ³rfib 6²
on node 1, ³rfib 7² on node 2, and so on.  Is there any way to give a
different
parameter(s) to different process/slot?

I thought maybe I can use ­rf option to do this, but I am leaning toward
­app
option.  Unfortunately, I see no example for the application context
file.  Would
someone kindly explain how I can do what I describe?

Thank you.

Tena Sakai
tsa...@gallo.ucsf.edu


Hi Tena

We ran MPMD/MIMD programs  here using in the past.
Coupled climate modes: atmosphere, ocean, sea ice, etc, each one one
executable, communicating via MPI.
Actually this was with MPICH1, somewhat different syntax than OpenMPI,
the flag/file was called '-pgfile' not '-app',
but I see no reason why it shouldn't work in your case with OpenMPI.

I think if you create a 'appfile' with this content:

-H node0 -np 1 rfib 5
-H node0 -np 1 rfib 6
...

and launch mpirun with

mpirun -app appfile

it is likely to work.

Under Torque I cannot test this very easily,
because I need to parse the Torque file that gives me the nodes,
then write down the 'appfile' on the fly (which is what I used to
do for the coupled climate models).

However, I tried on a standalone machine (where the -H nodename didn't
make sense, and was not used) and it worked.
My appfile test was like this:
-np 1 ls appfile
-np 1 hostname
-np 2 date
-np 4 who

You can add your -H nodename to each line.

I hope this helps,
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Newbie question continues, a step toward real app

2011-01-13 Thread Tena Sakai
Fantastic, Gus!  Now I think I got framework pretty much done.
The rest is to work on 'problem solving' end with R.

Many thanks for your insight and kindness.  I really appreciate it.

Regards,

Tena Sakai
tsa...@gallo.ucsf.edu


On 1/13/11 2:40 PM, "Gus Correa"  wrote:

> Tena Sakai wrote:
>> Hi,
>> 
>> I have a script I call fib.r.  It looks like:
>> 
>>#!/usr/bin/env r
>>
>>fib <- function( n ) {
>> a <- 0
>> b <- 1
>> for ( i in 1:n ) {
>> t <- b
>> b <- a
>> a <- a + t
>> }
>> a
>> }
>>
>>print( fib(argv[1]) )
>> 
>> When I run this script with a parameter, it generates a fibonocci number:
>> 
>>$ fib.r 5
>>5
>>$ fib.r 6
>>8
>> 
>> and if I stick this into  part of MIMD example I have used
>> previously:
>> 
>>$ mpirun -H vixen -np 1 hostname : --hostfile myhosts -np 8 fib.r 7
>> 
>> I get:
>> 
>>vixen.egcrc.org
>>[1] 13
>>[1] 13
>>[1] 13
>>[1] 13
>>[1] 13
>>[1] 13
>>[1] 13
>>[1] 13
>> 
>> This is good as proof of concept, but what I really want to do is to
>> have that 7
>> different for each (slave) process.  Ie., I want to run ³rfib 5² on node
>> 0, ³rfib 6²
>> on node 1, ³rfib 7² on node 2, and so on.  Is there any way to give a
>> different
>> parameter(s) to different process/slot?
>> 
>> I thought maybe I can use ­rf option to do this, but I am leaning toward
>> ­app
>> option.  Unfortunately, I see no example for the application context
>> file.  Would
>> someone kindly explain how I can do what I describe?
>> 
>> Thank you.
>> 
>> Tena Sakai
>> tsa...@gallo.ucsf.edu
>> 
> 
> Hi Tena
> 
> We ran MPMD/MIMD programs  here using in the past.
> Coupled climate modes: atmosphere, ocean, sea ice, etc, each one one
> executable, communicating via MPI.
> Actually this was with MPICH1, somewhat different syntax than OpenMPI,
> the flag/file was called '-pgfile' not '-app',
> but I see no reason why it shouldn't work in your case with OpenMPI.
> 
> I think if you create a 'appfile' with this content:
> 
> -H node0 -np 1 rfib 5
> -H node0 -np 1 rfib 6
> ...
> 
> and launch mpirun with
> 
> mpirun -app appfile
> 
> it is likely to work.
> 
> Under Torque I cannot test this very easily,
> because I need to parse the Torque file that gives me the nodes,
> then write down the 'appfile' on the fly (which is what I used to
> do for the coupled climate models).
> 
> However, I tried on a standalone machine (where the -H nodename didn't
> make sense, and was not used) and it worked.
> My appfile test was like this:
> -np 1 ls appfile
> -np 1 hostname
> -np 2 date
> -np 4 who
> 
> You can add your -H nodename to each line.
> 
> I hope this helps,
> Gus Correa
> -
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> -
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Newbie question continues, a step toward real app

2011-01-13 Thread Gus Correa

Tena Sakai wrote:

Hi,

I have a script I call fib.r.  It looks like:

   #!/usr/bin/env r
   
   fib <- function( n ) {

a <- 0
b <- 1
for ( i in 1:n ) {
t <- b
b <- a
a <- a + t
}
a
}
   
   print( fib(argv[1]) )


When I run this script with a parameter, it generates a fibonocci number:

   $ fib.r 5
   5
   $ fib.r 6
   8

and if I stick this into  part of MIMD example I have used 
previously:


   $ mpirun -H vixen -np 1 hostname : --hostfile myhosts -np 8 fib.r 7

I get:

   vixen.egcrc.org
   [1] 13
   [1] 13
   [1] 13
   [1] 13
   [1] 13
   [1] 13
   [1] 13
   [1] 13

This is good as proof of concept, but what I really want to do is to 
have that 7
different for each (slave) process.  Ie., I want to run “rfib 5” on node 
0, “rfib 6”
on node 1, “rfib 7” on node 2, and so on.  Is there any way to give a 
different

parameter(s) to different process/slot?

I thought maybe I can use –rf option to do this, but I am leaning toward 
–app
option.  Unfortunately, I see no example for the application context 
file.  Would

someone kindly explain how I can do what I describe?

Thank you.

Tena Sakai
tsa...@gallo.ucsf.edu



Hi Tena

We ran MPMD/MIMD programs  here using in the past.
Coupled climate modes: atmosphere, ocean, sea ice, etc, each one one 
executable, communicating via MPI.
Actually this was with MPICH1, somewhat different syntax than OpenMPI, 
the flag/file was called '-pgfile' not '-app',

but I see no reason why it shouldn't work in your case with OpenMPI.

I think if you create a 'appfile' with this content:

-H node0 -np 1 rfib 5
-H node0 -np 1 rfib 6
...

and launch mpirun with

mpirun -app appfile

it is likely to work.

Under Torque I cannot test this very easily,
because I need to parse the Torque file that gives me the nodes,
then write down the 'appfile' on the fly (which is what I used to
do for the coupled climate models).

However, I tried on a standalone machine (where the -H nodename didn't
make sense, and was not used) and it worked.
My appfile test was like this:
-np 1 ls appfile
-np 1 hostname
-np 2 date
-np 4 who

You can add your -H nodename to each line.

I hope this helps,
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-



[OMPI users] Newbie question continues, a step toward real app

2011-01-13 Thread Tena Sakai
Hi,

I have a script I call fib.r.  It looks like:

   #!/usr/bin/env r

   fib <- function( n ) {
a <- 0
b <- 1
for ( i in 1:n ) {
t <- b
b <- a
a <- a + t
}
a
}

   print( fib(argv[1]) )

When I run this script with a parameter, it generates a fibonocci number:

   $ fib.r 5
   5
   $ fib.r 6
   8

and if I stick this into  part of MIMD example I have used previously:

   $ mpirun -H vixen -np 1 hostname : --hostfile myhosts -np 8 fib.r 7

I get:

   vixen.egcrc.org
   [1] 13
   [1] 13
   [1] 13
   [1] 13
   [1] 13
   [1] 13
   [1] 13
   [1] 13

This is good as proof of concept, but what I really want to do is to have that 7
different for each (slave) process.  Ie., I want to run “rfib 5” on node 0, 
“rfib 6”
on node 1, “rfib 7” on node 2, and so on.  Is there any way to give a different
parameter(s) to different process/slot?

I thought maybe I can use –rf option to do this, but I am leaning toward –app
option.  Unfortunately, I see no example for the application context file.  
Would
someone kindly explain how I can do what I describe?

Thank you.

Tena Sakai
tsa...@gallo.ucsf.edu


Re: [OMPI users] Newbie question

2011-01-12 Thread Gus Correa

Ralph Castain wrote:

On Jan 12, 2011, at 12:54 PM, Tena Sakai wrote:


Hi Siegmar,

Many thanks for your reply.

I have tried man pages you mention, but one hurdle I am running into
is orte_hosts page.  I don't find the specification of fields for
the file.  I see an example:

  dummy1 slots=4
  dummy2 slots=4
  dummy3 slots=4
  dummy4 slots=4
  dummy5 slots=4

Is the first field (dummyX) machine/node name?  


Yes


What is the definition
of slots?  (Max number of processes to spawn?)


Yes


Here we don't let 'slots' exceed the number of physical cores.
(Actually, Torque does this for us.)
I suppose this prevents the cores to be oversubscribed,
at least by default, right?

Gus Correa





Am I missing a different man page?  Can you please shed some light?

Thank you.

Tena Sakai
tsa...@gallo.ucsf.edu




On 1/10/11 11:38 PM, "Siegmar Gross" 
wrote:


Hi,


What I want is to spawn a bunch of R slaves to other machines on
the network. I can spawn R slaves, as many as I like, to the local
machine, but I don t know how to do this with machines on the
network.  That s what hosts parameter of mpi.spawn.Rslaves()
enables me to do, I think.  If I can do that, then Rmpi has
function(s) to send command to each of the spawned slaves.

My question is how can I get open MPI to give me those hosts
parameters.

I am not quite sure if I understood your question, but when you
read "man MPI_Comm_spawn" you can find the parameter "MPI_Info info"
which allows to specify where and how to start processes. "man
MPI_Info_create" shows you how to create an info object and "man
MPI_Info_set" how to add a key/value pair. "man orte_hosts" shows
you how you can build a hostfile. I do not know how to do these
things in your language R but hopefully the information of the
manual pages helps to solve your problem.

Kind regards

Siegmar

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Newbie question

2011-01-12 Thread Ralph Castain

On Jan 12, 2011, at 12:54 PM, Tena Sakai wrote:

> Hi Siegmar,
> 
> Many thanks for your reply.
> 
> I have tried man pages you mention, but one hurdle I am running into
> is orte_hosts page.  I don't find the specification of fields for
> the file.  I see an example:
> 
>   dummy1 slots=4
>   dummy2 slots=4
>   dummy3 slots=4
>   dummy4 slots=4
>   dummy5 slots=4
> 
> Is the first field (dummyX) machine/node name?  

Yes

> What is the definition
> of slots?  (Max number of processes to spawn?)

Yes


> 
> Am I missing a different man page?  Can you please shed some light?
> 
> Thank you.
> 
> Tena Sakai
> tsa...@gallo.ucsf.edu
> 
> 
> 
> 
> On 1/10/11 11:38 PM, "Siegmar Gross" 
> wrote:
> 
>> Hi,
>> 
>>> What I want is to spawn a bunch of R slaves to other machines on
>>> the network. I can spawn R slaves, as many as I like, to the local
>>> machine, but I don t know how to do this with machines on the
>>> network.  That s what hosts parameter of mpi.spawn.Rslaves()
>>> enables me to do, I think.  If I can do that, then Rmpi has
>>> function(s) to send command to each of the spawned slaves.
>>> 
>>> My question is how can I get open MPI to give me those hosts
>>> parameters.
>> 
>> I am not quite sure if I understood your question, but when you
>> read "man MPI_Comm_spawn" you can find the parameter "MPI_Info info"
>> which allows to specify where and how to start processes. "man
>> MPI_Info_create" shows you how to create an info object and "man
>> MPI_Info_set" how to add a key/value pair. "man orte_hosts" shows
>> you how you can build a hostfile. I do not know how to do these
>> things in your language R but hopefully the information of the
>> manual pages helps to solve your problem.
>> 
>> Kind regards
>> 
>> Siegmar
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Newbie question

2011-01-12 Thread Tena Sakai
Hi Siegmar,

Many thanks for your reply.

I have tried man pages you mention, but one hurdle I am running into
is orte_hosts page.  I don't find the specification of fields for
the file.  I see an example:

   dummy1 slots=4
   dummy2 slots=4
   dummy3 slots=4
   dummy4 slots=4
   dummy5 slots=4

Is the first field (dummyX) machine/node name?  What is the definition
of slots?  (Max number of processes to spawn?)

Am I missing a different man page?  Can you please shed some light?

Thank you.

Tena Sakai
tsa...@gallo.ucsf.edu




On 1/10/11 11:38 PM, "Siegmar Gross" 
wrote:

> Hi,
> 
>> What I want is to spawn a bunch of R slaves to other machines on
>> the network. I can spawn R slaves, as many as I like, to the local
>> machine, but I don t know how to do this with machines on the
>> network.  That s what hosts parameter of mpi.spawn.Rslaves()
>> enables me to do, I think.  If I can do that, then Rmpi has
>> function(s) to send command to each of the spawned slaves.
>> 
>> My question is how can I get open MPI to give me those hosts
>> parameters.
> 
> I am not quite sure if I understood your question, but when you
> read "man MPI_Comm_spawn" you can find the parameter "MPI_Info info"
> which allows to specify where and how to start processes. "man
> MPI_Info_create" shows you how to create an info object and "man
> MPI_Info_set" how to add a key/value pair. "man orte_hosts" shows
> you how you can build a hostfile. I do not know how to do these
> things in your language R but hopefully the information of the
> manual pages helps to solve your problem.
> 
> Kind regards
> 
> Siegmar
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Newbie question

2011-01-11 Thread Jeff Squyres
You might be confusing LAM/MPI with Open MPI -- they're two different software 
code bases.  They both implement the MPI standard, but they're entirely 
different software projects.  Indeed, all the LAM/MPI developers (including me) 
abandoned LAM/MPI several years ago and went to work on Open MPI -- think of 
Open MPI as a "next generation" LAM/MPI.

To be clear, LAM node numbers and lamnodes() (and all the other LAM references) 
will not work with Open MPI.  Those are details from an entirely different 
implementation of MPI.  Hence, if your R MPI has instructions specific to 
LAM/MPI, they are likely not relevant at all for Open MPI.  

You probably need to check back with the R MPI folks and see how to do what you 
need to do in Open MPI.  Most of us here aren't familiar with R (which is why I 
suspect you're getting a motley assortment of answers); checking with the 
maintainers might be best.

I know that there has been a bunch of work with R MPI on Open MPI, so I'd be 
surprised if there wasn't a way to do what you need.


On Jan 10, 2011, at 8:04 PM, Tena Sakai wrote:

> Hi,
> 
> I am an mpi newbie.  My open MPI is v 1.4.3, which I compiled
> on a linux machine.
> 
> I am using a language called R, which has an mpi interface/package.
> It appears that it is happy, on the surface, with the open MPI I installed.
> 
> There is an R function called mpi.spawn.Rslaves().  An argument to
> this function is nslaves.  I can issue, for example,
>   mpi.spawn.Rslaves( nslaves=20 )
> And it spawns 20 slave processes.  The trouble is that it is all on the
> same node as that of the master.  I want, instead, these 20 (or more)
> slaves spawned on other machines on the network.
> 
> It so happens the mpi.spawn.Rslaves() has an extra argument called
> hosts.  Here’s the definition of hosts from the api document: “NULL or
> LAM node numbers to specify where R slaves to be spawned.”  I have
> no idea what LAM node is, but there  is a funciton called lamhosts().
> which returns a bit verbose message:
> 
>   It seems that there is no lamd running on the host compute-0-0.local.
> 
>   This indicates that the LAM/MPI runtime environment is not operating.
>   The LAM/MPI runtime environment is necessary for the "lamnodes" command.
> 
>   Please run the "lamboot" command the start the LAM/MPI runtime
>   environment.  See the LAM/MPI documentation for how to invoke
>   "lamboot" across multiple machines.
> 
> Here’s my question.  Is there such command as lamboot in open MPI 1.4.3?
> Or am I using a wrong mpi software?  In a FAQ I read that there are other
> MPI software (FT-mpi, LA-mpi, LAM-mpi), but I had notion that open MPI
> is to have functionalities of all.  Is this a wrong impression?
> 
> Thank you for your help.
> 
> Tena Sakai
> tsa...@gallo.ucsf.edu
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Newbie question

2011-01-11 Thread Terry Dontje
So are you trying to start an mpi job that one process is one executable 
and the other process(es) are something else?  If so, you probably want 
to use a multiple app context.  If you look at  FAQ question 7. How do I 
run an MPMD MPI Job at http://www.open-mpi.org/faq/?category=running 
this should answer your question below I believe.


--td

On 01/11/2011 01:06 AM, Tena Sakai wrote:

Hi,

Thanks for your reply.

I am afraid your terse response doesn’t shed much light.  What I need 
is “hosts”
parameter I can use to mpi.spawn.Rslaves() function.  Can you explain 
or better

yet give an example as to how I can get this via mpirun?

Looking at mpirun man page, I found an example:
  mpirun –H aa,aa,bb  ./a.out
and similar ones.  But they all execute a program (like a.out above). 
 That’’s not
what I want.  What I want is to spawn a bunch of R slaves to other 
machines on
the network.  I can spawn R slaves, as many as I like, to the local 
machine, but
I don’t know how to do this with machines on the network.  That’s what 
“hosts”
parameter of mpi.spawn.Rslaves() enables me to do, I think.  If I can 
do that, then

Rmpi has function(s) to send command to each of the spawned slaves.

My question is how can I get open MPI to give me those “hosts” parameters.

Can you please help me?

Thank you in advance.

Tena Sakai
tsa...@gallo.ucsf.edu


On 1/10/11 8:14 PM, "pooja varshneya"  wrote:

You can use mpirun.

On Mon, Jan 10, 2011 at 8:04 PM, Tena Sakai
 wrote:

Hi,

I am an mpi newbie.  My open MPI is v 1.4.3, which I compiled
on a linux machine.

I am using a language called R, which has an mpi
interface/package.
It appears that it is happy, on the surface, with the open MPI
I installed.

There is an R function called mpi.spawn.Rslaves().  An argument to
this function is nslaves.  I can issue, for example,
  mpi.spawn.Rslaves( nslaves=20 )
And it spawns 20 slave processes.  The trouble is that it is
all on the
same node as that of the master.  I want, instead, these 20
(or more)
slaves spawned on other machines on the network.

It so happens the mpi.spawn.Rslaves() has an extra argument called
hosts.  Here’s the definition of hosts from the api document:
“NULL or
LAM node numbers to specify where R slaves to be spawned.”  I have
no idea what LAM node is, but there  is a funciton called
lamhosts().
which returns a bit verbose message:

  It seems that there is no lamd running on the host
compute-0-0.local.

  This indicates that the LAM/MPI runtime environment is not
operating.
  The LAM/MPI runtime environment is necessary for the
"lamnodes" command.

  Please run the "lamboot" command the start the LAM/MPI runtime
  environment.  See the LAM/MPI documentation for how to invoke
  "lamboot" across multiple machines.

Here’s my question.  Is there such command as lamboot in open
MPI 1.4.3?
Or am I using a wrong mpi software?  In a FAQ I read that
there are other
MPI software (FT-mpi, LA-mpi, LAM-mpi), but I had notion that
open MPI
is to have functionalities of all.  Is this a wrong impression?

Thank you for your help.

Tena Sakai
tsa...@gallo.ucsf.edu 

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Newbie question

2011-01-11 Thread Siegmar Gross
Hi,

> What I want is to spawn a bunch of R slaves to other machines on
> the network. I can spawn R slaves, as many as I like, to the local
> machine, but I don t know how to do this with machines on the
> network.  That s what hosts parameter of mpi.spawn.Rslaves()
> enables me to do, I think.  If I can do that, then Rmpi has
> function(s) to send command to each of the spawned slaves.
> 
> My question is how can I get open MPI to give me those hosts
> parameters.

I am not quite sure if I understood your question, but when you
read "man MPI_Comm_spawn" you can find the parameter "MPI_Info info"
which allows to specify where and how to start processes. "man
MPI_Info_create" shows you how to create an info object and "man
MPI_Info_set" how to add a key/value pair. "man orte_hosts" shows
you how you can build a hostfile. I do not know how to do these
things in your language R but hopefully the information of the
manual pages helps to solve your problem.

Kind regards

Siegmar



Re: [OMPI users] Newbie question

2011-01-11 Thread Tena Sakai
Hi,

Thanks for your reply.

I am afraid your terse response doesn’t shed much light.  What I need is “hosts”
parameter I can use to mpi.spawn.Rslaves() function.  Can you explain or better
yet give an example as to how I can get this via mpirun?

Looking at mpirun man page, I found an example:
  mpirun –H aa,aa,bb  ./a.out
and similar ones.  But they all execute a program (like a.out above).  That’’s 
not
what I want.  What I want is to spawn a bunch of R slaves to other machines on
the network.  I can spawn R slaves, as many as I like, to the local machine, but
I don’t know how to do this with machines on the network.  That’s what “hosts”
parameter of mpi.spawn.Rslaves() enables me to do, I think.  If I can do that, 
then
Rmpi has function(s) to send command to each of the spawned slaves.

My question is how can I get open MPI to give me those “hosts” parameters.

Can you please help me?

Thank you in advance.

Tena Sakai
tsa...@gallo.ucsf.edu


On 1/10/11 8:14 PM, "pooja varshneya"  wrote:

You can use mpirun.

On Mon, Jan 10, 2011 at 8:04 PM, Tena Sakai  wrote:
Hi,

I am an mpi newbie.  My open MPI is v 1.4.3, which I compiled
on a linux machine.

I am using a language called R, which has an mpi interface/package.
It appears that it is happy, on the surface, with the open MPI I installed.

There is an R function called mpi.spawn.Rslaves().  An argument to
this function is nslaves.  I can issue, for example,
  mpi.spawn.Rslaves( nslaves=20 )
And it spawns 20 slave processes.  The trouble is that it is all on the
same node as that of the master.  I want, instead, these 20 (or more)
slaves spawned on other machines on the network.

It so happens the mpi.spawn.Rslaves() has an extra argument called
hosts.  Here’s the definition of hosts from the api document: “NULL or
LAM node numbers to specify where R slaves to be spawned.”  I have
no idea what LAM node is, but there  is a funciton called lamhosts().
which returns a bit verbose message:

  It seems that there is no lamd running on the host compute-0-0.local.

  This indicates that the LAM/MPI runtime environment is not operating.
  The LAM/MPI runtime environment is necessary for the "lamnodes" command.

  Please run the "lamboot" command the start the LAM/MPI runtime
  environment.  See the LAM/MPI documentation for how to invoke
  "lamboot" across multiple machines.

Here’s my question.  Is there such command as lamboot in open MPI 1.4.3?
Or am I using a wrong mpi software?  In a FAQ I read that there are other
MPI software (FT-mpi, LA-mpi, LAM-mpi), but I had notion that open MPI
is to have functionalities of all.  Is this a wrong impression?

Thank you for your help.

Tena Sakai
tsa...@gallo.ucsf.edu 

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Newbie question

2011-01-10 Thread pooja varshneya
You can use mpirun.

On Mon, Jan 10, 2011 at 8:04 PM, Tena Sakai  wrote:

>  Hi,
>
> I am an mpi newbie.  My open MPI is v 1.4.3, which I compiled
> on a linux machine.
>
> I am using a language called R, which has an mpi interface/package.
> It appears that it is happy, on the surface, with the open MPI I installed.
>
> There is an R function called mpi.spawn.Rslaves().  An argument to
> this function is nslaves.  I can issue, for example,
>   mpi.spawn.Rslaves( nslaves=20 )
> And it spawns 20 slave processes.  The trouble is that it is all on the
> same node as that of the master.  I want, instead, these 20 (or more)
> slaves spawned on other machines on the network.
>
> It so happens the mpi.spawn.Rslaves() has an extra argument called
> hosts.  Here’s the definition of hosts from the api document: “NULL or
> LAM node numbers to specify where R slaves to be spawned.”  I have
> no idea what LAM node is, but there  is a funciton called lamhosts().
> which returns a bit verbose message:
>
>   It seems that there is no lamd running on the host compute-0-0.local.
>
>   This indicates that the LAM/MPI runtime environment is not operating.
>   The LAM/MPI runtime environment is necessary for the "lamnodes" command.
>
>   Please run the "lamboot" command the start the LAM/MPI runtime
>   environment.  See the LAM/MPI documentation for how to invoke
>   "lamboot" across multiple machines.
>
> Here’s my question.  Is there such command as lamboot in open MPI 1.4.3?
> Or am I using a wrong mpi software?  In a FAQ I read that there are other
> MPI software (FT-mpi, LA-mpi, LAM-mpi), but I had notion that open MPI
> is to have functionalities of all.  Is this a wrong impression?
>
> Thank you for your help.
>
> Tena Sakai
> tsa...@gallo.ucsf.edu
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] Newbie question

2011-01-10 Thread Tena Sakai
Hi,

I am an mpi newbie.  My open MPI is v 1.4.3, which I compiled
on a linux machine.

I am using a language called R, which has an mpi interface/package.
It appears that it is happy, on the surface, with the open MPI I installed.

There is an R function called mpi.spawn.Rslaves().  An argument to
this function is nslaves.  I can issue, for example,
  mpi.spawn.Rslaves( nslaves=20 )
And it spawns 20 slave processes.  The trouble is that it is all on the
same node as that of the master.  I want, instead, these 20 (or more)
slaves spawned on other machines on the network.

It so happens the mpi.spawn.Rslaves() has an extra argument called
hosts.  Here’s the definition of hosts from the api document: “NULL or
LAM node numbers to specify where R slaves to be spawned.”  I have
no idea what LAM node is, but there  is a funciton called lamhosts().
which returns a bit verbose message:

  It seems that there is no lamd running on the host compute-0-0.local.

  This indicates that the LAM/MPI runtime environment is not operating.
  The LAM/MPI runtime environment is necessary for the "lamnodes" command.

  Please run the "lamboot" command the start the LAM/MPI runtime
  environment.  See the LAM/MPI documentation for how to invoke
  "lamboot" across multiple machines.

Here’s my question.  Is there such command as lamboot in open MPI 1.4.3?
Or am I using a wrong mpi software?  In a FAQ I read that there are other
MPI software (FT-mpi, LA-mpi, LAM-mpi), but I had notion that open MPI
is to have functionalities of all.  Is this a wrong impression?

Thank you for your help.

Tena Sakai
tsa...@gallo.ucsf.edu


Re: [OMPI users] newbie question

2007-05-14 Thread Brian Barrett
I fixed the OOB.  I also mucked some things up with it interface wise  
that I need to undo :).  Anyway, I'll have a look at fixing up the  
TCP component in the next day or two.


Brian

On May 10, 2007, at 6:07 PM, Jeff Squyres wrote:


Brian --

Didn't you add something to fix exactly this problem recently?  I
have a dim recollection of seeing a commit go by about this...?

(I advised Steve in IM to use --disable-ipv6 in the meantime)


On May 10, 2007, at 1:25 PM, Steve Wise wrote:


I'm trying to run a job specifically over tcp and the eth1 interface.
It seems to be barfing on trying to listen via ipv6.  I don't want
ipv6.
How can I disable it?

Here's my mpirun line:

[root@vic12-10g ~]# mpirun --n 2 --host vic12,vic20 --mca btl
self,tcp -mca btl_tcp_if_include eth1 /root/IMB_2.3/src/IMB-MPI1
sendrecv
[vic12][0,1,0][btl_tcp_component.c:
489:mca_btl_tcp_component_create_listen] socket() failed: Address
family not supported by protocol (97)
[vic12-10g:15771] mca_btl_tcp_component: IPv6 listening socket failed
[vic20][0,1,1][btl_tcp_component.c:
489:mca_btl_tcp_component_create_listen] socket() failed: Address
family not supported by protocol (97)
[vic20-10g:23977] mca_btl_tcp_component: IPv6 listening socket failed


Thanks,

Steve.

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Newbie question. Please help.

2007-05-10 Thread Jeff Squyres
Good to know.  This suggests that building VASP properly with Open  
MPI should work properly; perhaps there's some secret sauce in the  
Makefile somewhere...?  Off list, someone cited the following to me:


-
Also VASP has a forum for things like this too.
http://cms.mpi.univie.ac.at/vasp-forum/forum.php

From there it looks like people have been having problems with  
ifort 9.1.043 with vasp.


and from this post it looks like I'm not the only one to use openMPI  
and VASP


http://cms.mpi.univie.ac.at/vasp-forum/forum_viewtopic.php?2.550
-

I have not received a reply from the VASP author yet.



On May 10, 2007, at 8:52 AM, Terry Frankcombe wrote:



I have previously been running parallel VASP happily with an old,
prerelease version of OpenMPI:


[terry@nocona Vasp.4.6-OpenMPI]$
head /home/terry/Install_trees/OpenMPI-1.0rc6/config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure 1.0rc6, which was
generated by GNU Autoconf 2.59.  Invocation command line was

  $ ./configure --enable-static --disable-shared
--prefix=/home/terry/bin/Local --enable-picky --disable-heterogeneous
--without-libnuma --without-slurm --without-tm F77=ifort



In my VASP makefile:

FC=/home/terry/bin/Local/bin/mpif90

OFLAG= -O3 -xP -tpp7

CPP = $(CPP_) -DMPI  -DHOST=\"LinuxIFC\" -DIFC -Dkind8 -DNGZhalf
-DCACHE_SIZE=12000 -DPGF90 -Davoidalloc -DMPI_BLOCK=500 -DRPROMU_DGEMV
-DRACCMU_DGEMV

FFLAGS =  -FR -lowercase -assume byterecl

As far as I can see (it was a long time ago!) I didn't use BLACS or
SCALAPACK libraries.  I used ATLAS.



Maybe this will help.


--
Dr Terry Frankcombe
Physical Chemistry, Department of Chemistry
Göteborgs Universitet
SE-412 96 Göteborg Sweden
Ph: +46 76 224 0887   Skype: terry.frankcombe


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems




Re: [OMPI users] newbie question

2007-05-10 Thread Steve Wise
On Thu, 2007-05-10 at 20:07 -0400, Jeff Squyres wrote:
> Brian --
> 
> Didn't you add something to fix exactly this problem recently?  I  
> have a dim recollection of seeing a commit go by about this...?
> 
> (I advised Steve in IM to use --disable-ipv6 in the meantime)
> 

Yes, disabling it worked. ;-)





Re: [OMPI users] newbie question

2007-05-10 Thread Jeff Squyres

Brian --

Didn't you add something to fix exactly this problem recently?  I  
have a dim recollection of seeing a commit go by about this...?


(I advised Steve in IM to use --disable-ipv6 in the meantime)


On May 10, 2007, at 1:25 PM, Steve Wise wrote:


I'm trying to run a job specifically over tcp and the eth1 interface.
It seems to be barfing on trying to listen via ipv6.  I don't want  
ipv6.

How can I disable it?

Here's my mpirun line:

[root@vic12-10g ~]# mpirun --n 2 --host vic12,vic20 --mca btl  
self,tcp -mca btl_tcp_if_include eth1 /root/IMB_2.3/src/IMB-MPI1  
sendrecv
[vic12][0,1,0][btl_tcp_component.c: 
489:mca_btl_tcp_component_create_listen] socket() failed: Address  
family not supported by protocol (97)

[vic12-10g:15771] mca_btl_tcp_component: IPv6 listening socket failed
[vic20][0,1,1][btl_tcp_component.c: 
489:mca_btl_tcp_component_create_listen] socket() failed: Address  
family not supported by protocol (97)

[vic20-10g:23977] mca_btl_tcp_component: IPv6 listening socket failed


Thanks,

Steve.

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



[OMPI users] newbie question

2007-05-10 Thread Steve Wise
I'm trying to run a job specifically over tcp and the eth1 interface.
It seems to be barfing on trying to listen via ipv6.  I don't want ipv6.
How can I disable it?

Here's my mpirun line:

[root@vic12-10g ~]# mpirun --n 2 --host vic12,vic20 --mca btl self,tcp -mca 
btl_tcp_if_include eth1 /root/IMB_2.3/src/IMB-MPI1 sendrecv
[vic12][0,1,0][btl_tcp_component.c:489:mca_btl_tcp_component_create_listen] 
socket() failed: Address family not supported by protocol (97)
[vic12-10g:15771] mca_btl_tcp_component: IPv6 listening socket failed
[vic20][0,1,1][btl_tcp_component.c:489:mca_btl_tcp_component_create_listen] 
socket() failed: Address family not supported by protocol (97)
[vic20-10g:23977] mca_btl_tcp_component: IPv6 listening socket failed


Thanks,

Steve.



Re: [OMPI users] Newbie question. Please help.

2007-05-10 Thread Terry Frankcombe

I have previously been running parallel VASP happily with an old,
prerelease version of OpenMPI:


[terry@nocona Vasp.4.6-OpenMPI]$
head /home/terry/Install_trees/OpenMPI-1.0rc6/config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure 1.0rc6, which was
generated by GNU Autoconf 2.59.  Invocation command line was

  $ ./configure --enable-static --disable-shared
--prefix=/home/terry/bin/Local --enable-picky --disable-heterogeneous
--without-libnuma --without-slurm --without-tm F77=ifort



In my VASP makefile:

FC=/home/terry/bin/Local/bin/mpif90

OFLAG= -O3 -xP -tpp7

CPP = $(CPP_) -DMPI  -DHOST=\"LinuxIFC\" -DIFC -Dkind8 -DNGZhalf
-DCACHE_SIZE=12000 -DPGF90 -Davoidalloc -DMPI_BLOCK=500 -DRPROMU_DGEMV
-DRACCMU_DGEMV

FFLAGS =  -FR -lowercase -assume byterecl

As far as I can see (it was a long time ago!) I didn't use BLACS or
SCALAPACK libraries.  I used ATLAS.



Maybe this will help.


-- 
Dr Terry Frankcombe
Physical Chemistry, Department of Chemistry
Göteborgs Universitet
SE-412 96 Göteborg Sweden
Ph: +46 76 224 0887   Skype: terry.frankcombe




Re: [OMPI users] Newbie question. Please help.

2007-05-09 Thread Steven Truong

Thank Jeff very much for your efforts and helps.

On 5/9/07, Jeff Squyres  wrote:

I have mailed the VASP maintainer asking for a copy of the code.
Let's see what happens.

On May 9, 2007, at 2:44 PM, Steven Truong wrote:

> Hi, Jeff.   Thank you very much for looking into this issue.   I am
> afraid that I can not give you the application/package because it is a
> comercial software.  I believe that a lot of people are using this
> VASP software package http://cms.mpi.univie.ac.at/vasp/.
>
> My current environment uses MPICH 1.2.7p1, however, because a new set
> of dual core machines has posed a new set of challenges and I am
> looking into replacing MPICH with openmpi on these machines.
>
> Could Mr. Radican, who wrote that he was able to run VASP with
> openMPI, provide a lot more detail regarding how he configure openmpi,
> how he compile and run VASP job and anything relating to this issue?
>
> Thank you very much for all your helps.
> Steven.
>
> On 5/9/07, Jeff Squyres  wrote:
>> Can you send a simple test that reproduces these errors?
>>
>> I.e., if there's a single, simple package that you can send
>> instructions on how to build, it would be most helpful if we could
>> reproduce the error (and therefore figure out how to fix it).
>>
>> Thanks!
>>
>>
>> On May 9, 2007, at 2:19 PM, Steven Truong wrote:
>>
>>> Oh, no.  I tried with ACML and had the same set of errors.
>>>
>>> Steven.
>>>
>>> On 5/9/07, Steven Truong  wrote:
 Hi, Kevin and all.  I tried with the following:

 ./configure --prefix=/usr/local/openmpi-1.2.1 --disable-ipv6
 --with-tm=/usr/local/pbs  --enable-mpirun-prefix-by-default
 --enable-mpi-f90 --with-threads=posix  --enable-static

 and added the mpi.o in my VASP's makefile but i still got error.

 I forgot to mention that our environment has Intel MKL 9.0 or
 8.1 and
 my machines are dual proc dual core Xeon 5130 .

  Well, I am going to try acml too.

 Attached is my makefile for VASP and I am not sure if I missed
 anything again.

 Thank you very much for all your helps.

 On 5/9/07, Steven Truong  wrote:
> Thank Kevin and Brook for replying to my question.  I am going to
> try
> out what Kevin suggested.
>
> Steven.
>
> On 5/9/07, Kevin Radican  wrote:
>> Hi,
>>
>> We use VASP 4.6 in parallel with opemmpi 1.1.2 without any
>> problems on
>> x86_64 with opensuse and compiled with gcc and Intel fortran and
>> use
>> torque PBS.
>>
>> I used standard configure to build openmpi something like
>>
>> ./configure --prefix=/usr/local --enable-static --with-threads
>> --with-tm=/usr/local --with-libnuma
>>
>> I used the ACLM math lapack libs and built Blacs and Scalapack
>> with them
>> too.
>>
>> I attached my vasp makefile, I might of added
>>
>> mpi.o : mpi.F
>> $(CPP)
>> $(FC) -FR -lowercase -O0 -c $*$(SUFFIX)
>>
>> to the end of the make file, It doesn't look like it is in the
>> example
>> makefiles they give, but I compiled this a while ago.
>>
>> Hope this helps.
>>
>> Cheers,
>> Kevin
>>
>>
>>
>>
>>
>> On Tue, 2007-05-08 at 19:18 -0700, Steven Truong wrote:
>>> Hi, all.  I am new to OpenMPI and after initial setup I tried
>>> to run
>>> my app but got the followign errors:
>>>
>>> [node07.my.com:16673] *** An error occurred in MPI_Comm_rank
>>> [node07.my.com:16673] *** on communicator MPI_COMM_WORLD
>>> [node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator
>>> [node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>> [node07.my.com:16674] *** An error occurred in MPI_Comm_rank
>>> [node07.my.com:16674] *** on communicator MPI_COMM_WORLD
>>> [node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator
>>> [node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>> [node07.my.com:16675] *** An error occurred in MPI_Comm_rank
>>> [node07.my.com:16675] *** on communicator MPI_COMM_WORLD
>>> [node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator
>>> [node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>> [node07.my.com:16676] *** An error occurred in MPI_Comm_rank
>>> [node07.my.com:16676] *** on communicator MPI_COMM_WORLD
>>> [node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator
>>> [node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>> mpiexec noticed that job rank 2 with PID 16675 on node node07
>>> exited
>>> on signal 60 (Real-time signal 26).
>>>
>>>  /usr/local/openmpi-1.2.1/bin/ompi_info
>>> Open MPI: 1.2.1
>>>Open MPI SVN revision: r14481
>>> Open RTE: 1.2.1
>>>Open RTE SVN revision: r14481
>>> 

Re: [OMPI users] Newbie question. Please help.

2007-05-09 Thread Jeff Squyres

Can you send a simple test that reproduces these errors?

I.e., if there's a single, simple package that you can send  
instructions on how to build, it would be most helpful if we could  
reproduce the error (and therefore figure out how to fix it).


Thanks!


On May 9, 2007, at 2:19 PM, Steven Truong wrote:


Oh, no.  I tried with ACML and had the same set of errors.

Steven.

On 5/9/07, Steven Truong  wrote:

Hi, Kevin and all.  I tried with the following:

./configure --prefix=/usr/local/openmpi-1.2.1 --disable-ipv6
--with-tm=/usr/local/pbs  --enable-mpirun-prefix-by-default
--enable-mpi-f90 --with-threads=posix  --enable-static

and added the mpi.o in my VASP's makefile but i still got error.

I forgot to mention that our environment has Intel MKL 9.0 or 8.1 and
my machines are dual proc dual core Xeon 5130 .

 Well, I am going to try acml too.

Attached is my makefile for VASP and I am not sure if I missed  
anything again.


Thank you very much for all your helps.

On 5/9/07, Steven Truong  wrote:
Thank Kevin and Brook for replying to my question.  I am going to  
try

out what Kevin suggested.

Steven.

On 5/9/07, Kevin Radican  wrote:

Hi,

We use VASP 4.6 in parallel with opemmpi 1.1.2 without any  
problems on
x86_64 with opensuse and compiled with gcc and Intel fortran and  
use

torque PBS.

I used standard configure to build openmpi something like

./configure --prefix=/usr/local --enable-static --with-threads
--with-tm=/usr/local --with-libnuma

I used the ACLM math lapack libs and built Blacs and Scalapack  
with them

too.

I attached my vasp makefile, I might of added

mpi.o : mpi.F
$(CPP)
$(FC) -FR -lowercase -O0 -c $*$(SUFFIX)

to the end of the make file, It doesn't look like it is in the  
example

makefiles they give, but I compiled this a while ago.

Hope this helps.

Cheers,
Kevin





On Tue, 2007-05-08 at 19:18 -0700, Steven Truong wrote:
Hi, all.  I am new to OpenMPI and after initial setup I tried  
to run

my app but got the followign errors:

[node07.my.com:16673] *** An error occurred in MPI_Comm_rank
[node07.my.com:16673] *** on communicator MPI_COMM_WORLD
[node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node07.my.com:16674] *** An error occurred in MPI_Comm_rank
[node07.my.com:16674] *** on communicator MPI_COMM_WORLD
[node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node07.my.com:16675] *** An error occurred in MPI_Comm_rank
[node07.my.com:16675] *** on communicator MPI_COMM_WORLD
[node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node07.my.com:16676] *** An error occurred in MPI_Comm_rank
[node07.my.com:16676] *** on communicator MPI_COMM_WORLD
[node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye)
mpiexec noticed that job rank 2 with PID 16675 on node node07  
exited

on signal 60 (Real-time signal 26).

 /usr/local/openmpi-1.2.1/bin/ompi_info
Open MPI: 1.2.1
   Open MPI SVN revision: r14481
Open RTE: 1.2.1
   Open RTE SVN revision: r14481
OPAL: 1.2.1
   OPAL SVN revision: r14481
  Prefix: /usr/local/openmpi-1.2.1
 Configured architecture: x86_64-unknown-linux-gnu
   Configured by: root
   Configured on: Mon May  7 18:32:56 PDT 2007
  Configure host: neptune.nanostellar.com
Built by: root
Built on: Mon May  7 18:40:28 PDT 2007
  Built host: neptune.my.com
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: yes
 Fortran90 bindings size: small
  C compiler: gcc
 C compiler absolute: /usr/bin/gcc
C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
  Fortran77 compiler: /opt/intel/fce/9.1.043/bin/ifort
  Fortran77 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
  Fortran90 compiler: /opt/intel/fce/9.1.043/bin/ifort
  Fortran90 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: yes
 Fortran90 profiling: yes
  C++ exceptions: no
  Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
 MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
 libltdl support: yes
   Heterogeneous support: yes
 mpirun default --prefix: yes
   MCA backtrace: execinfo (MCA v1.0, API v1.0,  
Component v1.2.1)
  MCA memory: ptmalloc2 (MCA v1.0, API v1.0,  
Component v1.2.1)
   MCA paffinity: linux (MCA v1.0, API v1.0, Component  
v1.2.1)
   MCA maffinity: first_use (MCA v1.0, API v1.0,  
Component v1.2.1)
   

Re: [OMPI users] Newbie question. Please help.

2007-05-09 Thread Steven Truong

Oh, no.  I tried with ACML and had the same set of errors.

Steven.

On 5/9/07, Steven Truong  wrote:

Hi, Kevin and all.  I tried with the following:

./configure --prefix=/usr/local/openmpi-1.2.1 --disable-ipv6
--with-tm=/usr/local/pbs  --enable-mpirun-prefix-by-default
--enable-mpi-f90 --with-threads=posix  --enable-static

and added the mpi.o in my VASP's makefile but i still got error.

I forgot to mention that our environment has Intel MKL 9.0 or 8.1 and
my machines are dual proc dual core Xeon 5130 .

 Well, I am going to try acml too.

Attached is my makefile for VASP and I am not sure if I missed anything again.

Thank you very much for all your helps.

On 5/9/07, Steven Truong  wrote:
> Thank Kevin and Brook for replying to my question.  I am going to try
> out what Kevin suggested.
>
> Steven.
>
> On 5/9/07, Kevin Radican  wrote:
> > Hi,
> >
> > We use VASP 4.6 in parallel with opemmpi 1.1.2 without any problems on
> > x86_64 with opensuse and compiled with gcc and Intel fortran and use
> > torque PBS.
> >
> > I used standard configure to build openmpi something like
> >
> > ./configure --prefix=/usr/local --enable-static --with-threads
> > --with-tm=/usr/local --with-libnuma
> >
> > I used the ACLM math lapack libs and built Blacs and Scalapack with them
> > too.
> >
> > I attached my vasp makefile, I might of added
> >
> > mpi.o : mpi.F
> > $(CPP)
> > $(FC) -FR -lowercase -O0 -c $*$(SUFFIX)
> >
> > to the end of the make file, It doesn't look like it is in the example
> > makefiles they give, but I compiled this a while ago.
> >
> > Hope this helps.
> >
> > Cheers,
> > Kevin
> >
> >
> >
> >
> >
> > On Tue, 2007-05-08 at 19:18 -0700, Steven Truong wrote:
> > > Hi, all.  I am new to OpenMPI and after initial setup I tried to run
> > > my app but got the followign errors:
> > >
> > > [node07.my.com:16673] *** An error occurred in MPI_Comm_rank
> > > [node07.my.com:16673] *** on communicator MPI_COMM_WORLD
> > > [node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator
> > > [node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye)
> > > [node07.my.com:16674] *** An error occurred in MPI_Comm_rank
> > > [node07.my.com:16674] *** on communicator MPI_COMM_WORLD
> > > [node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator
> > > [node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye)
> > > [node07.my.com:16675] *** An error occurred in MPI_Comm_rank
> > > [node07.my.com:16675] *** on communicator MPI_COMM_WORLD
> > > [node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator
> > > [node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye)
> > > [node07.my.com:16676] *** An error occurred in MPI_Comm_rank
> > > [node07.my.com:16676] *** on communicator MPI_COMM_WORLD
> > > [node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator
> > > [node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye)
> > > mpiexec noticed that job rank 2 with PID 16675 on node node07 exited
> > > on signal 60 (Real-time signal 26).
> > >
> > >  /usr/local/openmpi-1.2.1/bin/ompi_info
> > > Open MPI: 1.2.1
> > >Open MPI SVN revision: r14481
> > > Open RTE: 1.2.1
> > >Open RTE SVN revision: r14481
> > > OPAL: 1.2.1
> > >OPAL SVN revision: r14481
> > >   Prefix: /usr/local/openmpi-1.2.1
> > >  Configured architecture: x86_64-unknown-linux-gnu
> > >Configured by: root
> > >Configured on: Mon May  7 18:32:56 PDT 2007
> > >   Configure host: neptune.nanostellar.com
> > > Built by: root
> > > Built on: Mon May  7 18:40:28 PDT 2007
> > >   Built host: neptune.my.com
> > >   C bindings: yes
> > > C++ bindings: yes
> > >   Fortran77 bindings: yes (all)
> > >   Fortran90 bindings: yes
> > >  Fortran90 bindings size: small
> > >   C compiler: gcc
> > >  C compiler absolute: /usr/bin/gcc
> > > C++ compiler: g++
> > >C++ compiler absolute: /usr/bin/g++
> > >   Fortran77 compiler: /opt/intel/fce/9.1.043/bin/ifort
> > >   Fortran77 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
> > >   Fortran90 compiler: /opt/intel/fce/9.1.043/bin/ifort
> > >   Fortran90 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
> > >  C profiling: yes
> > >C++ profiling: yes
> > >  Fortran77 profiling: yes
> > >  Fortran90 profiling: yes
> > >   C++ exceptions: no
> > >   Thread support: posix (mpi: no, progress: no)
> > >   Internal debug support: no
> > >  MPI parameter check: runtime
> > > Memory profiling support: no
> > > Memory debugging support: no
> > >  libltdl support: yes
> > >Heterogeneous support: yes
> > >  mpirun default --prefix: yes
> > >MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.1)
> > >   MCA memory: ptmalloc2 (MCA v1.0, API v1.0, 

Re: [OMPI users] Newbie question. Please help.

2007-05-09 Thread Steven Truong

Hi, Kevin and all.  I tried with the following:

./configure --prefix=/usr/local/openmpi-1.2.1 --disable-ipv6
--with-tm=/usr/local/pbs  --enable-mpirun-prefix-by-default
--enable-mpi-f90 --with-threads=posix  --enable-static

and added the mpi.o in my VASP's makefile but i still got error.

I forgot to mention that our environment has Intel MKL 9.0 or 8.1 and
my machines are dual proc dual core Xeon 5130 .

Well, I am going to try acml too.

Attached is my makefile for VASP and I am not sure if I missed anything again.

Thank you very much for all your helps.

On 5/9/07, Steven Truong  wrote:

Thank Kevin and Brook for replying to my question.  I am going to try
out what Kevin suggested.

Steven.

On 5/9/07, Kevin Radican  wrote:
> Hi,
>
> We use VASP 4.6 in parallel with opemmpi 1.1.2 without any problems on
> x86_64 with opensuse and compiled with gcc and Intel fortran and use
> torque PBS.
>
> I used standard configure to build openmpi something like
>
> ./configure --prefix=/usr/local --enable-static --with-threads
> --with-tm=/usr/local --with-libnuma
>
> I used the ACLM math lapack libs and built Blacs and Scalapack with them
> too.
>
> I attached my vasp makefile, I might of added
>
> mpi.o : mpi.F
> $(CPP)
> $(FC) -FR -lowercase -O0 -c $*$(SUFFIX)
>
> to the end of the make file, It doesn't look like it is in the example
> makefiles they give, but I compiled this a while ago.
>
> Hope this helps.
>
> Cheers,
> Kevin
>
>
>
>
>
> On Tue, 2007-05-08 at 19:18 -0700, Steven Truong wrote:
> > Hi, all.  I am new to OpenMPI and after initial setup I tried to run
> > my app but got the followign errors:
> >
> > [node07.my.com:16673] *** An error occurred in MPI_Comm_rank
> > [node07.my.com:16673] *** on communicator MPI_COMM_WORLD
> > [node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator
> > [node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye)
> > [node07.my.com:16674] *** An error occurred in MPI_Comm_rank
> > [node07.my.com:16674] *** on communicator MPI_COMM_WORLD
> > [node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator
> > [node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye)
> > [node07.my.com:16675] *** An error occurred in MPI_Comm_rank
> > [node07.my.com:16675] *** on communicator MPI_COMM_WORLD
> > [node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator
> > [node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye)
> > [node07.my.com:16676] *** An error occurred in MPI_Comm_rank
> > [node07.my.com:16676] *** on communicator MPI_COMM_WORLD
> > [node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator
> > [node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye)
> > mpiexec noticed that job rank 2 with PID 16675 on node node07 exited
> > on signal 60 (Real-time signal 26).
> >
> >  /usr/local/openmpi-1.2.1/bin/ompi_info
> > Open MPI: 1.2.1
> >Open MPI SVN revision: r14481
> > Open RTE: 1.2.1
> >Open RTE SVN revision: r14481
> > OPAL: 1.2.1
> >OPAL SVN revision: r14481
> >   Prefix: /usr/local/openmpi-1.2.1
> >  Configured architecture: x86_64-unknown-linux-gnu
> >Configured by: root
> >Configured on: Mon May  7 18:32:56 PDT 2007
> >   Configure host: neptune.nanostellar.com
> > Built by: root
> > Built on: Mon May  7 18:40:28 PDT 2007
> >   Built host: neptune.my.com
> >   C bindings: yes
> > C++ bindings: yes
> >   Fortran77 bindings: yes (all)
> >   Fortran90 bindings: yes
> >  Fortran90 bindings size: small
> >   C compiler: gcc
> >  C compiler absolute: /usr/bin/gcc
> > C++ compiler: g++
> >C++ compiler absolute: /usr/bin/g++
> >   Fortran77 compiler: /opt/intel/fce/9.1.043/bin/ifort
> >   Fortran77 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
> >   Fortran90 compiler: /opt/intel/fce/9.1.043/bin/ifort
> >   Fortran90 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
> >  C profiling: yes
> >C++ profiling: yes
> >  Fortran77 profiling: yes
> >  Fortran90 profiling: yes
> >   C++ exceptions: no
> >   Thread support: posix (mpi: no, progress: no)
> >   Internal debug support: no
> >  MPI parameter check: runtime
> > Memory profiling support: no
> > Memory debugging support: no
> >  libltdl support: yes
> >Heterogeneous support: yes
> >  mpirun default --prefix: yes
> >MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.1)
> >   MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.1)
> >MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.1)
> >MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.1)
> >MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.2.1)
> >MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.1)
> >