Re: [OMPI users] random problems with a ring communication example

2014-03-15 Thread Ralph Castain

On Mar 15, 2014, at 6:21 PM, christophe petit  
wrote:

> Ok, so from what you say, on a "execution system" point view, the ring 
> communication is well achieved (i.e respecting the good order with, in last 
> position, rank0 which receives from rank 6) but the stdout doesn't reflect 
> what really happened, does it ?

Well, it reflects what you printed, but not the order in which things happened.

> 
> Is there a way to make stdout respect the expected order ?

In your program, have each rank!=0 proc recv the message from the previous 
rank, print the message, sleep(1), and then send.

> 
> Thanks
> 
> 
> 2014-03-16 0:42 GMT+01:00 Ralph Castain :
> The explanation is simple: there is no rule about ordering of stdout. So even 
> though your rank0 may receive its MPI message last, its stdout may well be 
> printed before one generated on a remote node. Reason is that rank 0 may well 
> be local to mpirun, and thus the stdout can be handled immediately. However, 
> your rank6 may well be on a remote node, and that daemon has to forward the 
> stdout to mpirun for printing.
> 
> Like I said - no guarantee about ordering of stdout.
> 
> 
> On Mar 15, 2014, at 2:43 PM, christophe petit  
> wrote:
> 
>> Hello,
>> 
>> I followed a simple MPI example to do a ring communication.
>> 
>> Here's the figure that illustrates this example with 7 processes :
>> 
>> http://i.imgur.com/Wrd6acv.png
>> 
>> Here the code :
>> 
>> --
>>  program ring
>> 
>>  implicit none
>>  include 'mpif.h'
>> 
>>  integer, dimension( MPI_STATUS_SIZE ) :: status
>>  integer, parameter:: tag=100
>>  integer :: nb_procs, rank, value, &
>> num_proc_previous,num_proc_next,code
>> 
>>  call MPI_INIT (code)
>>  call MPI_COMM_SIZE ( MPI_COMM_WORLD ,nb_procs,code)
>>  call MPI_COMM_RANK ( MPI_COMM_WORLD ,rank,code)
>>  
>>  num_proc_next=mod(rank+1,nb_procs) 
>>  num_proc_previous=mod(nb_procs+rank-1,nb_procs)
>>  
>>  if (rank == 0) then
>> call MPI_SEND (1000,1, MPI_INTEGER ,num_proc_next,tag, &
>>MPI_COMM_WORLD ,code)
>> call MPI_RECV (value,1, MPI_INTEGER ,num_proc_previous,tag, &
>>MPI_COMM_WORLD ,status,code)
>>  else
>> call MPI_RECV (value,1, MPI_INTEGER ,num_proc_previous,tag, &
>>MPI_COMM_WORLD ,status,code)
>> call MPI_SEND (rank+1000,1, MPI_INTEGER ,num_proc_next,tag, &
>>MPI_COMM_WORLD ,code)
>>  end if
>>  print *,'Me, process ',rank,', I have received ',value,' from process 
>> ',num_proc_previous
>>  
>>  call MPI_FINALIZE (code)
>> end program ring
>> 
>> --
>> 
>> At the execution, I expect to always have :
>> 
>> Me, process1 , I have received 1000  from process
>> 0
>>  Me, process2 , I have received 1001  from process   
>>  1
>>  Me, process3 , I have received 1002  from process   
>>  2
>>  Me, process4 , I have received 1003  from process   
>>  3
>>  Me, process5 , I have received 1004  from process   
>>  4
>>  Me, process6 , I have received 1005  from process   
>>  5
>>  Me, process0 , I have received 1006  from process   
>>  6
>> 
>> But sometimes, I have the reception of process 0 from process 6 which is not 
>> the last reception, like this :
>> 
>>  Me, process1 , I have received 1000  from process   
>>  0
>>  Me, process2 , I have received 1001  from process   
>>  1
>>  Me, process3 , I have received 1002  from process   
>>  2
>>  Me, process4 , I have received 1003  from process   
>>  3
>>  Me, process5 , I have received 1004  from process   
>>  4
>>  Me, process0 , I have received 1006  from process   
>>  6
>>  Me, process6 , I have received 1005  from process   
>>  5
>> 
>> where reception of process 0 from process 6 happens before the reception of 
>> process 6 from process 5
>> 
>> or like on this result :
>> 
>>  Me, process1 , I have received 1000  from process   
>>  0
>>  Me, process2 , I have received 1001  from process   
>>  1
>>  Me, process3 , I have received 1002  from process   
>>  2
>>  Me, process4 , I have received 1003  from process   
>>  3
>>  Me, process0 , I have received 1006  

Re: [OMPI users] random problems with a ring communication example

2014-03-15 Thread christophe petit
Ok, so from what you say, on a "execution system" point view, the ring
communication is well achieved (i.e respecting the good order with, in last
position, rank0 which receives from rank 6) but the stdout doesn't reflect
what really happened, does it ?

Is there a way to make stdout respect the expected order ?

Thanks


2014-03-16 0:42 GMT+01:00 Ralph Castain :

> The explanation is simple: there is no rule about ordering of stdout. So
> even though your rank0 may receive its MPI message last, its stdout may
> well be printed before one generated on a remote node. Reason is that rank
> 0 may well be local to mpirun, and thus the stdout can be handled
> immediately. However, your rank6 may well be on a remote node, and that
> daemon has to forward the stdout to mpirun for printing.
>
> Like I said - no guarantee about ordering of stdout.
>
>
> On Mar 15, 2014, at 2:43 PM, christophe petit <
> christophe.peti...@gmail.com> wrote:
>
> Hello,
>
> I followed a simple MPI example to do a ring communication.
>
> Here's the figure that illustrates this example with 7 processes :
>
> http://i.imgur.com/Wrd6acv.png
>
> Here the code :
>
>
> --
>  program ring
>
>  implicit none
>  include 'mpif.h'
>
>  integer, dimension( MPI_STATUS_SIZE ) :: status
>  integer, parameter:: tag=100
>  integer :: nb_procs, rank, value, &
> num_proc_previous,num_proc_next,code
>
>  call MPI_INIT (code)
>  call MPI_COMM_SIZE ( MPI_COMM_WORLD ,nb_procs,code)
>  call MPI_COMM_RANK ( MPI_COMM_WORLD ,rank,code)
>
>  num_proc_next=mod(rank+1,nb_procs)
>  num_proc_previous=mod(nb_procs+rank-1,nb_procs)
>
>  if (rank == 0) then
> call MPI_SEND (1000,1, MPI_INTEGER ,num_proc_next,tag, &
>MPI_COMM_WORLD ,code)
> call MPI_RECV (value,1, MPI_INTEGER ,num_proc_previous,tag, &
>MPI_COMM_WORLD ,status,code)
>  else
> call MPI_RECV (value,1, MPI_INTEGER ,num_proc_previous,tag, &
>MPI_COMM_WORLD ,status,code)
> call MPI_SEND (rank+1000,1, MPI_INTEGER ,num_proc_next,tag, &
>MPI_COMM_WORLD ,code)
>  end if
>  print *,'Me, process ',rank,', I have received ',value,' from process
> ',num_proc_previous
>
>  call MPI_FINALIZE (code)
> end program ring
>
>
> --
>
> At the execution, I expect to always have :
>
> Me, process1 , I have received 1000  from
> process0
>  Me, process2 , I have received 1001  from
> process1
>  Me, process3 , I have received 1002  from
> process2
>  Me, process4 , I have received 1003  from
> process3
>  Me, process5 , I have received 1004  from
> process4
>  Me, process6 , I have received 1005  from
> process5
>  Me, process0 , I have received 1006  from
> process6
>
> But sometimes, I have the reception of process 0 from process 6 which is
> not the last reception, like this :
>
>  Me, process1 , I have received 1000  from
> process0
>  Me, process2 , I have received 1001  from
> process1
>  Me, process3 , I have received 1002  from
> process2
>  Me, process4 , I have received 1003  from
> process3
>  Me, process5 , I have received 1004  from
> process4
>  Me, process0 , I have received 1006  from
> process6
>  Me, process6 , I have received 1005  from
> process5
>
> where reception of process 0 from process 6 happens before the reception
> of process 6 from process 5
>
> or like on this result :
>
>  Me, process1 , I have received 1000  from
> process0
>  Me, process2 , I have received 1001  from
> process1
>  Me, process3 , I have received 1002  from
> process2
>  Me, process4 , I have received 1003  from
> process3
>  Me, process0 , I have received 1006  from
> process6
>  Me, process5 , I have received 1004  from
> process4
>  Me, process6 , I have received 1005  from
> process5
>
> where process 0 receives between the reception of process 4 and 5.
>
> How can we explain this strange result ? I thought that standard use of
> MPI_SEND and MPI_RECV were blocking by default and,
> with this result, it seems to be not blocking.
>
> I tested this example on Debian 7.0 with open-mpi 

Re: [OMPI users] random problems with a ring communication example

2014-03-15 Thread Ralph Castain
The explanation is simple: there is no rule about ordering of stdout. So even 
though your rank0 may receive its MPI message last, its stdout may well be 
printed before one generated on a remote node. Reason is that rank 0 may well 
be local to mpirun, and thus the stdout can be handled immediately. However, 
your rank6 may well be on a remote node, and that daemon has to forward the 
stdout to mpirun for printing.

Like I said - no guarantee about ordering of stdout.


On Mar 15, 2014, at 2:43 PM, christophe petit  
wrote:

> Hello,
> 
> I followed a simple MPI example to do a ring communication.
> 
> Here's the figure that illustrates this example with 7 processes :
> 
> http://i.imgur.com/Wrd6acv.png
> 
> Here the code :
> 
> --
>  program ring
> 
>  implicit none
>  include 'mpif.h'
> 
>  integer, dimension( MPI_STATUS_SIZE ) :: status
>  integer, parameter:: tag=100
>  integer :: nb_procs, rank, value, &
> num_proc_previous,num_proc_next,code
> 
>  call MPI_INIT (code)
>  call MPI_COMM_SIZE ( MPI_COMM_WORLD ,nb_procs,code)
>  call MPI_COMM_RANK ( MPI_COMM_WORLD ,rank,code)
>  
>  num_proc_next=mod(rank+1,nb_procs) 
>  num_proc_previous=mod(nb_procs+rank-1,nb_procs)
>  
>  if (rank == 0) then
> call MPI_SEND (1000,1, MPI_INTEGER ,num_proc_next,tag, &
>MPI_COMM_WORLD ,code)
> call MPI_RECV (value,1, MPI_INTEGER ,num_proc_previous,tag, &
>MPI_COMM_WORLD ,status,code)
>  else
> call MPI_RECV (value,1, MPI_INTEGER ,num_proc_previous,tag, &
>MPI_COMM_WORLD ,status,code)
> call MPI_SEND (rank+1000,1, MPI_INTEGER ,num_proc_next,tag, &
>MPI_COMM_WORLD ,code)
>  end if
>  print *,'Me, process ',rank,', I have received ',value,' from process 
> ',num_proc_previous
>  
>  call MPI_FINALIZE (code)
> end program ring
> 
> --
> 
> At the execution, I expect to always have :
> 
> Me, process1 , I have received 1000  from process 
>0
>  Me, process2 , I have received 1001  from process
> 1
>  Me, process3 , I have received 1002  from process
> 2
>  Me, process4 , I have received 1003  from process
> 3
>  Me, process5 , I have received 1004  from process
> 4
>  Me, process6 , I have received 1005  from process
> 5
>  Me, process0 , I have received 1006  from process
> 6
> 
> But sometimes, I have the reception of process 0 from process 6 which is not 
> the last reception, like this :
> 
>  Me, process1 , I have received 1000  from process
> 0
>  Me, process2 , I have received 1001  from process
> 1
>  Me, process3 , I have received 1002  from process
> 2
>  Me, process4 , I have received 1003  from process
> 3
>  Me, process5 , I have received 1004  from process
> 4
>  Me, process0 , I have received 1006  from process
> 6
>  Me, process6 , I have received 1005  from process
> 5
> 
> where reception of process 0 from process 6 happens before the reception of 
> process 6 from process 5
> 
> or like on this result :
> 
>  Me, process1 , I have received 1000  from process
> 0
>  Me, process2 , I have received 1001  from process
> 1
>  Me, process3 , I have received 1002  from process
> 2
>  Me, process4 , I have received 1003  from process
> 3
>  Me, process0 , I have received 1006  from process
> 6
>  Me, process5 , I have received 1004  from process
> 4
>  Me, process6 , I have received 1005  from process
> 5
> 
> where process 0 receives between the reception of process 4 and 5.
> 
> How can we explain this strange result ? I thought that standard use of 
> MPI_SEND and MPI_RECV were blocking by default and,
> with this result, it seems to be not blocking.
> 
> I tested this example on Debian 7.0 with open-mpi package.
> 
> Thanks for your help
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] random problems with a ring communication example

2014-03-15 Thread christophe petit
Hello,

I followed a simple MPI example to do a ring communication.

Here's the figure that illustrates this example with 7 processes :

http://i.imgur.com/Wrd6acv.png

Here the code :

--
 program ring

 implicit none
 include 'mpif.h'

 integer, dimension( MPI_STATUS_SIZE ) :: status
 integer, parameter:: tag=100
 integer :: nb_procs, rank, value, &
num_proc_previous,num_proc_next,code

 call MPI_INIT (code)
 call MPI_COMM_SIZE ( MPI_COMM_WORLD ,nb_procs,code)
 call MPI_COMM_RANK ( MPI_COMM_WORLD ,rank,code)

 num_proc_next=mod(rank+1,nb_procs)
 num_proc_previous=mod(nb_procs+rank-1,nb_procs)

 if (rank == 0) then
call MPI_SEND (1000,1, MPI_INTEGER ,num_proc_next,tag, &
   MPI_COMM_WORLD ,code)
call MPI_RECV (value,1, MPI_INTEGER ,num_proc_previous,tag, &
   MPI_COMM_WORLD ,status,code)
 else
call MPI_RECV (value,1, MPI_INTEGER ,num_proc_previous,tag, &
   MPI_COMM_WORLD ,status,code)
call MPI_SEND (rank+1000,1, MPI_INTEGER ,num_proc_next,tag, &
   MPI_COMM_WORLD ,code)
 end if
 print *,'Me, process ',rank,', I have received ',value,' from process
',num_proc_previous

 call MPI_FINALIZE (code)
end program ring

--

At the execution, I expect to always have :

Me, process1 , I have received 1000  from
process0
 Me, process2 , I have received 1001  from
process1
 Me, process3 , I have received 1002  from
process2
 Me, process4 , I have received 1003  from
process3
 Me, process5 , I have received 1004  from
process4
 Me, process6 , I have received 1005  from
process5
 Me, process0 , I have received 1006  from
process6

But sometimes, I have the reception of process 0 from process 6 which is
not the last reception, like this :

 Me, process1 , I have received 1000  from
process0
 Me, process2 , I have received 1001  from
process1
 Me, process3 , I have received 1002  from
process2
 Me, process4 , I have received 1003  from
process3
 Me, process5 , I have received 1004  from
process4
 Me, process0 , I have received 1006  from
process6
 Me, process6 , I have received 1005  from
process5

where reception of process 0 from process 6 happens before the reception of
process 6 from process 5

or like on this result :

 Me, process1 , I have received 1000  from
process0
 Me, process2 , I have received 1001  from
process1
 Me, process3 , I have received 1002  from
process2
 Me, process4 , I have received 1003  from
process3
 Me, process0 , I have received 1006  from
process6
 Me, process5 , I have received 1004  from
process4
 Me, process6 , I have received 1005  from
process5

where process 0 receives between the reception of process 4 and 5.

How can we explain this strange result ? I thought that standard use of
MPI_SEND and MPI_RECV were blocking by default and,
with this result, it seems to be not blocking.

I tested this example on Debian 7.0 with open-mpi package.

Thanks for your help


Re: [OMPI users] Question about '--mca btl tcp,self'

2014-03-15 Thread Ralph Castain

On Mar 14, 2014, at 10:18 PM, Jianyu Liu  wrote:

>> On Mar 14, 2014, at 10:16:34 AM,Jeff Squyres  wrote: 
>> 
>>> On Mar 14, 2014, at 10:11 AM, Ralph Castain  wrote: 
>>> 
 1. If specified '--mca btl tcp,self', which interface application will run 
 on, use GigE adaper OR use the OpenFabrics interface in IP over IB mode 
 (just like a high performance GigE adapter) ? 
>>> 
>>> Both - ip over ib looks just like an Ethernet adaptor 
>> 
>> 
>> To be clear: the TCP BTL will use all TCP interfaces (regardless of 
>> underlying physical transport). Your GigE adapter and your IP adapter both 
>> present IP interfaces to>the OS, and both support TCP. So the TCP BTL will 
>> use them, because it just sees the TCP/IP interfaces. 
> 
> Thanks for your kindly input.
> 
> Please see if I have understood correctly
> 
> Assume there are two nework
>   Gigabit Ethernet
> 
> eth0-renamed : 192.168.[1-22].[1-14] / 255.255.192.0
> 
>   InfiniBand network
> 
> ib0 :  172.20.[1-22].[1-4] / 255.255.0.0
> 
> 
> 1. If specified '--mca btl tcp,self
> 
> The control information ( such as setup and teardown ) are routed to and 
> passed by Gigabit Ethernet in TCP/IP mode

Not necessarily - the out-of-band (OOB) system will pickup one of the TCP 
interfaces, but which one depends on the ordering in the kernel.

> The MPI messages are routed to and passed by InfiniBand network in IP 
> over IB mode

Not necessarily - could use either device

> On the same machine, the TCP lookback device will be used for passing 
> control and MPI messages 

I believe the TCP BTL would use the selected device for loopback, ignoring the 
loopback device

> 
> 2. If specified '--mca btl tcp,self --mca btl_tcp_if_include ib0'
> 
> Both of control information ( such as setup and teardown ) and MPI 
> messages are routed to and passed by InfiniBand network in IP over IB mode

No - control info is sent by the OOB, not the BTL. To get what you describe, 
you would have to add "-mca oob_tcp_if_include ib0"

> On the same machine, The TCP lookback device will be used for passing 
> control and MPI messages

No - the TCP MPI messages would loopback via the ib0 device

> 
> 
> 3. If specified '--mca btl openib,self'
> 
> The control information ( such as setup and teardown ) are routed to and 
> passed by InfiniBand network in IP over IB mode

Not necessarily - same answer as #1

> The MPI messages are routed to and passed by InfiniBand network in RDMA 
> mode

Well, it will use IB, but may not use RDMA. That is an internal decision tree 
made per-message based on a variety of factors

> On the same machine, the TCP lookback device will be used for passing 
> control and MPI messages

No - you excluded TCP for MPI messages, and so it would have to loopback within 
the IB stack. Control messages would loopback via TCP

> 
> 
> 4. If without specifiying any 'mca btl' parameters
> 
> The control information ( such as setup and teardown ) are routed to and 
> passed by Gigabit Ethernet in TCP/IP mode

Not necessarily - same answer as #1

> The MPI messages are routed and passed by InfiniBand network in RDMA mode

Same as #3

> On the same machine, the shared memory (sm) BTL will be used for control 
> and MPI passing messages

Not for control - just for MPI

> 
> 
> Appreciating your kindly input
> 
> Jianyu  
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] ssh error

2014-03-15 Thread Ralph Castain
Well, for one thing - that output clearly shows you are running MPICH, not Open 
MPI. You might ask them about the errors

On Mar 15, 2014, at 6:36 AM, raha khalili  wrote:

> Dear all
> 
> I am trying to run a program based on other posts in this topic. I run this 
> command as Mehdi said but I get an error:
> 
> [client3@master 92.12.23]$ mpirun --hostfile texthost -np 2 
> /home/client3/espresso-5.0.2/bin/pw.x -in AdnAu.rx.in | tee AdnAu.rx.out
> [mpie...@master.cluster.umz] HYDU_process_mfile_token 
> (./utils/args/args.c:299): token slots not supported at this time
> [mpie...@master.cluster.umz] HYDU_parse_hostfile (./utils/args/args.c:347): 
> unable to process token
> [mpie...@master.cluster.umz] mfile_fn (./ui/mpich/utils.c:341): error parsing 
> hostfile
> [mpie...@master.cluster.umz] match_arg (./utils/args/args.c:153): match 
> handler returned error
> [mpie...@master.cluster.umz] HYDU_parse_array (./utils/args/args.c:175): 
> argument matching returned error
> [mpie...@master.cluster.umz] parse_args (./ui/mpich/utils.c:1609): error 
> parsing input array
> [mpie...@master.cluster.umz] HYD_uii_mpx_get_parameters 
> (./ui/mpich/utils.c:1660): unable to parse user arguments
> [mpie...@master.cluster.umz] main (./ui/mpich/mpiexec.c:153): error parsing 
> parameters
> 
> hostfile:
> 
> # This is a hostfile.
> #
> # The following nodes are used for calculations
> #
> #master.cluster.umz slots=4 max-slots=2
> khalili@192.168.196.2 slots=4 max-slots=4
> khalili@192.168.196.3 slots=4 max-slots=4
> #khal...@client3.cluster.umz slots=8
> 
> Any help is really appreciated. 
> Khadije Khalili
> 
> 
> On Tue, Mar 11, 2014 at 9:01 PM, raha khalili  
> wrote:
> Very thanks to Mehdi and Reuti for your helps.
> 
> 
> On Tue, Mar 11, 2014 at 3:46 PM, Mehdi Rahmani  wrote:
> Hi
> use --hostfile or --machinefile in your command
> mpirun --hostfile texthost -np 2 /home/client3/espresso-5.0.2/bin/pw.x -in 
> AdnAu.rx.in | tee AdnAu.rx.out
> 
> 
> On Tue, Mar 11, 2014 at 1:35 PM, raha khalili  
> wrote:
> Dear users
> 
> I want to run a quantum espresso program (with passwordless ssh). I prepared 
> a hostfile named 'texthost' at my input directory. I get this error when I 
> run the program:
> 
> texthost:
> # This is a hostfile. 
> # I have 4 syetems are paralleled by mpich2
> # The following nodes are that machines I want to use:
> #clie...@master.cluster.umz slots=4 
> khal...@client1.cluster.umz slots=4 max-slots=4
> #khal...@client2.cluster.umz slots=4 max-slots=4
> khal...@client3.cluster.umz slots=8 max-slots=8
> 
> command: 
> mpirun --host texthost -np 2 /home/client3/espresso-5.0.2/bin/pw.x -in 
> AdnAu.rx.in | tee AdnAu.rx.out
> 
> error:
> ssh: Could not resolve hostname texthost: Name or service not known
> 
> after press ctrl+c:
> ^C[mpie...@master.cluster.umz] HYDU_sock_write (./utils/sock/sock.c:291): 
> write error (Bad file descriptor)
> [mpie...@master.cluster.umz] HYD_pmcd_pmiserv_send_signal 
> (./pm/pmiserv/pmiserv_cb.c:170): unable to write data to proxy
> [mpie...@master.cluster.umz] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:79): 
> unable to send signal downstream
> [mpie...@master.cluster.umz] HYDT_dmxu_poll_wait_for_event 
> (./tools/demux/demux_poll.c:77): callback returned error status
> [mpie...@master.cluster.umz] HYD_pmci_wait_for_completion 
> (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
> [mpie...@master.cluster.umz] main (./ui/mpich/mpiexec.c:331): process manager 
> error waiting for completion
> 
> Could you help me please?
> Thank you very much
> -- 
> Khadije Khalili
> Ph.D Student of Solid-State Physics
> Department of Physics
> University of Mazandaran
> Babolsar, Iran
> kh.khal...@stu.umz.ac.ir
>  
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> -- 
> Khadije Khalili
> Ph.D Student of Solid-State Physics
> Department of Physics
> University of Mazandaran
> Babolsar, Iran
> kh.khal...@stu.umz.ac.ir
>  
> 
> 
> 
> -- 
> Khadije Khalili
> Ph.D Student of Solid-State Physics
> Department of Physics
> University of Mazandaran
> Babolsar, Iran
> kh.khal...@stu.umz.ac.ir
>  
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] ssh error

2014-03-15 Thread raha khalili
Dear all

I am trying to run a program based on other posts in this topic. I run this
command as Mehdi said but I get an error:

[client3@master 92.12.23]$ mpirun --hostfile texthost -np 2
/home/client3/espresso-5.0.2/bin/pw.x -in AdnAu.rx.in | tee AdnAu.rx.out
[mpie...@master.cluster.umz] HYDU_process_mfile_token
(./utils/args/args.c:299): token slots not supported at this time
[mpie...@master.cluster.umz] HYDU_parse_hostfile (./utils/args/args.c:347):
unable to process token
[mpie...@master.cluster.umz] mfile_fn (./ui/mpich/utils.c:341): error
parsing hostfile
[mpie...@master.cluster.umz] match_arg (./utils/args/args.c:153): match
handler returned error
[mpie...@master.cluster.umz] HYDU_parse_array (./utils/args/args.c:175):
argument matching returned error
[mpie...@master.cluster.umz] parse_args (./ui/mpich/utils.c:1609): error
parsing input array
[mpie...@master.cluster.umz] HYD_uii_mpx_get_parameters
(./ui/mpich/utils.c:1660): unable to parse user arguments
[mpie...@master.cluster.umz] main (./ui/mpich/mpiexec.c:153): error parsing
parameters

hostfile:

# This is a hostfile.
#
# The following nodes are used for calculations
#
#master.cluster.umz slots=4 max-slots=2
khalili@192.168.196.2 slots=4 max-slots=4
khalili@192.168.196.3 slots=4 max-slots=4
#khal...@client3.cluster.umz slots=8

Any help is really appreciated.
Khadije Khalili


On Tue, Mar 11, 2014 at 9:01 PM, raha khalili wrote:

> Very thanks to Mehdi and Reuti for your helps.
>
>
> On Tue, Mar 11, 2014 at 3:46 PM, Mehdi Rahmani wrote:
>
>> Hi
>> use --hostfile or --machinefile in your command
>> mpirun *--hostfile* texthost -np 2 /home/client3/espresso-5.0.2/bin/pw.x
>> -in AdnAu.rx.in | tee AdnAu.rx.out
>>
>>
>> On Tue, Mar 11, 2014 at 1:35 PM, raha khalili 
>> wrote:
>>
>>> Dear users
>>>
>>> I want to run a quantum espresso program (with passwordless ssh). I
>>> prepared a hostfile named 'texthost' at my input directory. I get this
>>> error when I run the program:
>>>
>>> texthost:
>>> # This is a hostfile.
>>> # I have 4 syetems are paralleled by mpich2
>>> # The following nodes are that machines I want to use:
>>> #clie...@master.cluster.umz slots=4
>>> khal...@client1.cluster.umz slots=4 max-slots=4
>>> #khal...@client2.cluster.umz slots=4 max-slots=4
>>> khal...@client3.cluster.umz slots=8 max-slots=8
>>>
>>> command:
>>> mpirun --host texthost -np 2 /home/client3/espresso-5.0.2/bin/pw.x -in
>>> AdnAu.rx.in | tee AdnAu.rx.out
>>>
>>> error:
>>> ssh: Could not resolve hostname texthost: Name or service not known
>>>
>>> after press ctrl+c:
>>> ^C[mpie...@master.cluster.umz] HYDU_sock_write
>>> (./utils/sock/sock.c:291): write error (Bad file descriptor)
>>> [mpie...@master.cluster.umz] HYD_pmcd_pmiserv_send_signal
>>> (./pm/pmiserv/pmiserv_cb.c:170): unable to write data to proxy
>>> [mpie...@master.cluster.umz] ui_cmd_cb
>>> (./pm/pmiserv/pmiserv_pmci.c:79): unable to send signal downstream
>>> [mpie...@master.cluster.umz] HYDT_dmxu_poll_wait_for_event
>>> (./tools/demux/demux_poll.c:77): callback returned error status
>>> [mpie...@master.cluster.umz] HYD_pmci_wait_for_completion
>>> (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
>>> [mpie...@master.cluster.umz] main (./ui/mpich/mpiexec.c:331): process
>>> manager error waiting for completion
>>>
>>> Could you help me please?
>>> Thank you very much
>>> --
>>> Khadije Khalili
>>> Ph.D Student of Solid-State Physics
>>> Department of Physics
>>> University of Mazandaran
>>> Babolsar, Iran
>>> kh.khal...@stu.umz.ac.ir
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> Khadije Khalili
> Ph.D Student of Solid-State Physics
> Department of Physics
> University of Mazandaran
> Babolsar, Iran
> kh.khal...@stu.umz.ac.ir
>
>



-- 
Khadije Khalili
Ph.D Student of Solid-State Physics
Department of Physics
University of Mazandaran
Babolsar, Iran
kh.khal...@stu.umz.ac.ir


Re: [OMPI users] Question about '--mca btl tcp,self'

2014-03-15 Thread Jianyu Liu
>On Mar 14, 2014, at 10:16:34 AM,Jeff Squyres  wrote: 
>
>>On Mar 14, 2014, at 10:11 AM, Ralph Castain  wrote: 
>>
>>> 1. If specified '--mca btl tcp,self', which interface application will run 
>>> on, use GigE adaper OR use the OpenFabrics interface in IP over IB mode 
>>> (just like a high performance GigE adapter) ? 
>> 
>> Both - ip over ib looks just like an Ethernet adaptor 
>
>
>To be clear: the TCP BTL will use all TCP interfaces (regardless of underlying 
>physical transport). Your GigE adapter and your IP adapter both present IP 
>interfaces to>the OS, and both support TCP. So the TCP BTL will use them, 
>because it just sees the TCP/IP interfaces. 

Thanks for your kindly input.

Please see if I have understood correctly

Assume there are two nework
   Gigabit Ethernet

 eth0-renamed : 192.168.[1-22].[1-14] / 255.255.192.0

   InfiniBand network

 ib0 :  172.20.[1-22].[1-4] / 255.255.0.0
 

1. If specified '--mca btl tcp,self

 The control information ( such as setup and teardown ) are routed to and 
passed by Gigabit Ethernet in TCP/IP mode
 The MPI messages are routed to and passed by InfiniBand network in IP over 
IB mode
 On the same machine, the TCP lookback device will be used for passing 
control and MPI messages 

2. If specified '--mca btl tcp,self --mca btl_tcp_if_include ib0'

 Both of control information ( such as setup and teardown ) and MPI 
messages are routed to and passed by InfiniBand network in IP over IB mode
 On the same machine, The TCP lookback device will be used for passing 
control and MPI messages

 
3. If specified '--mca btl openib,self'

 The control information ( such as setup and teardown ) are routed to and 
passed by InfiniBand network in IP over IB mode
 The MPI messages are routed to and passed by InfiniBand network in RDMA 
mode
 On the same machine, the TCP lookback device will be used for passing 
control and MPI messages


4. If without specifiying any 'mca btl' parameters

 The control information ( such as setup and teardown ) are routed to and 
passed by Gigabit Ethernet in TCP/IP mode
 The MPI messages are routed and passed by InfiniBand network in RDMA mode
 On the same machine, the shared memory (sm) BTL will be used for control 
and MPI passing messages


Appreciating your kindly input

Jianyu