subject:"\[OMPI users\] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand"

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-24 Thread Ralph Castain

Yes, that's fine. Thx!

On Aug 24, 2010, at 9:02 AM, Philippe wrote:

> awesome, I'll give it a spin! with the parameters as below?
> 
> p.
> 
> On Tue, Aug 24, 2010 at 10:47 AM, Ralph Castain  wrote:
>> I think I have this working now - try anything on or after r23647
>> 
>> 
>> On Aug 23, 2010, at 1:36 PM, Philippe wrote:
>> 
>>> sure. I took a guess at ppn and nodes for the case where 2 processes
>>> are on the same node... I dont claim these are the right values ;-)
>>> 
>>> 
>>> 
>>> c0301b10e1 ~/mpi> env|grep OMPI
>>> OMPI_MCA_orte_nodes=c0301b10e1
>>> OMPI_MCA_orte_rank=0
>>> OMPI_MCA_orte_ppn=2
>>> OMPI_MCA_orte_num_procs=2
>>> OMPI_MCA_oob_tcp_static_ports_v6=1-11000
>>> OMPI_MCA_ess=generic
>>> OMPI_MCA_orte_jobid=
>>> OMPI_MCA_oob_tcp_static_ports=1-11000
>>> c0301b10e1 ~/hpa/benchmark/mpi> ./ben1 1 1 1
>>> [c0301b10e1:22827] [[0,],0] assigned port 10001
>>> [c0301b10e1:22827] [[0,],0] accepting connections via event library
>>> minsize=1 maxsize=1 delay=1.00
>>> 
>>> 
>>> 
>>> 
>>> c0301b10e1 ~/mpi> env|grep OMPI
>>> OMPI_MCA_orte_nodes=c0301b10e1
>>> OMPI_MCA_orte_rank=1
>>> OMPI_MCA_orte_ppn=2
>>> OMPI_MCA_orte_num_procs=2
>>> OMPI_MCA_oob_tcp_static_ports_v6=1-11000
>>> OMPI_MCA_ess=generic
>>> OMPI_MCA_orte_jobid=
>>> OMPI_MCA_oob_tcp_static_ports=1-11000
>>> c0301b10e1 ~/hpa/benchmark/mpi> ./ben1 1 1 1
>>> [c0301b10e1:22830] [[0,],1] assigned port 10002
>>> [c0301b10e1:22830] [[0,],1] accepting connections via event library
>>> [c0301b10e1:22830] [[0,],1]-[[0,0],0] mca_oob_tcp_send_nb: tag 15 size 
>>> 189
>>> [c0301b10e1:22830] [[0,],1]-[[0,0],0]
>>> mca_oob_tcp_peer_try_connect: connecting port 10002 to:
>>> 10.4.72.110:1
>>> [c0301b10e1:22830] [[0,],1]-[[0,0],0]
>>> mca_oob_tcp_peer_complete_connect: connection failed: Connection
>>> refused (111) - retrying
>>> [c0301b10e1:22830] [[0,],1]-[[0,0],0]
>>> mca_oob_tcp_peer_try_connect: connecting port 10002 to:
>>> 10.4.72.110:1
>>> [c0301b10e1:22830] [[0,],1]-[[0,0],0]
>>> mca_oob_tcp_peer_complete_connect: connection failed: Connection
>>> refused (111) - retrying
>>> [c0301b10e1:22830] [[0,],1]-[[0,0],0]
>>> mca_oob_tcp_peer_try_connect: connecting port 10002 to:
>>> 10.4.72.110:1
>>> [c0301b10e1:22830] [[0,],1]-[[0,0],0]
>>> mca_oob_tcp_peer_complete_connect: connection failed: Connection
>>> refused (111) - retrying
>>> 
>>> 
>>> 
>>> 
>>> Thanks!
>>> p.
>>> 
>>> 
>>> On Mon, Aug 23, 2010 at 3:24 PM, Ralph Castain  wrote:
 Can you send me the values you are using for the relevant envars? That way 
 I can try to replicate here
 
 
 On Aug 23, 2010, at 1:15 PM, Philippe wrote:
 
> I took a look at the code but I'm afraid I dont see anything wrong.
> 
> p.
> 
> On Thu, Aug 19, 2010 at 2:32 PM, Ralph Castain  wrote:
>> Yes, that is correct - we reserve the first port in the range for a 
>> daemon,
>> should one exist.
>> The problem is clearly that get_node_rank is returning the wrong value 
>> for
>> the second process (your rank=1). If you want to dig deeper, look at the
>> orte/mca/ess/generic code where it generates the nidmap and pidmap. 
>> There is
>> a bug down there somewhere that gives the wrong answer when ppn > 1.
>> 
>> 
>> On Thu, Aug 19, 2010 at 12:12 PM, Philippe  wrote:
>>> 
>>> Ralph,
>>> 
>>> somewhere in ./orte/mca/oob/tcp/oob_tcp.c, there is this comment:
>>> 
>>>orte_node_rank_t nrank;
>>>/* do I know my node_local_rank yet? */
>>>if (ORTE_NODE_RANK_INVALID != (nrank =
>>> orte_ess.get_node_rank(ORTE_PROC_MY_NAME)) &&
>>>(nrank+1) <
>>> opal_argv_count(mca_oob_tcp_component.tcp4_static_ports)) {
>>>/* any daemon takes the first entry, so we start
>>> with the second */
>>> 
>>> which seems constant with process #0 listening on 10001. the question
>>> would be why process #1 attempt to connect to port 1 then? or
>>> maybe totally unrelated :-)
>>> 
>>> btw, if I trick process #1 to open the connection to 10001 by shifting
>>> the range, I now get this error and the process terminate immediately:
>>> 
>>> [c0301b10e1:03919] [[0,],1]-[[0,0],0]
>>> mca_oob_tcp_peer_recv_connect_ack: received unexpected process
>>> identifier [[0,],0]
>>> 
>>> good luck with the surgery and wishing you a prompt recovery!
>>> 
>>> p.
>>> 
>>> On Thu, Aug 19, 2010 at 2:02 PM, Ralph Castain  
>>> wrote:
 Something doesn't look right - here is what the algo attempts to do:
 given a port range of 1-12000, the lowest rank'd process on the 
 node
 should open port 1. The next lowest rank on the node will open
 10001,
 etc.
 So it looks to me like there is some confusion in the local rank al

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-24 Thread Philippe

awesome, I'll give it a spin! with the parameters as below?

p.

On Tue, Aug 24, 2010 at 10:47 AM, Ralph Castain  wrote:
> I think I have this working now - try anything on or after r23647
>
>
> On Aug 23, 2010, at 1:36 PM, Philippe wrote:
>
>> sure. I took a guess at ppn and nodes for the case where 2 processes
>> are on the same node... I dont claim these are the right values ;-)
>>
>>
>>
>> c0301b10e1 ~/mpi> env|grep OMPI
>> OMPI_MCA_orte_nodes=c0301b10e1
>> OMPI_MCA_orte_rank=0
>> OMPI_MCA_orte_ppn=2
>> OMPI_MCA_orte_num_procs=2
>> OMPI_MCA_oob_tcp_static_ports_v6=1-11000
>> OMPI_MCA_ess=generic
>> OMPI_MCA_orte_jobid=
>> OMPI_MCA_oob_tcp_static_ports=1-11000
>> c0301b10e1 ~/hpa/benchmark/mpi> ./ben1 1 1 1
>> [c0301b10e1:22827] [[0,],0] assigned port 10001
>> [c0301b10e1:22827] [[0,],0] accepting connections via event library
>> minsize=1 maxsize=1 delay=1.00
>>
>> 
>>
>>
>> c0301b10e1 ~/mpi> env|grep OMPI
>> OMPI_MCA_orte_nodes=c0301b10e1
>> OMPI_MCA_orte_rank=1
>> OMPI_MCA_orte_ppn=2
>> OMPI_MCA_orte_num_procs=2
>> OMPI_MCA_oob_tcp_static_ports_v6=1-11000
>> OMPI_MCA_ess=generic
>> OMPI_MCA_orte_jobid=
>> OMPI_MCA_oob_tcp_static_ports=1-11000
>> c0301b10e1 ~/hpa/benchmark/mpi> ./ben1 1 1 1
>> [c0301b10e1:22830] [[0,],1] assigned port 10002
>> [c0301b10e1:22830] [[0,],1] accepting connections via event library
>> [c0301b10e1:22830] [[0,],1]-[[0,0],0] mca_oob_tcp_send_nb: tag 15 size 
>> 189
>> [c0301b10e1:22830] [[0,],1]-[[0,0],0]
>> mca_oob_tcp_peer_try_connect: connecting port 10002 to:
>> 10.4.72.110:1
>> [c0301b10e1:22830] [[0,],1]-[[0,0],0]
>> mca_oob_tcp_peer_complete_connect: connection failed: Connection
>> refused (111) - retrying
>> [c0301b10e1:22830] [[0,],1]-[[0,0],0]
>> mca_oob_tcp_peer_try_connect: connecting port 10002 to:
>> 10.4.72.110:1
>> [c0301b10e1:22830] [[0,],1]-[[0,0],0]
>> mca_oob_tcp_peer_complete_connect: connection failed: Connection
>> refused (111) - retrying
>> [c0301b10e1:22830] [[0,],1]-[[0,0],0]
>> mca_oob_tcp_peer_try_connect: connecting port 10002 to:
>> 10.4.72.110:1
>> [c0301b10e1:22830] [[0,],1]-[[0,0],0]
>> mca_oob_tcp_peer_complete_connect: connection failed: Connection
>> refused (111) - retrying
>>
>> 
>>
>>
>> Thanks!
>> p.
>>
>>
>> On Mon, Aug 23, 2010 at 3:24 PM, Ralph Castain  wrote:
>>> Can you send me the values you are using for the relevant envars? That way 
>>> I can try to replicate here
>>>
>>>
>>> On Aug 23, 2010, at 1:15 PM, Philippe wrote:
>>>
 I took a look at the code but I'm afraid I dont see anything wrong.

 p.

 On Thu, Aug 19, 2010 at 2:32 PM, Ralph Castain  wrote:
> Yes, that is correct - we reserve the first port in the range for a 
> daemon,
> should one exist.
> The problem is clearly that get_node_rank is returning the wrong value for
> the second process (your rank=1). If you want to dig deeper, look at the
> orte/mca/ess/generic code where it generates the nidmap and pidmap. There 
> is
> a bug down there somewhere that gives the wrong answer when ppn > 1.
>
>
> On Thu, Aug 19, 2010 at 12:12 PM, Philippe  wrote:
>>
>> Ralph,
>>
>> somewhere in ./orte/mca/oob/tcp/oob_tcp.c, there is this comment:
>>
>>                orte_node_rank_t nrank;
>>                /* do I know my node_local_rank yet? */
>>                if (ORTE_NODE_RANK_INVALID != (nrank =
>> orte_ess.get_node_rank(ORTE_PROC_MY_NAME)) &&
>>                    (nrank+1) <
>> opal_argv_count(mca_oob_tcp_component.tcp4_static_ports)) {
>>                    /* any daemon takes the first entry, so we start
>> with the second */
>>
>> which seems constant with process #0 listening on 10001. the question
>> would be why process #1 attempt to connect to port 1 then? or
>> maybe totally unrelated :-)
>>
>> btw, if I trick process #1 to open the connection to 10001 by shifting
>> the range, I now get this error and the process terminate immediately:
>>
>> [c0301b10e1:03919] [[0,],1]-[[0,0],0]
>> mca_oob_tcp_peer_recv_connect_ack: received unexpected process
>> identifier [[0,],0]
>>
>> good luck with the surgery and wishing you a prompt recovery!
>>
>> p.
>>
>> On Thu, Aug 19, 2010 at 2:02 PM, Ralph Castain  wrote:
>>> Something doesn't look right - here is what the algo attempts to do:
>>> given a port range of 1-12000, the lowest rank'd process on the node
>>> should open port 1. The next lowest rank on the node will open
>>> 10001,
>>> etc.
>>> So it looks to me like there is some confusion in the local rank algo.
>>> I'll
>>> have to look at the generic module - must be a bug in it somewhere.
>>> This might take a couple of days as I have surgery tomorrow morning, so
>>> please forgive the delay.
>>>
>>> On Thu, Aug 19, 2010 at 11:13 AM, Philip

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-24 Thread Ralph Castain

I think I have this working now - try anything on or after r23647


On Aug 23, 2010, at 1:36 PM, Philippe wrote:

> sure. I took a guess at ppn and nodes for the case where 2 processes
> are on the same node... I dont claim these are the right values ;-)
> 
> 
> 
> c0301b10e1 ~/mpi> env|grep OMPI
> OMPI_MCA_orte_nodes=c0301b10e1
> OMPI_MCA_orte_rank=0
> OMPI_MCA_orte_ppn=2
> OMPI_MCA_orte_num_procs=2
> OMPI_MCA_oob_tcp_static_ports_v6=1-11000
> OMPI_MCA_ess=generic
> OMPI_MCA_orte_jobid=
> OMPI_MCA_oob_tcp_static_ports=1-11000
> c0301b10e1 ~/hpa/benchmark/mpi> ./ben1 1 1 1
> [c0301b10e1:22827] [[0,],0] assigned port 10001
> [c0301b10e1:22827] [[0,],0] accepting connections via event library
> minsize=1 maxsize=1 delay=1.00
> 
> 
> 
> 
> c0301b10e1 ~/mpi> env|grep OMPI
> OMPI_MCA_orte_nodes=c0301b10e1
> OMPI_MCA_orte_rank=1
> OMPI_MCA_orte_ppn=2
> OMPI_MCA_orte_num_procs=2
> OMPI_MCA_oob_tcp_static_ports_v6=1-11000
> OMPI_MCA_ess=generic
> OMPI_MCA_orte_jobid=
> OMPI_MCA_oob_tcp_static_ports=1-11000
> c0301b10e1 ~/hpa/benchmark/mpi> ./ben1 1 1 1
> [c0301b10e1:22830] [[0,],1] assigned port 10002
> [c0301b10e1:22830] [[0,],1] accepting connections via event library
> [c0301b10e1:22830] [[0,],1]-[[0,0],0] mca_oob_tcp_send_nb: tag 15 size 189
> [c0301b10e1:22830] [[0,],1]-[[0,0],0]
> mca_oob_tcp_peer_try_connect: connecting port 10002 to:
> 10.4.72.110:1
> [c0301b10e1:22830] [[0,],1]-[[0,0],0]
> mca_oob_tcp_peer_complete_connect: connection failed: Connection
> refused (111) - retrying
> [c0301b10e1:22830] [[0,],1]-[[0,0],0]
> mca_oob_tcp_peer_try_connect: connecting port 10002 to:
> 10.4.72.110:1
> [c0301b10e1:22830] [[0,],1]-[[0,0],0]
> mca_oob_tcp_peer_complete_connect: connection failed: Connection
> refused (111) - retrying
> [c0301b10e1:22830] [[0,],1]-[[0,0],0]
> mca_oob_tcp_peer_try_connect: connecting port 10002 to:
> 10.4.72.110:1
> [c0301b10e1:22830] [[0,],1]-[[0,0],0]
> mca_oob_tcp_peer_complete_connect: connection failed: Connection
> refused (111) - retrying
> 
> 
> 
> 
> Thanks!
> p.
> 
> 
> On Mon, Aug 23, 2010 at 3:24 PM, Ralph Castain  wrote:
>> Can you send me the values you are using for the relevant envars? That way I 
>> can try to replicate here
>> 
>> 
>> On Aug 23, 2010, at 1:15 PM, Philippe wrote:
>> 
>>> I took a look at the code but I'm afraid I dont see anything wrong.
>>> 
>>> p.
>>> 
>>> On Thu, Aug 19, 2010 at 2:32 PM, Ralph Castain  wrote:
 Yes, that is correct - we reserve the first port in the range for a daemon,
 should one exist.
 The problem is clearly that get_node_rank is returning the wrong value for
 the second process (your rank=1). If you want to dig deeper, look at the
 orte/mca/ess/generic code where it generates the nidmap and pidmap. There 
 is
 a bug down there somewhere that gives the wrong answer when ppn > 1.
 
 
 On Thu, Aug 19, 2010 at 12:12 PM, Philippe  wrote:
> 
> Ralph,
> 
> somewhere in ./orte/mca/oob/tcp/oob_tcp.c, there is this comment:
> 
>orte_node_rank_t nrank;
>/* do I know my node_local_rank yet? */
>if (ORTE_NODE_RANK_INVALID != (nrank =
> orte_ess.get_node_rank(ORTE_PROC_MY_NAME)) &&
>(nrank+1) <
> opal_argv_count(mca_oob_tcp_component.tcp4_static_ports)) {
>/* any daemon takes the first entry, so we start
> with the second */
> 
> which seems constant with process #0 listening on 10001. the question
> would be why process #1 attempt to connect to port 1 then? or
> maybe totally unrelated :-)
> 
> btw, if I trick process #1 to open the connection to 10001 by shifting
> the range, I now get this error and the process terminate immediately:
> 
> [c0301b10e1:03919] [[0,],1]-[[0,0],0]
> mca_oob_tcp_peer_recv_connect_ack: received unexpected process
> identifier [[0,],0]
> 
> good luck with the surgery and wishing you a prompt recovery!
> 
> p.
> 
> On Thu, Aug 19, 2010 at 2:02 PM, Ralph Castain  wrote:
>> Something doesn't look right - here is what the algo attempts to do:
>> given a port range of 1-12000, the lowest rank'd process on the node
>> should open port 1. The next lowest rank on the node will open
>> 10001,
>> etc.
>> So it looks to me like there is some confusion in the local rank algo.
>> I'll
>> have to look at the generic module - must be a bug in it somewhere.
>> This might take a couple of days as I have surgery tomorrow morning, so
>> please forgive the delay.
>> 
>> On Thu, Aug 19, 2010 at 11:13 AM, Philippe 
>> wrote:
>>> 
>>> Ralph,
>>> 
>>> I'm able to use the generic module when the processes are on different
>>> machines.
>>> 
>>> what would be the values of the EV when two processes are on the same
>

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-23 Thread Philippe

sure. I took a guess at ppn and nodes for the case where 2 processes
are on the same node... I dont claim these are the right values ;-)



c0301b10e1 ~/mpi> env|grep OMPI
OMPI_MCA_orte_nodes=c0301b10e1
OMPI_MCA_orte_rank=0
OMPI_MCA_orte_ppn=2
OMPI_MCA_orte_num_procs=2
OMPI_MCA_oob_tcp_static_ports_v6=1-11000
OMPI_MCA_ess=generic
OMPI_MCA_orte_jobid=
OMPI_MCA_oob_tcp_static_ports=1-11000
c0301b10e1 ~/hpa/benchmark/mpi> ./ben1 1 1 1
[c0301b10e1:22827] [[0,],0] assigned port 10001
[c0301b10e1:22827] [[0,],0] accepting connections via event library
minsize=1 maxsize=1 delay=1.00




c0301b10e1 ~/mpi> env|grep OMPI
OMPI_MCA_orte_nodes=c0301b10e1
OMPI_MCA_orte_rank=1
OMPI_MCA_orte_ppn=2
OMPI_MCA_orte_num_procs=2
OMPI_MCA_oob_tcp_static_ports_v6=1-11000
OMPI_MCA_ess=generic
OMPI_MCA_orte_jobid=
OMPI_MCA_oob_tcp_static_ports=1-11000
c0301b10e1 ~/hpa/benchmark/mpi> ./ben1 1 1 1
[c0301b10e1:22830] [[0,],1] assigned port 10002
[c0301b10e1:22830] [[0,],1] accepting connections via event library
[c0301b10e1:22830] [[0,],1]-[[0,0],0] mca_oob_tcp_send_nb: tag 15 size 189
[c0301b10e1:22830] [[0,],1]-[[0,0],0]
mca_oob_tcp_peer_try_connect: connecting port 10002 to:
10.4.72.110:1
[c0301b10e1:22830] [[0,],1]-[[0,0],0]
mca_oob_tcp_peer_complete_connect: connection failed: Connection
refused (111) - retrying
[c0301b10e1:22830] [[0,],1]-[[0,0],0]
mca_oob_tcp_peer_try_connect: connecting port 10002 to:
10.4.72.110:1
[c0301b10e1:22830] [[0,],1]-[[0,0],0]
mca_oob_tcp_peer_complete_connect: connection failed: Connection
refused (111) - retrying
[c0301b10e1:22830] [[0,],1]-[[0,0],0]
mca_oob_tcp_peer_try_connect: connecting port 10002 to:
10.4.72.110:1
[c0301b10e1:22830] [[0,],1]-[[0,0],0]
mca_oob_tcp_peer_complete_connect: connection failed: Connection
refused (111) - retrying




Thanks!
p.


On Mon, Aug 23, 2010 at 3:24 PM, Ralph Castain  wrote:
> Can you send me the values you are using for the relevant envars? That way I 
> can try to replicate here
>
>
> On Aug 23, 2010, at 1:15 PM, Philippe wrote:
>
>> I took a look at the code but I'm afraid I dont see anything wrong.
>>
>> p.
>>
>> On Thu, Aug 19, 2010 at 2:32 PM, Ralph Castain  wrote:
>>> Yes, that is correct - we reserve the first port in the range for a daemon,
>>> should one exist.
>>> The problem is clearly that get_node_rank is returning the wrong value for
>>> the second process (your rank=1). If you want to dig deeper, look at the
>>> orte/mca/ess/generic code where it generates the nidmap and pidmap. There is
>>> a bug down there somewhere that gives the wrong answer when ppn > 1.
>>>
>>>
>>> On Thu, Aug 19, 2010 at 12:12 PM, Philippe  wrote:

 Ralph,

 somewhere in ./orte/mca/oob/tcp/oob_tcp.c, there is this comment:

                orte_node_rank_t nrank;
                /* do I know my node_local_rank yet? */
                if (ORTE_NODE_RANK_INVALID != (nrank =
 orte_ess.get_node_rank(ORTE_PROC_MY_NAME)) &&
                    (nrank+1) <
 opal_argv_count(mca_oob_tcp_component.tcp4_static_ports)) {
                    /* any daemon takes the first entry, so we start
 with the second */

 which seems constant with process #0 listening on 10001. the question
 would be why process #1 attempt to connect to port 1 then? or
 maybe totally unrelated :-)

 btw, if I trick process #1 to open the connection to 10001 by shifting
 the range, I now get this error and the process terminate immediately:

 [c0301b10e1:03919] [[0,],1]-[[0,0],0]
 mca_oob_tcp_peer_recv_connect_ack: received unexpected process
 identifier [[0,],0]

 good luck with the surgery and wishing you a prompt recovery!

 p.

 On Thu, Aug 19, 2010 at 2:02 PM, Ralph Castain  wrote:
> Something doesn't look right - here is what the algo attempts to do:
> given a port range of 1-12000, the lowest rank'd process on the node
> should open port 1. The next lowest rank on the node will open
> 10001,
> etc.
> So it looks to me like there is some confusion in the local rank algo.
> I'll
> have to look at the generic module - must be a bug in it somewhere.
> This might take a couple of days as I have surgery tomorrow morning, so
> please forgive the delay.
>
> On Thu, Aug 19, 2010 at 11:13 AM, Philippe 
> wrote:
>>
>> Ralph,
>>
>> I'm able to use the generic module when the processes are on different
>> machines.
>>
>> what would be the values of the EV when two processes are on the same
>> machine (hopefully talking over SHM).
>>
>> i've played with combination of nodelist and ppn but no luck. I get
>> errors
>> like:
>>
>>
>>
>> [c0301b10e1:03172] [[0,],1] -> [[0,0],0] (node: c0301b10e1)
>> oob-tcp: Number of attempts to create TCP connection has been
>> exceeded.  Ca

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-23 Thread Ralph Castain

Can you send me the values you are using for the relevant envars? That way I 
can try to replicate here


On Aug 23, 2010, at 1:15 PM, Philippe wrote:

> I took a look at the code but I'm afraid I dont see anything wrong.
> 
> p.
> 
> On Thu, Aug 19, 2010 at 2:32 PM, Ralph Castain  wrote:
>> Yes, that is correct - we reserve the first port in the range for a daemon,
>> should one exist.
>> The problem is clearly that get_node_rank is returning the wrong value for
>> the second process (your rank=1). If you want to dig deeper, look at the
>> orte/mca/ess/generic code where it generates the nidmap and pidmap. There is
>> a bug down there somewhere that gives the wrong answer when ppn > 1.
>> 
>> 
>> On Thu, Aug 19, 2010 at 12:12 PM, Philippe  wrote:
>>> 
>>> Ralph,
>>> 
>>> somewhere in ./orte/mca/oob/tcp/oob_tcp.c, there is this comment:
>>> 
>>>orte_node_rank_t nrank;
>>>/* do I know my node_local_rank yet? */
>>>if (ORTE_NODE_RANK_INVALID != (nrank =
>>> orte_ess.get_node_rank(ORTE_PROC_MY_NAME)) &&
>>>(nrank+1) <
>>> opal_argv_count(mca_oob_tcp_component.tcp4_static_ports)) {
>>>/* any daemon takes the first entry, so we start
>>> with the second */
>>> 
>>> which seems constant with process #0 listening on 10001. the question
>>> would be why process #1 attempt to connect to port 1 then? or
>>> maybe totally unrelated :-)
>>> 
>>> btw, if I trick process #1 to open the connection to 10001 by shifting
>>> the range, I now get this error and the process terminate immediately:
>>> 
>>> [c0301b10e1:03919] [[0,],1]-[[0,0],0]
>>> mca_oob_tcp_peer_recv_connect_ack: received unexpected process
>>> identifier [[0,],0]
>>> 
>>> good luck with the surgery and wishing you a prompt recovery!
>>> 
>>> p.
>>> 
>>> On Thu, Aug 19, 2010 at 2:02 PM, Ralph Castain  wrote:
 Something doesn't look right - here is what the algo attempts to do:
 given a port range of 1-12000, the lowest rank'd process on the node
 should open port 1. The next lowest rank on the node will open
 10001,
 etc.
 So it looks to me like there is some confusion in the local rank algo.
 I'll
 have to look at the generic module - must be a bug in it somewhere.
 This might take a couple of days as I have surgery tomorrow morning, so
 please forgive the delay.
 
 On Thu, Aug 19, 2010 at 11:13 AM, Philippe 
 wrote:
> 
> Ralph,
> 
> I'm able to use the generic module when the processes are on different
> machines.
> 
> what would be the values of the EV when two processes are on the same
> machine (hopefully talking over SHM).
> 
> i've played with combination of nodelist and ppn but no luck. I get
> errors
> like:
> 
> 
> 
> [c0301b10e1:03172] [[0,],1] -> [[0,0],0] (node: c0301b10e1)
> oob-tcp: Number of attempts to create TCP connection has been
> exceeded.  Can not communicate with peer
> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file
> grpcomm_hier_module.c at line 303
> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file
> base/grpcomm_base_modex.c at line 470
> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file
> grpcomm_hier_module.c at line 484
> 
> --
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or
> environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
> 
>  orte_grpcomm_modex failed
>  --> Returned "Unreachable" (-12) instead of "Success" (0)
> 
> --
> *** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> [c0301b10e1:3172] Abort before MPI_INIT completed successfully; not
> able to guarantee that all other processes were killed!
> 
> 
> maybe a related question is how to assign the TCP port range and how
> is it used? when the processes are on different machines, I use the
> same range and that's ok as long as the range is free. but when the
> processes are on the same node, what value should the range be for
> each process? My range is 1-12000 (for both processes) and I see
> that process with rank #0 listen on port 10001 while process with rank
> #1 try to establish a connect to port 1.
> 
> Thanks so much!
> p. still here... still trying... ;-)
> 
> On Tue, Jul 27, 2010 at 12:58 AM, Ralph C

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-23 Thread Philippe

I took a look at the code but I'm afraid I dont see anything wrong.

p.

On Thu, Aug 19, 2010 at 2:32 PM, Ralph Castain  wrote:
> Yes, that is correct - we reserve the first port in the range for a daemon,
> should one exist.
> The problem is clearly that get_node_rank is returning the wrong value for
> the second process (your rank=1). If you want to dig deeper, look at the
> orte/mca/ess/generic code where it generates the nidmap and pidmap. There is
> a bug down there somewhere that gives the wrong answer when ppn > 1.
>
>
> On Thu, Aug 19, 2010 at 12:12 PM, Philippe  wrote:
>>
>> Ralph,
>>
>> somewhere in ./orte/mca/oob/tcp/oob_tcp.c, there is this comment:
>>
>>                orte_node_rank_t nrank;
>>                /* do I know my node_local_rank yet? */
>>                if (ORTE_NODE_RANK_INVALID != (nrank =
>> orte_ess.get_node_rank(ORTE_PROC_MY_NAME)) &&
>>                    (nrank+1) <
>> opal_argv_count(mca_oob_tcp_component.tcp4_static_ports)) {
>>                    /* any daemon takes the first entry, so we start
>> with the second */
>>
>> which seems constant with process #0 listening on 10001. the question
>> would be why process #1 attempt to connect to port 1 then? or
>> maybe totally unrelated :-)
>>
>> btw, if I trick process #1 to open the connection to 10001 by shifting
>> the range, I now get this error and the process terminate immediately:
>>
>> [c0301b10e1:03919] [[0,],1]-[[0,0],0]
>> mca_oob_tcp_peer_recv_connect_ack: received unexpected process
>> identifier [[0,],0]
>>
>> good luck with the surgery and wishing you a prompt recovery!
>>
>> p.
>>
>> On Thu, Aug 19, 2010 at 2:02 PM, Ralph Castain  wrote:
>> > Something doesn't look right - here is what the algo attempts to do:
>> > given a port range of 1-12000, the lowest rank'd process on the node
>> > should open port 1. The next lowest rank on the node will open
>> > 10001,
>> > etc.
>> > So it looks to me like there is some confusion in the local rank algo.
>> > I'll
>> > have to look at the generic module - must be a bug in it somewhere.
>> > This might take a couple of days as I have surgery tomorrow morning, so
>> > please forgive the delay.
>> >
>> > On Thu, Aug 19, 2010 at 11:13 AM, Philippe 
>> > wrote:
>> >>
>> >> Ralph,
>> >>
>> >> I'm able to use the generic module when the processes are on different
>> >> machines.
>> >>
>> >> what would be the values of the EV when two processes are on the same
>> >> machine (hopefully talking over SHM).
>> >>
>> >> i've played with combination of nodelist and ppn but no luck. I get
>> >> errors
>> >> like:
>> >>
>> >>
>> >>
>> >> [c0301b10e1:03172] [[0,],1] -> [[0,0],0] (node: c0301b10e1)
>> >> oob-tcp: Number of attempts to create TCP connection has been
>> >> exceeded.  Can not communicate with peer
>> >> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file
>> >> grpcomm_hier_module.c at line 303
>> >> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file
>> >> base/grpcomm_base_modex.c at line 470
>> >> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file
>> >> grpcomm_hier_module.c at line 484
>> >>
>> >> --
>> >> It looks like MPI_INIT failed for some reason; your parallel process is
>> >> likely to abort.  There are many reasons that a parallel process can
>> >> fail during MPI_INIT; some of which are due to configuration or
>> >> environment
>> >> problems.  This failure appears to be an internal failure; here's some
>> >> additional information (which may only be relevant to an Open MPI
>> >> developer):
>> >>
>> >>  orte_grpcomm_modex failed
>> >>  --> Returned "Unreachable" (-12) instead of "Success" (0)
>> >>
>> >> --
>> >> *** The MPI_Init() function was called before MPI_INIT was invoked.
>> >> *** This is disallowed by the MPI standard.
>> >> *** Your MPI job will now abort.
>> >> [c0301b10e1:3172] Abort before MPI_INIT completed successfully; not
>> >> able to guarantee that all other processes were killed!
>> >>
>> >>
>> >> maybe a related question is how to assign the TCP port range and how
>> >> is it used? when the processes are on different machines, I use the
>> >> same range and that's ok as long as the range is free. but when the
>> >> processes are on the same node, what value should the range be for
>> >> each process? My range is 1-12000 (for both processes) and I see
>> >> that process with rank #0 listen on port 10001 while process with rank
>> >> #1 try to establish a connect to port 1.
>> >>
>> >> Thanks so much!
>> >> p. still here... still trying... ;-)
>> >>
>> >> On Tue, Jul 27, 2010 at 12:58 AM, Ralph Castain 
>> >> wrote:
>> >> > Use what hostname returns - don't worry about IP addresses as we'll
>> >> > discover them.
>> >> >
>> >> > On Jul 26, 2010, at 10:45 PM, Philippe wrote:
>> >> >
>> >> >> Thanks a lot!
>> >> >>
>> >>

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-19 Thread Ralph Castain

Yes, that is correct - we reserve the first port in the range for a daemon,
should one exist.

The problem is clearly that get_node_rank is returning the wrong value for
the second process (your rank=1). If you want to dig deeper, look at the
orte/mca/ess/generic code where it generates the nidmap and pidmap. There is
a bug down there somewhere that gives the wrong answer when ppn > 1.



On Thu, Aug 19, 2010 at 12:12 PM, Philippe  wrote:

> Ralph,
>
> somewhere in ./orte/mca/oob/tcp/oob_tcp.c, there is this comment:
>
>orte_node_rank_t nrank;
>/* do I know my node_local_rank yet? */
>if (ORTE_NODE_RANK_INVALID != (nrank =
> orte_ess.get_node_rank(ORTE_PROC_MY_NAME)) &&
>(nrank+1) <
> opal_argv_count(mca_oob_tcp_component.tcp4_static_ports)) {
>/* any daemon takes the first entry, so we start
> with the second */
>
> which seems constant with process #0 listening on 10001. the question
> would be why process #1 attempt to connect to port 1 then? or
> maybe totally unrelated :-)
>
> btw, if I trick process #1 to open the connection to 10001 by shifting
> the range, I now get this error and the process terminate immediately:
>
> [c0301b10e1:03919] [[0,],1]-[[0,0],0]
> mca_oob_tcp_peer_recv_connect_ack: received unexpected process
> identifier [[0,],0]
>
> good luck with the surgery and wishing you a prompt recovery!
>
> p.
>
> On Thu, Aug 19, 2010 at 2:02 PM, Ralph Castain  wrote:
> > Something doesn't look right - here is what the algo attempts to do:
> > given a port range of 1-12000, the lowest rank'd process on the node
> > should open port 1. The next lowest rank on the node will open 10001,
> > etc.
> > So it looks to me like there is some confusion in the local rank algo.
> I'll
> > have to look at the generic module - must be a bug in it somewhere.
> > This might take a couple of days as I have surgery tomorrow morning, so
> > please forgive the delay.
> >
> > On Thu, Aug 19, 2010 at 11:13 AM, Philippe 
> wrote:
> >>
> >> Ralph,
> >>
> >> I'm able to use the generic module when the processes are on different
> >> machines.
> >>
> >> what would be the values of the EV when two processes are on the same
> >> machine (hopefully talking over SHM).
> >>
> >> i've played with combination of nodelist and ppn but no luck. I get
> errors
> >> like:
> >>
> >>
> >>
> >> [c0301b10e1:03172] [[0,],1] -> [[0,0],0] (node: c0301b10e1)
> >> oob-tcp: Number of attempts to create TCP connection has been
> >> exceeded.  Can not communicate with peer
> >> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file
> >> grpcomm_hier_module.c at line 303
> >> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file
> >> base/grpcomm_base_modex.c at line 470
> >> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file
> >> grpcomm_hier_module.c at line 484
> >>
> --
> >> It looks like MPI_INIT failed for some reason; your parallel process is
> >> likely to abort.  There are many reasons that a parallel process can
> >> fail during MPI_INIT; some of which are due to configuration or
> >> environment
> >> problems.  This failure appears to be an internal failure; here's some
> >> additional information (which may only be relevant to an Open MPI
> >> developer):
> >>
> >>  orte_grpcomm_modex failed
> >>  --> Returned "Unreachable" (-12) instead of "Success" (0)
> >>
> --
> >> *** The MPI_Init() function was called before MPI_INIT was invoked.
> >> *** This is disallowed by the MPI standard.
> >> *** Your MPI job will now abort.
> >> [c0301b10e1:3172] Abort before MPI_INIT completed successfully; not
> >> able to guarantee that all other processes were killed!
> >>
> >>
> >> maybe a related question is how to assign the TCP port range and how
> >> is it used? when the processes are on different machines, I use the
> >> same range and that's ok as long as the range is free. but when the
> >> processes are on the same node, what value should the range be for
> >> each process? My range is 1-12000 (for both processes) and I see
> >> that process with rank #0 listen on port 10001 while process with rank
> >> #1 try to establish a connect to port 1.
> >>
> >> Thanks so much!
> >> p. still here... still trying... ;-)
> >>
> >> On Tue, Jul 27, 2010 at 12:58 AM, Ralph Castain 
> wrote:
> >> > Use what hostname returns - don't worry about IP addresses as we'll
> >> > discover them.
> >> >
> >> > On Jul 26, 2010, at 10:45 PM, Philippe wrote:
> >> >
> >> >> Thanks a lot!
> >> >>
> >> >> now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our
> >> >> nodes have a short/long name (it's rhel 5.x, so the command hostname
> >> >> returns the long name) and at least 2 IP addresses.
> >> >>
> >> >> p.
> >> >>
> >> >> On Tue, Jul 27, 2010 at 12:06

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-19 Thread Philippe

Ralph,

somewhere in ./orte/mca/oob/tcp/oob_tcp.c, there is this comment:

orte_node_rank_t nrank;
/* do I know my node_local_rank yet? */
if (ORTE_NODE_RANK_INVALID != (nrank =
orte_ess.get_node_rank(ORTE_PROC_MY_NAME)) &&
(nrank+1) <
opal_argv_count(mca_oob_tcp_component.tcp4_static_ports)) {
/* any daemon takes the first entry, so we start
with the second */

which seems constant with process #0 listening on 10001. the question
would be why process #1 attempt to connect to port 1 then? or
maybe totally unrelated :-)

btw, if I trick process #1 to open the connection to 10001 by shifting
the range, I now get this error and the process terminate immediately:

[c0301b10e1:03919] [[0,],1]-[[0,0],0]
mca_oob_tcp_peer_recv_connect_ack: received unexpected process
identifier [[0,],0]

good luck with the surgery and wishing you a prompt recovery!

p.

On Thu, Aug 19, 2010 at 2:02 PM, Ralph Castain  wrote:
> Something doesn't look right - here is what the algo attempts to do:
> given a port range of 1-12000, the lowest rank'd process on the node
> should open port 1. The next lowest rank on the node will open 10001,
> etc.
> So it looks to me like there is some confusion in the local rank algo. I'll
> have to look at the generic module - must be a bug in it somewhere.
> This might take a couple of days as I have surgery tomorrow morning, so
> please forgive the delay.
>
> On Thu, Aug 19, 2010 at 11:13 AM, Philippe  wrote:
>>
>> Ralph,
>>
>> I'm able to use the generic module when the processes are on different
>> machines.
>>
>> what would be the values of the EV when two processes are on the same
>> machine (hopefully talking over SHM).
>>
>> i've played with combination of nodelist and ppn but no luck. I get errors
>> like:
>>
>>
>>
>> [c0301b10e1:03172] [[0,],1] -> [[0,0],0] (node: c0301b10e1)
>> oob-tcp: Number of attempts to create TCP connection has been
>> exceeded.  Can not communicate with peer
>> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file
>> grpcomm_hier_module.c at line 303
>> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file
>> base/grpcomm_base_modex.c at line 470
>> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file
>> grpcomm_hier_module.c at line 484
>> --
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or
>> environment
>> problems.  This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>>  orte_grpcomm_modex failed
>>  --> Returned "Unreachable" (-12) instead of "Success" (0)
>> --
>> *** The MPI_Init() function was called before MPI_INIT was invoked.
>> *** This is disallowed by the MPI standard.
>> *** Your MPI job will now abort.
>> [c0301b10e1:3172] Abort before MPI_INIT completed successfully; not
>> able to guarantee that all other processes were killed!
>>
>>
>> maybe a related question is how to assign the TCP port range and how
>> is it used? when the processes are on different machines, I use the
>> same range and that's ok as long as the range is free. but when the
>> processes are on the same node, what value should the range be for
>> each process? My range is 1-12000 (for both processes) and I see
>> that process with rank #0 listen on port 10001 while process with rank
>> #1 try to establish a connect to port 1.
>>
>> Thanks so much!
>> p. still here... still trying... ;-)
>>
>> On Tue, Jul 27, 2010 at 12:58 AM, Ralph Castain  wrote:
>> > Use what hostname returns - don't worry about IP addresses as we'll
>> > discover them.
>> >
>> > On Jul 26, 2010, at 10:45 PM, Philippe wrote:
>> >
>> >> Thanks a lot!
>> >>
>> >> now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our
>> >> nodes have a short/long name (it's rhel 5.x, so the command hostname
>> >> returns the long name) and at least 2 IP addresses.
>> >>
>> >> p.
>> >>
>> >> On Tue, Jul 27, 2010 at 12:06 AM, Ralph Castain 
>> >> wrote:
>> >>> Okay, fixed in r23499. Thanks again...
>> >>>
>> >>>
>> >>> On Jul 26, 2010, at 9:47 PM, Ralph Castain wrote:
>> >>>
>>  Doh - yes it should! I'll fix it right now.
>> 
>>  Thanks!
>> 
>>  On Jul 26, 2010, at 9:28 PM, Philippe wrote:
>> 
>> > Ralph,
>> >
>> > i was able to test the generic module and it seems to be working.
>> >
>> > one question tho, the function orte_ess_generic_component_query in
>> > "orte/mca/ess/generic/ess_generic_component.c" calls getenv with the
>> > argument "OMPI_MCA_enc", which seems to cause the module to fail to
>> > load. s

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-19 Thread Ralph Castain

Something doesn't look right - here is what the algo attempts to do:

given a port range of 1-12000, the lowest rank'd process on the node
should open port 1. The next lowest rank on the node will open 10001,
etc.

So it looks to me like there is some confusion in the local rank algo. I'll
have to look at the generic module - must be a bug in it somewhere.

This might take a couple of days as I have surgery tomorrow morning, so
please forgive the delay.


On Thu, Aug 19, 2010 at 11:13 AM, Philippe  wrote:

> Ralph,
>
> I'm able to use the generic module when the processes are on different
> machines.
>
> what would be the values of the EV when two processes are on the same
> machine (hopefully talking over SHM).
>
> i've played with combination of nodelist and ppn but no luck. I get errors
> like:
>
>
>
> [c0301b10e1:03172] [[0,],1] -> [[0,0],0] (node: c0301b10e1)
> oob-tcp: Number of attempts to create TCP connection has been
> exceeded.  Can not communicate with peer
> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file
> grpcomm_hier_module.c at line 303
> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file
> base/grpcomm_base_modex.c at line 470
> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file
> grpcomm_hier_module.c at line 484
> --
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
>  orte_grpcomm_modex failed
>  --> Returned "Unreachable" (-12) instead of "Success" (0)
> --
> *** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> [c0301b10e1:3172] Abort before MPI_INIT completed successfully; not
> able to guarantee that all other processes were killed!
>
>
> maybe a related question is how to assign the TCP port range and how
> is it used? when the processes are on different machines, I use the
> same range and that's ok as long as the range is free. but when the
> processes are on the same node, what value should the range be for
> each process? My range is 1-12000 (for both processes) and I see
> that process with rank #0 listen on port 10001 while process with rank
> #1 try to establish a connect to port 1.
>
> Thanks so much!
> p. still here... still trying... ;-)
>
> On Tue, Jul 27, 2010 at 12:58 AM, Ralph Castain  wrote:
> > Use what hostname returns - don't worry about IP addresses as we'll
> discover them.
> >
> > On Jul 26, 2010, at 10:45 PM, Philippe wrote:
> >
> >> Thanks a lot!
> >>
> >> now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our
> >> nodes have a short/long name (it's rhel 5.x, so the command hostname
> >> returns the long name) and at least 2 IP addresses.
> >>
> >> p.
> >>
> >> On Tue, Jul 27, 2010 at 12:06 AM, Ralph Castain 
> wrote:
> >>> Okay, fixed in r23499. Thanks again...
> >>>
> >>>
> >>> On Jul 26, 2010, at 9:47 PM, Ralph Castain wrote:
> >>>
>  Doh - yes it should! I'll fix it right now.
> 
>  Thanks!
> 
>  On Jul 26, 2010, at 9:28 PM, Philippe wrote:
> 
> > Ralph,
> >
> > i was able to test the generic module and it seems to be working.
> >
> > one question tho, the function orte_ess_generic_component_query in
> > "orte/mca/ess/generic/ess_generic_component.c" calls getenv with the
> > argument "OMPI_MCA_enc", which seems to cause the module to fail to
> > load. shouldnt it be "OMPI_MCA_ess" ?
> >
> > .
> >
> >   /* only pick us if directed to do so */
> >   if (NULL != (pick = getenv("OMPI_MCA_env")) &&
> >0 == strcmp(pick, "generic")) {
> >   *priority = 1000;
> >   *module = (mca_base_module_t *)&orte_ess_generic_module;
> >
> > ...
> >
> > p.
> >
> > On Thu, Jul 22, 2010 at 5:53 PM, Ralph Castain 
> wrote:
> >> Dev trunk looks okay right now - I think you'll be fine using it. My
> new component -might- work with 1.5, but probably not with 1.4. I haven't
> checked either of them.
> >>
> >> Anything at r23478 or above will have the new module. Let me know
> how it works for you. I haven't tested it myself, but am pretty sure it
> should work.
> >>
> >>
> >> On Jul 22, 2010, at 3:22 PM, Philippe wrote:
> >>
> >>> Ralph,
> >>>
> >>> Thank you so much!!
> >>>
> >>> I'll give it a try and let you know.
> >>>
> >>> I know it's a tough question, but how stable is the dev trunk? Can
> I
> >>> just grab the latest and run, or am I

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-19 Thread Philippe

Ralph,

I'm able to use the generic module when the processes are on different machines.

what would be the values of the EV when two processes are on the same
machine (hopefully talking over SHM).

i've played with combination of nodelist and ppn but no luck. I get errors like:



[c0301b10e1:03172] [[0,],1] -> [[0,0],0] (node: c0301b10e1)
oob-tcp: Number of attempts to create TCP connection has been
exceeded.  Can not communicate with peer
[c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file
grpcomm_hier_module.c at line 303
[c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file
base/grpcomm_base_modex.c at line 470
[c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file
grpcomm_hier_module.c at line 484
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  orte_grpcomm_modex failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--
*** The MPI_Init() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[c0301b10e1:3172] Abort before MPI_INIT completed successfully; not
able to guarantee that all other processes were killed!


maybe a related question is how to assign the TCP port range and how
is it used? when the processes are on different machines, I use the
same range and that's ok as long as the range is free. but when the
processes are on the same node, what value should the range be for
each process? My range is 1-12000 (for both processes) and I see
that process with rank #0 listen on port 10001 while process with rank
#1 try to establish a connect to port 1.

Thanks so much!
p. still here... still trying... ;-)

On Tue, Jul 27, 2010 at 12:58 AM, Ralph Castain  wrote:
> Use what hostname returns - don't worry about IP addresses as we'll discover 
> them.
>
> On Jul 26, 2010, at 10:45 PM, Philippe wrote:
>
>> Thanks a lot!
>>
>> now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our
>> nodes have a short/long name (it's rhel 5.x, so the command hostname
>> returns the long name) and at least 2 IP addresses.
>>
>> p.
>>
>> On Tue, Jul 27, 2010 at 12:06 AM, Ralph Castain  wrote:
>>> Okay, fixed in r23499. Thanks again...
>>>
>>>
>>> On Jul 26, 2010, at 9:47 PM, Ralph Castain wrote:
>>>
 Doh - yes it should! I'll fix it right now.

 Thanks!

 On Jul 26, 2010, at 9:28 PM, Philippe wrote:

> Ralph,
>
> i was able to test the generic module and it seems to be working.
>
> one question tho, the function orte_ess_generic_component_query in
> "orte/mca/ess/generic/ess_generic_component.c" calls getenv with the
> argument "OMPI_MCA_enc", which seems to cause the module to fail to
> load. shouldnt it be "OMPI_MCA_ess" ?
>
> .
>
>   /* only pick us if directed to do so */
>   if (NULL != (pick = getenv("OMPI_MCA_env")) &&
>                0 == strcmp(pick, "generic")) {
>       *priority = 1000;
>       *module = (mca_base_module_t *)&orte_ess_generic_module;
>
> ...
>
> p.
>
> On Thu, Jul 22, 2010 at 5:53 PM, Ralph Castain  wrote:
>> Dev trunk looks okay right now - I think you'll be fine using it. My new 
>> component -might- work with 1.5, but probably not with 1.4. I haven't 
>> checked either of them.
>>
>> Anything at r23478 or above will have the new module. Let me know how it 
>> works for you. I haven't tested it myself, but am pretty sure it should 
>> work.
>>
>>
>> On Jul 22, 2010, at 3:22 PM, Philippe wrote:
>>
>>> Ralph,
>>>
>>> Thank you so much!!
>>>
>>> I'll give it a try and let you know.
>>>
>>> I know it's a tough question, but how stable is the dev trunk? Can I
>>> just grab the latest and run, or am I better off taking your changes
>>> and copy them back in a stable release? (if so, which one? 1.4? 1.5?)
>>>
>>> p.
>>>
>>> On Thu, Jul 22, 2010 at 3:50 PM, Ralph Castain  
>>> wrote:
 It was easier for me to just construct this module than to explain how 
 to do so :-)

 I will commit it this evening (couple of hours from now) as that is 
 our standard practice. You'll need to use the developer's trunk, 
 though, to use it.

 Here are the envars you'll need to provide:

 Each process needs to get the same following values:

 * OMPI_MCA_ess=generic
 * OMPI_MCA_orte_num_procs=
>>

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-27 Thread Ralph Castain

Use what hostname returns - don't worry about IP addresses as we'll discover 
them.

On Jul 26, 2010, at 10:45 PM, Philippe wrote:

> Thanks a lot!
> 
> now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our
> nodes have a short/long name (it's rhel 5.x, so the command hostname
> returns the long name) and at least 2 IP addresses.
> 
> p.
> 
> On Tue, Jul 27, 2010 at 12:06 AM, Ralph Castain  wrote:
>> Okay, fixed in r23499. Thanks again...
>> 
>> 
>> On Jul 26, 2010, at 9:47 PM, Ralph Castain wrote:
>> 
>>> Doh - yes it should! I'll fix it right now.
>>> 
>>> Thanks!
>>> 
>>> On Jul 26, 2010, at 9:28 PM, Philippe wrote:
>>> 
 Ralph,
 
 i was able to test the generic module and it seems to be working.
 
 one question tho, the function orte_ess_generic_component_query in
 "orte/mca/ess/generic/ess_generic_component.c" calls getenv with the
 argument "OMPI_MCA_enc", which seems to cause the module to fail to
 load. shouldnt it be "OMPI_MCA_ess" ?
 
 .
 
   /* only pick us if directed to do so */
   if (NULL != (pick = getenv("OMPI_MCA_env")) &&
0 == strcmp(pick, "generic")) {
   *priority = 1000;
   *module = (mca_base_module_t *)&orte_ess_generic_module;
 
 ...
 
 p.
 
 On Thu, Jul 22, 2010 at 5:53 PM, Ralph Castain  wrote:
> Dev trunk looks okay right now - I think you'll be fine using it. My new 
> component -might- work with 1.5, but probably not with 1.4. I haven't 
> checked either of them.
> 
> Anything at r23478 or above will have the new module. Let me know how it 
> works for you. I haven't tested it myself, but am pretty sure it should 
> work.
> 
> 
> On Jul 22, 2010, at 3:22 PM, Philippe wrote:
> 
>> Ralph,
>> 
>> Thank you so much!!
>> 
>> I'll give it a try and let you know.
>> 
>> I know it's a tough question, but how stable is the dev trunk? Can I
>> just grab the latest and run, or am I better off taking your changes
>> and copy them back in a stable release? (if so, which one? 1.4? 1.5?)
>> 
>> p.
>> 
>> On Thu, Jul 22, 2010 at 3:50 PM, Ralph Castain  wrote:
>>> It was easier for me to just construct this module than to explain how 
>>> to do so :-)
>>> 
>>> I will commit it this evening (couple of hours from now) as that is our 
>>> standard practice. You'll need to use the developer's trunk, though, to 
>>> use it.
>>> 
>>> Here are the envars you'll need to provide:
>>> 
>>> Each process needs to get the same following values:
>>> 
>>> * OMPI_MCA_ess=generic
>>> * OMPI_MCA_orte_num_procs=
>>> * OMPI_MCA_orte_nodes=>> procs reside>
>>> * OMPI_MCA_orte_ppn=
>>> 
>>> Note that I have assumed this last value is a constant for simplicity. 
>>> If that isn't the case, let me know - you could instead provide it as a 
>>> comma-separated list of values with an entry for each node.
>>> 
>>> In addition, you need to provide the following value that will be 
>>> unique to each process:
>>> 
>>> * OMPI_MCA_orte_rank=
>>> 
>>> Finally, you have to provide a range of static TCP ports for use by the 
>>> processes. Pick any range that you know will be available across all 
>>> the nodes. You then need to ensure that each process sees the following 
>>> envar:
>>> 
>>> * OMPI_MCA_oob_tcp_static_ports=6000-6010  <== obviously, replace this 
>>> with your range
>>> 
>>> You will need a port range that is at least equal to the ppn for the 
>>> job (each proc on a node will take one of the provided ports).
>>> 
>>> That should do it. I compute everything else I need from those values.
>>> 
>>> Does that work for you?
>>> Ralph
>>> 
>>> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-27 Thread Philippe

Thanks a lot!

now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our
nodes have a short/long name (it's rhel 5.x, so the command hostname
returns the long name) and at least 2 IP addresses.

p.

On Tue, Jul 27, 2010 at 12:06 AM, Ralph Castain  wrote:
> Okay, fixed in r23499. Thanks again...
>
>
> On Jul 26, 2010, at 9:47 PM, Ralph Castain wrote:
>
>> Doh - yes it should! I'll fix it right now.
>>
>> Thanks!
>>
>> On Jul 26, 2010, at 9:28 PM, Philippe wrote:
>>
>>> Ralph,
>>>
>>> i was able to test the generic module and it seems to be working.
>>>
>>> one question tho, the function orte_ess_generic_component_query in
>>> "orte/mca/ess/generic/ess_generic_component.c" calls getenv with the
>>> argument "OMPI_MCA_enc", which seems to cause the module to fail to
>>> load. shouldnt it be "OMPI_MCA_ess" ?
>>>
>>> .
>>>
>>>   /* only pick us if directed to do so */
>>>   if (NULL != (pick = getenv("OMPI_MCA_env")) &&
>>>                0 == strcmp(pick, "generic")) {
>>>       *priority = 1000;
>>>       *module = (mca_base_module_t *)&orte_ess_generic_module;
>>>
>>> ...
>>>
>>> p.
>>>
>>> On Thu, Jul 22, 2010 at 5:53 PM, Ralph Castain  wrote:
 Dev trunk looks okay right now - I think you'll be fine using it. My new 
 component -might- work with 1.5, but probably not with 1.4. I haven't 
 checked either of them.

 Anything at r23478 or above will have the new module. Let me know how it 
 works for you. I haven't tested it myself, but am pretty sure it should 
 work.


 On Jul 22, 2010, at 3:22 PM, Philippe wrote:

> Ralph,
>
> Thank you so much!!
>
> I'll give it a try and let you know.
>
> I know it's a tough question, but how stable is the dev trunk? Can I
> just grab the latest and run, or am I better off taking your changes
> and copy them back in a stable release? (if so, which one? 1.4? 1.5?)
>
> p.
>
> On Thu, Jul 22, 2010 at 3:50 PM, Ralph Castain  wrote:
>> It was easier for me to just construct this module than to explain how 
>> to do so :-)
>>
>> I will commit it this evening (couple of hours from now) as that is our 
>> standard practice. You'll need to use the developer's trunk, though, to 
>> use it.
>>
>> Here are the envars you'll need to provide:
>>
>> Each process needs to get the same following values:
>>
>> * OMPI_MCA_ess=generic
>> * OMPI_MCA_orte_num_procs=
>> * OMPI_MCA_orte_nodes=> procs reside>
>> * OMPI_MCA_orte_ppn=
>>
>> Note that I have assumed this last value is a constant for simplicity. 
>> If that isn't the case, let me know - you could instead provide it as a 
>> comma-separated list of values with an entry for each node.
>>
>> In addition, you need to provide the following value that will be unique 
>> to each process:
>>
>> * OMPI_MCA_orte_rank=
>>
>> Finally, you have to provide a range of static TCP ports for use by the 
>> processes. Pick any range that you know will be available across all the 
>> nodes. You then need to ensure that each process sees the following 
>> envar:
>>
>> * OMPI_MCA_oob_tcp_static_ports=6000-6010  <== obviously, replace this 
>> with your range
>>
>> You will need a port range that is at least equal to the ppn for the job 
>> (each proc on a node will take one of the provided ports).
>>
>> That should do it. I compute everything else I need from those values.
>>
>> Does that work for you?
>> Ralph
>>
>>

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-27 Thread Ralph Castain

Okay, fixed in r23499. Thanks again...


On Jul 26, 2010, at 9:47 PM, Ralph Castain wrote:

> Doh - yes it should! I'll fix it right now.
> 
> Thanks!
> 
> On Jul 26, 2010, at 9:28 PM, Philippe wrote:
> 
>> Ralph,
>> 
>> i was able to test the generic module and it seems to be working.
>> 
>> one question tho, the function orte_ess_generic_component_query in
>> "orte/mca/ess/generic/ess_generic_component.c" calls getenv with the
>> argument "OMPI_MCA_enc", which seems to cause the module to fail to
>> load. shouldnt it be "OMPI_MCA_ess" ?
>> 
>> .
>> 
>>   /* only pick us if directed to do so */
>>   if (NULL != (pick = getenv("OMPI_MCA_env")) &&
>>0 == strcmp(pick, "generic")) {
>>   *priority = 1000;
>>   *module = (mca_base_module_t *)&orte_ess_generic_module;
>> 
>> ...
>> 
>> p.
>> 
>> On Thu, Jul 22, 2010 at 5:53 PM, Ralph Castain  wrote:
>>> Dev trunk looks okay right now - I think you'll be fine using it. My new 
>>> component -might- work with 1.5, but probably not with 1.4. I haven't 
>>> checked either of them.
>>> 
>>> Anything at r23478 or above will have the new module. Let me know how it 
>>> works for you. I haven't tested it myself, but am pretty sure it should 
>>> work.
>>> 
>>> 
>>> On Jul 22, 2010, at 3:22 PM, Philippe wrote:
>>> 
 Ralph,
 
 Thank you so much!!
 
 I'll give it a try and let you know.
 
 I know it's a tough question, but how stable is the dev trunk? Can I
 just grab the latest and run, or am I better off taking your changes
 and copy them back in a stable release? (if so, which one? 1.4? 1.5?)
 
 p.
 
 On Thu, Jul 22, 2010 at 3:50 PM, Ralph Castain  wrote:
> It was easier for me to just construct this module than to explain how to 
> do so :-)
> 
> I will commit it this evening (couple of hours from now) as that is our 
> standard practice. You'll need to use the developer's trunk, though, to 
> use it.
> 
> Here are the envars you'll need to provide:
> 
> Each process needs to get the same following values:
> 
> * OMPI_MCA_ess=generic
> * OMPI_MCA_orte_num_procs=
> * OMPI_MCA_orte_nodes= procs reside>
> * OMPI_MCA_orte_ppn=
> 
> Note that I have assumed this last value is a constant for simplicity. If 
> that isn't the case, let me know - you could instead provide it as a 
> comma-separated list of values with an entry for each node.
> 
> In addition, you need to provide the following value that will be unique 
> to each process:
> 
> * OMPI_MCA_orte_rank=
> 
> Finally, you have to provide a range of static TCP ports for use by the 
> processes. Pick any range that you know will be available across all the 
> nodes. You then need to ensure that each process sees the following envar:
> 
> * OMPI_MCA_oob_tcp_static_ports=6000-6010  <== obviously, replace this 
> with your range
> 
> You will need a port range that is at least equal to the ppn for the job 
> (each proc on a node will take one of the provided ports).
> 
> That should do it. I compute everything else I need from those values.
> 
> Does that work for you?
> Ralph
> 
> 
> On Jul 22, 2010, at 6:48 AM, Philippe wrote:
> 
>> On Wed, Jul 21, 2010 at 10:44 AM, Ralph Castain  
>> wrote:
>>> 
>>> On Jul 21, 2010, at 7:44 AM, Philippe wrote:
>>> 
 Ralph,
 
 Sorry for the late reply -- I was away on vacation.
>>> 
>>> no problem at all!
>>> 
 
 regarding your earlier question about how many processes where
 involved when the memory was entirely allocated, it was only two, a
 sender and a receiver. I'm still trying to pinpoint what can be
 different between the standalone case and the "integrated" case. I
 will try to find out what part of the code is allocating memory in a
 loop.
>>> 
>>> hmmmthat sounds like a bug in your program. let me know what you 
>>> find
>>> 
 
 
 On Tue, Jul 20, 2010 at 12:51 AM, Ralph Castain  
 wrote:
> Well, I finally managed to make this work without the required 
> ompi-server rendezvous point. The fix is only in the devel trunk 
> right now - I'll have to ask the release managers for 1.5 and 1.4 if 
> they want it ported to those series.
> 
 
 great -- i'll give it a try
 
> On the notion of integrating OMPI to your launch environment: 
> remember that we don't necessarily require that you use mpiexec for 
> that purpose. If your launch environment provides just a little info 
> in the environment of the launched procs, we can usually devise a 
> method that allows the procs to perform an MPI_Init as a single job 
> without all this work you are doing.
> 
>

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-26 Thread Ralph Castain

Doh - yes it should! I'll fix it right now.

Thanks!

On Jul 26, 2010, at 9:28 PM, Philippe wrote:

> Ralph,
> 
> i was able to test the generic module and it seems to be working.
> 
> one question tho, the function orte_ess_generic_component_query in
> "orte/mca/ess/generic/ess_generic_component.c" calls getenv with the
> argument "OMPI_MCA_enc", which seems to cause the module to fail to
> load. shouldnt it be "OMPI_MCA_ess" ?
> 
> .
> 
>/* only pick us if directed to do so */
>if (NULL != (pick = getenv("OMPI_MCA_env")) &&
> 0 == strcmp(pick, "generic")) {
>*priority = 1000;
>*module = (mca_base_module_t *)&orte_ess_generic_module;
> 
> ...
> 
> p.
> 
> On Thu, Jul 22, 2010 at 5:53 PM, Ralph Castain  wrote:
>> Dev trunk looks okay right now - I think you'll be fine using it. My new 
>> component -might- work with 1.5, but probably not with 1.4. I haven't 
>> checked either of them.
>> 
>> Anything at r23478 or above will have the new module. Let me know how it 
>> works for you. I haven't tested it myself, but am pretty sure it should work.
>> 
>> 
>> On Jul 22, 2010, at 3:22 PM, Philippe wrote:
>> 
>>> Ralph,
>>> 
>>> Thank you so much!!
>>> 
>>> I'll give it a try and let you know.
>>> 
>>> I know it's a tough question, but how stable is the dev trunk? Can I
>>> just grab the latest and run, or am I better off taking your changes
>>> and copy them back in a stable release? (if so, which one? 1.4? 1.5?)
>>> 
>>> p.
>>> 
>>> On Thu, Jul 22, 2010 at 3:50 PM, Ralph Castain  wrote:
 It was easier for me to just construct this module than to explain how to 
 do so :-)
 
 I will commit it this evening (couple of hours from now) as that is our 
 standard practice. You'll need to use the developer's trunk, though, to 
 use it.
 
 Here are the envars you'll need to provide:
 
 Each process needs to get the same following values:
 
 * OMPI_MCA_ess=generic
 * OMPI_MCA_orte_num_procs=
 * OMPI_MCA_orte_nodes=>>> reside>
 * OMPI_MCA_orte_ppn=
 
 Note that I have assumed this last value is a constant for simplicity. If 
 that isn't the case, let me know - you could instead provide it as a 
 comma-separated list of values with an entry for each node.
 
 In addition, you need to provide the following value that will be unique 
 to each process:
 
 * OMPI_MCA_orte_rank=
 
 Finally, you have to provide a range of static TCP ports for use by the 
 processes. Pick any range that you know will be available across all the 
 nodes. You then need to ensure that each process sees the following envar:
 
 * OMPI_MCA_oob_tcp_static_ports=6000-6010  <== obviously, replace this 
 with your range
 
 You will need a port range that is at least equal to the ppn for the job 
 (each proc on a node will take one of the provided ports).
 
 That should do it. I compute everything else I need from those values.
 
 Does that work for you?
 Ralph
 
 
 On Jul 22, 2010, at 6:48 AM, Philippe wrote:
 
> On Wed, Jul 21, 2010 at 10:44 AM, Ralph Castain  wrote:
>> 
>> On Jul 21, 2010, at 7:44 AM, Philippe wrote:
>> 
>>> Ralph,
>>> 
>>> Sorry for the late reply -- I was away on vacation.
>> 
>> no problem at all!
>> 
>>> 
>>> regarding your earlier question about how many processes where
>>> involved when the memory was entirely allocated, it was only two, a
>>> sender and a receiver. I'm still trying to pinpoint what can be
>>> different between the standalone case and the "integrated" case. I
>>> will try to find out what part of the code is allocating memory in a
>>> loop.
>> 
>> hmmmthat sounds like a bug in your program. let me know what you find
>> 
>>> 
>>> 
>>> On Tue, Jul 20, 2010 at 12:51 AM, Ralph Castain  
>>> wrote:
 Well, I finally managed to make this work without the required 
 ompi-server rendezvous point. The fix is only in the devel trunk right 
 now - I'll have to ask the release managers for 1.5 and 1.4 if they 
 want it ported to those series.
 
>>> 
>>> great -- i'll give it a try
>>> 
 On the notion of integrating OMPI to your launch environment: remember 
 that we don't necessarily require that you use mpiexec for that 
 purpose. If your launch environment provides just a little info in the 
 environment of the launched procs, we can usually devise a method that 
 allows the procs to perform an MPI_Init as a single job without all 
 this work you are doing.
 
>>> 
>>> I'm working on creating operators using MPI for the IBM product
>>> "InfoSphere Streams". It has its own launching mechanism to start the
>>> processes. However I can pass some information to the processes that
>>> belong to th

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-26 Thread Philippe

Ralph,

i was able to test the generic module and it seems to be working.

one question tho, the function orte_ess_generic_component_query in
"orte/mca/ess/generic/ess_generic_component.c" calls getenv with the
argument "OMPI_MCA_enc", which seems to cause the module to fail to
load. shouldnt it be "OMPI_MCA_ess" ?

.

/* only pick us if directed to do so */
if (NULL != (pick = getenv("OMPI_MCA_env")) &&
 0 == strcmp(pick, "generic")) {
*priority = 1000;
*module = (mca_base_module_t *)&orte_ess_generic_module;

...

p.

On Thu, Jul 22, 2010 at 5:53 PM, Ralph Castain  wrote:
> Dev trunk looks okay right now - I think you'll be fine using it. My new 
> component -might- work with 1.5, but probably not with 1.4. I haven't checked 
> either of them.
>
> Anything at r23478 or above will have the new module. Let me know how it 
> works for you. I haven't tested it myself, but am pretty sure it should work.
>
>
> On Jul 22, 2010, at 3:22 PM, Philippe wrote:
>
>> Ralph,
>>
>> Thank you so much!!
>>
>> I'll give it a try and let you know.
>>
>> I know it's a tough question, but how stable is the dev trunk? Can I
>> just grab the latest and run, or am I better off taking your changes
>> and copy them back in a stable release? (if so, which one? 1.4? 1.5?)
>>
>> p.
>>
>> On Thu, Jul 22, 2010 at 3:50 PM, Ralph Castain  wrote:
>>> It was easier for me to just construct this module than to explain how to 
>>> do so :-)
>>>
>>> I will commit it this evening (couple of hours from now) as that is our 
>>> standard practice. You'll need to use the developer's trunk, though, to use 
>>> it.
>>>
>>> Here are the envars you'll need to provide:
>>>
>>> Each process needs to get the same following values:
>>>
>>> * OMPI_MCA_ess=generic
>>> * OMPI_MCA_orte_num_procs=
>>> * OMPI_MCA_orte_nodes=>> reside>
>>> * OMPI_MCA_orte_ppn=
>>>
>>> Note that I have assumed this last value is a constant for simplicity. If 
>>> that isn't the case, let me know - you could instead provide it as a 
>>> comma-separated list of values with an entry for each node.
>>>
>>> In addition, you need to provide the following value that will be unique to 
>>> each process:
>>>
>>> * OMPI_MCA_orte_rank=
>>>
>>> Finally, you have to provide a range of static TCP ports for use by the 
>>> processes. Pick any range that you know will be available across all the 
>>> nodes. You then need to ensure that each process sees the following envar:
>>>
>>> * OMPI_MCA_oob_tcp_static_ports=6000-6010  <== obviously, replace this with 
>>> your range
>>>
>>> You will need a port range that is at least equal to the ppn for the job 
>>> (each proc on a node will take one of the provided ports).
>>>
>>> That should do it. I compute everything else I need from those values.
>>>
>>> Does that work for you?
>>> Ralph
>>>
>>>
>>> On Jul 22, 2010, at 6:48 AM, Philippe wrote:
>>>
 On Wed, Jul 21, 2010 at 10:44 AM, Ralph Castain  wrote:
>
> On Jul 21, 2010, at 7:44 AM, Philippe wrote:
>
>> Ralph,
>>
>> Sorry for the late reply -- I was away on vacation.
>
> no problem at all!
>
>>
>> regarding your earlier question about how many processes where
>> involved when the memory was entirely allocated, it was only two, a
>> sender and a receiver. I'm still trying to pinpoint what can be
>> different between the standalone case and the "integrated" case. I
>> will try to find out what part of the code is allocating memory in a
>> loop.
>
> hmmmthat sounds like a bug in your program. let me know what you find
>
>>
>>
>> On Tue, Jul 20, 2010 at 12:51 AM, Ralph Castain  
>> wrote:
>>> Well, I finally managed to make this work without the required 
>>> ompi-server rendezvous point. The fix is only in the devel trunk right 
>>> now - I'll have to ask the release managers for 1.5 and 1.4 if they 
>>> want it ported to those series.
>>>
>>
>> great -- i'll give it a try
>>
>>> On the notion of integrating OMPI to your launch environment: remember 
>>> that we don't necessarily require that you use mpiexec for that 
>>> purpose. If your launch environment provides just a little info in the 
>>> environment of the launched procs, we can usually devise a method that 
>>> allows the procs to perform an MPI_Init as a single job without all 
>>> this work you are doing.
>>>
>>
>> I'm working on creating operators using MPI for the IBM product
>> "InfoSphere Streams". It has its own launching mechanism to start the
>> processes. However I can pass some information to the processes that
>> belong to the same job (Streams job -- which should neatly map to MPI
>> job).
>>
>>> Only difference is that your procs will all block in MPI_Init until 
>>> they -all- have executed that function. If that isn't a problem, this 
>>> would be a much more scalable and reliable meth

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-22 Thread Ralph Castain

Dev trunk looks okay right now - I think you'll be fine using it. My new 
component -might- work with 1.5, but probably not with 1.4. I haven't checked 
either of them.

Anything at r23478 or above will have the new module. Let me know how it works 
for you. I haven't tested it myself, but am pretty sure it should work.


On Jul 22, 2010, at 3:22 PM, Philippe wrote:

> Ralph,
> 
> Thank you so much!!
> 
> I'll give it a try and let you know.
> 
> I know it's a tough question, but how stable is the dev trunk? Can I
> just grab the latest and run, or am I better off taking your changes
> and copy them back in a stable release? (if so, which one? 1.4? 1.5?)
> 
> p.
> 
> On Thu, Jul 22, 2010 at 3:50 PM, Ralph Castain  wrote:
>> It was easier for me to just construct this module than to explain how to do 
>> so :-)
>> 
>> I will commit it this evening (couple of hours from now) as that is our 
>> standard practice. You'll need to use the developer's trunk, though, to use 
>> it.
>> 
>> Here are the envars you'll need to provide:
>> 
>> Each process needs to get the same following values:
>> 
>> * OMPI_MCA_ess=generic
>> * OMPI_MCA_orte_num_procs=
>> * OMPI_MCA_orte_nodes=> reside>
>> * OMPI_MCA_orte_ppn=
>> 
>> Note that I have assumed this last value is a constant for simplicity. If 
>> that isn't the case, let me know - you could instead provide it as a 
>> comma-separated list of values with an entry for each node.
>> 
>> In addition, you need to provide the following value that will be unique to 
>> each process:
>> 
>> * OMPI_MCA_orte_rank=
>> 
>> Finally, you have to provide a range of static TCP ports for use by the 
>> processes. Pick any range that you know will be available across all the 
>> nodes. You then need to ensure that each process sees the following envar:
>> 
>> * OMPI_MCA_oob_tcp_static_ports=6000-6010  <== obviously, replace this with 
>> your range
>> 
>> You will need a port range that is at least equal to the ppn for the job 
>> (each proc on a node will take one of the provided ports).
>> 
>> That should do it. I compute everything else I need from those values.
>> 
>> Does that work for you?
>> Ralph
>> 
>> 
>> On Jul 22, 2010, at 6:48 AM, Philippe wrote:
>> 
>>> On Wed, Jul 21, 2010 at 10:44 AM, Ralph Castain  wrote:
 
 On Jul 21, 2010, at 7:44 AM, Philippe wrote:
 
> Ralph,
> 
> Sorry for the late reply -- I was away on vacation.
 
 no problem at all!
 
> 
> regarding your earlier question about how many processes where
> involved when the memory was entirely allocated, it was only two, a
> sender and a receiver. I'm still trying to pinpoint what can be
> different between the standalone case and the "integrated" case. I
> will try to find out what part of the code is allocating memory in a
> loop.
 
 hmmmthat sounds like a bug in your program. let me know what you find
 
> 
> 
> On Tue, Jul 20, 2010 at 12:51 AM, Ralph Castain  wrote:
>> Well, I finally managed to make this work without the required 
>> ompi-server rendezvous point. The fix is only in the devel trunk right 
>> now - I'll have to ask the release managers for 1.5 and 1.4 if they want 
>> it ported to those series.
>> 
> 
> great -- i'll give it a try
> 
>> On the notion of integrating OMPI to your launch environment: remember 
>> that we don't necessarily require that you use mpiexec for that purpose. 
>> If your launch environment provides just a little info in the 
>> environment of the launched procs, we can usually devise a method that 
>> allows the procs to perform an MPI_Init as a single job without all this 
>> work you are doing.
>> 
> 
> I'm working on creating operators using MPI for the IBM product
> "InfoSphere Streams". It has its own launching mechanism to start the
> processes. However I can pass some information to the processes that
> belong to the same job (Streams job -- which should neatly map to MPI
> job).
> 
>> Only difference is that your procs will all block in MPI_Init until they 
>> -all- have executed that function. If that isn't a problem, this would 
>> be a much more scalable and reliable method than doing it thru massive 
>> calls to MPI_Port_connect.
>> 
> 
> in the general case, that would be a problem, but for my prototype,
> this is acceptable.
> 
> In general, each process is composed of operators, some may be MPI
> related and some may not. But in my case, I know ahead of time which
> processes will be part of the MPI job, so I can easily deal with the
> fact that they would block on MPI_init (actually -- MPI_thread_init
> since its using a lot of threads).
 
 We have talked in the past about creating a non-blocking MPI_Init as an 
 extension to the standard. It would lock you to Open MPI, though...
 
 Regardless, at some point

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-22 Thread Philippe

Ralph,

Thank you so much!!

I'll give it a try and let you know.

I know it's a tough question, but how stable is the dev trunk? Can I
just grab the latest and run, or am I better off taking your changes
and copy them back in a stable release? (if so, which one? 1.4? 1.5?)

p.

On Thu, Jul 22, 2010 at 3:50 PM, Ralph Castain  wrote:
> It was easier for me to just construct this module than to explain how to do 
> so :-)
>
> I will commit it this evening (couple of hours from now) as that is our 
> standard practice. You'll need to use the developer's trunk, though, to use 
> it.
>
> Here are the envars you'll need to provide:
>
> Each process needs to get the same following values:
>
> * OMPI_MCA_ess=generic
> * OMPI_MCA_orte_num_procs=
> * OMPI_MCA_orte_nodes= reside>
> * OMPI_MCA_orte_ppn=
>
> Note that I have assumed this last value is a constant for simplicity. If 
> that isn't the case, let me know - you could instead provide it as a 
> comma-separated list of values with an entry for each node.
>
> In addition, you need to provide the following value that will be unique to 
> each process:
>
> * OMPI_MCA_orte_rank=
>
> Finally, you have to provide a range of static TCP ports for use by the 
> processes. Pick any range that you know will be available across all the 
> nodes. You then need to ensure that each process sees the following envar:
>
> * OMPI_MCA_oob_tcp_static_ports=6000-6010  <== obviously, replace this with 
> your range
>
> You will need a port range that is at least equal to the ppn for the job 
> (each proc on a node will take one of the provided ports).
>
> That should do it. I compute everything else I need from those values.
>
> Does that work for you?
> Ralph
>
>
> On Jul 22, 2010, at 6:48 AM, Philippe wrote:
>
>> On Wed, Jul 21, 2010 at 10:44 AM, Ralph Castain  wrote:
>>>
>>> On Jul 21, 2010, at 7:44 AM, Philippe wrote:
>>>
 Ralph,

 Sorry for the late reply -- I was away on vacation.
>>>
>>> no problem at all!
>>>

 regarding your earlier question about how many processes where
 involved when the memory was entirely allocated, it was only two, a
 sender and a receiver. I'm still trying to pinpoint what can be
 different between the standalone case and the "integrated" case. I
 will try to find out what part of the code is allocating memory in a
 loop.
>>>
>>> hmmmthat sounds like a bug in your program. let me know what you find
>>>


 On Tue, Jul 20, 2010 at 12:51 AM, Ralph Castain  wrote:
> Well, I finally managed to make this work without the required 
> ompi-server rendezvous point. The fix is only in the devel trunk right 
> now - I'll have to ask the release managers for 1.5 and 1.4 if they want 
> it ported to those series.
>

 great -- i'll give it a try

> On the notion of integrating OMPI to your launch environment: remember 
> that we don't necessarily require that you use mpiexec for that purpose. 
> If your launch environment provides just a little info in the environment 
> of the launched procs, we can usually devise a method that allows the 
> procs to perform an MPI_Init as a single job without all this work you 
> are doing.
>

 I'm working on creating operators using MPI for the IBM product
 "InfoSphere Streams". It has its own launching mechanism to start the
 processes. However I can pass some information to the processes that
 belong to the same job (Streams job -- which should neatly map to MPI
 job).

> Only difference is that your procs will all block in MPI_Init until they 
> -all- have executed that function. If that isn't a problem, this would be 
> a much more scalable and reliable method than doing it thru massive calls 
> to MPI_Port_connect.
>

 in the general case, that would be a problem, but for my prototype,
 this is acceptable.

 In general, each process is composed of operators, some may be MPI
 related and some may not. But in my case, I know ahead of time which
 processes will be part of the MPI job, so I can easily deal with the
 fact that they would block on MPI_init (actually -- MPI_thread_init
 since its using a lot of threads).
>>>
>>> We have talked in the past about creating a non-blocking MPI_Init as an 
>>> extension to the standard. It would lock you to Open MPI, though...
>>>
>>> Regardless, at some point you would have to know how many processes are 
>>> going to be part of the job so you can know when MPI_Init is complete. I 
>>> would think you would require that info for the singleton wireup anyway - 
>>> yes? Otherwise, how would you know when to quit running connect-accept?
>>>
>>
>> the short answer is yes... although, the longer answer is a bit more
>> complicated. currently I do know the number of connect I need to do on
>> a per-port basis. a job can contains an arbitrary number of MPI
>> processes, each opening one or more

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-22 Thread Ralph Castain

It was easier for me to just construct this module than to explain how to do so 
:-)

I will commit it this evening (couple of hours from now) as that is our 
standard practice. You'll need to use the developer's trunk, though, to use it.

Here are the envars you'll need to provide:

Each process needs to get the same following values:

* OMPI_MCA_ess=generic
* OMPI_MCA_orte_num_procs=
* OMPI_MCA_orte_nodes=
* OMPI_MCA_orte_ppn=

Note that I have assumed this last value is a constant for simplicity. If that 
isn't the case, let me know - you could instead provide it as a comma-separated 
list of values with an entry for each node.

In addition, you need to provide the following value that will be unique to 
each process:

* OMPI_MCA_orte_rank=

Finally, you have to provide a range of static TCP ports for use by the 
processes. Pick any range that you know will be available across all the nodes. 
You then need to ensure that each process sees the following envar:

* OMPI_MCA_oob_tcp_static_ports=6000-6010  <== obviously, replace this with 
your range

You will need a port range that is at least equal to the ppn for the job (each 
proc on a node will take one of the provided ports).

That should do it. I compute everything else I need from those values.

Does that work for you?
Ralph


On Jul 22, 2010, at 6:48 AM, Philippe wrote:

> On Wed, Jul 21, 2010 at 10:44 AM, Ralph Castain  wrote:
>> 
>> On Jul 21, 2010, at 7:44 AM, Philippe wrote:
>> 
>>> Ralph,
>>> 
>>> Sorry for the late reply -- I was away on vacation.
>> 
>> no problem at all!
>> 
>>> 
>>> regarding your earlier question about how many processes where
>>> involved when the memory was entirely allocated, it was only two, a
>>> sender and a receiver. I'm still trying to pinpoint what can be
>>> different between the standalone case and the "integrated" case. I
>>> will try to find out what part of the code is allocating memory in a
>>> loop.
>> 
>> hmmmthat sounds like a bug in your program. let me know what you find
>> 
>>> 
>>> 
>>> On Tue, Jul 20, 2010 at 12:51 AM, Ralph Castain  wrote:
 Well, I finally managed to make this work without the required ompi-server 
 rendezvous point. The fix is only in the devel trunk right now - I'll have 
 to ask the release managers for 1.5 and 1.4 if they want it ported to 
 those series.
 
>>> 
>>> great -- i'll give it a try
>>> 
 On the notion of integrating OMPI to your launch environment: remember 
 that we don't necessarily require that you use mpiexec for that purpose. 
 If your launch environment provides just a little info in the environment 
 of the launched procs, we can usually devise a method that allows the 
 procs to perform an MPI_Init as a single job without all this work you are 
 doing.
 
>>> 
>>> I'm working on creating operators using MPI for the IBM product
>>> "InfoSphere Streams". It has its own launching mechanism to start the
>>> processes. However I can pass some information to the processes that
>>> belong to the same job (Streams job -- which should neatly map to MPI
>>> job).
>>> 
 Only difference is that your procs will all block in MPI_Init until they 
 -all- have executed that function. If that isn't a problem, this would be 
 a much more scalable and reliable method than doing it thru massive calls 
 to MPI_Port_connect.
 
>>> 
>>> in the general case, that would be a problem, but for my prototype,
>>> this is acceptable.
>>> 
>>> In general, each process is composed of operators, some may be MPI
>>> related and some may not. But in my case, I know ahead of time which
>>> processes will be part of the MPI job, so I can easily deal with the
>>> fact that they would block on MPI_init (actually -- MPI_thread_init
>>> since its using a lot of threads).
>> 
>> We have talked in the past about creating a non-blocking MPI_Init as an 
>> extension to the standard. It would lock you to Open MPI, though...
>> 
>> Regardless, at some point you would have to know how many processes are 
>> going to be part of the job so you can know when MPI_Init is complete. I 
>> would think you would require that info for the singleton wireup anyway - 
>> yes? Otherwise, how would you know when to quit running connect-accept?
>> 
> 
> the short answer is yes... although, the longer answer is a bit more
> complicated. currently I do know the number of connect I need to do on
> a per-port basis. a job can contains an arbitrary number of MPI
> processes, each opening one or more ports. so i know the count port by
> ports but I dont need to worry about how many MPI processes there is
> globally. to make things a bit more complicated, each MPI operator can
> be "fused" with other operators to make a process. each fused operator
> may or may not require MPI. the bottom line is, to get the total
> number of processes to calculate rank&size, I need to reverse engineer
> the fusing that the compiler may do.
> 
> but that's ok, I'm willing to

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-22 Thread Philippe

On Wed, Jul 21, 2010 at 10:44 AM, Ralph Castain  wrote:
>
> On Jul 21, 2010, at 7:44 AM, Philippe wrote:
>
>> Ralph,
>>
>> Sorry for the late reply -- I was away on vacation.
>
> no problem at all!
>
>>
>> regarding your earlier question about how many processes where
>> involved when the memory was entirely allocated, it was only two, a
>> sender and a receiver. I'm still trying to pinpoint what can be
>> different between the standalone case and the "integrated" case. I
>> will try to find out what part of the code is allocating memory in a
>> loop.
>
> hmmmthat sounds like a bug in your program. let me know what you find
>
>>
>>
>> On Tue, Jul 20, 2010 at 12:51 AM, Ralph Castain  wrote:
>>> Well, I finally managed to make this work without the required ompi-server 
>>> rendezvous point. The fix is only in the devel trunk right now - I'll have 
>>> to ask the release managers for 1.5 and 1.4 if they want it ported to those 
>>> series.
>>>
>>
>> great -- i'll give it a try
>>
>>> On the notion of integrating OMPI to your launch environment: remember that 
>>> we don't necessarily require that you use mpiexec for that purpose. If your 
>>> launch environment provides just a little info in the environment of the 
>>> launched procs, we can usually devise a method that allows the procs to 
>>> perform an MPI_Init as a single job without all this work you are doing.
>>>
>>
>> I'm working on creating operators using MPI for the IBM product
>> "InfoSphere Streams". It has its own launching mechanism to start the
>> processes. However I can pass some information to the processes that
>> belong to the same job (Streams job -- which should neatly map to MPI
>> job).
>>
>>> Only difference is that your procs will all block in MPI_Init until they 
>>> -all- have executed that function. If that isn't a problem, this would be a 
>>> much more scalable and reliable method than doing it thru massive calls to 
>>> MPI_Port_connect.
>>>
>>
>> in the general case, that would be a problem, but for my prototype,
>> this is acceptable.
>>
>> In general, each process is composed of operators, some may be MPI
>> related and some may not. But in my case, I know ahead of time which
>> processes will be part of the MPI job, so I can easily deal with the
>> fact that they would block on MPI_init (actually -- MPI_thread_init
>> since its using a lot of threads).
>
> We have talked in the past about creating a non-blocking MPI_Init as an 
> extension to the standard. It would lock you to Open MPI, though...
>
> Regardless, at some point you would have to know how many processes are going 
> to be part of the job so you can know when MPI_Init is complete. I would 
> think you would require that info for the singleton wireup anyway - yes? 
> Otherwise, how would you know when to quit running connect-accept?
>

the short answer is yes... although, the longer answer is a bit more
complicated. currently I do know the number of connect I need to do on
a per-port basis. a job can contains an arbitrary number of MPI
processes, each opening one or more ports. so i know the count port by
ports but I dont need to worry about how many MPI processes there is
globally. to make things a bit more complicated, each MPI operator can
be "fused" with other operators to make a process. each fused operator
may or may not require MPI. the bottom line is, to get the total
number of processes to calculate rank&size, I need to reverse engineer
the fusing that the compiler may do.

but that's ok, I'm willing to do that for our prototype :-)

>>
>> Is there a documentation or example I can use to see what information
>> I can pass to the processes to enable that? Is it just environment
>> variables?
>
> No real documentation - a lack I should probably fill. At the moment, we 
> don't have a "generic" module for standalone launch, but I can create one as 
> it is pretty trivial. I would then need you to pass each process envars 
> telling it the total number of processes in the MPI job, its rank within that 
> job, and a file where some rendezvous process (can be rank=0) has provided 
> that port string. Armed with that info, I can wireup the job.
>
> Won't be as scalable as an mpirun-initiated startup, but will be much better 
> than doing it from singletons.

that would be great. I can definitely pass environment variables to
each process.

>
> Or if you prefer, we could setup an "infosphere" module that we could 
> customize for this system. Main thing here would be to provide us with some 
> kind of regex (or access to a file containing the info) that describes the 
> map of rank to node so we can construct the wireup communication pattern.
>

i think for our prototype we are fine with the first method. I'd leave
the cleaner implementation as a task for the product team ;-)

regarding the "generic" module, is that something you can put together
quickly? can I help in any way?

Thanks!
p

> Either way would work. The second is more scalable, but I don't

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-21 Thread Ralph Castain

On Jul 21, 2010, at 7:44 AM, Philippe wrote:

> Ralph,
> 
> Sorry for the late reply -- I was away on vacation.

no problem at all!

> 
> regarding your earlier question about how many processes where
> involved when the memory was entirely allocated, it was only two, a
> sender and a receiver. I'm still trying to pinpoint what can be
> different between the standalone case and the "integrated" case. I
> will try to find out what part of the code is allocating memory in a
> loop.

hmmmthat sounds like a bug in your program. let me know what you find

> 
> 
> On Tue, Jul 20, 2010 at 12:51 AM, Ralph Castain  wrote:
>> Well, I finally managed to make this work without the required ompi-server 
>> rendezvous point. The fix is only in the devel trunk right now - I'll have 
>> to ask the release managers for 1.5 and 1.4 if they want it ported to those 
>> series.
>> 
> 
> great -- i'll give it a try
> 
>> On the notion of integrating OMPI to your launch environment: remember that 
>> we don't necessarily require that you use mpiexec for that purpose. If your 
>> launch environment provides just a little info in the environment of the 
>> launched procs, we can usually devise a method that allows the procs to 
>> perform an MPI_Init as a single job without all this work you are doing.
>> 
> 
> I'm working on creating operators using MPI for the IBM product
> "InfoSphere Streams". It has its own launching mechanism to start the
> processes. However I can pass some information to the processes that
> belong to the same job (Streams job -- which should neatly map to MPI
> job).
> 
>> Only difference is that your procs will all block in MPI_Init until they 
>> -all- have executed that function. If that isn't a problem, this would be a 
>> much more scalable and reliable method than doing it thru massive calls to 
>> MPI_Port_connect.
>> 
> 
> in the general case, that would be a problem, but for my prototype,
> this is acceptable.
> 
> In general, each process is composed of operators, some may be MPI
> related and some may not. But in my case, I know ahead of time which
> processes will be part of the MPI job, so I can easily deal with the
> fact that they would block on MPI_init (actually -- MPI_thread_init
> since its using a lot of threads).

We have talked in the past about creating a non-blocking MPI_Init as an 
extension to the standard. It would lock you to Open MPI, though...

Regardless, at some point you would have to know how many processes are going 
to be part of the job so you can know when MPI_Init is complete. I would think 
you would require that info for the singleton wireup anyway - yes? Otherwise, 
how would you know when to quit running connect-accept?

> 
> Is there a documentation or example I can use to see what information
> I can pass to the processes to enable that? Is it just environment
> variables?

No real documentation - a lack I should probably fill. At the moment, we don't 
have a "generic" module for standalone launch, but I can create one as it is 
pretty trivial. I would then need you to pass each process envars telling it 
the total number of processes in the MPI job, its rank within that job, and a 
file where some rendezvous process (can be rank=0) has provided that port 
string. Armed with that info, I can wireup the job.

Won't be as scalable as an mpirun-initiated startup, but will be much better 
than doing it from singletons.

Or if you prefer, we could setup an "infosphere" module that we could customize 
for this system. Main thing here would be to provide us with some kind of regex 
(or access to a file containing the info) that describes the map of rank to 
node so we can construct the wireup communication pattern.

Either way would work. The second is more scalable, but I don't know if you 
have (or can construct) the map info.

> 
> Many thanks!
> p.
> 
>> 
>> On Jul 18, 2010, at 4:09 PM, Philippe wrote:
>> 
>>> Ralph,
>>> 
>>> thanks for investigating.
>>> 
>>> I've applied the two patches you mentioned earlier and ran with the
>>> ompi server. Although i was able to runn our standalone test, when I
>>> integrated the changes to our code, the processes entered a crazy loop
>>> and allocated all the memory available when calling MPI_Port_Connect.
>>> I was not able to identify why it works standalone but not integrated
>>> with our code. If I found why, I'll let your know.
>>> 
>>> looking forward to your findings. We'll be happy to test any patches
>>> if you have some!
>>> 
>>> p.
>>> 
>>> On Sat, Jul 17, 2010 at 9:47 PM, Ralph Castain  wrote:
 Okay, I can reproduce this problem. Frankly, I don't think this ever 
 worked with OMPI, and I'm not sure how the choice of BTL makes a 
 difference.

 The program is crashing in the communicator definition, which involves a 
 communication over our internal out-of-band messaging system. That system 
 has zero connection to any BTL, so it should crash either way.

 Regardless, I wi

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-21 Thread Philippe

Ralph,

Sorry for the late reply -- I was away on vacation.

regarding your earlier question about how many processes where
involved when the memory was entirely allocated, it was only two, a
sender and a receiver. I'm still trying to pinpoint what can be
different between the standalone case and the "integrated" case. I
will try to find out what part of the code is allocating memory in a
loop.

On Tue, Jul 20, 2010 at 12:51 AM, Ralph Castain  wrote:
> Well, I finally managed to make this work without the required ompi-server 
> rendezvous point. The fix is only in the devel trunk right now - I'll have to 
> ask the release managers for 1.5 and 1.4 if they want it ported to those 
> series.
>

great -- i'll give it a try

> On the notion of integrating OMPI to your launch environment: remember that 
> we don't necessarily require that you use mpiexec for that purpose. If your 
> launch environment provides just a little info in the environment of the 
> launched procs, we can usually devise a method that allows the procs to 
> perform an MPI_Init as a single job without all this work you are doing.
>

I'm working on creating operators using MPI for the IBM product
"InfoSphere Streams". It has its own launching mechanism to start the
processes. However I can pass some information to the processes that
belong to the same job (Streams job -- which should neatly map to MPI
job).

> Only difference is that your procs will all block in MPI_Init until they 
> -all- have executed that function. If that isn't a problem, this would be a 
> much more scalable and reliable method than doing it thru massive calls to 
> MPI_Port_connect.
>

in the general case, that would be a problem, but for my prototype,
this is acceptable.

In general, each process is composed of operators, some may be MPI
related and some may not. But in my case, I know ahead of time which
processes will be part of the MPI job, so I can easily deal with the
fact that they would block on MPI_init (actually -- MPI_thread_init
since its using a lot of threads).

Is there a documentation or example I can use to see what information
I can pass to the processes to enable that? Is it just environment
variables?

Many thanks!
p.

>
> On Jul 18, 2010, at 4:09 PM, Philippe wrote:
>
>> Ralph,
>>
>> thanks for investigating.
>>
>> I've applied the two patches you mentioned earlier and ran with the
>> ompi server. Although i was able to runn our standalone test, when I
>> integrated the changes to our code, the processes entered a crazy loop
>> and allocated all the memory available when calling MPI_Port_Connect.
>> I was not able to identify why it works standalone but not integrated
>> with our code. If I found why, I'll let your know.
>>
>> looking forward to your findings. We'll be happy to test any patches
>> if you have some!
>>
>> p.
>>
>> On Sat, Jul 17, 2010 at 9:47 PM, Ralph Castain  wrote:
>>> Okay, I can reproduce this problem. Frankly, I don't think this ever worked 
>>> with OMPI, and I'm not sure how the choice of BTL makes a difference.
>>>
>>> The program is crashing in the communicator definition, which involves a 
>>> communication over our internal out-of-band messaging system. That system 
>>> has zero connection to any BTL, so it should crash either way.
>>>
>>> Regardless, I will play with this a little as time allows. Thanks for the 
>>> reproducer!
>>>
>>>
>>> On Jun 25, 2010, at 7:23 AM, Philippe wrote:
>>>
 Hi,

 I'm trying to run a test program which consists of a server creating a
 port using MPI_Open_port and N clients using MPI_Comm_connect to
 connect to the server.

 I'm able to do so with 1 server and 2 clients, but with 1 server + 3
 clients, I get the following error message:

   [node003:32274] [[37084,0],0]:route_callback tried routing message
 from [[37084,1],0] to [[40912,1],0]:102, can't find route

 This is only happening with the openib BTL. With tcp BTL it works
 perfectly fine (ofud also works as a matter of fact...). This has been
 tested on two completely different clusters, with identical results.
 In either cases, the IB frabic works normally.

 Any help would be greatly appreciated! Several people in my team
 looked at the problem. Google and the mailing list archive did not
 provide any clue. I believe that from an MPI standpoint, my test
 program is valid (and it works with TCP, which make me feel better
 about the sequence of MPI calls)

 Regards,
 Philippe.

 Background:

 I intend to use openMPI to transport data inside a much larger
 application. Because of that, I cannot used mpiexec. Each process is
 started by our own "job management" and use a name server to find
 about each others. Once all the clients are connected, I would like
 the server to do MPI_Recv to get the data from all the client. I dont
 care about the order or which client are sending data, as

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-20 Thread Ralph Castain

Well, I finally managed to make this work without the required ompi-server 
rendezvous point. The fix is only in the devel trunk right now - I'll have to 
ask the release managers for 1.5 and 1.4 if they want it ported to those series.

On the notion of integrating OMPI to your launch environment: remember that we 
don't necessarily require that you use mpiexec for that purpose. If your launch 
environment provides just a little info in the environment of the launched 
procs, we can usually devise a method that allows the procs to perform an 
MPI_Init as a single job without all this work you are doing.

Only difference is that your procs will all block in MPI_Init until they -all- 
have executed that function. If that isn't a problem, this would be a much more 
scalable and reliable method than doing it thru massive calls to 
MPI_Port_connect.


On Jul 18, 2010, at 4:09 PM, Philippe wrote:

> Ralph,
> 
> thanks for investigating.
> 
> I've applied the two patches you mentioned earlier and ran with the
> ompi server. Although i was able to runn our standalone test, when I
> integrated the changes to our code, the processes entered a crazy loop
> and allocated all the memory available when calling MPI_Port_Connect.
> I was not able to identify why it works standalone but not integrated
> with our code. If I found why, I'll let your know.
> 
> looking forward to your findings. We'll be happy to test any patches
> if you have some!
> 
> p.
> 
> On Sat, Jul 17, 2010 at 9:47 PM, Ralph Castain  wrote:
>> Okay, I can reproduce this problem. Frankly, I don't think this ever worked 
>> with OMPI, and I'm not sure how the choice of BTL makes a difference.
>> 
>> The program is crashing in the communicator definition, which involves a 
>> communication over our internal out-of-band messaging system. That system 
>> has zero connection to any BTL, so it should crash either way.
>> 
>> Regardless, I will play with this a little as time allows. Thanks for the 
>> reproducer!
>> 
>> 
>> On Jun 25, 2010, at 7:23 AM, Philippe wrote:
>> 
>>> Hi,
>>> 
>>> I'm trying to run a test program which consists of a server creating a
>>> port using MPI_Open_port and N clients using MPI_Comm_connect to
>>> connect to the server.
>>> 
>>> I'm able to do so with 1 server and 2 clients, but with 1 server + 3
>>> clients, I get the following error message:
>>> 
>>>   [node003:32274] [[37084,0],0]:route_callback tried routing message
>>> from [[37084,1],0] to [[40912,1],0]:102, can't find route
>>> 
>>> This is only happening with the openib BTL. With tcp BTL it works
>>> perfectly fine (ofud also works as a matter of fact...). This has been
>>> tested on two completely different clusters, with identical results.
>>> In either cases, the IB frabic works normally.
>>> 
>>> Any help would be greatly appreciated! Several people in my team
>>> looked at the problem. Google and the mailing list archive did not
>>> provide any clue. I believe that from an MPI standpoint, my test
>>> program is valid (and it works with TCP, which make me feel better
>>> about the sequence of MPI calls)
>>> 
>>> Regards,
>>> Philippe.
>>> 
>>> 
>>> 
>>> Background:
>>> 
>>> I intend to use openMPI to transport data inside a much larger
>>> application. Because of that, I cannot used mpiexec. Each process is
>>> started by our own "job management" and use a name server to find
>>> about each others. Once all the clients are connected, I would like
>>> the server to do MPI_Recv to get the data from all the client. I dont
>>> care about the order or which client are sending data, as long as I
>>> can receive it with on call. Do do that, the clients and the server
>>> are going through a series of Comm_accept/Conn_connect/Intercomm_merge
>>> so that at the end, all the clients and the server are inside the same
>>> intracomm.
>>> 
>>> Steps:
>>> 
>>> I have a sample program that show the issue. I tried to make it as
>>> short as possible. It needs to be executed on a shared file system
>>> like NFS because the server write the port info to a file that the
>>> client will read. To reproduce the issue, the following steps should
>>> be performed:
>>> 
>>> 0. compile the test with "mpicc -o ben12 ben12.c"
>>> 1. ssh to the machine that will be the server
>>> 2. run ./ben12 3 1
>>> 3. ssh to the machine that will be the client #1
>>> 4. run ./ben12 3 0
>>> 5. repeat step 3-4 for client #2 and #3
>>> 
>>> the server accept the connection from client #1 and merge it in a new
>>> intracomm. It then accept connection from client #2 and merge it. when
>>> the client #3 arrives, the server accept the connection, but that
>>> cause client #1 and #2 to die with the error above (see the complete
>>> trace in the tarball).
>>> 
>>> The exact steps are:
>>> 
>>> - server open port
>>> - server does accept
>>> - client #1 does connect
>>> - server and client #1 do merge
>>> - server does accept
>>> - client #2 does connect
>>> - server, client #1 and clien

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-19 Thread Ralph Castain

I'm wondering if we can't make this simpler. What launch environment are you 
operating under? I know you said you can't use mpiexec, but I'm wondering if we 
could add support for your environment to mpiexec so you could.


On Jul 18, 2010, at 4:09 PM, Philippe wrote:

> Ralph,
> 
> thanks for investigating.
> 
> I've applied the two patches you mentioned earlier and ran with the
> ompi server. Although i was able to runn our standalone test, when I
> integrated the changes to our code, the processes entered a crazy loop
> and allocated all the memory available when calling MPI_Port_Connect.
> I was not able to identify why it works standalone but not integrated
> with our code. If I found why, I'll let your know.
> 
> looking forward to your findings. We'll be happy to test any patches
> if you have some!
> 
> p.
> 
> On Sat, Jul 17, 2010 at 9:47 PM, Ralph Castain  wrote:
>> Okay, I can reproduce this problem. Frankly, I don't think this ever worked 
>> with OMPI, and I'm not sure how the choice of BTL makes a difference.
>> 
>> The program is crashing in the communicator definition, which involves a 
>> communication over our internal out-of-band messaging system. That system 
>> has zero connection to any BTL, so it should crash either way.
>> 
>> Regardless, I will play with this a little as time allows. Thanks for the 
>> reproducer!
>> 
>> 
>> On Jun 25, 2010, at 7:23 AM, Philippe wrote:
>> 
>>> Hi,
>>> 
>>> I'm trying to run a test program which consists of a server creating a
>>> port using MPI_Open_port and N clients using MPI_Comm_connect to
>>> connect to the server.
>>> 
>>> I'm able to do so with 1 server and 2 clients, but with 1 server + 3
>>> clients, I get the following error message:
>>> 
>>>   [node003:32274] [[37084,0],0]:route_callback tried routing message
>>> from [[37084,1],0] to [[40912,1],0]:102, can't find route
>>> 
>>> This is only happening with the openib BTL. With tcp BTL it works
>>> perfectly fine (ofud also works as a matter of fact...). This has been
>>> tested on two completely different clusters, with identical results.
>>> In either cases, the IB frabic works normally.
>>> 
>>> Any help would be greatly appreciated! Several people in my team
>>> looked at the problem. Google and the mailing list archive did not
>>> provide any clue. I believe that from an MPI standpoint, my test
>>> program is valid (and it works with TCP, which make me feel better
>>> about the sequence of MPI calls)
>>> 
>>> Regards,
>>> Philippe.
>>> 
>>> 
>>> 
>>> Background:
>>> 
>>> I intend to use openMPI to transport data inside a much larger
>>> application. Because of that, I cannot used mpiexec. Each process is
>>> started by our own "job management" and use a name server to find
>>> about each others. Once all the clients are connected, I would like
>>> the server to do MPI_Recv to get the data from all the client. I dont
>>> care about the order or which client are sending data, as long as I
>>> can receive it with on call. Do do that, the clients and the server
>>> are going through a series of Comm_accept/Conn_connect/Intercomm_merge
>>> so that at the end, all the clients and the server are inside the same
>>> intracomm.
>>> 
>>> Steps:
>>> 
>>> I have a sample program that show the issue. I tried to make it as
>>> short as possible. It needs to be executed on a shared file system
>>> like NFS because the server write the port info to a file that the
>>> client will read. To reproduce the issue, the following steps should
>>> be performed:
>>> 
>>> 0. compile the test with "mpicc -o ben12 ben12.c"
>>> 1. ssh to the machine that will be the server
>>> 2. run ./ben12 3 1
>>> 3. ssh to the machine that will be the client #1
>>> 4. run ./ben12 3 0
>>> 5. repeat step 3-4 for client #2 and #3
>>> 
>>> the server accept the connection from client #1 and merge it in a new
>>> intracomm. It then accept connection from client #2 and merge it. when
>>> the client #3 arrives, the server accept the connection, but that
>>> cause client #1 and #2 to die with the error above (see the complete
>>> trace in the tarball).
>>> 
>>> The exact steps are:
>>> 
>>> - server open port
>>> - server does accept
>>> - client #1 does connect
>>> - server and client #1 do merge
>>> - server does accept
>>> - client #2 does connect
>>> - server, client #1 and client #2 do merge
>>> - server does accept
>>> - client #3 does connect
>>> - server, client #1, client #2 and client #3 do merge
>>> 
>>> 
>>> My infiniband network works normally with other test programs or
>>> applications (MPI or others like Verbs).
>>> 
>>> Info about my setup:
>>> 
>>>openMPI version = 1.4.1 (I also tried 1.4.2, nightly snapshot of
>>> 1.4.3, nightly snapshot of 1.5 --- all show the same error)
>>>config.log in the tarball
>>>"ompi_info --all" in the tarball
>>>OFED version = 1.3 installed from RHEL 5.3
>>>Distro = RedHat Entreprise Linux 5.3
>>>Kernel = 2.6.18-128.4.1.el5

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-19 Thread Ralph Castain


On Jul 18, 2010, at 4:09 PM, Philippe wrote:

> Ralph,
> 
> thanks for investigating.
> 
> I've applied the two patches you mentioned earlier and ran with the
> ompi server. Although i was able to runn our standalone test, when I
> integrated the changes to our code, the processes entered a crazy loop
> and allocated all the memory available when calling MPI_Port_Connect.
> I was not able to identify why it works standalone but not integrated
> with our code. If I found why, I'll let your know.

How many processes are we talking about?

> 
> looking forward to your findings. We'll be happy to test any patches
> if you have some!
> 
> p.
> 
> On Sat, Jul 17, 2010 at 9:47 PM, Ralph Castain  wrote:
>> Okay, I can reproduce this problem. Frankly, I don't think this ever worked 
>> with OMPI, and I'm not sure how the choice of BTL makes a difference.
>> 
>> The program is crashing in the communicator definition, which involves a 
>> communication over our internal out-of-band messaging system. That system 
>> has zero connection to any BTL, so it should crash either way.
>> 
>> Regardless, I will play with this a little as time allows. Thanks for the 
>> reproducer!
>> 
>> 
>> On Jun 25, 2010, at 7:23 AM, Philippe wrote:
>> 
>>> Hi,
>>> 
>>> I'm trying to run a test program which consists of a server creating a
>>> port using MPI_Open_port and N clients using MPI_Comm_connect to
>>> connect to the server.
>>> 
>>> I'm able to do so with 1 server and 2 clients, but with 1 server + 3
>>> clients, I get the following error message:
>>> 
>>>   [node003:32274] [[37084,0],0]:route_callback tried routing message
>>> from [[37084,1],0] to [[40912,1],0]:102, can't find route
>>> 
>>> This is only happening with the openib BTL. With tcp BTL it works
>>> perfectly fine (ofud also works as a matter of fact...). This has been
>>> tested on two completely different clusters, with identical results.
>>> In either cases, the IB frabic works normally.
>>> 
>>> Any help would be greatly appreciated! Several people in my team
>>> looked at the problem. Google and the mailing list archive did not
>>> provide any clue. I believe that from an MPI standpoint, my test
>>> program is valid (and it works with TCP, which make me feel better
>>> about the sequence of MPI calls)
>>> 
>>> Regards,
>>> Philippe.
>>> 
>>> 
>>> 
>>> Background:
>>> 
>>> I intend to use openMPI to transport data inside a much larger
>>> application. Because of that, I cannot used mpiexec. Each process is
>>> started by our own "job management" and use a name server to find
>>> about each others. Once all the clients are connected, I would like
>>> the server to do MPI_Recv to get the data from all the client. I dont
>>> care about the order or which client are sending data, as long as I
>>> can receive it with on call. Do do that, the clients and the server
>>> are going through a series of Comm_accept/Conn_connect/Intercomm_merge
>>> so that at the end, all the clients and the server are inside the same
>>> intracomm.
>>> 
>>> Steps:
>>> 
>>> I have a sample program that show the issue. I tried to make it as
>>> short as possible. It needs to be executed on a shared file system
>>> like NFS because the server write the port info to a file that the
>>> client will read. To reproduce the issue, the following steps should
>>> be performed:
>>> 
>>> 0. compile the test with "mpicc -o ben12 ben12.c"
>>> 1. ssh to the machine that will be the server
>>> 2. run ./ben12 3 1
>>> 3. ssh to the machine that will be the client #1
>>> 4. run ./ben12 3 0
>>> 5. repeat step 3-4 for client #2 and #3
>>> 
>>> the server accept the connection from client #1 and merge it in a new
>>> intracomm. It then accept connection from client #2 and merge it. when
>>> the client #3 arrives, the server accept the connection, but that
>>> cause client #1 and #2 to die with the error above (see the complete
>>> trace in the tarball).
>>> 
>>> The exact steps are:
>>> 
>>> - server open port
>>> - server does accept
>>> - client #1 does connect
>>> - server and client #1 do merge
>>> - server does accept
>>> - client #2 does connect
>>> - server, client #1 and client #2 do merge
>>> - server does accept
>>> - client #3 does connect
>>> - server, client #1, client #2 and client #3 do merge
>>> 
>>> 
>>> My infiniband network works normally with other test programs or
>>> applications (MPI or others like Verbs).
>>> 
>>> Info about my setup:
>>> 
>>>openMPI version = 1.4.1 (I also tried 1.4.2, nightly snapshot of
>>> 1.4.3, nightly snapshot of 1.5 --- all show the same error)
>>>config.log in the tarball
>>>"ompi_info --all" in the tarball
>>>OFED version = 1.3 installed from RHEL 5.3
>>>Distro = RedHat Entreprise Linux 5.3
>>>Kernel = 2.6.18-128.4.1.el5 x86_64
>>>subnet manager = built-in SM from the cisco/topspin switch
>>>output of ibv_devinfo included in the tarball (there are no "bad" nodes)
>>>"ulimit -l" says "un

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-18 Thread Philippe

Ralph,

thanks for investigating.

I've applied the two patches you mentioned earlier and ran with the
ompi server. Although i was able to runn our standalone test, when I
integrated the changes to our code, the processes entered a crazy loop
and allocated all the memory available when calling MPI_Port_Connect.
I was not able to identify why it works standalone but not integrated
with our code. If I found why, I'll let your know.

looking forward to your findings. We'll be happy to test any patches
if you have some!

p.

On Sat, Jul 17, 2010 at 9:47 PM, Ralph Castain  wrote:
> Okay, I can reproduce this problem. Frankly, I don't think this ever worked 
> with OMPI, and I'm not sure how the choice of BTL makes a difference.
>
> The program is crashing in the communicator definition, which involves a 
> communication over our internal out-of-band messaging system. That system has 
> zero connection to any BTL, so it should crash either way.
>
> Regardless, I will play with this a little as time allows. Thanks for the 
> reproducer!
>
>
> On Jun 25, 2010, at 7:23 AM, Philippe wrote:
>
>> Hi,
>>
>> I'm trying to run a test program which consists of a server creating a
>> port using MPI_Open_port and N clients using MPI_Comm_connect to
>> connect to the server.
>>
>> I'm able to do so with 1 server and 2 clients, but with 1 server + 3
>> clients, I get the following error message:
>>
>>   [node003:32274] [[37084,0],0]:route_callback tried routing message
>> from [[37084,1],0] to [[40912,1],0]:102, can't find route
>>
>> This is only happening with the openib BTL. With tcp BTL it works
>> perfectly fine (ofud also works as a matter of fact...). This has been
>> tested on two completely different clusters, with identical results.
>> In either cases, the IB frabic works normally.
>>
>> Any help would be greatly appreciated! Several people in my team
>> looked at the problem. Google and the mailing list archive did not
>> provide any clue. I believe that from an MPI standpoint, my test
>> program is valid (and it works with TCP, which make me feel better
>> about the sequence of MPI calls)
>>
>> Regards,
>> Philippe.
>>
>>
>>
>> Background:
>>
>> I intend to use openMPI to transport data inside a much larger
>> application. Because of that, I cannot used mpiexec. Each process is
>> started by our own "job management" and use a name server to find
>> about each others. Once all the clients are connected, I would like
>> the server to do MPI_Recv to get the data from all the client. I dont
>> care about the order or which client are sending data, as long as I
>> can receive it with on call. Do do that, the clients and the server
>> are going through a series of Comm_accept/Conn_connect/Intercomm_merge
>> so that at the end, all the clients and the server are inside the same
>> intracomm.
>>
>> Steps:
>>
>> I have a sample program that show the issue. I tried to make it as
>> short as possible. It needs to be executed on a shared file system
>> like NFS because the server write the port info to a file that the
>> client will read. To reproduce the issue, the following steps should
>> be performed:
>>
>> 0. compile the test with "mpicc -o ben12 ben12.c"
>> 1. ssh to the machine that will be the server
>> 2. run ./ben12 3 1
>> 3. ssh to the machine that will be the client #1
>> 4. run ./ben12 3 0
>> 5. repeat step 3-4 for client #2 and #3
>>
>> the server accept the connection from client #1 and merge it in a new
>> intracomm. It then accept connection from client #2 and merge it. when
>> the client #3 arrives, the server accept the connection, but that
>> cause client #1 and #2 to die with the error above (see the complete
>> trace in the tarball).
>>
>> The exact steps are:
>>
>>     - server open port
>>     - server does accept
>>     - client #1 does connect
>>     - server and client #1 do merge
>>     - server does accept
>>     - client #2 does connect
>>     - server, client #1 and client #2 do merge
>>     - server does accept
>>     - client #3 does connect
>>     - server, client #1, client #2 and client #3 do merge
>>
>>
>> My infiniband network works normally with other test programs or
>> applications (MPI or others like Verbs).
>>
>> Info about my setup:
>>
>>    openMPI version = 1.4.1 (I also tried 1.4.2, nightly snapshot of
>> 1.4.3, nightly snapshot of 1.5 --- all show the same error)
>>    config.log in the tarball
>>    "ompi_info --all" in the tarball
>>    OFED version = 1.3 installed from RHEL 5.3
>>    Distro = RedHat Entreprise Linux 5.3
>>    Kernel = 2.6.18-128.4.1.el5 x86_64
>>    subnet manager = built-in SM from the cisco/topspin switch
>>    output of ibv_devinfo included in the tarball (there are no "bad" nodes)
>>    "ulimit -l" says "unlimited"
>>
>> The tarball contains:
>>
>>   - ben12.c: my test program showing the behavior
>>   - config.log / config.out / make.out / make-install.out /
>> ifconfig.txt / ibv-devinfo.txt / ompi_info.txt
>>   - trace-tcp.txt: output of the server and

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-17 Thread Ralph Castain

Okay, I can reproduce this problem. Frankly, I don't think this ever worked 
with OMPI, and I'm not sure how the choice of BTL makes a difference.

The program is crashing in the communicator definition, which involves a 
communication over our internal out-of-band messaging system. That system has 
zero connection to any BTL, so it should crash either way.

Regardless, I will play with this a little as time allows. Thanks for the 
reproducer!


On Jun 25, 2010, at 7:23 AM, Philippe wrote:

> Hi,
> 
> I'm trying to run a test program which consists of a server creating a
> port using MPI_Open_port and N clients using MPI_Comm_connect to
> connect to the server.
> 
> I'm able to do so with 1 server and 2 clients, but with 1 server + 3
> clients, I get the following error message:
> 
>   [node003:32274] [[37084,0],0]:route_callback tried routing message
> from [[37084,1],0] to [[40912,1],0]:102, can't find route
> 
> This is only happening with the openib BTL. With tcp BTL it works
> perfectly fine (ofud also works as a matter of fact...). This has been
> tested on two completely different clusters, with identical results.
> In either cases, the IB frabic works normally.
> 
> Any help would be greatly appreciated! Several people in my team
> looked at the problem. Google and the mailing list archive did not
> provide any clue. I believe that from an MPI standpoint, my test
> program is valid (and it works with TCP, which make me feel better
> about the sequence of MPI calls)
> 
> Regards,
> Philippe.
> 
> 
> 
> Background:
> 
> I intend to use openMPI to transport data inside a much larger
> application. Because of that, I cannot used mpiexec. Each process is
> started by our own "job management" and use a name server to find
> about each others. Once all the clients are connected, I would like
> the server to do MPI_Recv to get the data from all the client. I dont
> care about the order or which client are sending data, as long as I
> can receive it with on call. Do do that, the clients and the server
> are going through a series of Comm_accept/Conn_connect/Intercomm_merge
> so that at the end, all the clients and the server are inside the same
> intracomm.
> 
> Steps:
> 
> I have a sample program that show the issue. I tried to make it as
> short as possible. It needs to be executed on a shared file system
> like NFS because the server write the port info to a file that the
> client will read. To reproduce the issue, the following steps should
> be performed:
> 
> 0. compile the test with "mpicc -o ben12 ben12.c"
> 1. ssh to the machine that will be the server
> 2. run ./ben12 3 1
> 3. ssh to the machine that will be the client #1
> 4. run ./ben12 3 0
> 5. repeat step 3-4 for client #2 and #3
> 
> the server accept the connection from client #1 and merge it in a new
> intracomm. It then accept connection from client #2 and merge it. when
> the client #3 arrives, the server accept the connection, but that
> cause client #1 and #2 to die with the error above (see the complete
> trace in the tarball).
> 
> The exact steps are:
> 
> - server open port
> - server does accept
> - client #1 does connect
> - server and client #1 do merge
> - server does accept
> - client #2 does connect
> - server, client #1 and client #2 do merge
> - server does accept
> - client #3 does connect
> - server, client #1, client #2 and client #3 do merge
> 
> 
> My infiniband network works normally with other test programs or
> applications (MPI or others like Verbs).
> 
> Info about my setup:
> 
>openMPI version = 1.4.1 (I also tried 1.4.2, nightly snapshot of
> 1.4.3, nightly snapshot of 1.5 --- all show the same error)
>config.log in the tarball
>"ompi_info --all" in the tarball
>OFED version = 1.3 installed from RHEL 5.3
>Distro = RedHat Entreprise Linux 5.3
>Kernel = 2.6.18-128.4.1.el5 x86_64
>subnet manager = built-in SM from the cisco/topspin switch
>output of ibv_devinfo included in the tarball (there are no "bad" nodes)
>"ulimit -l" says "unlimited"
> 
> The tarball contains:
> 
>   - ben12.c: my test program showing the behavior
>   - config.log / config.out / make.out / make-install.out /
> ifconfig.txt / ibv-devinfo.txt / ompi_info.txt
>   - trace-tcp.txt: output of the server and each client when it works
> with TCP (I added "btl = tcp,self" in ~/.openmpi/mca-params.conf)
>   - trace-ib.txt: output of the server and each client when it fails
> with IB (I added "btl = openib,self" in ~/.openmpi/mca-params.conf)
> 
> I hope I provided enough info for somebody to reproduce the problem...
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-17 Thread Ralph Castain

Reopening this thread. In searching another problem I ran across this one in a 
different context. Turns out there really is a bug here that needs to be 
addressed.

I'll try to tackle it this weekend - will update you when done.


On Jun 25, 2010, at 7:23 AM, Philippe wrote:

> Hi,
> 
> I'm trying to run a test program which consists of a server creating a
> port using MPI_Open_port and N clients using MPI_Comm_connect to
> connect to the server.
> 
> I'm able to do so with 1 server and 2 clients, but with 1 server + 3
> clients, I get the following error message:
> 
>   [node003:32274] [[37084,0],0]:route_callback tried routing message
> from [[37084,1],0] to [[40912,1],0]:102, can't find route
> 
> This is only happening with the openib BTL. With tcp BTL it works
> perfectly fine (ofud also works as a matter of fact...). This has been
> tested on two completely different clusters, with identical results.
> In either cases, the IB frabic works normally.
> 
> Any help would be greatly appreciated! Several people in my team
> looked at the problem. Google and the mailing list archive did not
> provide any clue. I believe that from an MPI standpoint, my test
> program is valid (and it works with TCP, which make me feel better
> about the sequence of MPI calls)
> 
> Regards,
> Philippe.
> 
> 
> 
> Background:
> 
> I intend to use openMPI to transport data inside a much larger
> application. Because of that, I cannot used mpiexec. Each process is
> started by our own "job management" and use a name server to find
> about each others. Once all the clients are connected, I would like
> the server to do MPI_Recv to get the data from all the client. I dont
> care about the order or which client are sending data, as long as I
> can receive it with on call. Do do that, the clients and the server
> are going through a series of Comm_accept/Conn_connect/Intercomm_merge
> so that at the end, all the clients and the server are inside the same
> intracomm.
> 
> Steps:
> 
> I have a sample program that show the issue. I tried to make it as
> short as possible. It needs to be executed on a shared file system
> like NFS because the server write the port info to a file that the
> client will read. To reproduce the issue, the following steps should
> be performed:
> 
> 0. compile the test with "mpicc -o ben12 ben12.c"
> 1. ssh to the machine that will be the server
> 2. run ./ben12 3 1
> 3. ssh to the machine that will be the client #1
> 4. run ./ben12 3 0
> 5. repeat step 3-4 for client #2 and #3
> 
> the server accept the connection from client #1 and merge it in a new
> intracomm. It then accept connection from client #2 and merge it. when
> the client #3 arrives, the server accept the connection, but that
> cause client #1 and #2 to die with the error above (see the complete
> trace in the tarball).
> 
> The exact steps are:
> 
> - server open port
> - server does accept
> - client #1 does connect
> - server and client #1 do merge
> - server does accept
> - client #2 does connect
> - server, client #1 and client #2 do merge
> - server does accept
> - client #3 does connect
> - server, client #1, client #2 and client #3 do merge
> 
> 
> My infiniband network works normally with other test programs or
> applications (MPI or others like Verbs).
> 
> Info about my setup:
> 
>openMPI version = 1.4.1 (I also tried 1.4.2, nightly snapshot of
> 1.4.3, nightly snapshot of 1.5 --- all show the same error)
>config.log in the tarball
>"ompi_info --all" in the tarball
>OFED version = 1.3 installed from RHEL 5.3
>Distro = RedHat Entreprise Linux 5.3
>Kernel = 2.6.18-128.4.1.el5 x86_64
>subnet manager = built-in SM from the cisco/topspin switch
>output of ibv_devinfo included in the tarball (there are no "bad" nodes)
>"ulimit -l" says "unlimited"
> 
> The tarball contains:
> 
>   - ben12.c: my test program showing the behavior
>   - config.log / config.out / make.out / make-install.out /
> ifconfig.txt / ibv-devinfo.txt / ompi_info.txt
>   - trace-tcp.txt: output of the server and each client when it works
> with TCP (I added "btl = tcp,self" in ~/.openmpi/mca-params.conf)
>   - trace-ib.txt: output of the server and each client when it fails
> with IB (I added "btl = openib,self" in ~/.openmpi/mca-params.conf)
> 
> I hope I provided enough info for somebody to reproduce the problem...
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-12 Thread Ralph Castain

Sorry for the delayed response - Brad asked if I could comment on this.

I'm afraid your application, as written, isn't going to work because the 
rendezvous protocol isn't correct. You cannot just write a port to a file and 
have the other side of a connect/accept read it. The reason for this is that 
OMPI needs to route its out-of-band communications, and needs some handshake to 
get that setup. If we don't route those communications, we consume way too many 
ports on nodes of large machines, and thus cannot run large jobs.

If you want to do this, you need three things:

1. you have to run our "ompi-server" program on a node where all MPI processes 
can reach it. This program serves as the central rendezvous point. See "man 
ompi-server" for info.

2. you'll need a patch I provided to some other users that allows singletons to 
connect to ompi-server without first spawning their own daemon. Otherwise, you 
get an OMPI daemon ("orted") started for every one of your clients.

3. you'll need the patch I'm just completing that allows you to have more than 
64 singletons connecting together, otherwise you'll just segfault. Each of your 
clients looks like a singleton to us because it wasn't started with mpiexec.

I suspect your test works because (a) TCP interconnects differently than IB and 
doesn't talk via OOB to do it, and thus you made it further (but would still 
fail at some point when OOB was required), and (b) you were running fewer than 
64 clients.

HTH
Ralph



On Jun 25, 2010, at 7:23 AM, Philippe wrote:

> Hi,
> 
> I'm trying to run a test program which consists of a server creating a
> port using MPI_Open_port and N clients using MPI_Comm_connect to
> connect to the server.
> 
> I'm able to do so with 1 server and 2 clients, but with 1 server + 3
> clients, I get the following error message:
> 
>   [node003:32274] [[37084,0],0]:route_callback tried routing message
> from [[37084,1],0] to [[40912,1],0]:102, can't find route
> 
> This is only happening with the openib BTL. With tcp BTL it works
> perfectly fine (ofud also works as a matter of fact...). This has been
> tested on two completely different clusters, with identical results.
> In either cases, the IB frabic works normally.
> 
> Any help would be greatly appreciated! Several people in my team
> looked at the problem. Google and the mailing list archive did not
> provide any clue. I believe that from an MPI standpoint, my test
> program is valid (and it works with TCP, which make me feel better
> about the sequence of MPI calls)
> 
> Regards,
> Philippe.
> 
> 
> 
> Background:
> 
> I intend to use openMPI to transport data inside a much larger
> application. Because of that, I cannot used mpiexec. Each process is
> started by our own "job management" and use a name server to find
> about each others. Once all the clients are connected, I would like
> the server to do MPI_Recv to get the data from all the client. I dont
> care about the order or which client are sending data, as long as I
> can receive it with on call. Do do that, the clients and the server
> are going through a series of Comm_accept/Conn_connect/Intercomm_merge
> so that at the end, all the clients and the server are inside the same
> intracomm.
> 
> Steps:
> 
> I have a sample program that show the issue. I tried to make it as
> short as possible. It needs to be executed on a shared file system
> like NFS because the server write the port info to a file that the
> client will read. To reproduce the issue, the following steps should
> be performed:
> 
> 0. compile the test with "mpicc -o ben12 ben12.c"
> 1. ssh to the machine that will be the server
> 2. run ./ben12 3 1
> 3. ssh to the machine that will be the client #1
> 4. run ./ben12 3 0
> 5. repeat step 3-4 for client #2 and #3
> 
> the server accept the connection from client #1 and merge it in a new
> intracomm. It then accept connection from client #2 and merge it. when
> the client #3 arrives, the server accept the connection, but that
> cause client #1 and #2 to die with the error above (see the complete
> trace in the tarball).
> 
> The exact steps are:
> 
> - server open port
> - server does accept
> - client #1 does connect
> - server and client #1 do merge
> - server does accept
> - client #2 does connect
> - server, client #1 and client #2 do merge
> - server does accept
> - client #3 does connect
> - server, client #1, client #2 and client #3 do merge
> 
> 
> My infiniband network works normally with other test programs or
> applications (MPI or others like Verbs).
> 
> Info about my setup:
> 
>openMPI version = 1.4.1 (I also tried 1.4.2, nightly snapshot of
> 1.4.3, nightly snapshot of 1.5 --- all show the same error)
>config.log in the tarball
>"ompi_info --all" in the tarball
>OFED version = 1.3 installed from RHEL 5.3
>Distro = RedHat Entreprise Linux 5.3
>Kernel = 2.6.18-128.4.1.el5 x86_64
>subnet manager = built-in SM from the cisco

[OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-06-25 Thread Philippe

Hi,

I'm trying to run a test program which consists of a server creating a
port using MPI_Open_port and N clients using MPI_Comm_connect to
connect to the server.

I'm able to do so with 1 server and 2 clients, but with 1 server + 3
clients, I get the following error message:

   [node003:32274] [[37084,0],0]:route_callback tried routing message
from [[37084,1],0] to [[40912,1],0]:102, can't find route

This is only happening with the openib BTL. With tcp BTL it works
perfectly fine (ofud also works as a matter of fact...). This has been
tested on two completely different clusters, with identical results.
In either cases, the IB frabic works normally.

Any help would be greatly appreciated! Several people in my team
looked at the problem. Google and the mailing list archive did not
provide any clue. I believe that from an MPI standpoint, my test
program is valid (and it works with TCP, which make me feel better
about the sequence of MPI calls)

Regards,
Philippe.



Background:

I intend to use openMPI to transport data inside a much larger
application. Because of that, I cannot used mpiexec. Each process is
started by our own "job management" and use a name server to find
about each others. Once all the clients are connected, I would like
the server to do MPI_Recv to get the data from all the client. I dont
care about the order or which client are sending data, as long as I
can receive it with on call. Do do that, the clients and the server
are going through a series of Comm_accept/Conn_connect/Intercomm_merge
so that at the end, all the clients and the server are inside the same
intracomm.

Steps:

I have a sample program that show the issue. I tried to make it as
short as possible. It needs to be executed on a shared file system
like NFS because the server write the port info to a file that the
client will read. To reproduce the issue, the following steps should
be performed:

 0. compile the test with "mpicc -o ben12 ben12.c"
 1. ssh to the machine that will be the server
 2. run ./ben12 3 1
 3. ssh to the machine that will be the client #1
 4. run ./ben12 3 0
 5. repeat step 3-4 for client #2 and #3

the server accept the connection from client #1 and merge it in a new
intracomm. It then accept connection from client #2 and merge it. when
the client #3 arrives, the server accept the connection, but that
cause client #1 and #2 to die with the error above (see the complete
trace in the tarball).

The exact steps are:

 - server open port
 - server does accept
 - client #1 does connect
 - server and client #1 do merge
 - server does accept
 - client #2 does connect
 - server, client #1 and client #2 do merge
 - server does accept
 - client #3 does connect
 - server, client #1, client #2 and client #3 do merge


My infiniband network works normally with other test programs or
applications (MPI or others like Verbs).

Info about my setup:

openMPI version = 1.4.1 (I also tried 1.4.2, nightly snapshot of
1.4.3, nightly snapshot of 1.5 --- all show the same error)
config.log in the tarball
"ompi_info --all" in the tarball
OFED version = 1.3 installed from RHEL 5.3
Distro = RedHat Entreprise Linux 5.3
Kernel = 2.6.18-128.4.1.el5 x86_64
subnet manager = built-in SM from the cisco/topspin switch
output of ibv_devinfo included in the tarball (there are no "bad" nodes)
"ulimit -l" says "unlimited"

The tarball contains:

   - ben12.c: my test program showing the behavior
   - config.log / config.out / make.out / make-install.out /
ifconfig.txt / ibv-devinfo.txt / ompi_info.txt
   - trace-tcp.txt: output of the server and each client when it works
with TCP (I added "btl = tcp,self" in ~/.openmpi/mca-params.conf)
   - trace-ib.txt: output of the server and each client when it fails
with IB (I added "btl = openib,self" in ~/.openmpi/mca-params.conf)

I hope I provided enough info for somebody to reproduce the problem...


ompi-output.tar.bz2
Description: BZip2 compressed data

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

[OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

29 matches

Site Navigation

Mail list logo

Footer information