Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
Yes, that's fine. Thx! On Aug 24, 2010, at 9:02 AM, Philippe wrote: > awesome, I'll give it a spin! with the parameters as below? > > p. > > On Tue, Aug 24, 2010 at 10:47 AM, Ralph Castain wrote: >> I think I have this working now - try anything on or after r23647 >> >> >> On Aug 23, 2010, at 1:36 PM, Philippe wrote: >> >>> sure. I took a guess at ppn and nodes for the case where 2 processes >>> are on the same node... I dont claim these are the right values ;-) >>> >>> >>> >>> c0301b10e1 ~/mpi> env|grep OMPI >>> OMPI_MCA_orte_nodes=c0301b10e1 >>> OMPI_MCA_orte_rank=0 >>> OMPI_MCA_orte_ppn=2 >>> OMPI_MCA_orte_num_procs=2 >>> OMPI_MCA_oob_tcp_static_ports_v6=1-11000 >>> OMPI_MCA_ess=generic >>> OMPI_MCA_orte_jobid= >>> OMPI_MCA_oob_tcp_static_ports=1-11000 >>> c0301b10e1 ~/hpa/benchmark/mpi> ./ben1 1 1 1 >>> [c0301b10e1:22827] [[0,],0] assigned port 10001 >>> [c0301b10e1:22827] [[0,],0] accepting connections via event library >>> minsize=1 maxsize=1 delay=1.00 >>> >>> >>> >>> >>> c0301b10e1 ~/mpi> env|grep OMPI >>> OMPI_MCA_orte_nodes=c0301b10e1 >>> OMPI_MCA_orte_rank=1 >>> OMPI_MCA_orte_ppn=2 >>> OMPI_MCA_orte_num_procs=2 >>> OMPI_MCA_oob_tcp_static_ports_v6=1-11000 >>> OMPI_MCA_ess=generic >>> OMPI_MCA_orte_jobid= >>> OMPI_MCA_oob_tcp_static_ports=1-11000 >>> c0301b10e1 ~/hpa/benchmark/mpi> ./ben1 1 1 1 >>> [c0301b10e1:22830] [[0,],1] assigned port 10002 >>> [c0301b10e1:22830] [[0,],1] accepting connections via event library >>> [c0301b10e1:22830] [[0,],1]-[[0,0],0] mca_oob_tcp_send_nb: tag 15 size >>> 189 >>> [c0301b10e1:22830] [[0,],1]-[[0,0],0] >>> mca_oob_tcp_peer_try_connect: connecting port 10002 to: >>> 10.4.72.110:1 >>> [c0301b10e1:22830] [[0,],1]-[[0,0],0] >>> mca_oob_tcp_peer_complete_connect: connection failed: Connection >>> refused (111) - retrying >>> [c0301b10e1:22830] [[0,],1]-[[0,0],0] >>> mca_oob_tcp_peer_try_connect: connecting port 10002 to: >>> 10.4.72.110:1 >>> [c0301b10e1:22830] [[0,],1]-[[0,0],0] >>> mca_oob_tcp_peer_complete_connect: connection failed: Connection >>> refused (111) - retrying >>> [c0301b10e1:22830] [[0,],1]-[[0,0],0] >>> mca_oob_tcp_peer_try_connect: connecting port 10002 to: >>> 10.4.72.110:1 >>> [c0301b10e1:22830] [[0,],1]-[[0,0],0] >>> mca_oob_tcp_peer_complete_connect: connection failed: Connection >>> refused (111) - retrying >>> >>> >>> >>> >>> Thanks! >>> p. >>> >>> >>> On Mon, Aug 23, 2010 at 3:24 PM, Ralph Castain wrote: Can you send me the values you are using for the relevant envars? That way I can try to replicate here On Aug 23, 2010, at 1:15 PM, Philippe wrote: > I took a look at the code but I'm afraid I dont see anything wrong. > > p. > > On Thu, Aug 19, 2010 at 2:32 PM, Ralph Castain wrote: >> Yes, that is correct - we reserve the first port in the range for a >> daemon, >> should one exist. >> The problem is clearly that get_node_rank is returning the wrong value >> for >> the second process (your rank=1). If you want to dig deeper, look at the >> orte/mca/ess/generic code where it generates the nidmap and pidmap. >> There is >> a bug down there somewhere that gives the wrong answer when ppn > 1. >> >> >> On Thu, Aug 19, 2010 at 12:12 PM, Philippe wrote: >>> >>> Ralph, >>> >>> somewhere in ./orte/mca/oob/tcp/oob_tcp.c, there is this comment: >>> >>>orte_node_rank_t nrank; >>>/* do I know my node_local_rank yet? */ >>>if (ORTE_NODE_RANK_INVALID != (nrank = >>> orte_ess.get_node_rank(ORTE_PROC_MY_NAME)) && >>>(nrank+1) < >>> opal_argv_count(mca_oob_tcp_component.tcp4_static_ports)) { >>>/* any daemon takes the first entry, so we start >>> with the second */ >>> >>> which seems constant with process #0 listening on 10001. the question >>> would be why process #1 attempt to connect to port 1 then? or >>> maybe totally unrelated :-) >>> >>> btw, if I trick process #1 to open the connection to 10001 by shifting >>> the range, I now get this error and the process terminate immediately: >>> >>> [c0301b10e1:03919] [[0,],1]-[[0,0],0] >>> mca_oob_tcp_peer_recv_connect_ack: received unexpected process >>> identifier [[0,],0] >>> >>> good luck with the surgery and wishing you a prompt recovery! >>> >>> p. >>> >>> On Thu, Aug 19, 2010 at 2:02 PM, Ralph Castain >>> wrote: Something doesn't look right - here is what the algo attempts to do: given a port range of 1-12000, the lowest rank'd process on the node should open port 1. The next lowest rank on the node will open 10001, etc. So it looks to me like there is some confusion in the local rank al
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
awesome, I'll give it a spin! with the parameters as below? p. On Tue, Aug 24, 2010 at 10:47 AM, Ralph Castain wrote: > I think I have this working now - try anything on or after r23647 > > > On Aug 23, 2010, at 1:36 PM, Philippe wrote: > >> sure. I took a guess at ppn and nodes for the case where 2 processes >> are on the same node... I dont claim these are the right values ;-) >> >> >> >> c0301b10e1 ~/mpi> env|grep OMPI >> OMPI_MCA_orte_nodes=c0301b10e1 >> OMPI_MCA_orte_rank=0 >> OMPI_MCA_orte_ppn=2 >> OMPI_MCA_orte_num_procs=2 >> OMPI_MCA_oob_tcp_static_ports_v6=1-11000 >> OMPI_MCA_ess=generic >> OMPI_MCA_orte_jobid= >> OMPI_MCA_oob_tcp_static_ports=1-11000 >> c0301b10e1 ~/hpa/benchmark/mpi> ./ben1 1 1 1 >> [c0301b10e1:22827] [[0,],0] assigned port 10001 >> [c0301b10e1:22827] [[0,],0] accepting connections via event library >> minsize=1 maxsize=1 delay=1.00 >> >> >> >> >> c0301b10e1 ~/mpi> env|grep OMPI >> OMPI_MCA_orte_nodes=c0301b10e1 >> OMPI_MCA_orte_rank=1 >> OMPI_MCA_orte_ppn=2 >> OMPI_MCA_orte_num_procs=2 >> OMPI_MCA_oob_tcp_static_ports_v6=1-11000 >> OMPI_MCA_ess=generic >> OMPI_MCA_orte_jobid= >> OMPI_MCA_oob_tcp_static_ports=1-11000 >> c0301b10e1 ~/hpa/benchmark/mpi> ./ben1 1 1 1 >> [c0301b10e1:22830] [[0,],1] assigned port 10002 >> [c0301b10e1:22830] [[0,],1] accepting connections via event library >> [c0301b10e1:22830] [[0,],1]-[[0,0],0] mca_oob_tcp_send_nb: tag 15 size >> 189 >> [c0301b10e1:22830] [[0,],1]-[[0,0],0] >> mca_oob_tcp_peer_try_connect: connecting port 10002 to: >> 10.4.72.110:1 >> [c0301b10e1:22830] [[0,],1]-[[0,0],0] >> mca_oob_tcp_peer_complete_connect: connection failed: Connection >> refused (111) - retrying >> [c0301b10e1:22830] [[0,],1]-[[0,0],0] >> mca_oob_tcp_peer_try_connect: connecting port 10002 to: >> 10.4.72.110:1 >> [c0301b10e1:22830] [[0,],1]-[[0,0],0] >> mca_oob_tcp_peer_complete_connect: connection failed: Connection >> refused (111) - retrying >> [c0301b10e1:22830] [[0,],1]-[[0,0],0] >> mca_oob_tcp_peer_try_connect: connecting port 10002 to: >> 10.4.72.110:1 >> [c0301b10e1:22830] [[0,],1]-[[0,0],0] >> mca_oob_tcp_peer_complete_connect: connection failed: Connection >> refused (111) - retrying >> >> >> >> >> Thanks! >> p. >> >> >> On Mon, Aug 23, 2010 at 3:24 PM, Ralph Castain wrote: >>> Can you send me the values you are using for the relevant envars? That way >>> I can try to replicate here >>> >>> >>> On Aug 23, 2010, at 1:15 PM, Philippe wrote: >>> I took a look at the code but I'm afraid I dont see anything wrong. p. On Thu, Aug 19, 2010 at 2:32 PM, Ralph Castain wrote: > Yes, that is correct - we reserve the first port in the range for a > daemon, > should one exist. > The problem is clearly that get_node_rank is returning the wrong value for > the second process (your rank=1). If you want to dig deeper, look at the > orte/mca/ess/generic code where it generates the nidmap and pidmap. There > is > a bug down there somewhere that gives the wrong answer when ppn > 1. > > > On Thu, Aug 19, 2010 at 12:12 PM, Philippe wrote: >> >> Ralph, >> >> somewhere in ./orte/mca/oob/tcp/oob_tcp.c, there is this comment: >> >> orte_node_rank_t nrank; >> /* do I know my node_local_rank yet? */ >> if (ORTE_NODE_RANK_INVALID != (nrank = >> orte_ess.get_node_rank(ORTE_PROC_MY_NAME)) && >> (nrank+1) < >> opal_argv_count(mca_oob_tcp_component.tcp4_static_ports)) { >> /* any daemon takes the first entry, so we start >> with the second */ >> >> which seems constant with process #0 listening on 10001. the question >> would be why process #1 attempt to connect to port 1 then? or >> maybe totally unrelated :-) >> >> btw, if I trick process #1 to open the connection to 10001 by shifting >> the range, I now get this error and the process terminate immediately: >> >> [c0301b10e1:03919] [[0,],1]-[[0,0],0] >> mca_oob_tcp_peer_recv_connect_ack: received unexpected process >> identifier [[0,],0] >> >> good luck with the surgery and wishing you a prompt recovery! >> >> p. >> >> On Thu, Aug 19, 2010 at 2:02 PM, Ralph Castain wrote: >>> Something doesn't look right - here is what the algo attempts to do: >>> given a port range of 1-12000, the lowest rank'd process on the node >>> should open port 1. The next lowest rank on the node will open >>> 10001, >>> etc. >>> So it looks to me like there is some confusion in the local rank algo. >>> I'll >>> have to look at the generic module - must be a bug in it somewhere. >>> This might take a couple of days as I have surgery tomorrow morning, so >>> please forgive the delay. >>> >>> On Thu, Aug 19, 2010 at 11:13 AM, Philip
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
I think I have this working now - try anything on or after r23647 On Aug 23, 2010, at 1:36 PM, Philippe wrote: > sure. I took a guess at ppn and nodes for the case where 2 processes > are on the same node... I dont claim these are the right values ;-) > > > > c0301b10e1 ~/mpi> env|grep OMPI > OMPI_MCA_orte_nodes=c0301b10e1 > OMPI_MCA_orte_rank=0 > OMPI_MCA_orte_ppn=2 > OMPI_MCA_orte_num_procs=2 > OMPI_MCA_oob_tcp_static_ports_v6=1-11000 > OMPI_MCA_ess=generic > OMPI_MCA_orte_jobid= > OMPI_MCA_oob_tcp_static_ports=1-11000 > c0301b10e1 ~/hpa/benchmark/mpi> ./ben1 1 1 1 > [c0301b10e1:22827] [[0,],0] assigned port 10001 > [c0301b10e1:22827] [[0,],0] accepting connections via event library > minsize=1 maxsize=1 delay=1.00 > > > > > c0301b10e1 ~/mpi> env|grep OMPI > OMPI_MCA_orte_nodes=c0301b10e1 > OMPI_MCA_orte_rank=1 > OMPI_MCA_orte_ppn=2 > OMPI_MCA_orte_num_procs=2 > OMPI_MCA_oob_tcp_static_ports_v6=1-11000 > OMPI_MCA_ess=generic > OMPI_MCA_orte_jobid= > OMPI_MCA_oob_tcp_static_ports=1-11000 > c0301b10e1 ~/hpa/benchmark/mpi> ./ben1 1 1 1 > [c0301b10e1:22830] [[0,],1] assigned port 10002 > [c0301b10e1:22830] [[0,],1] accepting connections via event library > [c0301b10e1:22830] [[0,],1]-[[0,0],0] mca_oob_tcp_send_nb: tag 15 size 189 > [c0301b10e1:22830] [[0,],1]-[[0,0],0] > mca_oob_tcp_peer_try_connect: connecting port 10002 to: > 10.4.72.110:1 > [c0301b10e1:22830] [[0,],1]-[[0,0],0] > mca_oob_tcp_peer_complete_connect: connection failed: Connection > refused (111) - retrying > [c0301b10e1:22830] [[0,],1]-[[0,0],0] > mca_oob_tcp_peer_try_connect: connecting port 10002 to: > 10.4.72.110:1 > [c0301b10e1:22830] [[0,],1]-[[0,0],0] > mca_oob_tcp_peer_complete_connect: connection failed: Connection > refused (111) - retrying > [c0301b10e1:22830] [[0,],1]-[[0,0],0] > mca_oob_tcp_peer_try_connect: connecting port 10002 to: > 10.4.72.110:1 > [c0301b10e1:22830] [[0,],1]-[[0,0],0] > mca_oob_tcp_peer_complete_connect: connection failed: Connection > refused (111) - retrying > > > > > Thanks! > p. > > > On Mon, Aug 23, 2010 at 3:24 PM, Ralph Castain wrote: >> Can you send me the values you are using for the relevant envars? That way I >> can try to replicate here >> >> >> On Aug 23, 2010, at 1:15 PM, Philippe wrote: >> >>> I took a look at the code but I'm afraid I dont see anything wrong. >>> >>> p. >>> >>> On Thu, Aug 19, 2010 at 2:32 PM, Ralph Castain wrote: Yes, that is correct - we reserve the first port in the range for a daemon, should one exist. The problem is clearly that get_node_rank is returning the wrong value for the second process (your rank=1). If you want to dig deeper, look at the orte/mca/ess/generic code where it generates the nidmap and pidmap. There is a bug down there somewhere that gives the wrong answer when ppn > 1. On Thu, Aug 19, 2010 at 12:12 PM, Philippe wrote: > > Ralph, > > somewhere in ./orte/mca/oob/tcp/oob_tcp.c, there is this comment: > >orte_node_rank_t nrank; >/* do I know my node_local_rank yet? */ >if (ORTE_NODE_RANK_INVALID != (nrank = > orte_ess.get_node_rank(ORTE_PROC_MY_NAME)) && >(nrank+1) < > opal_argv_count(mca_oob_tcp_component.tcp4_static_ports)) { >/* any daemon takes the first entry, so we start > with the second */ > > which seems constant with process #0 listening on 10001. the question > would be why process #1 attempt to connect to port 1 then? or > maybe totally unrelated :-) > > btw, if I trick process #1 to open the connection to 10001 by shifting > the range, I now get this error and the process terminate immediately: > > [c0301b10e1:03919] [[0,],1]-[[0,0],0] > mca_oob_tcp_peer_recv_connect_ack: received unexpected process > identifier [[0,],0] > > good luck with the surgery and wishing you a prompt recovery! > > p. > > On Thu, Aug 19, 2010 at 2:02 PM, Ralph Castain wrote: >> Something doesn't look right - here is what the algo attempts to do: >> given a port range of 1-12000, the lowest rank'd process on the node >> should open port 1. The next lowest rank on the node will open >> 10001, >> etc. >> So it looks to me like there is some confusion in the local rank algo. >> I'll >> have to look at the generic module - must be a bug in it somewhere. >> This might take a couple of days as I have surgery tomorrow morning, so >> please forgive the delay. >> >> On Thu, Aug 19, 2010 at 11:13 AM, Philippe >> wrote: >>> >>> Ralph, >>> >>> I'm able to use the generic module when the processes are on different >>> machines. >>> >>> what would be the values of the EV when two processes are on the same >
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
sure. I took a guess at ppn and nodes for the case where 2 processes are on the same node... I dont claim these are the right values ;-) c0301b10e1 ~/mpi> env|grep OMPI OMPI_MCA_orte_nodes=c0301b10e1 OMPI_MCA_orte_rank=0 OMPI_MCA_orte_ppn=2 OMPI_MCA_orte_num_procs=2 OMPI_MCA_oob_tcp_static_ports_v6=1-11000 OMPI_MCA_ess=generic OMPI_MCA_orte_jobid= OMPI_MCA_oob_tcp_static_ports=1-11000 c0301b10e1 ~/hpa/benchmark/mpi> ./ben1 1 1 1 [c0301b10e1:22827] [[0,],0] assigned port 10001 [c0301b10e1:22827] [[0,],0] accepting connections via event library minsize=1 maxsize=1 delay=1.00 c0301b10e1 ~/mpi> env|grep OMPI OMPI_MCA_orte_nodes=c0301b10e1 OMPI_MCA_orte_rank=1 OMPI_MCA_orte_ppn=2 OMPI_MCA_orte_num_procs=2 OMPI_MCA_oob_tcp_static_ports_v6=1-11000 OMPI_MCA_ess=generic OMPI_MCA_orte_jobid= OMPI_MCA_oob_tcp_static_ports=1-11000 c0301b10e1 ~/hpa/benchmark/mpi> ./ben1 1 1 1 [c0301b10e1:22830] [[0,],1] assigned port 10002 [c0301b10e1:22830] [[0,],1] accepting connections via event library [c0301b10e1:22830] [[0,],1]-[[0,0],0] mca_oob_tcp_send_nb: tag 15 size 189 [c0301b10e1:22830] [[0,],1]-[[0,0],0] mca_oob_tcp_peer_try_connect: connecting port 10002 to: 10.4.72.110:1 [c0301b10e1:22830] [[0,],1]-[[0,0],0] mca_oob_tcp_peer_complete_connect: connection failed: Connection refused (111) - retrying [c0301b10e1:22830] [[0,],1]-[[0,0],0] mca_oob_tcp_peer_try_connect: connecting port 10002 to: 10.4.72.110:1 [c0301b10e1:22830] [[0,],1]-[[0,0],0] mca_oob_tcp_peer_complete_connect: connection failed: Connection refused (111) - retrying [c0301b10e1:22830] [[0,],1]-[[0,0],0] mca_oob_tcp_peer_try_connect: connecting port 10002 to: 10.4.72.110:1 [c0301b10e1:22830] [[0,],1]-[[0,0],0] mca_oob_tcp_peer_complete_connect: connection failed: Connection refused (111) - retrying Thanks! p. On Mon, Aug 23, 2010 at 3:24 PM, Ralph Castain wrote: > Can you send me the values you are using for the relevant envars? That way I > can try to replicate here > > > On Aug 23, 2010, at 1:15 PM, Philippe wrote: > >> I took a look at the code but I'm afraid I dont see anything wrong. >> >> p. >> >> On Thu, Aug 19, 2010 at 2:32 PM, Ralph Castain wrote: >>> Yes, that is correct - we reserve the first port in the range for a daemon, >>> should one exist. >>> The problem is clearly that get_node_rank is returning the wrong value for >>> the second process (your rank=1). If you want to dig deeper, look at the >>> orte/mca/ess/generic code where it generates the nidmap and pidmap. There is >>> a bug down there somewhere that gives the wrong answer when ppn > 1. >>> >>> >>> On Thu, Aug 19, 2010 at 12:12 PM, Philippe wrote: Ralph, somewhere in ./orte/mca/oob/tcp/oob_tcp.c, there is this comment: orte_node_rank_t nrank; /* do I know my node_local_rank yet? */ if (ORTE_NODE_RANK_INVALID != (nrank = orte_ess.get_node_rank(ORTE_PROC_MY_NAME)) && (nrank+1) < opal_argv_count(mca_oob_tcp_component.tcp4_static_ports)) { /* any daemon takes the first entry, so we start with the second */ which seems constant with process #0 listening on 10001. the question would be why process #1 attempt to connect to port 1 then? or maybe totally unrelated :-) btw, if I trick process #1 to open the connection to 10001 by shifting the range, I now get this error and the process terminate immediately: [c0301b10e1:03919] [[0,],1]-[[0,0],0] mca_oob_tcp_peer_recv_connect_ack: received unexpected process identifier [[0,],0] good luck with the surgery and wishing you a prompt recovery! p. On Thu, Aug 19, 2010 at 2:02 PM, Ralph Castain wrote: > Something doesn't look right - here is what the algo attempts to do: > given a port range of 1-12000, the lowest rank'd process on the node > should open port 1. The next lowest rank on the node will open > 10001, > etc. > So it looks to me like there is some confusion in the local rank algo. > I'll > have to look at the generic module - must be a bug in it somewhere. > This might take a couple of days as I have surgery tomorrow morning, so > please forgive the delay. > > On Thu, Aug 19, 2010 at 11:13 AM, Philippe > wrote: >> >> Ralph, >> >> I'm able to use the generic module when the processes are on different >> machines. >> >> what would be the values of the EV when two processes are on the same >> machine (hopefully talking over SHM). >> >> i've played with combination of nodelist and ppn but no luck. I get >> errors >> like: >> >> >> >> [c0301b10e1:03172] [[0,],1] -> [[0,0],0] (node: c0301b10e1) >> oob-tcp: Number of attempts to create TCP connection has been >> exceeded. Ca
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
Can you send me the values you are using for the relevant envars? That way I can try to replicate here On Aug 23, 2010, at 1:15 PM, Philippe wrote: > I took a look at the code but I'm afraid I dont see anything wrong. > > p. > > On Thu, Aug 19, 2010 at 2:32 PM, Ralph Castain wrote: >> Yes, that is correct - we reserve the first port in the range for a daemon, >> should one exist. >> The problem is clearly that get_node_rank is returning the wrong value for >> the second process (your rank=1). If you want to dig deeper, look at the >> orte/mca/ess/generic code where it generates the nidmap and pidmap. There is >> a bug down there somewhere that gives the wrong answer when ppn > 1. >> >> >> On Thu, Aug 19, 2010 at 12:12 PM, Philippe wrote: >>> >>> Ralph, >>> >>> somewhere in ./orte/mca/oob/tcp/oob_tcp.c, there is this comment: >>> >>>orte_node_rank_t nrank; >>>/* do I know my node_local_rank yet? */ >>>if (ORTE_NODE_RANK_INVALID != (nrank = >>> orte_ess.get_node_rank(ORTE_PROC_MY_NAME)) && >>>(nrank+1) < >>> opal_argv_count(mca_oob_tcp_component.tcp4_static_ports)) { >>>/* any daemon takes the first entry, so we start >>> with the second */ >>> >>> which seems constant with process #0 listening on 10001. the question >>> would be why process #1 attempt to connect to port 1 then? or >>> maybe totally unrelated :-) >>> >>> btw, if I trick process #1 to open the connection to 10001 by shifting >>> the range, I now get this error and the process terminate immediately: >>> >>> [c0301b10e1:03919] [[0,],1]-[[0,0],0] >>> mca_oob_tcp_peer_recv_connect_ack: received unexpected process >>> identifier [[0,],0] >>> >>> good luck with the surgery and wishing you a prompt recovery! >>> >>> p. >>> >>> On Thu, Aug 19, 2010 at 2:02 PM, Ralph Castain wrote: Something doesn't look right - here is what the algo attempts to do: given a port range of 1-12000, the lowest rank'd process on the node should open port 1. The next lowest rank on the node will open 10001, etc. So it looks to me like there is some confusion in the local rank algo. I'll have to look at the generic module - must be a bug in it somewhere. This might take a couple of days as I have surgery tomorrow morning, so please forgive the delay. On Thu, Aug 19, 2010 at 11:13 AM, Philippe wrote: > > Ralph, > > I'm able to use the generic module when the processes are on different > machines. > > what would be the values of the EV when two processes are on the same > machine (hopefully talking over SHM). > > i've played with combination of nodelist and ppn but no luck. I get > errors > like: > > > > [c0301b10e1:03172] [[0,],1] -> [[0,0],0] (node: c0301b10e1) > oob-tcp: Number of attempts to create TCP connection has been > exceeded. Can not communicate with peer > [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file > grpcomm_hier_module.c at line 303 > [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file > base/grpcomm_base_modex.c at line 470 > [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file > grpcomm_hier_module.c at line 484 > > -- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or > environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > orte_grpcomm_modex failed > --> Returned "Unreachable" (-12) instead of "Success" (0) > > -- > *** The MPI_Init() function was called before MPI_INIT was invoked. > *** This is disallowed by the MPI standard. > *** Your MPI job will now abort. > [c0301b10e1:3172] Abort before MPI_INIT completed successfully; not > able to guarantee that all other processes were killed! > > > maybe a related question is how to assign the TCP port range and how > is it used? when the processes are on different machines, I use the > same range and that's ok as long as the range is free. but when the > processes are on the same node, what value should the range be for > each process? My range is 1-12000 (for both processes) and I see > that process with rank #0 listen on port 10001 while process with rank > #1 try to establish a connect to port 1. > > Thanks so much! > p. still here... still trying... ;-) > > On Tue, Jul 27, 2010 at 12:58 AM, Ralph C
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
I took a look at the code but I'm afraid I dont see anything wrong. p. On Thu, Aug 19, 2010 at 2:32 PM, Ralph Castain wrote: > Yes, that is correct - we reserve the first port in the range for a daemon, > should one exist. > The problem is clearly that get_node_rank is returning the wrong value for > the second process (your rank=1). If you want to dig deeper, look at the > orte/mca/ess/generic code where it generates the nidmap and pidmap. There is > a bug down there somewhere that gives the wrong answer when ppn > 1. > > > On Thu, Aug 19, 2010 at 12:12 PM, Philippe wrote: >> >> Ralph, >> >> somewhere in ./orte/mca/oob/tcp/oob_tcp.c, there is this comment: >> >> orte_node_rank_t nrank; >> /* do I know my node_local_rank yet? */ >> if (ORTE_NODE_RANK_INVALID != (nrank = >> orte_ess.get_node_rank(ORTE_PROC_MY_NAME)) && >> (nrank+1) < >> opal_argv_count(mca_oob_tcp_component.tcp4_static_ports)) { >> /* any daemon takes the first entry, so we start >> with the second */ >> >> which seems constant with process #0 listening on 10001. the question >> would be why process #1 attempt to connect to port 1 then? or >> maybe totally unrelated :-) >> >> btw, if I trick process #1 to open the connection to 10001 by shifting >> the range, I now get this error and the process terminate immediately: >> >> [c0301b10e1:03919] [[0,],1]-[[0,0],0] >> mca_oob_tcp_peer_recv_connect_ack: received unexpected process >> identifier [[0,],0] >> >> good luck with the surgery and wishing you a prompt recovery! >> >> p. >> >> On Thu, Aug 19, 2010 at 2:02 PM, Ralph Castain wrote: >> > Something doesn't look right - here is what the algo attempts to do: >> > given a port range of 1-12000, the lowest rank'd process on the node >> > should open port 1. The next lowest rank on the node will open >> > 10001, >> > etc. >> > So it looks to me like there is some confusion in the local rank algo. >> > I'll >> > have to look at the generic module - must be a bug in it somewhere. >> > This might take a couple of days as I have surgery tomorrow morning, so >> > please forgive the delay. >> > >> > On Thu, Aug 19, 2010 at 11:13 AM, Philippe >> > wrote: >> >> >> >> Ralph, >> >> >> >> I'm able to use the generic module when the processes are on different >> >> machines. >> >> >> >> what would be the values of the EV when two processes are on the same >> >> machine (hopefully talking over SHM). >> >> >> >> i've played with combination of nodelist and ppn but no luck. I get >> >> errors >> >> like: >> >> >> >> >> >> >> >> [c0301b10e1:03172] [[0,],1] -> [[0,0],0] (node: c0301b10e1) >> >> oob-tcp: Number of attempts to create TCP connection has been >> >> exceeded. Can not communicate with peer >> >> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file >> >> grpcomm_hier_module.c at line 303 >> >> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file >> >> base/grpcomm_base_modex.c at line 470 >> >> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file >> >> grpcomm_hier_module.c at line 484 >> >> >> >> -- >> >> It looks like MPI_INIT failed for some reason; your parallel process is >> >> likely to abort. There are many reasons that a parallel process can >> >> fail during MPI_INIT; some of which are due to configuration or >> >> environment >> >> problems. This failure appears to be an internal failure; here's some >> >> additional information (which may only be relevant to an Open MPI >> >> developer): >> >> >> >> orte_grpcomm_modex failed >> >> --> Returned "Unreachable" (-12) instead of "Success" (0) >> >> >> >> -- >> >> *** The MPI_Init() function was called before MPI_INIT was invoked. >> >> *** This is disallowed by the MPI standard. >> >> *** Your MPI job will now abort. >> >> [c0301b10e1:3172] Abort before MPI_INIT completed successfully; not >> >> able to guarantee that all other processes were killed! >> >> >> >> >> >> maybe a related question is how to assign the TCP port range and how >> >> is it used? when the processes are on different machines, I use the >> >> same range and that's ok as long as the range is free. but when the >> >> processes are on the same node, what value should the range be for >> >> each process? My range is 1-12000 (for both processes) and I see >> >> that process with rank #0 listen on port 10001 while process with rank >> >> #1 try to establish a connect to port 1. >> >> >> >> Thanks so much! >> >> p. still here... still trying... ;-) >> >> >> >> On Tue, Jul 27, 2010 at 12:58 AM, Ralph Castain >> >> wrote: >> >> > Use what hostname returns - don't worry about IP addresses as we'll >> >> > discover them. >> >> > >> >> > On Jul 26, 2010, at 10:45 PM, Philippe wrote: >> >> > >> >> >> Thanks a lot! >> >> >> >> >>
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
Yes, that is correct - we reserve the first port in the range for a daemon, should one exist. The problem is clearly that get_node_rank is returning the wrong value for the second process (your rank=1). If you want to dig deeper, look at the orte/mca/ess/generic code where it generates the nidmap and pidmap. There is a bug down there somewhere that gives the wrong answer when ppn > 1. On Thu, Aug 19, 2010 at 12:12 PM, Philippe wrote: > Ralph, > > somewhere in ./orte/mca/oob/tcp/oob_tcp.c, there is this comment: > >orte_node_rank_t nrank; >/* do I know my node_local_rank yet? */ >if (ORTE_NODE_RANK_INVALID != (nrank = > orte_ess.get_node_rank(ORTE_PROC_MY_NAME)) && >(nrank+1) < > opal_argv_count(mca_oob_tcp_component.tcp4_static_ports)) { >/* any daemon takes the first entry, so we start > with the second */ > > which seems constant with process #0 listening on 10001. the question > would be why process #1 attempt to connect to port 1 then? or > maybe totally unrelated :-) > > btw, if I trick process #1 to open the connection to 10001 by shifting > the range, I now get this error and the process terminate immediately: > > [c0301b10e1:03919] [[0,],1]-[[0,0],0] > mca_oob_tcp_peer_recv_connect_ack: received unexpected process > identifier [[0,],0] > > good luck with the surgery and wishing you a prompt recovery! > > p. > > On Thu, Aug 19, 2010 at 2:02 PM, Ralph Castain wrote: > > Something doesn't look right - here is what the algo attempts to do: > > given a port range of 1-12000, the lowest rank'd process on the node > > should open port 1. The next lowest rank on the node will open 10001, > > etc. > > So it looks to me like there is some confusion in the local rank algo. > I'll > > have to look at the generic module - must be a bug in it somewhere. > > This might take a couple of days as I have surgery tomorrow morning, so > > please forgive the delay. > > > > On Thu, Aug 19, 2010 at 11:13 AM, Philippe > wrote: > >> > >> Ralph, > >> > >> I'm able to use the generic module when the processes are on different > >> machines. > >> > >> what would be the values of the EV when two processes are on the same > >> machine (hopefully talking over SHM). > >> > >> i've played with combination of nodelist and ppn but no luck. I get > errors > >> like: > >> > >> > >> > >> [c0301b10e1:03172] [[0,],1] -> [[0,0],0] (node: c0301b10e1) > >> oob-tcp: Number of attempts to create TCP connection has been > >> exceeded. Can not communicate with peer > >> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file > >> grpcomm_hier_module.c at line 303 > >> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file > >> base/grpcomm_base_modex.c at line 470 > >> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file > >> grpcomm_hier_module.c at line 484 > >> > -- > >> It looks like MPI_INIT failed for some reason; your parallel process is > >> likely to abort. There are many reasons that a parallel process can > >> fail during MPI_INIT; some of which are due to configuration or > >> environment > >> problems. This failure appears to be an internal failure; here's some > >> additional information (which may only be relevant to an Open MPI > >> developer): > >> > >> orte_grpcomm_modex failed > >> --> Returned "Unreachable" (-12) instead of "Success" (0) > >> > -- > >> *** The MPI_Init() function was called before MPI_INIT was invoked. > >> *** This is disallowed by the MPI standard. > >> *** Your MPI job will now abort. > >> [c0301b10e1:3172] Abort before MPI_INIT completed successfully; not > >> able to guarantee that all other processes were killed! > >> > >> > >> maybe a related question is how to assign the TCP port range and how > >> is it used? when the processes are on different machines, I use the > >> same range and that's ok as long as the range is free. but when the > >> processes are on the same node, what value should the range be for > >> each process? My range is 1-12000 (for both processes) and I see > >> that process with rank #0 listen on port 10001 while process with rank > >> #1 try to establish a connect to port 1. > >> > >> Thanks so much! > >> p. still here... still trying... ;-) > >> > >> On Tue, Jul 27, 2010 at 12:58 AM, Ralph Castain > wrote: > >> > Use what hostname returns - don't worry about IP addresses as we'll > >> > discover them. > >> > > >> > On Jul 26, 2010, at 10:45 PM, Philippe wrote: > >> > > >> >> Thanks a lot! > >> >> > >> >> now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our > >> >> nodes have a short/long name (it's rhel 5.x, so the command hostname > >> >> returns the long name) and at least 2 IP addresses. > >> >> > >> >> p. > >> >> > >> >> On Tue, Jul 27, 2010 at 12:06
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
Ralph, somewhere in ./orte/mca/oob/tcp/oob_tcp.c, there is this comment: orte_node_rank_t nrank; /* do I know my node_local_rank yet? */ if (ORTE_NODE_RANK_INVALID != (nrank = orte_ess.get_node_rank(ORTE_PROC_MY_NAME)) && (nrank+1) < opal_argv_count(mca_oob_tcp_component.tcp4_static_ports)) { /* any daemon takes the first entry, so we start with the second */ which seems constant with process #0 listening on 10001. the question would be why process #1 attempt to connect to port 1 then? or maybe totally unrelated :-) btw, if I trick process #1 to open the connection to 10001 by shifting the range, I now get this error and the process terminate immediately: [c0301b10e1:03919] [[0,],1]-[[0,0],0] mca_oob_tcp_peer_recv_connect_ack: received unexpected process identifier [[0,],0] good luck with the surgery and wishing you a prompt recovery! p. On Thu, Aug 19, 2010 at 2:02 PM, Ralph Castain wrote: > Something doesn't look right - here is what the algo attempts to do: > given a port range of 1-12000, the lowest rank'd process on the node > should open port 1. The next lowest rank on the node will open 10001, > etc. > So it looks to me like there is some confusion in the local rank algo. I'll > have to look at the generic module - must be a bug in it somewhere. > This might take a couple of days as I have surgery tomorrow morning, so > please forgive the delay. > > On Thu, Aug 19, 2010 at 11:13 AM, Philippe wrote: >> >> Ralph, >> >> I'm able to use the generic module when the processes are on different >> machines. >> >> what would be the values of the EV when two processes are on the same >> machine (hopefully talking over SHM). >> >> i've played with combination of nodelist and ppn but no luck. I get errors >> like: >> >> >> >> [c0301b10e1:03172] [[0,],1] -> [[0,0],0] (node: c0301b10e1) >> oob-tcp: Number of attempts to create TCP connection has been >> exceeded. Can not communicate with peer >> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file >> grpcomm_hier_module.c at line 303 >> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file >> base/grpcomm_base_modex.c at line 470 >> [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file >> grpcomm_hier_module.c at line 484 >> -- >> It looks like MPI_INIT failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during MPI_INIT; some of which are due to configuration or >> environment >> problems. This failure appears to be an internal failure; here's some >> additional information (which may only be relevant to an Open MPI >> developer): >> >> orte_grpcomm_modex failed >> --> Returned "Unreachable" (-12) instead of "Success" (0) >> -- >> *** The MPI_Init() function was called before MPI_INIT was invoked. >> *** This is disallowed by the MPI standard. >> *** Your MPI job will now abort. >> [c0301b10e1:3172] Abort before MPI_INIT completed successfully; not >> able to guarantee that all other processes were killed! >> >> >> maybe a related question is how to assign the TCP port range and how >> is it used? when the processes are on different machines, I use the >> same range and that's ok as long as the range is free. but when the >> processes are on the same node, what value should the range be for >> each process? My range is 1-12000 (for both processes) and I see >> that process with rank #0 listen on port 10001 while process with rank >> #1 try to establish a connect to port 1. >> >> Thanks so much! >> p. still here... still trying... ;-) >> >> On Tue, Jul 27, 2010 at 12:58 AM, Ralph Castain wrote: >> > Use what hostname returns - don't worry about IP addresses as we'll >> > discover them. >> > >> > On Jul 26, 2010, at 10:45 PM, Philippe wrote: >> > >> >> Thanks a lot! >> >> >> >> now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our >> >> nodes have a short/long name (it's rhel 5.x, so the command hostname >> >> returns the long name) and at least 2 IP addresses. >> >> >> >> p. >> >> >> >> On Tue, Jul 27, 2010 at 12:06 AM, Ralph Castain >> >> wrote: >> >>> Okay, fixed in r23499. Thanks again... >> >>> >> >>> >> >>> On Jul 26, 2010, at 9:47 PM, Ralph Castain wrote: >> >>> >> Doh - yes it should! I'll fix it right now. >> >> Thanks! >> >> On Jul 26, 2010, at 9:28 PM, Philippe wrote: >> >> > Ralph, >> > >> > i was able to test the generic module and it seems to be working. >> > >> > one question tho, the function orte_ess_generic_component_query in >> > "orte/mca/ess/generic/ess_generic_component.c" calls getenv with the >> > argument "OMPI_MCA_enc", which seems to cause the module to fail to >> > load. s
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
Something doesn't look right - here is what the algo attempts to do: given a port range of 1-12000, the lowest rank'd process on the node should open port 1. The next lowest rank on the node will open 10001, etc. So it looks to me like there is some confusion in the local rank algo. I'll have to look at the generic module - must be a bug in it somewhere. This might take a couple of days as I have surgery tomorrow morning, so please forgive the delay. On Thu, Aug 19, 2010 at 11:13 AM, Philippe wrote: > Ralph, > > I'm able to use the generic module when the processes are on different > machines. > > what would be the values of the EV when two processes are on the same > machine (hopefully talking over SHM). > > i've played with combination of nodelist and ppn but no luck. I get errors > like: > > > > [c0301b10e1:03172] [[0,],1] -> [[0,0],0] (node: c0301b10e1) > oob-tcp: Number of attempts to create TCP connection has been > exceeded. Can not communicate with peer > [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file > grpcomm_hier_module.c at line 303 > [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file > base/grpcomm_base_modex.c at line 470 > [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file > grpcomm_hier_module.c at line 484 > -- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > orte_grpcomm_modex failed > --> Returned "Unreachable" (-12) instead of "Success" (0) > -- > *** The MPI_Init() function was called before MPI_INIT was invoked. > *** This is disallowed by the MPI standard. > *** Your MPI job will now abort. > [c0301b10e1:3172] Abort before MPI_INIT completed successfully; not > able to guarantee that all other processes were killed! > > > maybe a related question is how to assign the TCP port range and how > is it used? when the processes are on different machines, I use the > same range and that's ok as long as the range is free. but when the > processes are on the same node, what value should the range be for > each process? My range is 1-12000 (for both processes) and I see > that process with rank #0 listen on port 10001 while process with rank > #1 try to establish a connect to port 1. > > Thanks so much! > p. still here... still trying... ;-) > > On Tue, Jul 27, 2010 at 12:58 AM, Ralph Castain wrote: > > Use what hostname returns - don't worry about IP addresses as we'll > discover them. > > > > On Jul 26, 2010, at 10:45 PM, Philippe wrote: > > > >> Thanks a lot! > >> > >> now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our > >> nodes have a short/long name (it's rhel 5.x, so the command hostname > >> returns the long name) and at least 2 IP addresses. > >> > >> p. > >> > >> On Tue, Jul 27, 2010 at 12:06 AM, Ralph Castain > wrote: > >>> Okay, fixed in r23499. Thanks again... > >>> > >>> > >>> On Jul 26, 2010, at 9:47 PM, Ralph Castain wrote: > >>> > Doh - yes it should! I'll fix it right now. > > Thanks! > > On Jul 26, 2010, at 9:28 PM, Philippe wrote: > > > Ralph, > > > > i was able to test the generic module and it seems to be working. > > > > one question tho, the function orte_ess_generic_component_query in > > "orte/mca/ess/generic/ess_generic_component.c" calls getenv with the > > argument "OMPI_MCA_enc", which seems to cause the module to fail to > > load. shouldnt it be "OMPI_MCA_ess" ? > > > > . > > > > /* only pick us if directed to do so */ > > if (NULL != (pick = getenv("OMPI_MCA_env")) && > >0 == strcmp(pick, "generic")) { > > *priority = 1000; > > *module = (mca_base_module_t *)&orte_ess_generic_module; > > > > ... > > > > p. > > > > On Thu, Jul 22, 2010 at 5:53 PM, Ralph Castain > wrote: > >> Dev trunk looks okay right now - I think you'll be fine using it. My > new component -might- work with 1.5, but probably not with 1.4. I haven't > checked either of them. > >> > >> Anything at r23478 or above will have the new module. Let me know > how it works for you. I haven't tested it myself, but am pretty sure it > should work. > >> > >> > >> On Jul 22, 2010, at 3:22 PM, Philippe wrote: > >> > >>> Ralph, > >>> > >>> Thank you so much!! > >>> > >>> I'll give it a try and let you know. > >>> > >>> I know it's a tough question, but how stable is the dev trunk? Can > I > >>> just grab the latest and run, or am I
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
Ralph, I'm able to use the generic module when the processes are on different machines. what would be the values of the EV when two processes are on the same machine (hopefully talking over SHM). i've played with combination of nodelist and ppn but no luck. I get errors like: [c0301b10e1:03172] [[0,],1] -> [[0,0],0] (node: c0301b10e1) oob-tcp: Number of attempts to create TCP connection has been exceeded. Can not communicate with peer [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file grpcomm_hier_module.c at line 303 [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file base/grpcomm_base_modex.c at line 470 [c0301b10e1:03172] [[0,],1] ORTE_ERROR_LOG: Unreachable in file grpcomm_hier_module.c at line 484 -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_grpcomm_modex failed --> Returned "Unreachable" (-12) instead of "Success" (0) -- *** The MPI_Init() function was called before MPI_INIT was invoked. *** This is disallowed by the MPI standard. *** Your MPI job will now abort. [c0301b10e1:3172] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! maybe a related question is how to assign the TCP port range and how is it used? when the processes are on different machines, I use the same range and that's ok as long as the range is free. but when the processes are on the same node, what value should the range be for each process? My range is 1-12000 (for both processes) and I see that process with rank #0 listen on port 10001 while process with rank #1 try to establish a connect to port 1. Thanks so much! p. still here... still trying... ;-) On Tue, Jul 27, 2010 at 12:58 AM, Ralph Castain wrote: > Use what hostname returns - don't worry about IP addresses as we'll discover > them. > > On Jul 26, 2010, at 10:45 PM, Philippe wrote: > >> Thanks a lot! >> >> now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our >> nodes have a short/long name (it's rhel 5.x, so the command hostname >> returns the long name) and at least 2 IP addresses. >> >> p. >> >> On Tue, Jul 27, 2010 at 12:06 AM, Ralph Castain wrote: >>> Okay, fixed in r23499. Thanks again... >>> >>> >>> On Jul 26, 2010, at 9:47 PM, Ralph Castain wrote: >>> Doh - yes it should! I'll fix it right now. Thanks! On Jul 26, 2010, at 9:28 PM, Philippe wrote: > Ralph, > > i was able to test the generic module and it seems to be working. > > one question tho, the function orte_ess_generic_component_query in > "orte/mca/ess/generic/ess_generic_component.c" calls getenv with the > argument "OMPI_MCA_enc", which seems to cause the module to fail to > load. shouldnt it be "OMPI_MCA_ess" ? > > . > > /* only pick us if directed to do so */ > if (NULL != (pick = getenv("OMPI_MCA_env")) && > 0 == strcmp(pick, "generic")) { > *priority = 1000; > *module = (mca_base_module_t *)&orte_ess_generic_module; > > ... > > p. > > On Thu, Jul 22, 2010 at 5:53 PM, Ralph Castain wrote: >> Dev trunk looks okay right now - I think you'll be fine using it. My new >> component -might- work with 1.5, but probably not with 1.4. I haven't >> checked either of them. >> >> Anything at r23478 or above will have the new module. Let me know how it >> works for you. I haven't tested it myself, but am pretty sure it should >> work. >> >> >> On Jul 22, 2010, at 3:22 PM, Philippe wrote: >> >>> Ralph, >>> >>> Thank you so much!! >>> >>> I'll give it a try and let you know. >>> >>> I know it's a tough question, but how stable is the dev trunk? Can I >>> just grab the latest and run, or am I better off taking your changes >>> and copy them back in a stable release? (if so, which one? 1.4? 1.5?) >>> >>> p. >>> >>> On Thu, Jul 22, 2010 at 3:50 PM, Ralph Castain >>> wrote: It was easier for me to just construct this module than to explain how to do so :-) I will commit it this evening (couple of hours from now) as that is our standard practice. You'll need to use the developer's trunk, though, to use it. Here are the envars you'll need to provide: Each process needs to get the same following values: * OMPI_MCA_ess=generic * OMPI_MCA_orte_num_procs= >>
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
Use what hostname returns - don't worry about IP addresses as we'll discover them. On Jul 26, 2010, at 10:45 PM, Philippe wrote: > Thanks a lot! > > now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our > nodes have a short/long name (it's rhel 5.x, so the command hostname > returns the long name) and at least 2 IP addresses. > > p. > > On Tue, Jul 27, 2010 at 12:06 AM, Ralph Castain wrote: >> Okay, fixed in r23499. Thanks again... >> >> >> On Jul 26, 2010, at 9:47 PM, Ralph Castain wrote: >> >>> Doh - yes it should! I'll fix it right now. >>> >>> Thanks! >>> >>> On Jul 26, 2010, at 9:28 PM, Philippe wrote: >>> Ralph, i was able to test the generic module and it seems to be working. one question tho, the function orte_ess_generic_component_query in "orte/mca/ess/generic/ess_generic_component.c" calls getenv with the argument "OMPI_MCA_enc", which seems to cause the module to fail to load. shouldnt it be "OMPI_MCA_ess" ? . /* only pick us if directed to do so */ if (NULL != (pick = getenv("OMPI_MCA_env")) && 0 == strcmp(pick, "generic")) { *priority = 1000; *module = (mca_base_module_t *)&orte_ess_generic_module; ... p. On Thu, Jul 22, 2010 at 5:53 PM, Ralph Castain wrote: > Dev trunk looks okay right now - I think you'll be fine using it. My new > component -might- work with 1.5, but probably not with 1.4. I haven't > checked either of them. > > Anything at r23478 or above will have the new module. Let me know how it > works for you. I haven't tested it myself, but am pretty sure it should > work. > > > On Jul 22, 2010, at 3:22 PM, Philippe wrote: > >> Ralph, >> >> Thank you so much!! >> >> I'll give it a try and let you know. >> >> I know it's a tough question, but how stable is the dev trunk? Can I >> just grab the latest and run, or am I better off taking your changes >> and copy them back in a stable release? (if so, which one? 1.4? 1.5?) >> >> p. >> >> On Thu, Jul 22, 2010 at 3:50 PM, Ralph Castain wrote: >>> It was easier for me to just construct this module than to explain how >>> to do so :-) >>> >>> I will commit it this evening (couple of hours from now) as that is our >>> standard practice. You'll need to use the developer's trunk, though, to >>> use it. >>> >>> Here are the envars you'll need to provide: >>> >>> Each process needs to get the same following values: >>> >>> * OMPI_MCA_ess=generic >>> * OMPI_MCA_orte_num_procs= >>> * OMPI_MCA_orte_nodes=>> procs reside> >>> * OMPI_MCA_orte_ppn= >>> >>> Note that I have assumed this last value is a constant for simplicity. >>> If that isn't the case, let me know - you could instead provide it as a >>> comma-separated list of values with an entry for each node. >>> >>> In addition, you need to provide the following value that will be >>> unique to each process: >>> >>> * OMPI_MCA_orte_rank= >>> >>> Finally, you have to provide a range of static TCP ports for use by the >>> processes. Pick any range that you know will be available across all >>> the nodes. You then need to ensure that each process sees the following >>> envar: >>> >>> * OMPI_MCA_oob_tcp_static_ports=6000-6010 <== obviously, replace this >>> with your range >>> >>> You will need a port range that is at least equal to the ppn for the >>> job (each proc on a node will take one of the provided ports). >>> >>> That should do it. I compute everything else I need from those values. >>> >>> Does that work for you? >>> Ralph >>> >>> > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
Thanks a lot! now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our nodes have a short/long name (it's rhel 5.x, so the command hostname returns the long name) and at least 2 IP addresses. p. On Tue, Jul 27, 2010 at 12:06 AM, Ralph Castain wrote: > Okay, fixed in r23499. Thanks again... > > > On Jul 26, 2010, at 9:47 PM, Ralph Castain wrote: > >> Doh - yes it should! I'll fix it right now. >> >> Thanks! >> >> On Jul 26, 2010, at 9:28 PM, Philippe wrote: >> >>> Ralph, >>> >>> i was able to test the generic module and it seems to be working. >>> >>> one question tho, the function orte_ess_generic_component_query in >>> "orte/mca/ess/generic/ess_generic_component.c" calls getenv with the >>> argument "OMPI_MCA_enc", which seems to cause the module to fail to >>> load. shouldnt it be "OMPI_MCA_ess" ? >>> >>> . >>> >>> /* only pick us if directed to do so */ >>> if (NULL != (pick = getenv("OMPI_MCA_env")) && >>> 0 == strcmp(pick, "generic")) { >>> *priority = 1000; >>> *module = (mca_base_module_t *)&orte_ess_generic_module; >>> >>> ... >>> >>> p. >>> >>> On Thu, Jul 22, 2010 at 5:53 PM, Ralph Castain wrote: Dev trunk looks okay right now - I think you'll be fine using it. My new component -might- work with 1.5, but probably not with 1.4. I haven't checked either of them. Anything at r23478 or above will have the new module. Let me know how it works for you. I haven't tested it myself, but am pretty sure it should work. On Jul 22, 2010, at 3:22 PM, Philippe wrote: > Ralph, > > Thank you so much!! > > I'll give it a try and let you know. > > I know it's a tough question, but how stable is the dev trunk? Can I > just grab the latest and run, or am I better off taking your changes > and copy them back in a stable release? (if so, which one? 1.4? 1.5?) > > p. > > On Thu, Jul 22, 2010 at 3:50 PM, Ralph Castain wrote: >> It was easier for me to just construct this module than to explain how >> to do so :-) >> >> I will commit it this evening (couple of hours from now) as that is our >> standard practice. You'll need to use the developer's trunk, though, to >> use it. >> >> Here are the envars you'll need to provide: >> >> Each process needs to get the same following values: >> >> * OMPI_MCA_ess=generic >> * OMPI_MCA_orte_num_procs= >> * OMPI_MCA_orte_nodes=> procs reside> >> * OMPI_MCA_orte_ppn= >> >> Note that I have assumed this last value is a constant for simplicity. >> If that isn't the case, let me know - you could instead provide it as a >> comma-separated list of values with an entry for each node. >> >> In addition, you need to provide the following value that will be unique >> to each process: >> >> * OMPI_MCA_orte_rank= >> >> Finally, you have to provide a range of static TCP ports for use by the >> processes. Pick any range that you know will be available across all the >> nodes. You then need to ensure that each process sees the following >> envar: >> >> * OMPI_MCA_oob_tcp_static_ports=6000-6010 <== obviously, replace this >> with your range >> >> You will need a port range that is at least equal to the ppn for the job >> (each proc on a node will take one of the provided ports). >> >> That should do it. I compute everything else I need from those values. >> >> Does that work for you? >> Ralph >> >>
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
Okay, fixed in r23499. Thanks again... On Jul 26, 2010, at 9:47 PM, Ralph Castain wrote: > Doh - yes it should! I'll fix it right now. > > Thanks! > > On Jul 26, 2010, at 9:28 PM, Philippe wrote: > >> Ralph, >> >> i was able to test the generic module and it seems to be working. >> >> one question tho, the function orte_ess_generic_component_query in >> "orte/mca/ess/generic/ess_generic_component.c" calls getenv with the >> argument "OMPI_MCA_enc", which seems to cause the module to fail to >> load. shouldnt it be "OMPI_MCA_ess" ? >> >> . >> >> /* only pick us if directed to do so */ >> if (NULL != (pick = getenv("OMPI_MCA_env")) && >>0 == strcmp(pick, "generic")) { >> *priority = 1000; >> *module = (mca_base_module_t *)&orte_ess_generic_module; >> >> ... >> >> p. >> >> On Thu, Jul 22, 2010 at 5:53 PM, Ralph Castain wrote: >>> Dev trunk looks okay right now - I think you'll be fine using it. My new >>> component -might- work with 1.5, but probably not with 1.4. I haven't >>> checked either of them. >>> >>> Anything at r23478 or above will have the new module. Let me know how it >>> works for you. I haven't tested it myself, but am pretty sure it should >>> work. >>> >>> >>> On Jul 22, 2010, at 3:22 PM, Philippe wrote: >>> Ralph, Thank you so much!! I'll give it a try and let you know. I know it's a tough question, but how stable is the dev trunk? Can I just grab the latest and run, or am I better off taking your changes and copy them back in a stable release? (if so, which one? 1.4? 1.5?) p. On Thu, Jul 22, 2010 at 3:50 PM, Ralph Castain wrote: > It was easier for me to just construct this module than to explain how to > do so :-) > > I will commit it this evening (couple of hours from now) as that is our > standard practice. You'll need to use the developer's trunk, though, to > use it. > > Here are the envars you'll need to provide: > > Each process needs to get the same following values: > > * OMPI_MCA_ess=generic > * OMPI_MCA_orte_num_procs= > * OMPI_MCA_orte_nodes= procs reside> > * OMPI_MCA_orte_ppn= > > Note that I have assumed this last value is a constant for simplicity. If > that isn't the case, let me know - you could instead provide it as a > comma-separated list of values with an entry for each node. > > In addition, you need to provide the following value that will be unique > to each process: > > * OMPI_MCA_orte_rank= > > Finally, you have to provide a range of static TCP ports for use by the > processes. Pick any range that you know will be available across all the > nodes. You then need to ensure that each process sees the following envar: > > * OMPI_MCA_oob_tcp_static_ports=6000-6010 <== obviously, replace this > with your range > > You will need a port range that is at least equal to the ppn for the job > (each proc on a node will take one of the provided ports). > > That should do it. I compute everything else I need from those values. > > Does that work for you? > Ralph > > > On Jul 22, 2010, at 6:48 AM, Philippe wrote: > >> On Wed, Jul 21, 2010 at 10:44 AM, Ralph Castain >> wrote: >>> >>> On Jul 21, 2010, at 7:44 AM, Philippe wrote: >>> Ralph, Sorry for the late reply -- I was away on vacation. >>> >>> no problem at all! >>> regarding your earlier question about how many processes where involved when the memory was entirely allocated, it was only two, a sender and a receiver. I'm still trying to pinpoint what can be different between the standalone case and the "integrated" case. I will try to find out what part of the code is allocating memory in a loop. >>> >>> hmmmthat sounds like a bug in your program. let me know what you >>> find >>> On Tue, Jul 20, 2010 at 12:51 AM, Ralph Castain wrote: > Well, I finally managed to make this work without the required > ompi-server rendezvous point. The fix is only in the devel trunk > right now - I'll have to ask the release managers for 1.5 and 1.4 if > they want it ported to those series. > great -- i'll give it a try > On the notion of integrating OMPI to your launch environment: > remember that we don't necessarily require that you use mpiexec for > that purpose. If your launch environment provides just a little info > in the environment of the launched procs, we can usually devise a > method that allows the procs to perform an MPI_Init as a single job > without all this work you are doing. > >
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
Doh - yes it should! I'll fix it right now. Thanks! On Jul 26, 2010, at 9:28 PM, Philippe wrote: > Ralph, > > i was able to test the generic module and it seems to be working. > > one question tho, the function orte_ess_generic_component_query in > "orte/mca/ess/generic/ess_generic_component.c" calls getenv with the > argument "OMPI_MCA_enc", which seems to cause the module to fail to > load. shouldnt it be "OMPI_MCA_ess" ? > > . > >/* only pick us if directed to do so */ >if (NULL != (pick = getenv("OMPI_MCA_env")) && > 0 == strcmp(pick, "generic")) { >*priority = 1000; >*module = (mca_base_module_t *)&orte_ess_generic_module; > > ... > > p. > > On Thu, Jul 22, 2010 at 5:53 PM, Ralph Castain wrote: >> Dev trunk looks okay right now - I think you'll be fine using it. My new >> component -might- work with 1.5, but probably not with 1.4. I haven't >> checked either of them. >> >> Anything at r23478 or above will have the new module. Let me know how it >> works for you. I haven't tested it myself, but am pretty sure it should work. >> >> >> On Jul 22, 2010, at 3:22 PM, Philippe wrote: >> >>> Ralph, >>> >>> Thank you so much!! >>> >>> I'll give it a try and let you know. >>> >>> I know it's a tough question, but how stable is the dev trunk? Can I >>> just grab the latest and run, or am I better off taking your changes >>> and copy them back in a stable release? (if so, which one? 1.4? 1.5?) >>> >>> p. >>> >>> On Thu, Jul 22, 2010 at 3:50 PM, Ralph Castain wrote: It was easier for me to just construct this module than to explain how to do so :-) I will commit it this evening (couple of hours from now) as that is our standard practice. You'll need to use the developer's trunk, though, to use it. Here are the envars you'll need to provide: Each process needs to get the same following values: * OMPI_MCA_ess=generic * OMPI_MCA_orte_num_procs= * OMPI_MCA_orte_nodes=>>> reside> * OMPI_MCA_orte_ppn= Note that I have assumed this last value is a constant for simplicity. If that isn't the case, let me know - you could instead provide it as a comma-separated list of values with an entry for each node. In addition, you need to provide the following value that will be unique to each process: * OMPI_MCA_orte_rank= Finally, you have to provide a range of static TCP ports for use by the processes. Pick any range that you know will be available across all the nodes. You then need to ensure that each process sees the following envar: * OMPI_MCA_oob_tcp_static_ports=6000-6010 <== obviously, replace this with your range You will need a port range that is at least equal to the ppn for the job (each proc on a node will take one of the provided ports). That should do it. I compute everything else I need from those values. Does that work for you? Ralph On Jul 22, 2010, at 6:48 AM, Philippe wrote: > On Wed, Jul 21, 2010 at 10:44 AM, Ralph Castain wrote: >> >> On Jul 21, 2010, at 7:44 AM, Philippe wrote: >> >>> Ralph, >>> >>> Sorry for the late reply -- I was away on vacation. >> >> no problem at all! >> >>> >>> regarding your earlier question about how many processes where >>> involved when the memory was entirely allocated, it was only two, a >>> sender and a receiver. I'm still trying to pinpoint what can be >>> different between the standalone case and the "integrated" case. I >>> will try to find out what part of the code is allocating memory in a >>> loop. >> >> hmmmthat sounds like a bug in your program. let me know what you find >> >>> >>> >>> On Tue, Jul 20, 2010 at 12:51 AM, Ralph Castain >>> wrote: Well, I finally managed to make this work without the required ompi-server rendezvous point. The fix is only in the devel trunk right now - I'll have to ask the release managers for 1.5 and 1.4 if they want it ported to those series. >>> >>> great -- i'll give it a try >>> On the notion of integrating OMPI to your launch environment: remember that we don't necessarily require that you use mpiexec for that purpose. If your launch environment provides just a little info in the environment of the launched procs, we can usually devise a method that allows the procs to perform an MPI_Init as a single job without all this work you are doing. >>> >>> I'm working on creating operators using MPI for the IBM product >>> "InfoSphere Streams". It has its own launching mechanism to start the >>> processes. However I can pass some information to the processes that >>> belong to th
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
Ralph, i was able to test the generic module and it seems to be working. one question tho, the function orte_ess_generic_component_query in "orte/mca/ess/generic/ess_generic_component.c" calls getenv with the argument "OMPI_MCA_enc", which seems to cause the module to fail to load. shouldnt it be "OMPI_MCA_ess" ? . /* only pick us if directed to do so */ if (NULL != (pick = getenv("OMPI_MCA_env")) && 0 == strcmp(pick, "generic")) { *priority = 1000; *module = (mca_base_module_t *)&orte_ess_generic_module; ... p. On Thu, Jul 22, 2010 at 5:53 PM, Ralph Castain wrote: > Dev trunk looks okay right now - I think you'll be fine using it. My new > component -might- work with 1.5, but probably not with 1.4. I haven't checked > either of them. > > Anything at r23478 or above will have the new module. Let me know how it > works for you. I haven't tested it myself, but am pretty sure it should work. > > > On Jul 22, 2010, at 3:22 PM, Philippe wrote: > >> Ralph, >> >> Thank you so much!! >> >> I'll give it a try and let you know. >> >> I know it's a tough question, but how stable is the dev trunk? Can I >> just grab the latest and run, or am I better off taking your changes >> and copy them back in a stable release? (if so, which one? 1.4? 1.5?) >> >> p. >> >> On Thu, Jul 22, 2010 at 3:50 PM, Ralph Castain wrote: >>> It was easier for me to just construct this module than to explain how to >>> do so :-) >>> >>> I will commit it this evening (couple of hours from now) as that is our >>> standard practice. You'll need to use the developer's trunk, though, to use >>> it. >>> >>> Here are the envars you'll need to provide: >>> >>> Each process needs to get the same following values: >>> >>> * OMPI_MCA_ess=generic >>> * OMPI_MCA_orte_num_procs= >>> * OMPI_MCA_orte_nodes=>> reside> >>> * OMPI_MCA_orte_ppn= >>> >>> Note that I have assumed this last value is a constant for simplicity. If >>> that isn't the case, let me know - you could instead provide it as a >>> comma-separated list of values with an entry for each node. >>> >>> In addition, you need to provide the following value that will be unique to >>> each process: >>> >>> * OMPI_MCA_orte_rank= >>> >>> Finally, you have to provide a range of static TCP ports for use by the >>> processes. Pick any range that you know will be available across all the >>> nodes. You then need to ensure that each process sees the following envar: >>> >>> * OMPI_MCA_oob_tcp_static_ports=6000-6010 <== obviously, replace this with >>> your range >>> >>> You will need a port range that is at least equal to the ppn for the job >>> (each proc on a node will take one of the provided ports). >>> >>> That should do it. I compute everything else I need from those values. >>> >>> Does that work for you? >>> Ralph >>> >>> >>> On Jul 22, 2010, at 6:48 AM, Philippe wrote: >>> On Wed, Jul 21, 2010 at 10:44 AM, Ralph Castain wrote: > > On Jul 21, 2010, at 7:44 AM, Philippe wrote: > >> Ralph, >> >> Sorry for the late reply -- I was away on vacation. > > no problem at all! > >> >> regarding your earlier question about how many processes where >> involved when the memory was entirely allocated, it was only two, a >> sender and a receiver. I'm still trying to pinpoint what can be >> different between the standalone case and the "integrated" case. I >> will try to find out what part of the code is allocating memory in a >> loop. > > hmmmthat sounds like a bug in your program. let me know what you find > >> >> >> On Tue, Jul 20, 2010 at 12:51 AM, Ralph Castain >> wrote: >>> Well, I finally managed to make this work without the required >>> ompi-server rendezvous point. The fix is only in the devel trunk right >>> now - I'll have to ask the release managers for 1.5 and 1.4 if they >>> want it ported to those series. >>> >> >> great -- i'll give it a try >> >>> On the notion of integrating OMPI to your launch environment: remember >>> that we don't necessarily require that you use mpiexec for that >>> purpose. If your launch environment provides just a little info in the >>> environment of the launched procs, we can usually devise a method that >>> allows the procs to perform an MPI_Init as a single job without all >>> this work you are doing. >>> >> >> I'm working on creating operators using MPI for the IBM product >> "InfoSphere Streams". It has its own launching mechanism to start the >> processes. However I can pass some information to the processes that >> belong to the same job (Streams job -- which should neatly map to MPI >> job). >> >>> Only difference is that your procs will all block in MPI_Init until >>> they -all- have executed that function. If that isn't a problem, this >>> would be a much more scalable and reliable meth
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
Dev trunk looks okay right now - I think you'll be fine using it. My new component -might- work with 1.5, but probably not with 1.4. I haven't checked either of them. Anything at r23478 or above will have the new module. Let me know how it works for you. I haven't tested it myself, but am pretty sure it should work. On Jul 22, 2010, at 3:22 PM, Philippe wrote: > Ralph, > > Thank you so much!! > > I'll give it a try and let you know. > > I know it's a tough question, but how stable is the dev trunk? Can I > just grab the latest and run, or am I better off taking your changes > and copy them back in a stable release? (if so, which one? 1.4? 1.5?) > > p. > > On Thu, Jul 22, 2010 at 3:50 PM, Ralph Castain wrote: >> It was easier for me to just construct this module than to explain how to do >> so :-) >> >> I will commit it this evening (couple of hours from now) as that is our >> standard practice. You'll need to use the developer's trunk, though, to use >> it. >> >> Here are the envars you'll need to provide: >> >> Each process needs to get the same following values: >> >> * OMPI_MCA_ess=generic >> * OMPI_MCA_orte_num_procs= >> * OMPI_MCA_orte_nodes=> reside> >> * OMPI_MCA_orte_ppn= >> >> Note that I have assumed this last value is a constant for simplicity. If >> that isn't the case, let me know - you could instead provide it as a >> comma-separated list of values with an entry for each node. >> >> In addition, you need to provide the following value that will be unique to >> each process: >> >> * OMPI_MCA_orte_rank= >> >> Finally, you have to provide a range of static TCP ports for use by the >> processes. Pick any range that you know will be available across all the >> nodes. You then need to ensure that each process sees the following envar: >> >> * OMPI_MCA_oob_tcp_static_ports=6000-6010 <== obviously, replace this with >> your range >> >> You will need a port range that is at least equal to the ppn for the job >> (each proc on a node will take one of the provided ports). >> >> That should do it. I compute everything else I need from those values. >> >> Does that work for you? >> Ralph >> >> >> On Jul 22, 2010, at 6:48 AM, Philippe wrote: >> >>> On Wed, Jul 21, 2010 at 10:44 AM, Ralph Castain wrote: On Jul 21, 2010, at 7:44 AM, Philippe wrote: > Ralph, > > Sorry for the late reply -- I was away on vacation. no problem at all! > > regarding your earlier question about how many processes where > involved when the memory was entirely allocated, it was only two, a > sender and a receiver. I'm still trying to pinpoint what can be > different between the standalone case and the "integrated" case. I > will try to find out what part of the code is allocating memory in a > loop. hmmmthat sounds like a bug in your program. let me know what you find > > > On Tue, Jul 20, 2010 at 12:51 AM, Ralph Castain wrote: >> Well, I finally managed to make this work without the required >> ompi-server rendezvous point. The fix is only in the devel trunk right >> now - I'll have to ask the release managers for 1.5 and 1.4 if they want >> it ported to those series. >> > > great -- i'll give it a try > >> On the notion of integrating OMPI to your launch environment: remember >> that we don't necessarily require that you use mpiexec for that purpose. >> If your launch environment provides just a little info in the >> environment of the launched procs, we can usually devise a method that >> allows the procs to perform an MPI_Init as a single job without all this >> work you are doing. >> > > I'm working on creating operators using MPI for the IBM product > "InfoSphere Streams". It has its own launching mechanism to start the > processes. However I can pass some information to the processes that > belong to the same job (Streams job -- which should neatly map to MPI > job). > >> Only difference is that your procs will all block in MPI_Init until they >> -all- have executed that function. If that isn't a problem, this would >> be a much more scalable and reliable method than doing it thru massive >> calls to MPI_Port_connect. >> > > in the general case, that would be a problem, but for my prototype, > this is acceptable. > > In general, each process is composed of operators, some may be MPI > related and some may not. But in my case, I know ahead of time which > processes will be part of the MPI job, so I can easily deal with the > fact that they would block on MPI_init (actually -- MPI_thread_init > since its using a lot of threads). We have talked in the past about creating a non-blocking MPI_Init as an extension to the standard. It would lock you to Open MPI, though... Regardless, at some point
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
Ralph, Thank you so much!! I'll give it a try and let you know. I know it's a tough question, but how stable is the dev trunk? Can I just grab the latest and run, or am I better off taking your changes and copy them back in a stable release? (if so, which one? 1.4? 1.5?) p. On Thu, Jul 22, 2010 at 3:50 PM, Ralph Castain wrote: > It was easier for me to just construct this module than to explain how to do > so :-) > > I will commit it this evening (couple of hours from now) as that is our > standard practice. You'll need to use the developer's trunk, though, to use > it. > > Here are the envars you'll need to provide: > > Each process needs to get the same following values: > > * OMPI_MCA_ess=generic > * OMPI_MCA_orte_num_procs= > * OMPI_MCA_orte_nodes= reside> > * OMPI_MCA_orte_ppn= > > Note that I have assumed this last value is a constant for simplicity. If > that isn't the case, let me know - you could instead provide it as a > comma-separated list of values with an entry for each node. > > In addition, you need to provide the following value that will be unique to > each process: > > * OMPI_MCA_orte_rank= > > Finally, you have to provide a range of static TCP ports for use by the > processes. Pick any range that you know will be available across all the > nodes. You then need to ensure that each process sees the following envar: > > * OMPI_MCA_oob_tcp_static_ports=6000-6010 <== obviously, replace this with > your range > > You will need a port range that is at least equal to the ppn for the job > (each proc on a node will take one of the provided ports). > > That should do it. I compute everything else I need from those values. > > Does that work for you? > Ralph > > > On Jul 22, 2010, at 6:48 AM, Philippe wrote: > >> On Wed, Jul 21, 2010 at 10:44 AM, Ralph Castain wrote: >>> >>> On Jul 21, 2010, at 7:44 AM, Philippe wrote: >>> Ralph, Sorry for the late reply -- I was away on vacation. >>> >>> no problem at all! >>> regarding your earlier question about how many processes where involved when the memory was entirely allocated, it was only two, a sender and a receiver. I'm still trying to pinpoint what can be different between the standalone case and the "integrated" case. I will try to find out what part of the code is allocating memory in a loop. >>> >>> hmmmthat sounds like a bug in your program. let me know what you find >>> On Tue, Jul 20, 2010 at 12:51 AM, Ralph Castain wrote: > Well, I finally managed to make this work without the required > ompi-server rendezvous point. The fix is only in the devel trunk right > now - I'll have to ask the release managers for 1.5 and 1.4 if they want > it ported to those series. > great -- i'll give it a try > On the notion of integrating OMPI to your launch environment: remember > that we don't necessarily require that you use mpiexec for that purpose. > If your launch environment provides just a little info in the environment > of the launched procs, we can usually devise a method that allows the > procs to perform an MPI_Init as a single job without all this work you > are doing. > I'm working on creating operators using MPI for the IBM product "InfoSphere Streams". It has its own launching mechanism to start the processes. However I can pass some information to the processes that belong to the same job (Streams job -- which should neatly map to MPI job). > Only difference is that your procs will all block in MPI_Init until they > -all- have executed that function. If that isn't a problem, this would be > a much more scalable and reliable method than doing it thru massive calls > to MPI_Port_connect. > in the general case, that would be a problem, but for my prototype, this is acceptable. In general, each process is composed of operators, some may be MPI related and some may not. But in my case, I know ahead of time which processes will be part of the MPI job, so I can easily deal with the fact that they would block on MPI_init (actually -- MPI_thread_init since its using a lot of threads). >>> >>> We have talked in the past about creating a non-blocking MPI_Init as an >>> extension to the standard. It would lock you to Open MPI, though... >>> >>> Regardless, at some point you would have to know how many processes are >>> going to be part of the job so you can know when MPI_Init is complete. I >>> would think you would require that info for the singleton wireup anyway - >>> yes? Otherwise, how would you know when to quit running connect-accept? >>> >> >> the short answer is yes... although, the longer answer is a bit more >> complicated. currently I do know the number of connect I need to do on >> a per-port basis. a job can contains an arbitrary number of MPI >> processes, each opening one or more
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
It was easier for me to just construct this module than to explain how to do so :-) I will commit it this evening (couple of hours from now) as that is our standard practice. You'll need to use the developer's trunk, though, to use it. Here are the envars you'll need to provide: Each process needs to get the same following values: * OMPI_MCA_ess=generic * OMPI_MCA_orte_num_procs= * OMPI_MCA_orte_nodes= * OMPI_MCA_orte_ppn= Note that I have assumed this last value is a constant for simplicity. If that isn't the case, let me know - you could instead provide it as a comma-separated list of values with an entry for each node. In addition, you need to provide the following value that will be unique to each process: * OMPI_MCA_orte_rank= Finally, you have to provide a range of static TCP ports for use by the processes. Pick any range that you know will be available across all the nodes. You then need to ensure that each process sees the following envar: * OMPI_MCA_oob_tcp_static_ports=6000-6010 <== obviously, replace this with your range You will need a port range that is at least equal to the ppn for the job (each proc on a node will take one of the provided ports). That should do it. I compute everything else I need from those values. Does that work for you? Ralph On Jul 22, 2010, at 6:48 AM, Philippe wrote: > On Wed, Jul 21, 2010 at 10:44 AM, Ralph Castain wrote: >> >> On Jul 21, 2010, at 7:44 AM, Philippe wrote: >> >>> Ralph, >>> >>> Sorry for the late reply -- I was away on vacation. >> >> no problem at all! >> >>> >>> regarding your earlier question about how many processes where >>> involved when the memory was entirely allocated, it was only two, a >>> sender and a receiver. I'm still trying to pinpoint what can be >>> different between the standalone case and the "integrated" case. I >>> will try to find out what part of the code is allocating memory in a >>> loop. >> >> hmmmthat sounds like a bug in your program. let me know what you find >> >>> >>> >>> On Tue, Jul 20, 2010 at 12:51 AM, Ralph Castain wrote: Well, I finally managed to make this work without the required ompi-server rendezvous point. The fix is only in the devel trunk right now - I'll have to ask the release managers for 1.5 and 1.4 if they want it ported to those series. >>> >>> great -- i'll give it a try >>> On the notion of integrating OMPI to your launch environment: remember that we don't necessarily require that you use mpiexec for that purpose. If your launch environment provides just a little info in the environment of the launched procs, we can usually devise a method that allows the procs to perform an MPI_Init as a single job without all this work you are doing. >>> >>> I'm working on creating operators using MPI for the IBM product >>> "InfoSphere Streams". It has its own launching mechanism to start the >>> processes. However I can pass some information to the processes that >>> belong to the same job (Streams job -- which should neatly map to MPI >>> job). >>> Only difference is that your procs will all block in MPI_Init until they -all- have executed that function. If that isn't a problem, this would be a much more scalable and reliable method than doing it thru massive calls to MPI_Port_connect. >>> >>> in the general case, that would be a problem, but for my prototype, >>> this is acceptable. >>> >>> In general, each process is composed of operators, some may be MPI >>> related and some may not. But in my case, I know ahead of time which >>> processes will be part of the MPI job, so I can easily deal with the >>> fact that they would block on MPI_init (actually -- MPI_thread_init >>> since its using a lot of threads). >> >> We have talked in the past about creating a non-blocking MPI_Init as an >> extension to the standard. It would lock you to Open MPI, though... >> >> Regardless, at some point you would have to know how many processes are >> going to be part of the job so you can know when MPI_Init is complete. I >> would think you would require that info for the singleton wireup anyway - >> yes? Otherwise, how would you know when to quit running connect-accept? >> > > the short answer is yes... although, the longer answer is a bit more > complicated. currently I do know the number of connect I need to do on > a per-port basis. a job can contains an arbitrary number of MPI > processes, each opening one or more ports. so i know the count port by > ports but I dont need to worry about how many MPI processes there is > globally. to make things a bit more complicated, each MPI operator can > be "fused" with other operators to make a process. each fused operator > may or may not require MPI. the bottom line is, to get the total > number of processes to calculate rank&size, I need to reverse engineer > the fusing that the compiler may do. > > but that's ok, I'm willing to
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
On Wed, Jul 21, 2010 at 10:44 AM, Ralph Castain wrote: > > On Jul 21, 2010, at 7:44 AM, Philippe wrote: > >> Ralph, >> >> Sorry for the late reply -- I was away on vacation. > > no problem at all! > >> >> regarding your earlier question about how many processes where >> involved when the memory was entirely allocated, it was only two, a >> sender and a receiver. I'm still trying to pinpoint what can be >> different between the standalone case and the "integrated" case. I >> will try to find out what part of the code is allocating memory in a >> loop. > > hmmmthat sounds like a bug in your program. let me know what you find > >> >> >> On Tue, Jul 20, 2010 at 12:51 AM, Ralph Castain wrote: >>> Well, I finally managed to make this work without the required ompi-server >>> rendezvous point. The fix is only in the devel trunk right now - I'll have >>> to ask the release managers for 1.5 and 1.4 if they want it ported to those >>> series. >>> >> >> great -- i'll give it a try >> >>> On the notion of integrating OMPI to your launch environment: remember that >>> we don't necessarily require that you use mpiexec for that purpose. If your >>> launch environment provides just a little info in the environment of the >>> launched procs, we can usually devise a method that allows the procs to >>> perform an MPI_Init as a single job without all this work you are doing. >>> >> >> I'm working on creating operators using MPI for the IBM product >> "InfoSphere Streams". It has its own launching mechanism to start the >> processes. However I can pass some information to the processes that >> belong to the same job (Streams job -- which should neatly map to MPI >> job). >> >>> Only difference is that your procs will all block in MPI_Init until they >>> -all- have executed that function. If that isn't a problem, this would be a >>> much more scalable and reliable method than doing it thru massive calls to >>> MPI_Port_connect. >>> >> >> in the general case, that would be a problem, but for my prototype, >> this is acceptable. >> >> In general, each process is composed of operators, some may be MPI >> related and some may not. But in my case, I know ahead of time which >> processes will be part of the MPI job, so I can easily deal with the >> fact that they would block on MPI_init (actually -- MPI_thread_init >> since its using a lot of threads). > > We have talked in the past about creating a non-blocking MPI_Init as an > extension to the standard. It would lock you to Open MPI, though... > > Regardless, at some point you would have to know how many processes are going > to be part of the job so you can know when MPI_Init is complete. I would > think you would require that info for the singleton wireup anyway - yes? > Otherwise, how would you know when to quit running connect-accept? > the short answer is yes... although, the longer answer is a bit more complicated. currently I do know the number of connect I need to do on a per-port basis. a job can contains an arbitrary number of MPI processes, each opening one or more ports. so i know the count port by ports but I dont need to worry about how many MPI processes there is globally. to make things a bit more complicated, each MPI operator can be "fused" with other operators to make a process. each fused operator may or may not require MPI. the bottom line is, to get the total number of processes to calculate rank&size, I need to reverse engineer the fusing that the compiler may do. but that's ok, I'm willing to do that for our prototype :-) >> >> Is there a documentation or example I can use to see what information >> I can pass to the processes to enable that? Is it just environment >> variables? > > No real documentation - a lack I should probably fill. At the moment, we > don't have a "generic" module for standalone launch, but I can create one as > it is pretty trivial. I would then need you to pass each process envars > telling it the total number of processes in the MPI job, its rank within that > job, and a file where some rendezvous process (can be rank=0) has provided > that port string. Armed with that info, I can wireup the job. > > Won't be as scalable as an mpirun-initiated startup, but will be much better > than doing it from singletons. that would be great. I can definitely pass environment variables to each process. > > Or if you prefer, we could setup an "infosphere" module that we could > customize for this system. Main thing here would be to provide us with some > kind of regex (or access to a file containing the info) that describes the > map of rank to node so we can construct the wireup communication pattern. > i think for our prototype we are fine with the first method. I'd leave the cleaner implementation as a task for the product team ;-) regarding the "generic" module, is that something you can put together quickly? can I help in any way? Thanks! p > Either way would work. The second is more scalable, but I don't
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
On Jul 21, 2010, at 7:44 AM, Philippe wrote: > Ralph, > > Sorry for the late reply -- I was away on vacation. no problem at all! > > regarding your earlier question about how many processes where > involved when the memory was entirely allocated, it was only two, a > sender and a receiver. I'm still trying to pinpoint what can be > different between the standalone case and the "integrated" case. I > will try to find out what part of the code is allocating memory in a > loop. hmmmthat sounds like a bug in your program. let me know what you find > > > On Tue, Jul 20, 2010 at 12:51 AM, Ralph Castain wrote: >> Well, I finally managed to make this work without the required ompi-server >> rendezvous point. The fix is only in the devel trunk right now - I'll have >> to ask the release managers for 1.5 and 1.4 if they want it ported to those >> series. >> > > great -- i'll give it a try > >> On the notion of integrating OMPI to your launch environment: remember that >> we don't necessarily require that you use mpiexec for that purpose. If your >> launch environment provides just a little info in the environment of the >> launched procs, we can usually devise a method that allows the procs to >> perform an MPI_Init as a single job without all this work you are doing. >> > > I'm working on creating operators using MPI for the IBM product > "InfoSphere Streams". It has its own launching mechanism to start the > processes. However I can pass some information to the processes that > belong to the same job (Streams job -- which should neatly map to MPI > job). > >> Only difference is that your procs will all block in MPI_Init until they >> -all- have executed that function. If that isn't a problem, this would be a >> much more scalable and reliable method than doing it thru massive calls to >> MPI_Port_connect. >> > > in the general case, that would be a problem, but for my prototype, > this is acceptable. > > In general, each process is composed of operators, some may be MPI > related and some may not. But in my case, I know ahead of time which > processes will be part of the MPI job, so I can easily deal with the > fact that they would block on MPI_init (actually -- MPI_thread_init > since its using a lot of threads). We have talked in the past about creating a non-blocking MPI_Init as an extension to the standard. It would lock you to Open MPI, though... Regardless, at some point you would have to know how many processes are going to be part of the job so you can know when MPI_Init is complete. I would think you would require that info for the singleton wireup anyway - yes? Otherwise, how would you know when to quit running connect-accept? > > Is there a documentation or example I can use to see what information > I can pass to the processes to enable that? Is it just environment > variables? No real documentation - a lack I should probably fill. At the moment, we don't have a "generic" module for standalone launch, but I can create one as it is pretty trivial. I would then need you to pass each process envars telling it the total number of processes in the MPI job, its rank within that job, and a file where some rendezvous process (can be rank=0) has provided that port string. Armed with that info, I can wireup the job. Won't be as scalable as an mpirun-initiated startup, but will be much better than doing it from singletons. Or if you prefer, we could setup an "infosphere" module that we could customize for this system. Main thing here would be to provide us with some kind of regex (or access to a file containing the info) that describes the map of rank to node so we can construct the wireup communication pattern. Either way would work. The second is more scalable, but I don't know if you have (or can construct) the map info. > > Many thanks! > p. > >> >> On Jul 18, 2010, at 4:09 PM, Philippe wrote: >> >>> Ralph, >>> >>> thanks for investigating. >>> >>> I've applied the two patches you mentioned earlier and ran with the >>> ompi server. Although i was able to runn our standalone test, when I >>> integrated the changes to our code, the processes entered a crazy loop >>> and allocated all the memory available when calling MPI_Port_Connect. >>> I was not able to identify why it works standalone but not integrated >>> with our code. If I found why, I'll let your know. >>> >>> looking forward to your findings. We'll be happy to test any patches >>> if you have some! >>> >>> p. >>> >>> On Sat, Jul 17, 2010 at 9:47 PM, Ralph Castain wrote: Okay, I can reproduce this problem. Frankly, I don't think this ever worked with OMPI, and I'm not sure how the choice of BTL makes a difference. The program is crashing in the communicator definition, which involves a communication over our internal out-of-band messaging system. That system has zero connection to any BTL, so it should crash either way. Regardless, I wi
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
Ralph, Sorry for the late reply -- I was away on vacation. regarding your earlier question about how many processes where involved when the memory was entirely allocated, it was only two, a sender and a receiver. I'm still trying to pinpoint what can be different between the standalone case and the "integrated" case. I will try to find out what part of the code is allocating memory in a loop. On Tue, Jul 20, 2010 at 12:51 AM, Ralph Castain wrote: > Well, I finally managed to make this work without the required ompi-server > rendezvous point. The fix is only in the devel trunk right now - I'll have to > ask the release managers for 1.5 and 1.4 if they want it ported to those > series. > great -- i'll give it a try > On the notion of integrating OMPI to your launch environment: remember that > we don't necessarily require that you use mpiexec for that purpose. If your > launch environment provides just a little info in the environment of the > launched procs, we can usually devise a method that allows the procs to > perform an MPI_Init as a single job without all this work you are doing. > I'm working on creating operators using MPI for the IBM product "InfoSphere Streams". It has its own launching mechanism to start the processes. However I can pass some information to the processes that belong to the same job (Streams job -- which should neatly map to MPI job). > Only difference is that your procs will all block in MPI_Init until they > -all- have executed that function. If that isn't a problem, this would be a > much more scalable and reliable method than doing it thru massive calls to > MPI_Port_connect. > in the general case, that would be a problem, but for my prototype, this is acceptable. In general, each process is composed of operators, some may be MPI related and some may not. But in my case, I know ahead of time which processes will be part of the MPI job, so I can easily deal with the fact that they would block on MPI_init (actually -- MPI_thread_init since its using a lot of threads). Is there a documentation or example I can use to see what information I can pass to the processes to enable that? Is it just environment variables? Many thanks! p. > > On Jul 18, 2010, at 4:09 PM, Philippe wrote: > >> Ralph, >> >> thanks for investigating. >> >> I've applied the two patches you mentioned earlier and ran with the >> ompi server. Although i was able to runn our standalone test, when I >> integrated the changes to our code, the processes entered a crazy loop >> and allocated all the memory available when calling MPI_Port_Connect. >> I was not able to identify why it works standalone but not integrated >> with our code. If I found why, I'll let your know. >> >> looking forward to your findings. We'll be happy to test any patches >> if you have some! >> >> p. >> >> On Sat, Jul 17, 2010 at 9:47 PM, Ralph Castain wrote: >>> Okay, I can reproduce this problem. Frankly, I don't think this ever worked >>> with OMPI, and I'm not sure how the choice of BTL makes a difference. >>> >>> The program is crashing in the communicator definition, which involves a >>> communication over our internal out-of-band messaging system. That system >>> has zero connection to any BTL, so it should crash either way. >>> >>> Regardless, I will play with this a little as time allows. Thanks for the >>> reproducer! >>> >>> >>> On Jun 25, 2010, at 7:23 AM, Philippe wrote: >>> Hi, I'm trying to run a test program which consists of a server creating a port using MPI_Open_port and N clients using MPI_Comm_connect to connect to the server. I'm able to do so with 1 server and 2 clients, but with 1 server + 3 clients, I get the following error message: [node003:32274] [[37084,0],0]:route_callback tried routing message from [[37084,1],0] to [[40912,1],0]:102, can't find route This is only happening with the openib BTL. With tcp BTL it works perfectly fine (ofud also works as a matter of fact...). This has been tested on two completely different clusters, with identical results. In either cases, the IB frabic works normally. Any help would be greatly appreciated! Several people in my team looked at the problem. Google and the mailing list archive did not provide any clue. I believe that from an MPI standpoint, my test program is valid (and it works with TCP, which make me feel better about the sequence of MPI calls) Regards, Philippe. Background: I intend to use openMPI to transport data inside a much larger application. Because of that, I cannot used mpiexec. Each process is started by our own "job management" and use a name server to find about each others. Once all the clients are connected, I would like the server to do MPI_Recv to get the data from all the client. I dont care about the order or which client are sending data, as
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
Well, I finally managed to make this work without the required ompi-server rendezvous point. The fix is only in the devel trunk right now - I'll have to ask the release managers for 1.5 and 1.4 if they want it ported to those series. On the notion of integrating OMPI to your launch environment: remember that we don't necessarily require that you use mpiexec for that purpose. If your launch environment provides just a little info in the environment of the launched procs, we can usually devise a method that allows the procs to perform an MPI_Init as a single job without all this work you are doing. Only difference is that your procs will all block in MPI_Init until they -all- have executed that function. If that isn't a problem, this would be a much more scalable and reliable method than doing it thru massive calls to MPI_Port_connect. On Jul 18, 2010, at 4:09 PM, Philippe wrote: > Ralph, > > thanks for investigating. > > I've applied the two patches you mentioned earlier and ran with the > ompi server. Although i was able to runn our standalone test, when I > integrated the changes to our code, the processes entered a crazy loop > and allocated all the memory available when calling MPI_Port_Connect. > I was not able to identify why it works standalone but not integrated > with our code. If I found why, I'll let your know. > > looking forward to your findings. We'll be happy to test any patches > if you have some! > > p. > > On Sat, Jul 17, 2010 at 9:47 PM, Ralph Castain wrote: >> Okay, I can reproduce this problem. Frankly, I don't think this ever worked >> with OMPI, and I'm not sure how the choice of BTL makes a difference. >> >> The program is crashing in the communicator definition, which involves a >> communication over our internal out-of-band messaging system. That system >> has zero connection to any BTL, so it should crash either way. >> >> Regardless, I will play with this a little as time allows. Thanks for the >> reproducer! >> >> >> On Jun 25, 2010, at 7:23 AM, Philippe wrote: >> >>> Hi, >>> >>> I'm trying to run a test program which consists of a server creating a >>> port using MPI_Open_port and N clients using MPI_Comm_connect to >>> connect to the server. >>> >>> I'm able to do so with 1 server and 2 clients, but with 1 server + 3 >>> clients, I get the following error message: >>> >>> [node003:32274] [[37084,0],0]:route_callback tried routing message >>> from [[37084,1],0] to [[40912,1],0]:102, can't find route >>> >>> This is only happening with the openib BTL. With tcp BTL it works >>> perfectly fine (ofud also works as a matter of fact...). This has been >>> tested on two completely different clusters, with identical results. >>> In either cases, the IB frabic works normally. >>> >>> Any help would be greatly appreciated! Several people in my team >>> looked at the problem. Google and the mailing list archive did not >>> provide any clue. I believe that from an MPI standpoint, my test >>> program is valid (and it works with TCP, which make me feel better >>> about the sequence of MPI calls) >>> >>> Regards, >>> Philippe. >>> >>> >>> >>> Background: >>> >>> I intend to use openMPI to transport data inside a much larger >>> application. Because of that, I cannot used mpiexec. Each process is >>> started by our own "job management" and use a name server to find >>> about each others. Once all the clients are connected, I would like >>> the server to do MPI_Recv to get the data from all the client. I dont >>> care about the order or which client are sending data, as long as I >>> can receive it with on call. Do do that, the clients and the server >>> are going through a series of Comm_accept/Conn_connect/Intercomm_merge >>> so that at the end, all the clients and the server are inside the same >>> intracomm. >>> >>> Steps: >>> >>> I have a sample program that show the issue. I tried to make it as >>> short as possible. It needs to be executed on a shared file system >>> like NFS because the server write the port info to a file that the >>> client will read. To reproduce the issue, the following steps should >>> be performed: >>> >>> 0. compile the test with "mpicc -o ben12 ben12.c" >>> 1. ssh to the machine that will be the server >>> 2. run ./ben12 3 1 >>> 3. ssh to the machine that will be the client #1 >>> 4. run ./ben12 3 0 >>> 5. repeat step 3-4 for client #2 and #3 >>> >>> the server accept the connection from client #1 and merge it in a new >>> intracomm. It then accept connection from client #2 and merge it. when >>> the client #3 arrives, the server accept the connection, but that >>> cause client #1 and #2 to die with the error above (see the complete >>> trace in the tarball). >>> >>> The exact steps are: >>> >>> - server open port >>> - server does accept >>> - client #1 does connect >>> - server and client #1 do merge >>> - server does accept >>> - client #2 does connect >>> - server, client #1 and clien
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
I'm wondering if we can't make this simpler. What launch environment are you operating under? I know you said you can't use mpiexec, but I'm wondering if we could add support for your environment to mpiexec so you could. On Jul 18, 2010, at 4:09 PM, Philippe wrote: > Ralph, > > thanks for investigating. > > I've applied the two patches you mentioned earlier and ran with the > ompi server. Although i was able to runn our standalone test, when I > integrated the changes to our code, the processes entered a crazy loop > and allocated all the memory available when calling MPI_Port_Connect. > I was not able to identify why it works standalone but not integrated > with our code. If I found why, I'll let your know. > > looking forward to your findings. We'll be happy to test any patches > if you have some! > > p. > > On Sat, Jul 17, 2010 at 9:47 PM, Ralph Castain wrote: >> Okay, I can reproduce this problem. Frankly, I don't think this ever worked >> with OMPI, and I'm not sure how the choice of BTL makes a difference. >> >> The program is crashing in the communicator definition, which involves a >> communication over our internal out-of-band messaging system. That system >> has zero connection to any BTL, so it should crash either way. >> >> Regardless, I will play with this a little as time allows. Thanks for the >> reproducer! >> >> >> On Jun 25, 2010, at 7:23 AM, Philippe wrote: >> >>> Hi, >>> >>> I'm trying to run a test program which consists of a server creating a >>> port using MPI_Open_port and N clients using MPI_Comm_connect to >>> connect to the server. >>> >>> I'm able to do so with 1 server and 2 clients, but with 1 server + 3 >>> clients, I get the following error message: >>> >>> [node003:32274] [[37084,0],0]:route_callback tried routing message >>> from [[37084,1],0] to [[40912,1],0]:102, can't find route >>> >>> This is only happening with the openib BTL. With tcp BTL it works >>> perfectly fine (ofud also works as a matter of fact...). This has been >>> tested on two completely different clusters, with identical results. >>> In either cases, the IB frabic works normally. >>> >>> Any help would be greatly appreciated! Several people in my team >>> looked at the problem. Google and the mailing list archive did not >>> provide any clue. I believe that from an MPI standpoint, my test >>> program is valid (and it works with TCP, which make me feel better >>> about the sequence of MPI calls) >>> >>> Regards, >>> Philippe. >>> >>> >>> >>> Background: >>> >>> I intend to use openMPI to transport data inside a much larger >>> application. Because of that, I cannot used mpiexec. Each process is >>> started by our own "job management" and use a name server to find >>> about each others. Once all the clients are connected, I would like >>> the server to do MPI_Recv to get the data from all the client. I dont >>> care about the order or which client are sending data, as long as I >>> can receive it with on call. Do do that, the clients and the server >>> are going through a series of Comm_accept/Conn_connect/Intercomm_merge >>> so that at the end, all the clients and the server are inside the same >>> intracomm. >>> >>> Steps: >>> >>> I have a sample program that show the issue. I tried to make it as >>> short as possible. It needs to be executed on a shared file system >>> like NFS because the server write the port info to a file that the >>> client will read. To reproduce the issue, the following steps should >>> be performed: >>> >>> 0. compile the test with "mpicc -o ben12 ben12.c" >>> 1. ssh to the machine that will be the server >>> 2. run ./ben12 3 1 >>> 3. ssh to the machine that will be the client #1 >>> 4. run ./ben12 3 0 >>> 5. repeat step 3-4 for client #2 and #3 >>> >>> the server accept the connection from client #1 and merge it in a new >>> intracomm. It then accept connection from client #2 and merge it. when >>> the client #3 arrives, the server accept the connection, but that >>> cause client #1 and #2 to die with the error above (see the complete >>> trace in the tarball). >>> >>> The exact steps are: >>> >>> - server open port >>> - server does accept >>> - client #1 does connect >>> - server and client #1 do merge >>> - server does accept >>> - client #2 does connect >>> - server, client #1 and client #2 do merge >>> - server does accept >>> - client #3 does connect >>> - server, client #1, client #2 and client #3 do merge >>> >>> >>> My infiniband network works normally with other test programs or >>> applications (MPI or others like Verbs). >>> >>> Info about my setup: >>> >>>openMPI version = 1.4.1 (I also tried 1.4.2, nightly snapshot of >>> 1.4.3, nightly snapshot of 1.5 --- all show the same error) >>>config.log in the tarball >>>"ompi_info --all" in the tarball >>>OFED version = 1.3 installed from RHEL 5.3 >>>Distro = RedHat Entreprise Linux 5.3 >>>Kernel = 2.6.18-128.4.1.el5
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
On Jul 18, 2010, at 4:09 PM, Philippe wrote: > Ralph, > > thanks for investigating. > > I've applied the two patches you mentioned earlier and ran with the > ompi server. Although i was able to runn our standalone test, when I > integrated the changes to our code, the processes entered a crazy loop > and allocated all the memory available when calling MPI_Port_Connect. > I was not able to identify why it works standalone but not integrated > with our code. If I found why, I'll let your know. How many processes are we talking about? > > looking forward to your findings. We'll be happy to test any patches > if you have some! > > p. > > On Sat, Jul 17, 2010 at 9:47 PM, Ralph Castain wrote: >> Okay, I can reproduce this problem. Frankly, I don't think this ever worked >> with OMPI, and I'm not sure how the choice of BTL makes a difference. >> >> The program is crashing in the communicator definition, which involves a >> communication over our internal out-of-band messaging system. That system >> has zero connection to any BTL, so it should crash either way. >> >> Regardless, I will play with this a little as time allows. Thanks for the >> reproducer! >> >> >> On Jun 25, 2010, at 7:23 AM, Philippe wrote: >> >>> Hi, >>> >>> I'm trying to run a test program which consists of a server creating a >>> port using MPI_Open_port and N clients using MPI_Comm_connect to >>> connect to the server. >>> >>> I'm able to do so with 1 server and 2 clients, but with 1 server + 3 >>> clients, I get the following error message: >>> >>> [node003:32274] [[37084,0],0]:route_callback tried routing message >>> from [[37084,1],0] to [[40912,1],0]:102, can't find route >>> >>> This is only happening with the openib BTL. With tcp BTL it works >>> perfectly fine (ofud also works as a matter of fact...). This has been >>> tested on two completely different clusters, with identical results. >>> In either cases, the IB frabic works normally. >>> >>> Any help would be greatly appreciated! Several people in my team >>> looked at the problem. Google and the mailing list archive did not >>> provide any clue. I believe that from an MPI standpoint, my test >>> program is valid (and it works with TCP, which make me feel better >>> about the sequence of MPI calls) >>> >>> Regards, >>> Philippe. >>> >>> >>> >>> Background: >>> >>> I intend to use openMPI to transport data inside a much larger >>> application. Because of that, I cannot used mpiexec. Each process is >>> started by our own "job management" and use a name server to find >>> about each others. Once all the clients are connected, I would like >>> the server to do MPI_Recv to get the data from all the client. I dont >>> care about the order or which client are sending data, as long as I >>> can receive it with on call. Do do that, the clients and the server >>> are going through a series of Comm_accept/Conn_connect/Intercomm_merge >>> so that at the end, all the clients and the server are inside the same >>> intracomm. >>> >>> Steps: >>> >>> I have a sample program that show the issue. I tried to make it as >>> short as possible. It needs to be executed on a shared file system >>> like NFS because the server write the port info to a file that the >>> client will read. To reproduce the issue, the following steps should >>> be performed: >>> >>> 0. compile the test with "mpicc -o ben12 ben12.c" >>> 1. ssh to the machine that will be the server >>> 2. run ./ben12 3 1 >>> 3. ssh to the machine that will be the client #1 >>> 4. run ./ben12 3 0 >>> 5. repeat step 3-4 for client #2 and #3 >>> >>> the server accept the connection from client #1 and merge it in a new >>> intracomm. It then accept connection from client #2 and merge it. when >>> the client #3 arrives, the server accept the connection, but that >>> cause client #1 and #2 to die with the error above (see the complete >>> trace in the tarball). >>> >>> The exact steps are: >>> >>> - server open port >>> - server does accept >>> - client #1 does connect >>> - server and client #1 do merge >>> - server does accept >>> - client #2 does connect >>> - server, client #1 and client #2 do merge >>> - server does accept >>> - client #3 does connect >>> - server, client #1, client #2 and client #3 do merge >>> >>> >>> My infiniband network works normally with other test programs or >>> applications (MPI or others like Verbs). >>> >>> Info about my setup: >>> >>>openMPI version = 1.4.1 (I also tried 1.4.2, nightly snapshot of >>> 1.4.3, nightly snapshot of 1.5 --- all show the same error) >>>config.log in the tarball >>>"ompi_info --all" in the tarball >>>OFED version = 1.3 installed from RHEL 5.3 >>>Distro = RedHat Entreprise Linux 5.3 >>>Kernel = 2.6.18-128.4.1.el5 x86_64 >>>subnet manager = built-in SM from the cisco/topspin switch >>>output of ibv_devinfo included in the tarball (there are no "bad" nodes) >>>"ulimit -l" says "un
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
Ralph, thanks for investigating. I've applied the two patches you mentioned earlier and ran with the ompi server. Although i was able to runn our standalone test, when I integrated the changes to our code, the processes entered a crazy loop and allocated all the memory available when calling MPI_Port_Connect. I was not able to identify why it works standalone but not integrated with our code. If I found why, I'll let your know. looking forward to your findings. We'll be happy to test any patches if you have some! p. On Sat, Jul 17, 2010 at 9:47 PM, Ralph Castain wrote: > Okay, I can reproduce this problem. Frankly, I don't think this ever worked > with OMPI, and I'm not sure how the choice of BTL makes a difference. > > The program is crashing in the communicator definition, which involves a > communication over our internal out-of-band messaging system. That system has > zero connection to any BTL, so it should crash either way. > > Regardless, I will play with this a little as time allows. Thanks for the > reproducer! > > > On Jun 25, 2010, at 7:23 AM, Philippe wrote: > >> Hi, >> >> I'm trying to run a test program which consists of a server creating a >> port using MPI_Open_port and N clients using MPI_Comm_connect to >> connect to the server. >> >> I'm able to do so with 1 server and 2 clients, but with 1 server + 3 >> clients, I get the following error message: >> >> [node003:32274] [[37084,0],0]:route_callback tried routing message >> from [[37084,1],0] to [[40912,1],0]:102, can't find route >> >> This is only happening with the openib BTL. With tcp BTL it works >> perfectly fine (ofud also works as a matter of fact...). This has been >> tested on two completely different clusters, with identical results. >> In either cases, the IB frabic works normally. >> >> Any help would be greatly appreciated! Several people in my team >> looked at the problem. Google and the mailing list archive did not >> provide any clue. I believe that from an MPI standpoint, my test >> program is valid (and it works with TCP, which make me feel better >> about the sequence of MPI calls) >> >> Regards, >> Philippe. >> >> >> >> Background: >> >> I intend to use openMPI to transport data inside a much larger >> application. Because of that, I cannot used mpiexec. Each process is >> started by our own "job management" and use a name server to find >> about each others. Once all the clients are connected, I would like >> the server to do MPI_Recv to get the data from all the client. I dont >> care about the order or which client are sending data, as long as I >> can receive it with on call. Do do that, the clients and the server >> are going through a series of Comm_accept/Conn_connect/Intercomm_merge >> so that at the end, all the clients and the server are inside the same >> intracomm. >> >> Steps: >> >> I have a sample program that show the issue. I tried to make it as >> short as possible. It needs to be executed on a shared file system >> like NFS because the server write the port info to a file that the >> client will read. To reproduce the issue, the following steps should >> be performed: >> >> 0. compile the test with "mpicc -o ben12 ben12.c" >> 1. ssh to the machine that will be the server >> 2. run ./ben12 3 1 >> 3. ssh to the machine that will be the client #1 >> 4. run ./ben12 3 0 >> 5. repeat step 3-4 for client #2 and #3 >> >> the server accept the connection from client #1 and merge it in a new >> intracomm. It then accept connection from client #2 and merge it. when >> the client #3 arrives, the server accept the connection, but that >> cause client #1 and #2 to die with the error above (see the complete >> trace in the tarball). >> >> The exact steps are: >> >> - server open port >> - server does accept >> - client #1 does connect >> - server and client #1 do merge >> - server does accept >> - client #2 does connect >> - server, client #1 and client #2 do merge >> - server does accept >> - client #3 does connect >> - server, client #1, client #2 and client #3 do merge >> >> >> My infiniband network works normally with other test programs or >> applications (MPI or others like Verbs). >> >> Info about my setup: >> >> openMPI version = 1.4.1 (I also tried 1.4.2, nightly snapshot of >> 1.4.3, nightly snapshot of 1.5 --- all show the same error) >> config.log in the tarball >> "ompi_info --all" in the tarball >> OFED version = 1.3 installed from RHEL 5.3 >> Distro = RedHat Entreprise Linux 5.3 >> Kernel = 2.6.18-128.4.1.el5 x86_64 >> subnet manager = built-in SM from the cisco/topspin switch >> output of ibv_devinfo included in the tarball (there are no "bad" nodes) >> "ulimit -l" says "unlimited" >> >> The tarball contains: >> >> - ben12.c: my test program showing the behavior >> - config.log / config.out / make.out / make-install.out / >> ifconfig.txt / ibv-devinfo.txt / ompi_info.txt >> - trace-tcp.txt: output of the server and
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
Okay, I can reproduce this problem. Frankly, I don't think this ever worked with OMPI, and I'm not sure how the choice of BTL makes a difference. The program is crashing in the communicator definition, which involves a communication over our internal out-of-band messaging system. That system has zero connection to any BTL, so it should crash either way. Regardless, I will play with this a little as time allows. Thanks for the reproducer! On Jun 25, 2010, at 7:23 AM, Philippe wrote: > Hi, > > I'm trying to run a test program which consists of a server creating a > port using MPI_Open_port and N clients using MPI_Comm_connect to > connect to the server. > > I'm able to do so with 1 server and 2 clients, but with 1 server + 3 > clients, I get the following error message: > > [node003:32274] [[37084,0],0]:route_callback tried routing message > from [[37084,1],0] to [[40912,1],0]:102, can't find route > > This is only happening with the openib BTL. With tcp BTL it works > perfectly fine (ofud also works as a matter of fact...). This has been > tested on two completely different clusters, with identical results. > In either cases, the IB frabic works normally. > > Any help would be greatly appreciated! Several people in my team > looked at the problem. Google and the mailing list archive did not > provide any clue. I believe that from an MPI standpoint, my test > program is valid (and it works with TCP, which make me feel better > about the sequence of MPI calls) > > Regards, > Philippe. > > > > Background: > > I intend to use openMPI to transport data inside a much larger > application. Because of that, I cannot used mpiexec. Each process is > started by our own "job management" and use a name server to find > about each others. Once all the clients are connected, I would like > the server to do MPI_Recv to get the data from all the client. I dont > care about the order or which client are sending data, as long as I > can receive it with on call. Do do that, the clients and the server > are going through a series of Comm_accept/Conn_connect/Intercomm_merge > so that at the end, all the clients and the server are inside the same > intracomm. > > Steps: > > I have a sample program that show the issue. I tried to make it as > short as possible. It needs to be executed on a shared file system > like NFS because the server write the port info to a file that the > client will read. To reproduce the issue, the following steps should > be performed: > > 0. compile the test with "mpicc -o ben12 ben12.c" > 1. ssh to the machine that will be the server > 2. run ./ben12 3 1 > 3. ssh to the machine that will be the client #1 > 4. run ./ben12 3 0 > 5. repeat step 3-4 for client #2 and #3 > > the server accept the connection from client #1 and merge it in a new > intracomm. It then accept connection from client #2 and merge it. when > the client #3 arrives, the server accept the connection, but that > cause client #1 and #2 to die with the error above (see the complete > trace in the tarball). > > The exact steps are: > > - server open port > - server does accept > - client #1 does connect > - server and client #1 do merge > - server does accept > - client #2 does connect > - server, client #1 and client #2 do merge > - server does accept > - client #3 does connect > - server, client #1, client #2 and client #3 do merge > > > My infiniband network works normally with other test programs or > applications (MPI or others like Verbs). > > Info about my setup: > >openMPI version = 1.4.1 (I also tried 1.4.2, nightly snapshot of > 1.4.3, nightly snapshot of 1.5 --- all show the same error) >config.log in the tarball >"ompi_info --all" in the tarball >OFED version = 1.3 installed from RHEL 5.3 >Distro = RedHat Entreprise Linux 5.3 >Kernel = 2.6.18-128.4.1.el5 x86_64 >subnet manager = built-in SM from the cisco/topspin switch >output of ibv_devinfo included in the tarball (there are no "bad" nodes) >"ulimit -l" says "unlimited" > > The tarball contains: > > - ben12.c: my test program showing the behavior > - config.log / config.out / make.out / make-install.out / > ifconfig.txt / ibv-devinfo.txt / ompi_info.txt > - trace-tcp.txt: output of the server and each client when it works > with TCP (I added "btl = tcp,self" in ~/.openmpi/mca-params.conf) > - trace-ib.txt: output of the server and each client when it fails > with IB (I added "btl = openib,self" in ~/.openmpi/mca-params.conf) > > I hope I provided enough info for somebody to reproduce the problem... > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
Reopening this thread. In searching another problem I ran across this one in a different context. Turns out there really is a bug here that needs to be addressed. I'll try to tackle it this weekend - will update you when done. On Jun 25, 2010, at 7:23 AM, Philippe wrote: > Hi, > > I'm trying to run a test program which consists of a server creating a > port using MPI_Open_port and N clients using MPI_Comm_connect to > connect to the server. > > I'm able to do so with 1 server and 2 clients, but with 1 server + 3 > clients, I get the following error message: > > [node003:32274] [[37084,0],0]:route_callback tried routing message > from [[37084,1],0] to [[40912,1],0]:102, can't find route > > This is only happening with the openib BTL. With tcp BTL it works > perfectly fine (ofud also works as a matter of fact...). This has been > tested on two completely different clusters, with identical results. > In either cases, the IB frabic works normally. > > Any help would be greatly appreciated! Several people in my team > looked at the problem. Google and the mailing list archive did not > provide any clue. I believe that from an MPI standpoint, my test > program is valid (and it works with TCP, which make me feel better > about the sequence of MPI calls) > > Regards, > Philippe. > > > > Background: > > I intend to use openMPI to transport data inside a much larger > application. Because of that, I cannot used mpiexec. Each process is > started by our own "job management" and use a name server to find > about each others. Once all the clients are connected, I would like > the server to do MPI_Recv to get the data from all the client. I dont > care about the order or which client are sending data, as long as I > can receive it with on call. Do do that, the clients and the server > are going through a series of Comm_accept/Conn_connect/Intercomm_merge > so that at the end, all the clients and the server are inside the same > intracomm. > > Steps: > > I have a sample program that show the issue. I tried to make it as > short as possible. It needs to be executed on a shared file system > like NFS because the server write the port info to a file that the > client will read. To reproduce the issue, the following steps should > be performed: > > 0. compile the test with "mpicc -o ben12 ben12.c" > 1. ssh to the machine that will be the server > 2. run ./ben12 3 1 > 3. ssh to the machine that will be the client #1 > 4. run ./ben12 3 0 > 5. repeat step 3-4 for client #2 and #3 > > the server accept the connection from client #1 and merge it in a new > intracomm. It then accept connection from client #2 and merge it. when > the client #3 arrives, the server accept the connection, but that > cause client #1 and #2 to die with the error above (see the complete > trace in the tarball). > > The exact steps are: > > - server open port > - server does accept > - client #1 does connect > - server and client #1 do merge > - server does accept > - client #2 does connect > - server, client #1 and client #2 do merge > - server does accept > - client #3 does connect > - server, client #1, client #2 and client #3 do merge > > > My infiniband network works normally with other test programs or > applications (MPI or others like Verbs). > > Info about my setup: > >openMPI version = 1.4.1 (I also tried 1.4.2, nightly snapshot of > 1.4.3, nightly snapshot of 1.5 --- all show the same error) >config.log in the tarball >"ompi_info --all" in the tarball >OFED version = 1.3 installed from RHEL 5.3 >Distro = RedHat Entreprise Linux 5.3 >Kernel = 2.6.18-128.4.1.el5 x86_64 >subnet manager = built-in SM from the cisco/topspin switch >output of ibv_devinfo included in the tarball (there are no "bad" nodes) >"ulimit -l" says "unlimited" > > The tarball contains: > > - ben12.c: my test program showing the behavior > - config.log / config.out / make.out / make-install.out / > ifconfig.txt / ibv-devinfo.txt / ompi_info.txt > - trace-tcp.txt: output of the server and each client when it works > with TCP (I added "btl = tcp,self" in ~/.openmpi/mca-params.conf) > - trace-ib.txt: output of the server and each client when it fails > with IB (I added "btl = openib,self" in ~/.openmpi/mca-params.conf) > > I hope I provided enough info for somebody to reproduce the problem... > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
Sorry for the delayed response - Brad asked if I could comment on this. I'm afraid your application, as written, isn't going to work because the rendezvous protocol isn't correct. You cannot just write a port to a file and have the other side of a connect/accept read it. The reason for this is that OMPI needs to route its out-of-band communications, and needs some handshake to get that setup. If we don't route those communications, we consume way too many ports on nodes of large machines, and thus cannot run large jobs. If you want to do this, you need three things: 1. you have to run our "ompi-server" program on a node where all MPI processes can reach it. This program serves as the central rendezvous point. See "man ompi-server" for info. 2. you'll need a patch I provided to some other users that allows singletons to connect to ompi-server without first spawning their own daemon. Otherwise, you get an OMPI daemon ("orted") started for every one of your clients. 3. you'll need the patch I'm just completing that allows you to have more than 64 singletons connecting together, otherwise you'll just segfault. Each of your clients looks like a singleton to us because it wasn't started with mpiexec. I suspect your test works because (a) TCP interconnects differently than IB and doesn't talk via OOB to do it, and thus you made it further (but would still fail at some point when OOB was required), and (b) you were running fewer than 64 clients. HTH Ralph On Jun 25, 2010, at 7:23 AM, Philippe wrote: > Hi, > > I'm trying to run a test program which consists of a server creating a > port using MPI_Open_port and N clients using MPI_Comm_connect to > connect to the server. > > I'm able to do so with 1 server and 2 clients, but with 1 server + 3 > clients, I get the following error message: > > [node003:32274] [[37084,0],0]:route_callback tried routing message > from [[37084,1],0] to [[40912,1],0]:102, can't find route > > This is only happening with the openib BTL. With tcp BTL it works > perfectly fine (ofud also works as a matter of fact...). This has been > tested on two completely different clusters, with identical results. > In either cases, the IB frabic works normally. > > Any help would be greatly appreciated! Several people in my team > looked at the problem. Google and the mailing list archive did not > provide any clue. I believe that from an MPI standpoint, my test > program is valid (and it works with TCP, which make me feel better > about the sequence of MPI calls) > > Regards, > Philippe. > > > > Background: > > I intend to use openMPI to transport data inside a much larger > application. Because of that, I cannot used mpiexec. Each process is > started by our own "job management" and use a name server to find > about each others. Once all the clients are connected, I would like > the server to do MPI_Recv to get the data from all the client. I dont > care about the order or which client are sending data, as long as I > can receive it with on call. Do do that, the clients and the server > are going through a series of Comm_accept/Conn_connect/Intercomm_merge > so that at the end, all the clients and the server are inside the same > intracomm. > > Steps: > > I have a sample program that show the issue. I tried to make it as > short as possible. It needs to be executed on a shared file system > like NFS because the server write the port info to a file that the > client will read. To reproduce the issue, the following steps should > be performed: > > 0. compile the test with "mpicc -o ben12 ben12.c" > 1. ssh to the machine that will be the server > 2. run ./ben12 3 1 > 3. ssh to the machine that will be the client #1 > 4. run ./ben12 3 0 > 5. repeat step 3-4 for client #2 and #3 > > the server accept the connection from client #1 and merge it in a new > intracomm. It then accept connection from client #2 and merge it. when > the client #3 arrives, the server accept the connection, but that > cause client #1 and #2 to die with the error above (see the complete > trace in the tarball). > > The exact steps are: > > - server open port > - server does accept > - client #1 does connect > - server and client #1 do merge > - server does accept > - client #2 does connect > - server, client #1 and client #2 do merge > - server does accept > - client #3 does connect > - server, client #1, client #2 and client #3 do merge > > > My infiniband network works normally with other test programs or > applications (MPI or others like Verbs). > > Info about my setup: > >openMPI version = 1.4.1 (I also tried 1.4.2, nightly snapshot of > 1.4.3, nightly snapshot of 1.5 --- all show the same error) >config.log in the tarball >"ompi_info --all" in the tarball >OFED version = 1.3 installed from RHEL 5.3 >Distro = RedHat Entreprise Linux 5.3 >Kernel = 2.6.18-128.4.1.el5 x86_64 >subnet manager = built-in SM from the cisco
[OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand
Hi, I'm trying to run a test program which consists of a server creating a port using MPI_Open_port and N clients using MPI_Comm_connect to connect to the server. I'm able to do so with 1 server and 2 clients, but with 1 server + 3 clients, I get the following error message: [node003:32274] [[37084,0],0]:route_callback tried routing message from [[37084,1],0] to [[40912,1],0]:102, can't find route This is only happening with the openib BTL. With tcp BTL it works perfectly fine (ofud also works as a matter of fact...). This has been tested on two completely different clusters, with identical results. In either cases, the IB frabic works normally. Any help would be greatly appreciated! Several people in my team looked at the problem. Google and the mailing list archive did not provide any clue. I believe that from an MPI standpoint, my test program is valid (and it works with TCP, which make me feel better about the sequence of MPI calls) Regards, Philippe. Background: I intend to use openMPI to transport data inside a much larger application. Because of that, I cannot used mpiexec. Each process is started by our own "job management" and use a name server to find about each others. Once all the clients are connected, I would like the server to do MPI_Recv to get the data from all the client. I dont care about the order or which client are sending data, as long as I can receive it with on call. Do do that, the clients and the server are going through a series of Comm_accept/Conn_connect/Intercomm_merge so that at the end, all the clients and the server are inside the same intracomm. Steps: I have a sample program that show the issue. I tried to make it as short as possible. It needs to be executed on a shared file system like NFS because the server write the port info to a file that the client will read. To reproduce the issue, the following steps should be performed: 0. compile the test with "mpicc -o ben12 ben12.c" 1. ssh to the machine that will be the server 2. run ./ben12 3 1 3. ssh to the machine that will be the client #1 4. run ./ben12 3 0 5. repeat step 3-4 for client #2 and #3 the server accept the connection from client #1 and merge it in a new intracomm. It then accept connection from client #2 and merge it. when the client #3 arrives, the server accept the connection, but that cause client #1 and #2 to die with the error above (see the complete trace in the tarball). The exact steps are: - server open port - server does accept - client #1 does connect - server and client #1 do merge - server does accept - client #2 does connect - server, client #1 and client #2 do merge - server does accept - client #3 does connect - server, client #1, client #2 and client #3 do merge My infiniband network works normally with other test programs or applications (MPI or others like Verbs). Info about my setup: openMPI version = 1.4.1 (I also tried 1.4.2, nightly snapshot of 1.4.3, nightly snapshot of 1.5 --- all show the same error) config.log in the tarball "ompi_info --all" in the tarball OFED version = 1.3 installed from RHEL 5.3 Distro = RedHat Entreprise Linux 5.3 Kernel = 2.6.18-128.4.1.el5 x86_64 subnet manager = built-in SM from the cisco/topspin switch output of ibv_devinfo included in the tarball (there are no "bad" nodes) "ulimit -l" says "unlimited" The tarball contains: - ben12.c: my test program showing the behavior - config.log / config.out / make.out / make-install.out / ifconfig.txt / ibv-devinfo.txt / ompi_info.txt - trace-tcp.txt: output of the server and each client when it works with TCP (I added "btl = tcp,self" in ~/.openmpi/mca-params.conf) - trace-ib.txt: output of the server and each client when it fails with IB (I added "btl = openib,self" in ~/.openmpi/mca-params.conf) I hope I provided enough info for somebody to reproduce the problem... ompi-output.tar.bz2 Description: BZip2 compressed data