Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-24 Thread Ralph Castain
Yes, that's fine. Thx! On Aug 24, 2010, at 9:02 AM, Philippe wrote: > awesome, I'll give it a spin! with the parameters as below? > > p. > > On Tue, Aug 24, 2010 at 10:47 AM, Ralph Castain wrote: >> I think I have this working now - try anything on or after r23647 >> >>

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-24 Thread Ralph Castain
I think I have this working now - try anything on or after r23647 On Aug 23, 2010, at 1:36 PM, Philippe wrote: > sure. I took a guess at ppn and nodes for the case where 2 processes > are on the same node... I dont claim these are the right values ;-) > > > > c0301b10e1 ~/mpi> env|grep OMPI

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-23 Thread Philippe
sure. I took a guess at ppn and nodes for the case where 2 processes are on the same node... I dont claim these are the right values ;-) c0301b10e1 ~/mpi> env|grep OMPI OMPI_MCA_orte_nodes=c0301b10e1 OMPI_MCA_orte_rank=0 OMPI_MCA_orte_ppn=2 OMPI_MCA_orte_num_procs=2

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-23 Thread Ralph Castain
Can you send me the values you are using for the relevant envars? That way I can try to replicate here On Aug 23, 2010, at 1:15 PM, Philippe wrote: > I took a look at the code but I'm afraid I dont see anything wrong. > > p. > > On Thu, Aug 19, 2010 at 2:32 PM, Ralph Castain

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-23 Thread Philippe
I took a look at the code but I'm afraid I dont see anything wrong. p. On Thu, Aug 19, 2010 at 2:32 PM, Ralph Castain wrote: > Yes, that is correct - we reserve the first port in the range for a daemon, > should one exist. > The problem is clearly that get_node_rank is

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-19 Thread Ralph Castain
Yes, that is correct - we reserve the first port in the range for a daemon, should one exist. The problem is clearly that get_node_rank is returning the wrong value for the second process (your rank=1). If you want to dig deeper, look at the orte/mca/ess/generic code where it generates the nidmap

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-19 Thread Philippe
Ralph, somewhere in ./orte/mca/oob/tcp/oob_tcp.c, there is this comment: orte_node_rank_t nrank; /* do I know my node_local_rank yet? */ if (ORTE_NODE_RANK_INVALID != (nrank = orte_ess.get_node_rank(ORTE_PROC_MY_NAME)) &&

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-19 Thread Ralph Castain
Something doesn't look right - here is what the algo attempts to do: given a port range of 1-12000, the lowest rank'd process on the node should open port 1. The next lowest rank on the node will open 10001, etc. So it looks to me like there is some confusion in the local rank algo. I'll

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-19 Thread Philippe
Ralph, I'm able to use the generic module when the processes are on different machines. what would be the values of the EV when two processes are on the same machine (hopefully talking over SHM). i've played with combination of nodelist and ppn but no luck. I get errors like:

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-27 Thread Ralph Castain
Use what hostname returns - don't worry about IP addresses as we'll discover them. On Jul 26, 2010, at 10:45 PM, Philippe wrote: > Thanks a lot! > > now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our > nodes have a short/long name (it's rhel 5.x, so the command hostname >

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-27 Thread Ralph Castain
Doh - yes it should! I'll fix it right now. Thanks! On Jul 26, 2010, at 9:28 PM, Philippe wrote: > Ralph, > > i was able to test the generic module and it seems to be working. > > one question tho, the function orte_ess_generic_component_query in >

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-27 Thread Philippe
Ralph, i was able to test the generic module and it seems to be working. one question tho, the function orte_ess_generic_component_query in "orte/mca/ess/generic/ess_generic_component.c" calls getenv with the argument "OMPI_MCA_enc", which seems to cause the module to fail to load. shouldnt it

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-22 Thread Ralph Castain
Dev trunk looks okay right now - I think you'll be fine using it. My new component -might- work with 1.5, but probably not with 1.4. I haven't checked either of them. Anything at r23478 or above will have the new module. Let me know how it works for you. I haven't tested it myself, but am

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-22 Thread Philippe
Ralph, Thank you so much!! I'll give it a try and let you know. I know it's a tough question, but how stable is the dev trunk? Can I just grab the latest and run, or am I better off taking your changes and copy them back in a stable release? (if so, which one? 1.4? 1.5?) p. On Thu, Jul 22,

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-22 Thread Ralph Castain
It was easier for me to just construct this module than to explain how to do so :-) I will commit it this evening (couple of hours from now) as that is our standard practice. You'll need to use the developer's trunk, though, to use it. Here are the envars you'll need to provide: Each process

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-22 Thread Philippe
On Wed, Jul 21, 2010 at 10:44 AM, Ralph Castain wrote: > > On Jul 21, 2010, at 7:44 AM, Philippe wrote: > >> Ralph, >> >> Sorry for the late reply -- I was away on vacation. > > no problem at all! > >> >> regarding your earlier question about how many processes where >>

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-21 Thread Ralph Castain
On Jul 21, 2010, at 7:44 AM, Philippe wrote: > Ralph, > > Sorry for the late reply -- I was away on vacation. no problem at all! > > regarding your earlier question about how many processes where > involved when the memory was entirely allocated, it was only two, a > sender and a receiver.

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-21 Thread Philippe
Ralph, Sorry for the late reply -- I was away on vacation. regarding your earlier question about how many processes where involved when the memory was entirely allocated, it was only two, a sender and a receiver. I'm still trying to pinpoint what can be different between the standalone case and

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-20 Thread Ralph Castain
Well, I finally managed to make this work without the required ompi-server rendezvous point. The fix is only in the devel trunk right now - I'll have to ask the release managers for 1.5 and 1.4 if they want it ported to those series. On the notion of integrating OMPI to your launch environment:

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-19 Thread Ralph Castain
I'm wondering if we can't make this simpler. What launch environment are you operating under? I know you said you can't use mpiexec, but I'm wondering if we could add support for your environment to mpiexec so you could. On Jul 18, 2010, at 4:09 PM, Philippe wrote: > Ralph, > > thanks for

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-19 Thread Ralph Castain
On Jul 18, 2010, at 4:09 PM, Philippe wrote: > Ralph, > > thanks for investigating. > > I've applied the two patches you mentioned earlier and ran with the > ompi server. Although i was able to runn our standalone test, when I > integrated the changes to our code, the processes entered a crazy

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-18 Thread Philippe
Ralph, thanks for investigating. I've applied the two patches you mentioned earlier and ran with the ompi server. Although i was able to runn our standalone test, when I integrated the changes to our code, the processes entered a crazy loop and allocated all the memory available when calling

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-18 Thread Ralph Castain
Okay, I can reproduce this problem. Frankly, I don't think this ever worked with OMPI, and I'm not sure how the choice of BTL makes a difference. The program is crashing in the communicator definition, which involves a communication over our internal out-of-band messaging system. That system

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-17 Thread Ralph Castain
Reopening this thread. In searching another problem I ran across this one in a different context. Turns out there really is a bug here that needs to be addressed. I'll try to tackle it this weekend - will update you when done. On Jun 25, 2010, at 7:23 AM, Philippe wrote: > Hi, > > I'm

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-12 Thread Ralph Castain
Sorry for the delayed response - Brad asked if I could comment on this. I'm afraid your application, as written, isn't going to work because the rendezvous protocol isn't correct. You cannot just write a port to a file and have the other side of a connect/accept read it. The reason for this is

[OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-06-25 Thread Philippe
Hi, I'm trying to run a test program which consists of a server creating a port using MPI_Open_port and N clients using MPI_Comm_connect to connect to the server. I'm able to do so with 1 server and 2 clients, but with 1 server + 3 clients, I get the following error message: [node003:32274]