Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-14 Thread Galen Shipman


The patch applies to ib_multifrag as is without a conflict. But the  
branch
doesn't compile with or without the patch so I was not able to test  
it.

Do you have some uncommitted changes that may generate a conflict? Can
you commit them so they can be resolved? If there is no conflict  
between

your work and this patch may be it is a good idea to commit it to your
branch and trunk for testing?



I have a whole pile of changes that need to be committed, and even  
with these changes, it still doesn't compile as I am reworking names,  
and data structures, etc.
I will commit what I have now, and will work on this a bit more over  
the weekend.

- Galen






Thanks,

Galen


On Jun 13, 2007, at 7:27 AM, Gleb Natapov wrote:


Hello everyone,

  I encountered a problem with openib on depend connection code.
Basically
it works only by pure luck if you have more then one endpoint for
the same
proc and sometimes breaks in mysterious ways.

The algo works like this: A wants to connect to B so it creates QP
and sends it
to B. B receives the QP from A and looks for endpoint that is not
yet associated
with remote endpoint, creates QP for it and sends info back. Now A
receives
the QP and goes through the same logic as B i.e looks for endpoint
that is not
yet connected, BUT there is no guaranty that it will find the
endpoint that
initiated the connection in the first place! And if it finds
another one it will
create QP for it and will send it back to B and so on and so forth.
In the end
I sometimes receive a peculiar mesh of connection where no QP has a
connection
back to it from the peer process.

To overcome this problem B needs to send back some info that will
allow A to
determine the endpoint that initiated a connection request. The
lid:qp pair
will allow for this. But even then the problem will remain if two
procs initiate
connection at the same time. To dial with simultaneous connection
asymmetry
protocol have to be used one peer became master another slave.
Slave alway
initiate a connection to master. Master choose local endpoint to
satisfy
incoming request and sends info back to a slave. If master wants to
initiate a
connection it send message to a slave and slave initiate connection
back to
master.

Included patch implements an algorithm described above and work for
all
scenarios for which current code fails to create a connection.

--
Gleb.

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-14 Thread Jeff Squyres

On Jun 14, 2007, at 7:11 AM, Jeff Squyres wrote:


Now I see that my fix was in the right place, but still a little bit
wrong. I committed a fix to my fix in r15073. Can you check it?


My cluster is still running MTT from last night; I'll need to wait
for several jobs to finish.  I'll check it later today.


I got a test job to run in in the middle of other MTT runs.  r15073  
seems to have fixed the problem; thanks.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-14 Thread Jeff Squyres

On Jun 14, 2007, at 6:32 AM, Gleb Natapov wrote:


794:mca_btl_openib_endpoint_recv] can't find suitable endpoint for
this peer


Now I see that my fix was in the right place, but still a little bit
wrong. I committed a fix to my fix in r15073. Can you check it?


My cluster is still running MTT from last night; I'll need to wait  
for several jobs to finish.  I'll check it later today.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-14 Thread Gleb Natapov
On Wed, Jun 13, 2007 at 07:08:51PM +0300, Gleb Natapov wrote:
> On Wed, Jun 13, 2007 at 09:38:21AM -0600, Galen Shipman wrote:
> > Hi Gleb,
> > 
> > As we have discussed before I am working on adding support for  
> > multiple QPs with either per peer resources or shared resources.
> > As a result of this I am trying to clean up a lot of the OpenIB code.  
> > It has grown up organically over the years and needs some attention.
> > Perhaps we can coordinate on commits or even work from the same temp  
> > branch to do an overall cleanup as well as addressing the issue you  
> > describe in this email.
> > 
> > I bring this up because this commit will conflict quite a bit with  
> > what I am working on, I can always merge it by hand but it may make  
> > sense for us to get this all done in one area and then bring it all  
> > over?
> 
> I am not committing this yet. I want people to review my logic and the
> patch. If the change is OK with everyone how cares then I want this
> change to go into 1.2 branch.
> 
> I don't care how this change will get to the trunk. I can use patched
> version for a while. If you branch is in working state right now I can
> merge this change into it tomorrow.

The patch applies to ib_multifrag as is without a conflict. But the branch
doesn't compile with or without the patch so I was not able to test it.
Do you have some uncommitted changes that may generate a conflict? Can
you commit them so they can be resolved? If there is no conflict between
your work and this patch may be it is a good idea to commit it to your
branch and trunk for testing?

> 
> > 
> > Thanks,
> > 
> > Galen
> > 
> > 
> > On Jun 13, 2007, at 7:27 AM, Gleb Natapov wrote:
> > 
> > > Hello everyone,
> > >
> > >   I encountered a problem with openib on depend connection code.  
> > > Basically
> > > it works only by pure luck if you have more then one endpoint for  
> > > the same
> > > proc and sometimes breaks in mysterious ways.
> > >
> > > The algo works like this: A wants to connect to B so it creates QP  
> > > and sends it
> > > to B. B receives the QP from A and looks for endpoint that is not  
> > > yet associated
> > > with remote endpoint, creates QP for it and sends info back. Now A  
> > > receives
> > > the QP and goes through the same logic as B i.e looks for endpoint  
> > > that is not
> > > yet connected, BUT there is no guaranty that it will find the  
> > > endpoint that
> > > initiated the connection in the first place! And if it finds  
> > > another one it will
> > > create QP for it and will send it back to B and so on and so forth.  
> > > In the end
> > > I sometimes receive a peculiar mesh of connection where no QP has a  
> > > connection
> > > back to it from the peer process.
> > >
> > > To overcome this problem B needs to send back some info that will  
> > > allow A to
> > > determine the endpoint that initiated a connection request. The  
> > > lid:qp pair
> > > will allow for this. But even then the problem will remain if two  
> > > procs initiate
> > > connection at the same time. To dial with simultaneous connection  
> > > asymmetry
> > > protocol have to be used one peer became master another slave.  
> > > Slave alway
> > > initiate a connection to master. Master choose local endpoint to  
> > > satisfy
> > > incoming request and sends info back to a slave. If master wants to  
> > > initiate a
> > > connection it send message to a slave and slave initiate connection  
> > > back to
> > > master.
> > >
> > > Included patch implements an algorithm described above and work for  
> > > all
> > > scenarios for which current code fails to create a connection.
> > >
> > > --
> > >   Gleb.
> > > 
> > > ___
> > > devel mailing list
> > > de...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > 
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> --
>   Gleb.
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Gleb.


Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Galen Shipman


On Jun 13, 2007, at 12:07 PM, Gleb Natapov wrote:


On Wed, Jun 13, 2007 at 02:05:00PM -0400, Jeff Squyres wrote:

On Jun 13, 2007, at 1:54 PM, Jeff Squyres wrote:


With today's trunk, I still see the problem:


Same thing happens on v1.2 branch.  I'll re-open #548.


I am sure it was never tested with multiple subnets. I'll try to get
such setup.


I tested  this with multiple subnets but it was quite some time ago.
- Galen



--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Jeff Squyres

On Jun 13, 2007, at 1:54 PM, Jeff Squyres wrote:


With today's trunk, I still see the problem:


Same thing happens on v1.2 branch.  I'll re-open #548.

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Galen Shipman


On Jun 13, 2007, at 11:33 AM, Jeff Squyres wrote:


On Jun 13, 2007, at 1:15 PM, Nysal Jan wrote:


There is a ticket (closed) here: https://svn.open-mpi.org/trac/ompi/
ticket/548
It was fixed by Galen for 1.2.


Ah -- I forgot to look at closed tickets.  I think we broke it again;
it certainly fails on the trunk (perhaps related to what Gleb
found?).  I did not test 1.2.


There is a FAQ entry also about this  http://www.open-mpi.org/faq/?
category=openfabrics#ofa-port-wireup


That's what it *should* be doing, but I wonder if that's what it
*actually* is doing.


So it has been a while but we tested this on our local cluster with  
differing number of ports and it worked, but I was doing simple ping- 
pongs.
If both sides try to open a connection at the same time however,  
badness can occur, from my understanding of this.

- Galen





--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Gleb Natapov
On Wed, Jun 13, 2007 at 12:45:01PM -0400, Jeff Squyres wrote:
> On Jun 13, 2007, at 12:08 PM, Gleb Natapov wrote:
> 
> > I am not committing this yet. I want people to review my logic and the
> > patch. If the change is OK with everyone how cares then I want this
> > change to go into 1.2 branch.
> >
> > I don't care how this change will get to the trunk. I can use patched
> > version for a while. If you branch is in working state right now I can
> > merge this change into it tomorrow.
> 
> I was just bitten yesterday by a problem that I've known about for a  
> while but had never gotten around to looking into (I could have sworn  
> that there was an open trac ticket on this, but I can't find one  
> anywhere).
> 
> I have 2 hosts: one with 3 active ports and one with 2 active ports.   
> If I run an MPI job between them, the openib BTL wireup got badly and  
> it aborts.  So handling a heterogeneous number of ports is not  
> currently handled properly in the code.
Are the all in the same subnet? If not I fixed some bug yesterday that
may help.

> 
> I don't know if Gleb's patch addresses this situation or not; I'll  
> look at his patch this afternoon.
> 
This patch address different problem.

--
Gleb.


Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Jeff Squyres

On Jun 13, 2007, at 1:15 PM, Nysal Jan wrote:

There is a ticket (closed) here: https://svn.open-mpi.org/trac/ompi/ 
ticket/548

It was fixed by Galen for 1.2.


Ah -- I forgot to look at closed tickets.  I think we broke it again;  
it certainly fails on the trunk (perhaps related to what Gleb  
found?).  I did not test 1.2.


There is a FAQ entry also about this  http://www.open-mpi.org/faq/? 
category=openfabrics#ofa-port-wireup


That's what it *should* be doing, but I wonder if that's what it  
*actually* is doing.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Nysal Jan

I was just bitten yesterday by a problem that I've known about for a

while but had never gotten around to looking into (I could have sworn
that there was an open trac ticket on this, but I can't find one
anywhere).

I have 2 hosts: one with 3 active ports and one with 2 active ports.
If I run an MPI job between them, the openib BTL wireup got badly and
it aborts.  So handling a heterogeneous number of ports is not
currently handled properly in the code.

I don't know if Gleb's patch addresses this situation or not; I'll
look at his patch this afternoon.




There is a ticket (closed) here:
https://svn.open-mpi.org/trac/ompi/ticket/548
It was fixed by Galen for 1.2. There is a FAQ entry also about this
http://www.open-mpi.org/faq/?category=openfabrics#ofa-port-wireup


Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Jeff Squyres
I wonder if this is bringing up the point that there are several of  
us working in the openib code base -- I wonder if it would be  
worthwhile to have a [short] teleconference to discuss what we're all  
doing in openib, where we're doing it (trunk, branch, whatever), when  
we expect to have it done, what version we need it in, etc.  Just a  
coordination kind of teleconference.  If people think this is a good  
idea, I can setup the call.


For example, don't forget that Nysal and I have the openib btl port- 
selection stuff off in /tmp/jnysal-openib-wireup (the btl_openib_if_ 
[in|ex]clude MCA params).  Per my prior e-mail, if no one objects, I  
will be bringing that stuff in to the trunk tomorrow evening (I'm  
pretty sure it won't conflict with what Galen is doing; Galen and I  
discussed on the phone this morning).





On Jun 13, 2007, at 11:38 AM, Galen Shipman wrote:


Hi Gleb,

As we have discussed before I am working on adding support for
multiple QPs with either per peer resources or shared resources.
As a result of this I am trying to clean up a lot of the OpenIB code.
It has grown up organically over the years and needs some attention.
Perhaps we can coordinate on commits or even work from the same temp
branch to do an overall cleanup as well as addressing the issue you
describe in this email.

I bring this up because this commit will conflict quite a bit with
what I am working on, I can always merge it by hand but it may make
sense for us to get this all done in one area and then bring it all
over?

Thanks,

Galen


On Jun 13, 2007, at 7:27 AM, Gleb Natapov wrote:


Hello everyone,

  I encountered a problem with openib on depend connection code.
Basically
it works only by pure luck if you have more then one endpoint for
the same
proc and sometimes breaks in mysterious ways.

The algo works like this: A wants to connect to B so it creates QP
and sends it
to B. B receives the QP from A and looks for endpoint that is not
yet associated
with remote endpoint, creates QP for it and sends info back. Now A
receives
the QP and goes through the same logic as B i.e looks for endpoint
that is not
yet connected, BUT there is no guaranty that it will find the
endpoint that
initiated the connection in the first place! And if it finds
another one it will
create QP for it and will send it back to B and so on and so forth.
In the end
I sometimes receive a peculiar mesh of connection where no QP has a
connection
back to it from the peer process.

To overcome this problem B needs to send back some info that will
allow A to
determine the endpoint that initiated a connection request. The
lid:qp pair
will allow for this. But even then the problem will remain if two
procs initiate
connection at the same time. To dial with simultaneous connection
asymmetry
protocol have to be used one peer became master another slave.
Slave alway
initiate a connection to master. Master choose local endpoint to
satisfy
incoming request and sends info back to a slave. If master wants to
initiate a
connection it send message to a slave and slave initiate connection
back to
master.

Included patch implements an algorithm described above and work for
all
scenarios for which current code fails to create a connection.

--
Gleb.

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Jeff Squyres

On Jun 13, 2007, at 12:08 PM, Gleb Natapov wrote:


I am not committing this yet. I want people to review my logic and the
patch. If the change is OK with everyone how cares then I want this
change to go into 1.2 branch.

I don't care how this change will get to the trunk. I can use patched
version for a while. If you branch is in working state right now I can
merge this change into it tomorrow.


I was just bitten yesterday by a problem that I've known about for a  
while but had never gotten around to looking into (I could have sworn  
that there was an open trac ticket on this, but I can't find one  
anywhere).


I have 2 hosts: one with 3 active ports and one with 2 active ports.   
If I run an MPI job between them, the openib BTL wireup got badly and  
it aborts.  So handling a heterogeneous number of ports is not  
currently handled properly in the code.


I don't know if Gleb's patch addresses this situation or not; I'll  
look at his patch this afternoon.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Gleb Natapov
On Wed, Jun 13, 2007 at 09:38:21AM -0600, Galen Shipman wrote:
> Hi Gleb,
> 
> As we have discussed before I am working on adding support for  
> multiple QPs with either per peer resources or shared resources.
> As a result of this I am trying to clean up a lot of the OpenIB code.  
> It has grown up organically over the years and needs some attention.
> Perhaps we can coordinate on commits or even work from the same temp  
> branch to do an overall cleanup as well as addressing the issue you  
> describe in this email.
> 
> I bring this up because this commit will conflict quite a bit with  
> what I am working on, I can always merge it by hand but it may make  
> sense for us to get this all done in one area and then bring it all  
> over?

I am not committing this yet. I want people to review my logic and the
patch. If the change is OK with everyone how cares then I want this
change to go into 1.2 branch.

I don't care how this change will get to the trunk. I can use patched
version for a while. If you branch is in working state right now I can
merge this change into it tomorrow.

> 
> Thanks,
> 
> Galen
> 
> 
> On Jun 13, 2007, at 7:27 AM, Gleb Natapov wrote:
> 
> > Hello everyone,
> >
> >   I encountered a problem with openib on depend connection code.  
> > Basically
> > it works only by pure luck if you have more then one endpoint for  
> > the same
> > proc and sometimes breaks in mysterious ways.
> >
> > The algo works like this: A wants to connect to B so it creates QP  
> > and sends it
> > to B. B receives the QP from A and looks for endpoint that is not  
> > yet associated
> > with remote endpoint, creates QP for it and sends info back. Now A  
> > receives
> > the QP and goes through the same logic as B i.e looks for endpoint  
> > that is not
> > yet connected, BUT there is no guaranty that it will find the  
> > endpoint that
> > initiated the connection in the first place! And if it finds  
> > another one it will
> > create QP for it and will send it back to B and so on and so forth.  
> > In the end
> > I sometimes receive a peculiar mesh of connection where no QP has a  
> > connection
> > back to it from the peer process.
> >
> > To overcome this problem B needs to send back some info that will  
> > allow A to
> > determine the endpoint that initiated a connection request. The  
> > lid:qp pair
> > will allow for this. But even then the problem will remain if two  
> > procs initiate
> > connection at the same time. To dial with simultaneous connection  
> > asymmetry
> > protocol have to be used one peer became master another slave.  
> > Slave alway
> > initiate a connection to master. Master choose local endpoint to  
> > satisfy
> > incoming request and sends info back to a slave. If master wants to  
> > initiate a
> > connection it send message to a slave and slave initiate connection  
> > back to
> > master.
> >
> > Included patch implements an algorithm described above and work for  
> > all
> > scenarios for which current code fails to create a connection.
> >
> > --
> > Gleb.
> > 
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Gleb.


Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Galen Shipman


On Jun 13, 2007, at 9:49 AM, Torsten Hoefler wrote:


Hi Galen,Gleb,
there is also something weird going on if I call the basic alltoall
during the module_init() of a collective module (I need to wire up my
own QPs in my coll component). It takes 7 seconds for 4 nodes and more
than 30 minutes for 120 nodes. It seems to be an OpenIB wireup issue
because if I start with -mca btl tcp,self this goes as fast as  
expected

(<2 seconds).

Will this issue be fixed with your patch?


No, this is a separate issue.

Try:
-mca mpi_preconnect_oob 1

then try:

-mca mpi_preconnect_all 1

and let us know what the times are.

thx,

galen




Thanks,
  Torsten

--
 bash$ :(){ :|:&};: - http://www.unixer.de/ -
Indiana University| http://www.indiana.edu
Open Systems Lab  | http://osl.iu.edu/
150 S. Woodlawn Ave.  | Bloomington, IN, 474045-7104 | USA
Lindley Hall Room 135 | +01 (812) 855-3608
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Torsten Hoefler
Hi Galen,Gleb,
there is also something weird going on if I call the basic alltoall
during the module_init() of a collective module (I need to wire up my
own QPs in my coll component). It takes 7 seconds for 4 nodes and more
than 30 minutes for 120 nodes. It seems to be an OpenIB wireup issue
because if I start with -mca btl tcp,self this goes as fast as expected
(<2 seconds). 

Will this issue be fixed with your patch?

Thanks,
  Torsten

-- 
 bash$ :(){ :|:&};: - http://www.unixer.de/ -
Indiana University| http://www.indiana.edu
Open Systems Lab  | http://osl.iu.edu/
150 S. Woodlawn Ave.  | Bloomington, IN, 474045-7104 | USA
Lindley Hall Room 135 | +01 (812) 855-3608


Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Galen Shipman

Hi Gleb,

As we have discussed before I am working on adding support for  
multiple QPs with either per peer resources or shared resources.
As a result of this I am trying to clean up a lot of the OpenIB code.  
It has grown up organically over the years and needs some attention.
Perhaps we can coordinate on commits or even work from the same temp  
branch to do an overall cleanup as well as addressing the issue you  
describe in this email.


I bring this up because this commit will conflict quite a bit with  
what I am working on, I can always merge it by hand but it may make  
sense for us to get this all done in one area and then bring it all  
over?


Thanks,

Galen


On Jun 13, 2007, at 7:27 AM, Gleb Natapov wrote:


Hello everyone,

  I encountered a problem with openib on depend connection code.  
Basically
it works only by pure luck if you have more then one endpoint for  
the same

proc and sometimes breaks in mysterious ways.

The algo works like this: A wants to connect to B so it creates QP  
and sends it
to B. B receives the QP from A and looks for endpoint that is not  
yet associated
with remote endpoint, creates QP for it and sends info back. Now A  
receives
the QP and goes through the same logic as B i.e looks for endpoint  
that is not
yet connected, BUT there is no guaranty that it will find the  
endpoint that
initiated the connection in the first place! And if it finds  
another one it will
create QP for it and will send it back to B and so on and so forth.  
In the end
I sometimes receive a peculiar mesh of connection where no QP has a  
connection

back to it from the peer process.

To overcome this problem B needs to send back some info that will  
allow A to
determine the endpoint that initiated a connection request. The  
lid:qp pair
will allow for this. But even then the problem will remain if two  
procs initiate
connection at the same time. To dial with simultaneous connection  
asymmetry
protocol have to be used one peer became master another slave.  
Slave alway
initiate a connection to master. Master choose local endpoint to  
satisfy
incoming request and sends info back to a slave. If master wants to  
initiate a
connection it send message to a slave and slave initiate connection  
back to

master.

Included patch implements an algorithm described above and work for  
all

scenarios for which current code fails to create a connection.

--
Gleb.

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel