Re: [OMPI devel] btl tcp port to xensocket

2008-01-17 Thread Muhammad Atif
Thanks again. Nope.. at the moment I am doing the lame stuff i.e. simply 
changing the tcp code. So I have not created another btl component. I know its 
not recommended thing, but I just wanted to try before committing. Apart from 
xensocket specific stuff, all what I have done inside the btl/tcp code is to 
change the structure

 struct  mca_btl_tcp_addr_t {
struct in_addr addr_inet; /**< IPv4 address in network byte order */
in_port_t  addr_port; /**< listen port */
unsigned short addr_inuse;/**< local meaning only */
int   xs_domU_ref;   /**
To: Open MPI Developers 
Sent: Friday, January 18, 2008 1:42:41 AM
Subject: Re: [OMPI devel] btl tcp port to xensocket


On Jan 15, 2008, at 6:07 PM, Muhammad Atif wrote:

> Just for reference, I am trying to port btl/tcp to xensockets. Now  
> if i want to do modex send/recv , to my understanding,  
> mca_btl_tcp_addr_t is used (ref code/function is  
> mca_btl_tcp_component_exchange). For xensockets, I need to send only
  
> one additional integer remote_domU_id across to say all the peers  
> (in refined code it would be specific to each domain, but i just  
> want to have clear understanding before i move any further). Now I  
> have changed the struct mca_btl_tcp_addr_t present in btl_tcp_addr.h
  
> and have added int r_domu_id. This makes the size of structure 12.  
> Upon receive mca_btl_tcp_proc_create() gives an error after  
> mca_pml_base_modex_recv() and at this statement if(0 != (size %  
> sizeof(mca_btl_tcp_addr_t))) that size do not match. It is still  
> expecting size 8, where as i have made the size 12.  I am unable to  
> pin point the exact location where the size 8 is still embedded. Any
  
> ideas?

Just to be clear -- you have copied the tcp btl to another new name  
and are modifying that, right?  E.g., ompi/mca/btl/xensocket?

If so, you need to modify the prefix of all the symbols to be  
btl_xensocket, and ensure to change the string name of your component  
in the component sturcture.  The modex indexes off this string name,  
so it's important that it doesn't share a name with any other  
component in the framework.

> Second question is regarding the receive part of openmpi. In my  
> understanding, once Recv api is called, the control goes through PML
  
> layer and everything initializes there. However, I am unable to get  
> a lock at the layer/file/function where the receive socket polling  
> is done. There are callbacks, but where or how exactly the openMPI  
> knows that message has in fact arrived. Any pointer will do :)

Which receive are you asking about here -- BTL receive or the modex  
receive?

>
>
> Best Regards,
> Muhammad Atif
> PS: Sorry if my questions are too basic.
>
> - Original Message 
> From: Jeff Squyres 
> To: Open MPI Developers 
> Sent: Friday, January 11, 2008 1:02:31 PM
> Subject: Re: [OMPI devel] btl tcp port to xensocket
>
>
> On Jan 10, 2008, at 8:40 PM, Muhammad Atif wrote:
>
> > Hi,
> > Thanks for such a detailed reply. You are right, we have
 partitioned
> > (normalized) our system with Xen and have seen that virtualization
> > overhead is not that great (for some applications) as compared to
> > potential benefits that we can get. We have executed various
> > benchmarks on different network/cluster configuration of Xen and
> > Native linux and they are really encouraging. The only known
 problem
> > is inter-domain communication of Xen which is quite poor (1/6 of
 the
> > native memory transfer and not to mention 50%CPU utilization of
> > host). We have tested out Xensocket, and these sockets give us
> > really good performance boost in all respects.
> > Now that I am having a look at the complex yet wonderful
> > architecture of openmpi, can you guys give me some guidance on
> > couple of naive questions?
> >
> > 1- How do I view the console output of an mpi process which is not
> > at headnode? Do I have to have some parallel debugger? Or is there
> > any magical technique?
>
> OMPI's run-time environment takes care of redirection stdout/stderr
> from each MPI process to the stdout/stderr of mpirun for you (this is
> another use of the "out of band" TCP channel that is setup between
> mpirun and all the MPI processes).
>
> >
> > 2- How do i setup GPR?
>
> You don't.  The GPR is automatically instantiated in mpirun upon
> startup.
>
> > say i have a struct foo, and all processes have at least one such
> > instance of foo. From what I gather, openmpi will create a linked
> > list of foo's that were passed on (though I am unable to pass one
> > on). Where do i have to define struct foo so that it can be
> > exchanged b/w the processes? I know its a lame question, but I
 think
> > i am getting lost in the sea. :(
>
> I assume you're asking about the modex.
>
> Every BTL defines its own data that is passed around in the modex.
  It
> is assumed that only modules of the same BTL type will be able to  
> read/
> understand that data.  The modex just 

Re: [OMPI devel] open ib btl and xrc

2008-01-17 Thread Pavel Shamis (Pasha)

Here is paper from openib http://www.openib.org/archives/nov2007sc/XRC.pdf
and here is mvapich presentation 
http://mvapich.cse.ohio-state.edu/publications/ofa_nov07-mvapich-xrc.pdf


Button line: XRC decrease number of QPs that ompi opens and as result 
decrease ompi's memory footprint.
In the openib paper you may see more details about XRC. If you need more 
details about XRC implemention

in openib blt , please let me know.


Instead 
Don Kerr wrote:

Hi,

After searching, about the only thing I can find on xrc is what it 
stands for, can someone explain the benefits of open mpi's use of xrc, 
maybe point me to a paper, or both?


TIA
-DON

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  



--
Pavel Shamis (Pasha)
Mellanox Technologies



[OMPI devel] open ib btl and xrc

2008-01-17 Thread Don Kerr

Hi,

After searching, about the only thing I can find on xrc is what it 
stands for, can someone explain the benefits of open mpi's use of xrc, 
maybe point me to a paper, or both?


TIA
-DON



Re: [OMPI devel] Integrating the memchecker branch

2008-01-17 Thread Jeff Squyres

On Jan 15, 2008, at 8:24 AM, Rainer Keller wrote:


- ompi_info shows whether this stuff is enabled

It shows the memchecker-valgrind mca.
Apart from showing the mca, no (so no separate line for
valgrind-kind-of-checking enabled).

If it should, we can do that of course... But I don't think it's  
necessary.


Mmm -- good point.  I forgot that this is a new framework.  So, I  
agree: if there's already a line in the ompi_info output about  
memchecker support (through the listing of a component), then nothing  
else is necessary.



- obvious user-level configure errors raise errors/abort configure
(E.g., --enable-memchecker is specified but --enable-debug is not),  
or
make some obvious assumptions about "what the user meant" (e.g., if  
--

enable-memchecker is specified by --enable-debug is not, then
automatically enable --enable-debug and output a message saying so).

Currently not done.
It is not really a requirement, though! (although it does not make  
much sense

without).


I doubt that this has ever been mentioned before.  It should be pretty  
easy to do, though.  It might be nice to spend the 15 minutes doing it  
so that it doesn't get forgotten (says the guy who hasn't spent 15  
minutes on it :-) ).



- I think we've said ad nauseam that there should be zero performance
penalty for when this stuff is not enabled, and I'm guessing that  
this

is still true.  :-)

100% ,-]
No code added in the default case.


- some kind of documentation should be written up about how to use
this stuff, perhaps in the FAQ (e.g., pairing it with a valgrind-
enabled libibverbs for max benefit, etc.).

Yep.
Will prepare a text, do You need it in HTML, or plain-text?



If you have the cycles, writing it up in the FAQ format would be  
great.  We use a mini-wiki format for common stuff in the FAQ.


I just added a wiki page about it:

https://svn.open-mpi.org/trac/ompi/wiki/OMPIFAQEntries

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] btl tcp port to xensocket

2008-01-17 Thread Jeff Squyres

On Jan 15, 2008, at 6:07 PM, Muhammad Atif wrote:

Just for reference, I am trying to port btl/tcp to xensockets. Now  
if i want to do modex send/recv , to my understanding,  
mca_btl_tcp_addr_t is used (ref code/function is  
mca_btl_tcp_component_exchange). For xensockets, I need to send only  
one additional integer remote_domU_id across to say all the peers  
(in refined code it would be specific to each domain, but i just  
want to have clear understanding before i move any further). Now I  
have changed the struct mca_btl_tcp_addr_t present in btl_tcp_addr.h  
and have added int r_domu_id. This makes the size of structure 12.  
Upon receive mca_btl_tcp_proc_create() gives an error after  
mca_pml_base_modex_recv() and at this statement if(0 != (size %  
sizeof(mca_btl_tcp_addr_t))) that size do not match. It is still  
expecting size 8, where as i have made the size 12.  I am unable to  
pin point the exact location where the size 8 is still embedded. Any  
ideas?


Just to be clear -- you have copied the tcp btl to another new name  
and are modifying that, right?  E.g., ompi/mca/btl/xensocket?


If so, you need to modify the prefix of all the symbols to be  
btl_xensocket, and ensure to change the string name of your component  
in the component sturcture.  The modex indexes off this string name,  
so it's important that it doesn't share a name with any other  
component in the framework.


Second question is regarding the receive part of openmpi. In my  
understanding, once Recv api is called, the control goes through PML  
layer and everything initializes there. However, I am unable to get  
a lock at the layer/file/function where the receive socket polling  
is done. There are callbacks, but where or how exactly the openMPI  
knows that message has in fact arrived. Any pointer will do :)


Which receive are you asking about here -- BTL receive or the modex  
receive?





Best Regards,
Muhammad Atif
PS: Sorry if my questions are too basic.

- Original Message 
From: Jeff Squyres 
To: Open MPI Developers 
Sent: Friday, January 11, 2008 1:02:31 PM
Subject: Re: [OMPI devel] btl tcp port to xensocket


On Jan 10, 2008, at 8:40 PM, Muhammad Atif wrote:

> Hi,
> Thanks for such a detailed reply. You are right, we have partitioned
> (normalized) our system with Xen and have seen that virtualization
> overhead is not that great (for some applications) as compared to
> potential benefits that we can get. We have executed various
> benchmarks on different network/cluster configuration of Xen and
> Native linux and they are really encouraging. The only known problem
> is inter-domain communication of Xen which is quite poor (1/6 of the
> native memory transfer and not to mention 50%CPU utilization of
> host). We have tested out Xensocket, and these sockets give us
> really good performance boost in all respects.
> Now that I am having a look at the complex yet wonderful
> architecture of openmpi, can you guys give me some guidance on
> couple of naive questions?
>
> 1- How do I view the console output of an mpi process which is not
> at headnode? Do I have to have some parallel debugger? Or is there
> any magical technique?

OMPI's run-time environment takes care of redirection stdout/stderr
from each MPI process to the stdout/stderr of mpirun for you (this is
another use of the "out of band" TCP channel that is setup between
mpirun and all the MPI processes).

>
> 2- How do i setup GPR?

You don't.  The GPR is automatically instantiated in mpirun upon
startup.

> say i have a struct foo, and all processes have at least one such
> instance of foo. From what I gather, openmpi will create a linked
> list of foo's that were passed on (though I am unable to pass one
> on). Where do i have to define struct foo so that it can be
> exchanged b/w the processes? I know its a lame question, but I think
> i am getting lost in the sea. :(

I assume you're asking about the modex.

Every BTL defines its own data that is passed around in the modex.  It
is assumed that only modules of the same BTL type will be able to  
read/

understand that data.  The modex just treats the data as a blob; all
the modex blobs are gathered into mpirun and then broadcast out to
every MPI process (I said scatter in my previous mail; broadcast is
more accurate).

So when you modex_send, you just pass a pointer to a chunk of memory
and a length (e.g., a pointer to a struct instance and a length).
When you modex_receive, you can just dereference the blob that you
return as the same struct type as you modex_send'ed previously
(because you can only receive blobs from BTL modules that are the same
type as you, and therefore the data they sent is the same type of data
that you sent).

You can do more complex things in the modex if you need to, of
course.  For example, we're changing the openib BTL to send variable
length data in the modex, but that requires a bit more setup and I
suspect you don't need to do this.

>
> Best Regards,
> Mu

Re: [OMPI devel] btl tcp port to ...

2008-01-17 Thread Muhammad Atif
Hi, 
Thanks a lot for the reply. You have understood my problem correctly but I am 
unable to comprehend your solution or suggestion where to look into . The 
btl_size is shown as 12 and size as 8. But my understanding of 
mca_btl_tcp_comonent_exchange function is a touch different or perhaps wrong, 
so please correct me if I am wrong.

Once we do the exchange i.e. mca_btl_tcp_component_exchange:(), the size is 
calculated as 

size_t size = mca_btl_tcp_component.tcp_num_btls * sizeof(mca_btl_tcp_addr_t);

This is giving me correct size. I have only one tcp_num_btls, therefore size is 
given as 12. Now we allocate memory by 

mca_btl_tcp_addr_t *addrs = (mca_btl_tcp_addr_t *)malloc(size)

As size is 12, hence it gives me the correct allocation. And lastly

rc =  mca_pml_base_modex_send(&mca_btl_tcp_component.super.btl_version, addrs, 
size);

This sends addrs with the size 12. Should not that work out of the box? Or are 
there more things attached which are not transparent?

Can you please give me some more explanation of this statement which I 
think holds the key to my solution, but I am not able to comprehend correctly.
"We copy the information to be sent into the addrs array and increase xfer_size 
afterwards (telling the function how many bytes to be transferred)."
Where exactly are we increasing the size? 

 
Best Regards,
Muhammad Atif

- Original Message 
From: Adrian Knoth 
To: Open MPI Developers 
Sent: Thursday, January 17, 2008 11:43:24 PM
Subject: Re: [OMPI devel] btl tcp port to xensocket


On Tue, Jan 15, 2008 at 04:07:02PM -0800, Muhammad Atif wrote:

> Just for reference, I am trying to port btl/tcp to xensockets. Now if
> i want to do modex send/recv , to my understanding,
 mca_btl_tcp_addr_t
> is used (ref code/function is mca_btl_tcp_component_exchange). For
> xensockets, I need to send only one additional integer remote_domU_id
> across to say all the peers (in refined code it would be specific to
> each domain, but i just want to have clear understanding before i
 move
> any further). Now I have changed the struct mca_btl_tcp_addr_t
 present
> in btl_tcp_addr.h and have added int r_domu_id. This makes the size
 of
> structure 12. Upon receive mca_btl_tcp_proc_create() gives an error
> after mca_pml_base_modex_recv() and at this statement if(0 != (size %
> sizeof(mca_btl_tcp_addr_t))) that size do not match. It is still
> expecting size 8, where as i have made the size 12.  I am unable to
> pin point the exact location where the size 8 is still embedded. Any
> ideas?

Just an idea: the mca_base_modex_recv error gives you this error:

   BTL_ERROR(("mca_base_modex_recv: invalid size %d: btl-size:
   %d\n", size, sizeof(mca_btl_tcp_addr_t)));


So what is wrong? Is btl-size shown as 12 or as 8? It should be 12. And
is size just 8? So this means you forgot to include your new socket in
your modex_send_request.

See mca_btl_tcp_component_exchange: We copy the information to be sent
into the addrs array and increase xfer_size afterwards (telling the
function how many bytes to be transferred).

Perhaps you missed something there.


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel






  

Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  
http://tools.search.yahoo.com/newsearch/category.php?category=shopping

Re: [OMPI devel] Open IB BTL development question

2008-01-17 Thread Don Kerr
Thanks Steve, Jeff, Pasha, this is the kind of information I was looking 
for.


-DON

Pavel Shamis (Pasha) wrote:

I plan to add IB APM support  (not something specific to OFED)

Don Kerr wrote:
  
Looking at the list of new features for OFED 1.3 and seeing that support 
for XRC went into the trunk I am curious if support for additional OFED 
1.3 features will be included, or plan to be included in Open MPI? 

I am looking at the list of features here: 
http://64.233.167.104/search?q=cache:RXXOrY36QHcJ:www.openib.org/archives/nov2007sc/OFED%25201.3%2520status.ppt+ofed+1.3+feature&hl=en&ct=clnk&cd=3&gl=us&client=firefox-a
but I do not have any specific feature in mind, just wanted to get an 
idea what others are planning.


Thanks
-DON
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  




  


Re: [OMPI devel] btl tcp port to xensocket

2008-01-17 Thread Adrian Knoth
On Tue, Jan 15, 2008 at 04:07:02PM -0800, Muhammad Atif wrote:

> Just for reference, I am trying to port btl/tcp to xensockets. Now if
> i want to do modex send/recv , to my understanding, mca_btl_tcp_addr_t
> is used (ref code/function is mca_btl_tcp_component_exchange). For
> xensockets, I need to send only one additional integer remote_domU_id
> across to say all the peers (in refined code it would be specific to
> each domain, but i just want to have clear understanding before i move
> any further). Now I have changed the struct mca_btl_tcp_addr_t present
> in btl_tcp_addr.h and have added int r_domu_id. This makes the size of
> structure 12. Upon receive mca_btl_tcp_proc_create() gives an error
> after mca_pml_base_modex_recv() and at this statement if(0 != (size %
> sizeof(mca_btl_tcp_addr_t))) that size do not match. It is still
> expecting size 8, where as i have made the size 12.  I am unable to
> pin point the exact location where the size 8 is still embedded. Any
> ideas?

Just an idea: the mca_base_modex_recv error gives you this error:

   BTL_ERROR(("mca_base_modex_recv: invalid size %d: btl-size:
   %d\n", size, sizeof(mca_btl_tcp_addr_t)));


So what is wrong? Is btl-size shown as 12 or as 8? It should be 12. And
is size just 8? So this means you forgot to include your new socket in
your modex_send_request.

See mca_btl_tcp_component_exchange: We copy the information to be sent
into the addrs array and increase xfer_size afterwards (telling the
function how many bytes to be transferred).

Perhaps you missed something there.


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] Open IB BTL development question

2008-01-17 Thread Pavel Shamis (Pasha)

I plan to add IB APM support  (not something specific to OFED)

Don Kerr wrote:
Looking at the list of new features for OFED 1.3 and seeing that support 
for XRC went into the trunk I am curious if support for additional OFED 
1.3 features will be included, or plan to be included in Open MPI? 

I am looking at the list of features here: 
http://64.233.167.104/search?q=cache:RXXOrY36QHcJ:www.openib.org/archives/nov2007sc/OFED%25201.3%2520status.ppt+ofed+1.3+feature&hl=en&ct=clnk&cd=3&gl=us&client=firefox-a
but I do not have any specific feature in mind, just wanted to get an 
idea what others are planning.


Thanks
-DON
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  



--
Pavel Shamis (Pasha)
Mellanox Technologies