Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Ralph Castain
Fair enough - yeah, that is an issue I've been avoiding :-)

On Jul 31, 2014, at 9:14 AM, Nathan Hjelm <hje...@lanl.gov> wrote:

> 
> This approach will work now but we need to start thinking about how we
> want to support multiple simultaneous btl users. Does each user call
> add_procs with a single module (or set of modules) or does each user
> call btl_component_init and get their own module? If we do the latter
> then it might make sense to add a max_procs argument to the
> btl_component_init. Keep in mind we need to change the
> btl_component_init interface anyway because the threading arguments no
> longer make sense in their current form.
> 
> -Nathan
> 
> On Thu, Jul 31, 2014 at 09:04:09AM -0700, Ralph Castain wrote:
>> Like I said, why don't we just do the following:
>> 
>>> I'd like to suggest an alternative solution. A BTL can exploit whatever 
>>> data it wants, but should first test if the data is available. If the data 
>>> is *required*, then the BTL gracefully disqualifies itself. If the data is 
>>> *desirable* for optimization, then the BTL writer (if they choose) can 
>>> include an alternate path that doesn't do the optimization if the data 
>>> isn't available.
>> 
>> Seems like this should resolve the disagreement in a way that meets 
>> everyone's need. It basically is an attribute approach, but not requiring 
>> modification of the BTL interface.
>> 
>> 
>> On Jul 31, 2014, at 8:26 AM, Pritchard Jr., Howard <howa...@lanl.gov> wrote:
>> 
>>> Hi  George,
>>> 
>>> The ompi_process_info.num_procs thing that seems to have been an object
>>> of some contention yesterday.
>>> 
>>> The ugni use of this is cloned off of the way I designed the mpich  netmod.
>>> Leveraging off size of the job was an easy way to scale the mailbox size.
>>> 
>>> If I'd been asked to have the netmod work in a context like it appears we
>>> may want to be eventually using BTLs - not just within ompi but for other
>>> things, I'd have worked with Darius (if still in mpich world) on changing 
>>> the netmod initialization
>>> method to allow for an optional attributes struct to be passed into the 
>>> init 
>>> method to give hints about how many connections may need to be established,
>>> etc.  
>>> 
>>> For the GNI BTL - the way its currently designed - if you are only expecting
>>> to use it for a limited number of connections, then you want to initialize
>>> for big mailboxes (IBer's can think many large buffers posted as RX WQEs).
>>> But for very large jobs, with possibly highly connected communication 
>>> pattern,
>>> you want very small mailboxes.
>>> 
>>> Howard
>>> 
>>> 
>>> -Original Message-
>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca
>>> Sent: Thursday, July 31, 2014 9:09 AM
>>> To: Open MPI Developers
>>> Subject: Re: [OMPI devel] RFC: job size info in OPAL
>>> 
>>> What is your definition of "global job size"?
>>> 
>>> George.
>>> 
>>> On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard <howa...@lanl.gov> wrote:
>>> 
>>>> Hi Folks,
>>>> 
>>>> I think given the way we want to use the btl's in lower levels like 
>>>> opal, it is pretty disgusting for a btl to need to figure out on its 
>>>> own something like a "global job size".  That's not its business.  
>>>> Can't we add some attributes to the component's initialization method 
>>>> that provides hints for how to allocate resources it needs to provide its 
>>>> functionality?
>>>> 
>>>> I'll see if there's something clever that can be done in ugni for now.
>>>> I can always add in a hack to probe the apps placement info file and 
>>>> scale the smsg blocks by number of nodes rather than number of ranks.
>>>> 
>>>> Howard
>>>> 
>>>> 
>>>> -Original Message-
>>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan 
>>>> Hjelm
>>>> Sent: Thursday, July 31, 2014 8:58 AM
>>>> To: Open MPI Developers
>>>> Subject: Re: [OMPI devel] RFC: job size info in OPAL
>>>> 
>>>> 
>>>> +2^1000
>>>> 
>>>> This information is absolutely necessary at this point. If someone has a 
>>>> better solution they can provide it as an altern

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Nathan Hjelm

This approach will work now but we need to start thinking about how we
want to support multiple simultaneous btl users. Does each user call
add_procs with a single module (or set of modules) or does each user
call btl_component_init and get their own module? If we do the latter
then it might make sense to add a max_procs argument to the
btl_component_init. Keep in mind we need to change the
btl_component_init interface anyway because the threading arguments no
longer make sense in their current form.

-Nathan

On Thu, Jul 31, 2014 at 09:04:09AM -0700, Ralph Castain wrote:
> Like I said, why don't we just do the following:
> 
> > I'd like to suggest an alternative solution. A BTL can exploit whatever 
> > data it wants, but should first test if the data is available. If the data 
> > is *required*, then the BTL gracefully disqualifies itself. If the data is 
> > *desirable* for optimization, then the BTL writer (if they choose) can 
> > include an alternate path that doesn't do the optimization if the data 
> > isn't available.
> 
> Seems like this should resolve the disagreement in a way that meets 
> everyone's need. It basically is an attribute approach, but not requiring 
> modification of the BTL interface.
> 
> 
> On Jul 31, 2014, at 8:26 AM, Pritchard Jr., Howard <howa...@lanl.gov> wrote:
> 
> > Hi  George,
> > 
> > The ompi_process_info.num_procs thing that seems to have been an object
> > of some contention yesterday.
> > 
> > The ugni use of this is cloned off of the way I designed the mpich  netmod.
> > Leveraging off size of the job was an easy way to scale the mailbox size.
> > 
> > If I'd been asked to have the netmod work in a context like it appears we
> > may want to be eventually using BTLs - not just within ompi but for other
> > things, I'd have worked with Darius (if still in mpich world) on changing 
> > the netmod initialization
> > method to allow for an optional attributes struct to be passed into the 
> > init 
> > method to give hints about how many connections may need to be established,
> > etc.  
> > 
> > For the GNI BTL - the way its currently designed - if you are only expecting
> > to use it for a limited number of connections, then you want to initialize
> > for big mailboxes (IBer's can think many large buffers posted as RX WQEs).
> > But for very large jobs, with possibly highly connected communication 
> > pattern,
> > you want very small mailboxes.
> > 
> > Howard
> > 
> > 
> > -Original Message-
> > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca
> > Sent: Thursday, July 31, 2014 9:09 AM
> > To: Open MPI Developers
> > Subject: Re: [OMPI devel] RFC: job size info in OPAL
> > 
> > What is your definition of "global job size"?
> > 
> >  George.
> > 
> > On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard <howa...@lanl.gov> wrote:
> > 
> >> Hi Folks,
> >> 
> >> I think given the way we want to use the btl's in lower levels like 
> >> opal, it is pretty disgusting for a btl to need to figure out on its 
> >> own something like a "global job size".  That's not its business.  
> >> Can't we add some attributes to the component's initialization method 
> >> that provides hints for how to allocate resources it needs to provide its 
> >> functionality?
> >> 
> >> I'll see if there's something clever that can be done in ugni for now.
> >> I can always add in a hack to probe the apps placement info file and 
> >> scale the smsg blocks by number of nodes rather than number of ranks.
> >> 
> >> Howard
> >> 
> >> 
> >> -Original Message-
> >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan 
> >> Hjelm
> >> Sent: Thursday, July 31, 2014 8:58 AM
> >> To: Open MPI Developers
> >> Subject: Re: [OMPI devel] RFC: job size info in OPAL
> >> 
> >> 
> >> +2^1000
> >> 
> >> This information is absolutely necessary at this point. If someone has a 
> >> better solution they can provide it as an alternative RFC. Until then this 
> >> is how it should be done... Otherwise we loose uGNI support on the trunk. 
> >> Because we ARE NOT going to remove the mailbox size optimization.
> >> 
> >> -Nathan
> >> 
> >> On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote:
> >>> WHAT: Should we make the job size (i.e., initial number of procs) 
> >>> available

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Ralph Castain
Like I said, why don't we just do the following:

> I'd like to suggest an alternative solution. A BTL can exploit whatever data 
> it wants, but should first test if the data is available. If the data is 
> *required*, then the BTL gracefully disqualifies itself. If the data is 
> *desirable* for optimization, then the BTL writer (if they choose) can 
> include an alternate path that doesn't do the optimization if the data isn't 
> available.

Seems like this should resolve the disagreement in a way that meets everyone's 
need. It basically is an attribute approach, but not requiring modification of 
the BTL interface.


On Jul 31, 2014, at 8:26 AM, Pritchard Jr., Howard <howa...@lanl.gov> wrote:

> Hi  George,
> 
> The ompi_process_info.num_procs thing that seems to have been an object
> of some contention yesterday.
> 
> The ugni use of this is cloned off of the way I designed the mpich  netmod.
> Leveraging off size of the job was an easy way to scale the mailbox size.
> 
> If I'd been asked to have the netmod work in a context like it appears we
> may want to be eventually using BTLs - not just within ompi but for other
> things, I'd have worked with Darius (if still in mpich world) on changing the 
> netmod initialization
> method to allow for an optional attributes struct to be passed into the init 
> method to give hints about how many connections may need to be established,
> etc.  
> 
> For the GNI BTL - the way its currently designed - if you are only expecting
> to use it for a limited number of connections, then you want to initialize
> for big mailboxes (IBer's can think many large buffers posted as RX WQEs).
> But for very large jobs, with possibly highly connected communication pattern,
> you want very small mailboxes.
> 
> Howard
> 
> 
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca
> Sent: Thursday, July 31, 2014 9:09 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] RFC: job size info in OPAL
> 
> What is your definition of "global job size"?
> 
>  George.
> 
> On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard <howa...@lanl.gov> wrote:
> 
>> Hi Folks,
>> 
>> I think given the way we want to use the btl's in lower levels like 
>> opal, it is pretty disgusting for a btl to need to figure out on its 
>> own something like a "global job size".  That's not its business.  
>> Can't we add some attributes to the component's initialization method 
>> that provides hints for how to allocate resources it needs to provide its 
>> functionality?
>> 
>> I'll see if there's something clever that can be done in ugni for now.
>> I can always add in a hack to probe the apps placement info file and 
>> scale the smsg blocks by number of nodes rather than number of ranks.
>> 
>> Howard
>> 
>> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan 
>> Hjelm
>> Sent: Thursday, July 31, 2014 8:58 AM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] RFC: job size info in OPAL
>> 
>> 
>> +2^1000
>> 
>> This information is absolutely necessary at this point. If someone has a 
>> better solution they can provide it as an alternative RFC. Until then this 
>> is how it should be done... Otherwise we loose uGNI support on the trunk. 
>> Because we ARE NOT going to remove the mailbox size optimization.
>> 
>> -Nathan
>> 
>> On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote:
>>> WHAT: Should we make the job size (i.e., initial number of procs) available 
>>> in OPAL?
>>> 
>>> WHY: At least 2 BTLs are using this info (*more below)
>>> 
>>> WHERE: usnic and ugni
>>> 
>>> TIMEOUT: there's already been some inflammatory emails about this; 
>>> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
>>> 
>>> MORE DETAIL:
>>> 
>>> This is an open question.  We *have* the information at the time that the 
>>> BTLs are initialized: do we allow that information to go down to OPAL?
>>> 
>>> Ralph added this info down in OPAL in r32355, but George reverted it in 
>>> r32361.
>>> 
>>> Points for: YES, WE SHOULD
>>> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are 
>>> +++ already in OPAL (num local ranks, local rank)
>>> 
>>> Points for: NO, WE SHOULD NOT
>>> --- What exactly is this number (e.g., num currently-connected procs?), and 
>>> when is it updated?
&

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Nathan Hjelm
I do not like the fact that add_procs is called with every proc in the
MPI_COMM_WORLD. That needs to change, so, I will not rely on the number
of procs being added being the same as the world or universe size.

-Nathan

On Thu, Jul 31, 2014 at 09:22:00AM -0600, George Bosilca wrote:
>I definitively think you misunderstood this scope of this RFC. The
>information that is so important to you to configure the mailbox size is
>available to you when you need it. This information is made available by
>the PML through the call to add_procs, which comes with all the procs in
>the MPI_COMM_WORLD. So, ugni doesn't need anything more than it is
>available today. [This is of course under the assumption that someone
>clean the BTL and remove the usage of MPI_COMM_WORLD.]
> 
>The real scope of this RFC is to move this information before that in
>order to allow the BTLs to have access to some possible number of
>processes between the call to btl_open and the call to btl_all_proc (in
>other words during btl_init).
> 
>  George.
> 
>PS: here is the patch that fixes all issues in ugni.
> 
>On Jul 31, 2014, at 10:58 , Nathan Hjelm  wrote:
> 
>>
>> +2^1000
>>
>> This information is absolutely necessary at this point. If someone has a
>> better solution they can provide it as an alternative RFC. Until then
>> this is how it should be done... Otherwise we loose uGNI support on the
>> trunk. Because we ARE NOT going to remove the mailbox size optimization.
>>
>> -Nathan
>>
>> On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote:
>>> WHAT: Should we make the job size (i.e., initial number of procs)
>available in OPAL?
>>>
>>> WHY: At least 2 BTLs are using this info (*more below)
>>>
>>> WHERE: usnic and ugni
>>>
>>> TIMEOUT: there's already been some inflammatory emails about this;
>let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
>>>
>>> MORE DETAIL:
>>>
>>> This is an open question.  We *have* the information at the time that
>the BTLs are initialized: do we allow that information to go down to OPAL?
>>>
>>> Ralph added this info down in OPAL in r32355, but George reverted it in
>r32361.
>>>
>>> Points for: YES, WE SHOULD
>>> +++ 2 BTLs were using it (usinc, ugni)
>>> +++ Other RTE job-related info are already in OPAL (num local ranks,
>local rank)
>>>
>>> Points for: NO, WE SHOULD NOT
>>> --- What exactly is this number (e.g., num currently-connected procs?),
>and when is it updated?
>>> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL
>>>
>>> FWIW: here's how ompi_process_info.num_procs was used before the BTL
>move down to OPAL:
>>>
>>> - usnic: for a minor latency optimization / sizing of a shared receive
>buffer queue length, and for the initial size of a peer lookup hash
>>> - ugni: to determine the size of the per-peer buffers used for
>send/recv communication
>>>
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>http://www.open-mpi.org/community/lists/devel/2014/07/15373.php
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>http://www.open-mpi.org/community/lists/devel/2014/07/15394.php
> 
>___
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post:
>http://www.open-mpi.org/community/lists/devel/2014/07/15399.php




pgpo6WjkLZPnT.pgp
Description: PGP signature


Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Pritchard Jr., Howard
Hi  George,

The ompi_process_info.num_procs thing that seems to have been an object
of some contention yesterday.

The ugni use of this is cloned off of the way I designed the mpich  netmod.
Leveraging off size of the job was an easy way to scale the mailbox size.

If I'd been asked to have the netmod work in a context like it appears we
may want to be eventually using BTLs - not just within ompi but for other
things, I'd have worked with Darius (if still in mpich world) on changing the 
netmod initialization
method to allow for an optional attributes struct to be passed into the init 
method to give hints about how many connections may need to be established,
etc.  
 
For the GNI BTL - the way its currently designed - if you are only expecting
to use it for a limited number of connections, then you want to initialize
for big mailboxes (IBer's can think many large buffers posted as RX WQEs).
But for very large jobs, with possibly highly connected communication pattern,
you want very small mailboxes.

Howard


-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca
Sent: Thursday, July 31, 2014 9:09 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] RFC: job size info in OPAL

What is your definition of "global job size"?

  George.

On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard <howa...@lanl.gov> wrote:

> Hi Folks,
> 
> I think given the way we want to use the btl's in lower levels like 
> opal, it is pretty disgusting for a btl to need to figure out on its 
> own something like a "global job size".  That's not its business.  
> Can't we add some attributes to the component's initialization method 
> that provides hints for how to allocate resources it needs to provide its 
> functionality?
> 
> I'll see if there's something clever that can be done in ugni for now.
> I can always add in a hack to probe the apps placement info file and 
> scale the smsg blocks by number of nodes rather than number of ranks.
> 
> Howard
> 
> 
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan 
> Hjelm
> Sent: Thursday, July 31, 2014 8:58 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] RFC: job size info in OPAL
> 
> 
> +2^1000
> 
> This information is absolutely necessary at this point. If someone has a 
> better solution they can provide it as an alternative RFC. Until then this is 
> how it should be done... Otherwise we loose uGNI support on the trunk. 
> Because we ARE NOT going to remove the mailbox size optimization.
> 
> -Nathan
> 
> On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote:
>> WHAT: Should we make the job size (i.e., initial number of procs) available 
>> in OPAL?
>> 
>> WHY: At least 2 BTLs are using this info (*more below)
>> 
>> WHERE: usnic and ugni
>> 
>> TIMEOUT: there's already been some inflammatory emails about this; 
>> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
>> 
>> MORE DETAIL:
>> 
>> This is an open question.  We *have* the information at the time that the 
>> BTLs are initialized: do we allow that information to go down to OPAL?
>> 
>> Ralph added this info down in OPAL in r32355, but George reverted it in 
>> r32361.
>> 
>> Points for: YES, WE SHOULD
>> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are 
>> +++ already in OPAL (num local ranks, local rank)
>> 
>> Points for: NO, WE SHOULD NOT
>> --- What exactly is this number (e.g., num currently-connected procs?), and 
>> when is it updated?
>> --- We need to precisely delineate what belongs in OPAL vs. 
>> above-OPAL
>> 
>> FWIW: here's how ompi_process_info.num_procs was used before the BTL move 
>> down to OPAL:
>> 
>> - usnic: for a minor latency optimization / sizing of a shared 
>> receive buffer queue length, and for the initial size of a peer 
>> lookup hash
>> - ugni: to determine the size of the per-peer buffers used for 
>> send/recv communication
>> 
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15395.php

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/07/15396.php


Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread George Bosilca
I definitively think you misunderstood this scope of this RFC. The information 
that is so important to you to configure the mailbox size is available to you 
when you need it. This information is made available by the PML through the 
call to add_procs, which comes with all the procs in the MPI_COMM_WORLD. So, 
ugni doesn’t need anything more than it is available today. [This is of course 
under the assumption that someone clean the BTL and remove the usage of 
MPI_COMM_WORLD.]

The real scope of this RFC is to move this information before that in order to 
allow the BTLs to have access to some possible number of processes between the 
call to btl_open and the call to btl_all_proc (in other words during btl_init).

  George.

PS: here is the patch that fixes all issues in ugni.



ugni.patch
Description: Binary data

On Jul 31, 2014, at 10:58 , Nathan Hjelm  wrote:

> 
> +2^1000
> 
> This information is absolutely necessary at this point. If someone has a
> better solution they can provide it as an alternative RFC. Until then
> this is how it should be done... Otherwise we loose uGNI support on the
> trunk. Because we ARE NOT going to remove the mailbox size optimization.
> 
> -Nathan
> 
> On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote:
>> WHAT: Should we make the job size (i.e., initial number of procs) available 
>> in OPAL?
>> 
>> WHY: At least 2 BTLs are using this info (*more below)
>> 
>> WHERE: usnic and ugni
>> 
>> TIMEOUT: there's already been some inflammatory emails about this; let's 
>> discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
>> 
>> MORE DETAIL:
>> 
>> This is an open question.  We *have* the information at the time that the 
>> BTLs are initialized: do we allow that information to go down to OPAL?
>> 
>> Ralph added this info down in OPAL in r32355, but George reverted it in 
>> r32361.
>> 
>> Points for: YES, WE SHOULD
>> +++ 2 BTLs were using it (usinc, ugni)
>> +++ Other RTE job-related info are already in OPAL (num local ranks, local 
>> rank)
>> 
>> Points for: NO, WE SHOULD NOT
>> --- What exactly is this number (e.g., num currently-connected procs?), and 
>> when is it updated?
>> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL
>> 
>> FWIW: here's how ompi_process_info.num_procs was used before the BTL move 
>> down to OPAL:
>> 
>> - usnic: for a minor latency optimization / sizing of a shared receive 
>> buffer queue length, and for the initial size of a peer lookup hash
>> - ugni: to determine the size of the per-peer buffers used for send/recv 
>> communication
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15394.php



Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Nathan Hjelm

The maximum number of peer processes that may be added over the course
of the job will suffice. So either the world or universe size. This is a
reasonable piece of information to expect the upper layers to provide to
the communication layer.

And the impact of providing this information is no less intrusive than
providing information like the number of local ranks.

-Nathan

On Thu, Jul 31, 2014 at 11:09:24AM -0400, George Bosilca wrote:
> What is your definition of “global job size”?
> 
>   George.
> 
> On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard <howa...@lanl.gov> wrote:
> 
> > Hi Folks,
> > 
> > I think given the way we want to use the btl's in lower levels like opal,
> > it is pretty disgusting for a btl to need to figure out on its own something
> > like a "global job size".  That's not its business.  Can't we add some 
> > attributes
> > to the component's initialization method that provides hints for how to
> > allocate resources it needs to provide its functionality?
> > 
> > I'll see if there's something clever that can be done in ugni for now.
> > I can always add in a hack to probe the apps placement info file and
> > scale the smsg blocks by number of nodes rather than number of ranks.
> > 
> > Howard
> > 
> > 
> > -Original Message-
> > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm
> > Sent: Thursday, July 31, 2014 8:58 AM
> > To: Open MPI Developers
> > Subject: Re: [OMPI devel] RFC: job size info in OPAL
> > 
> > 
> > +2^1000
> > 
> > This information is absolutely necessary at this point. If someone has a 
> > better solution they can provide it as an alternative RFC. Until then this 
> > is how it should be done... Otherwise we loose uGNI support on the trunk. 
> > Because we ARE NOT going to remove the mailbox size optimization.
> > 
> > -Nathan
> > 
> > On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote:
> >> WHAT: Should we make the job size (i.e., initial number of procs) 
> >> available in OPAL?
> >> 
> >> WHY: At least 2 BTLs are using this info (*more below)
> >> 
> >> WHERE: usnic and ugni
> >> 
> >> TIMEOUT: there's already been some inflammatory emails about this; 
> >> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
> >> 
> >> MORE DETAIL:
> >> 
> >> This is an open question.  We *have* the information at the time that the 
> >> BTLs are initialized: do we allow that information to go down to OPAL?
> >> 
> >> Ralph added this info down in OPAL in r32355, but George reverted it in 
> >> r32361.
> >> 
> >> Points for: YES, WE SHOULD
> >> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are 
> >> +++ already in OPAL (num local ranks, local rank)
> >> 
> >> Points for: NO, WE SHOULD NOT
> >> --- What exactly is this number (e.g., num currently-connected procs?), 
> >> and when is it updated?
> >> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL
> >> 
> >> FWIW: here's how ompi_process_info.num_procs was used before the BTL move 
> >> down to OPAL:
> >> 
> >> - usnic: for a minor latency optimization / sizing of a shared receive 
> >> buffer queue length, and for the initial size of a peer lookup hash
> >> - ugni: to determine the size of the per-peer buffers used for 
> >> send/recv communication
> >> 
> >> --
> >> Jeff Squyres
> >> jsquy...@cisco.com
> >> For corporate legal information go to: 
> >> http://www.cisco.com/web/about/doing_business/legal/cri/
> >> 
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post: 
> >> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2014/07/15395.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15396.php


pgpvqqekN2qM3.pgp
Description: PGP signature


Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Ralph Castain
I'd like to suggest an alternative solution. A BTL can exploit whatever data it 
wants, but should first test if the data is available. If the data is 
*required*, then the BTL gracefully disqualifies itself. If the data is 
*desirable* for optimization, then the BTL writer (if they choose) can include 
an alternate path that doesn't do the optimization if the data isn't available.

This would seem to meet everyone's needs, yes?


On Jul 31, 2014, at 8:09 AM, George Bosilca <bosi...@icl.utk.edu> wrote:

> What is your definition of “global job size”?
> 
>  George.
> 
> On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard <howa...@lanl.gov> wrote:
> 
>> Hi Folks,
>> 
>> I think given the way we want to use the btl's in lower levels like opal,
>> it is pretty disgusting for a btl to need to figure out on its own something
>> like a "global job size".  That's not its business.  Can't we add some 
>> attributes
>> to the component's initialization method that provides hints for how to
>> allocate resources it needs to provide its functionality?
>> 
>> I'll see if there's something clever that can be done in ugni for now.
>> I can always add in a hack to probe the apps placement info file and
>> scale the smsg blocks by number of nodes rather than number of ranks.
>> 
>> Howard
>> 
>> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm
>> Sent: Thursday, July 31, 2014 8:58 AM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] RFC: job size info in OPAL
>> 
>> 
>> +2^1000
>> 
>> This information is absolutely necessary at this point. If someone has a 
>> better solution they can provide it as an alternative RFC. Until then this 
>> is how it should be done... Otherwise we loose uGNI support on the trunk. 
>> Because we ARE NOT going to remove the mailbox size optimization.
>> 
>> -Nathan
>> 
>> On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote:
>>> WHAT: Should we make the job size (i.e., initial number of procs) available 
>>> in OPAL?
>>> 
>>> WHY: At least 2 BTLs are using this info (*more below)
>>> 
>>> WHERE: usnic and ugni
>>> 
>>> TIMEOUT: there's already been some inflammatory emails about this; 
>>> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
>>> 
>>> MORE DETAIL:
>>> 
>>> This is an open question.  We *have* the information at the time that the 
>>> BTLs are initialized: do we allow that information to go down to OPAL?
>>> 
>>> Ralph added this info down in OPAL in r32355, but George reverted it in 
>>> r32361.
>>> 
>>> Points for: YES, WE SHOULD
>>> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are 
>>> +++ already in OPAL (num local ranks, local rank)
>>> 
>>> Points for: NO, WE SHOULD NOT
>>> --- What exactly is this number (e.g., num currently-connected procs?), and 
>>> when is it updated?
>>> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL
>>> 
>>> FWIW: here's how ompi_process_info.num_procs was used before the BTL move 
>>> down to OPAL:
>>> 
>>> - usnic: for a minor latency optimization / sizing of a shared receive 
>>> buffer queue length, and for the initial size of a peer lookup hash
>>> - ugni: to determine the size of the per-peer buffers used for 
>>> send/recv communication
>>> 
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to: 
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15395.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15396.php



Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread George Bosilca
What is your definition of “global job size”?

  George.

On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard <howa...@lanl.gov> wrote:

> Hi Folks,
> 
> I think given the way we want to use the btl's in lower levels like opal,
> it is pretty disgusting for a btl to need to figure out on its own something
> like a "global job size".  That's not its business.  Can't we add some 
> attributes
> to the component's initialization method that provides hints for how to
> allocate resources it needs to provide its functionality?
> 
> I'll see if there's something clever that can be done in ugni for now.
> I can always add in a hack to probe the apps placement info file and
> scale the smsg blocks by number of nodes rather than number of ranks.
> 
> Howard
> 
> 
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm
> Sent: Thursday, July 31, 2014 8:58 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] RFC: job size info in OPAL
> 
> 
> +2^1000
> 
> This information is absolutely necessary at this point. If someone has a 
> better solution they can provide it as an alternative RFC. Until then this is 
> how it should be done... Otherwise we loose uGNI support on the trunk. 
> Because we ARE NOT going to remove the mailbox size optimization.
> 
> -Nathan
> 
> On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote:
>> WHAT: Should we make the job size (i.e., initial number of procs) available 
>> in OPAL?
>> 
>> WHY: At least 2 BTLs are using this info (*more below)
>> 
>> WHERE: usnic and ugni
>> 
>> TIMEOUT: there's already been some inflammatory emails about this; 
>> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
>> 
>> MORE DETAIL:
>> 
>> This is an open question.  We *have* the information at the time that the 
>> BTLs are initialized: do we allow that information to go down to OPAL?
>> 
>> Ralph added this info down in OPAL in r32355, but George reverted it in 
>> r32361.
>> 
>> Points for: YES, WE SHOULD
>> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are 
>> +++ already in OPAL (num local ranks, local rank)
>> 
>> Points for: NO, WE SHOULD NOT
>> --- What exactly is this number (e.g., num currently-connected procs?), and 
>> when is it updated?
>> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL
>> 
>> FWIW: here's how ompi_process_info.num_procs was used before the BTL move 
>> down to OPAL:
>> 
>> - usnic: for a minor latency optimization / sizing of a shared receive 
>> buffer queue length, and for the initial size of a peer lookup hash
>> - ugni: to determine the size of the per-peer buffers used for 
>> send/recv communication
>> 
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15395.php



Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Pritchard Jr., Howard
Hi Folks,

I think given the way we want to use the btl's in lower levels like opal,
it is pretty disgusting for a btl to need to figure out on its own something
like a "global job size".  That's not its business.  Can't we add some 
attributes
to the component's initialization method that provides hints for how to
allocate resources it needs to provide its functionality?

I'll see if there's something clever that can be done in ugni for now.
I can always add in a hack to probe the apps placement info file and
scale the smsg blocks by number of nodes rather than number of ranks.

Howard


-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm
Sent: Thursday, July 31, 2014 8:58 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] RFC: job size info in OPAL


+2^1000

This information is absolutely necessary at this point. If someone has a better 
solution they can provide it as an alternative RFC. Until then this is how it 
should be done... Otherwise we loose uGNI support on the trunk. Because we ARE 
NOT going to remove the mailbox size optimization.

-Nathan

On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote:
> WHAT: Should we make the job size (i.e., initial number of procs) available 
> in OPAL?
> 
> WHY: At least 2 BTLs are using this info (*more below)
> 
> WHERE: usnic and ugni
> 
> TIMEOUT: there's already been some inflammatory emails about this; 
> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
> 
> MORE DETAIL:
> 
> This is an open question.  We *have* the information at the time that the 
> BTLs are initialized: do we allow that information to go down to OPAL?
> 
> Ralph added this info down in OPAL in r32355, but George reverted it in 
> r32361.
> 
> Points for: YES, WE SHOULD
> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are 
> +++ already in OPAL (num local ranks, local rank)
> 
> Points for: NO, WE SHOULD NOT
> --- What exactly is this number (e.g., num currently-connected procs?), and 
> when is it updated?
> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL
> 
> FWIW: here's how ompi_process_info.num_procs was used before the BTL move 
> down to OPAL:
> 
> - usnic: for a minor latency optimization / sizing of a shared receive 
> buffer queue length, and for the initial size of a peer lookup hash
> - ugni: to determine the size of the per-peer buffers used for 
> send/recv communication
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php


Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Nathan Hjelm

+2^1000

This information is absolutely necessary at this point. If someone has a
better solution they can provide it as an alternative RFC. Until then
this is how it should be done... Otherwise we loose uGNI support on the
trunk. Because we ARE NOT going to remove the mailbox size optimization.

-Nathan

On Wed, Jul 30, 2014 at 10:00:18PM +, Jeff Squyres (jsquyres) wrote:
> WHAT: Should we make the job size (i.e., initial number of procs) available 
> in OPAL?
> 
> WHY: At least 2 BTLs are using this info (*more below)
> 
> WHERE: usnic and ugni
> 
> TIMEOUT: there's already been some inflammatory emails about this; let's 
> discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
> 
> MORE DETAIL:
> 
> This is an open question.  We *have* the information at the time that the 
> BTLs are initialized: do we allow that information to go down to OPAL?
> 
> Ralph added this info down in OPAL in r32355, but George reverted it in 
> r32361.
> 
> Points for: YES, WE SHOULD
> +++ 2 BTLs were using it (usinc, ugni)
> +++ Other RTE job-related info are already in OPAL (num local ranks, local 
> rank)
> 
> Points for: NO, WE SHOULD NOT
> --- What exactly is this number (e.g., num currently-connected procs?), and 
> when is it updated?
> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL
> 
> FWIW: here's how ompi_process_info.num_procs was used before the BTL move 
> down to OPAL:
> 
> - usnic: for a minor latency optimization / sizing of a shared receive buffer 
> queue length, and for the initial size of a peer lookup hash
> - ugni: to determine the size of the per-peer buffers used for send/recv 
> communication
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php


pgpRELGUpwpHm.pgp
Description: PGP signature


Re: [OMPI devel] RFC: job size info in OPAL

2014-07-30 Thread Ralph Castain

On Jul 30, 2014, at 5:49 PM, George Bosilca  wrote:

> 
> On Jul 30, 2014, at 20:37 , Ralph Castain  wrote:
> 
>> 
>> On Jul 30, 2014, at 5:25 PM, George Bosilca  wrote:
>> 
>>> 
>>> On Jul 30, 2014, at 18:00 , Jeff Squyres (jsquyres)  
>>> wrote:
>>> 
 WHAT: Should we make the job size (i.e., initial number of procs) 
 available in OPAL?
 
 WHY: At least 2 BTLs are using this info (*more below)
 
 WHERE: usnic and ugni
 
 TIMEOUT: there's already been some inflammatory emails about this; let's 
 discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
 
 MORE DETAIL:
 
 This is an open question.  We *have* the information at the time that the 
 BTLs are initialized: do we allow that information to go down to OPAL?
 
 Ralph added this info down in OPAL in r32355, but George reverted it in 
 r32361.
 
 Points for: YES, WE SHOULD
 +++ 2 BTLs were using it (usinc, ugni)
 +++ Other RTE job-related info are already in OPAL (num local ranks, local 
 rank)
 
 Points for: NO, WE SHOULD NOT
 --- What exactly is this number (e.g., num currently-connected procs?), 
 and when is it updated?
 --- We need to precisely delineate what belongs in OPAL vs. above-OPAL
>>> --- Using this information to configure the communication environment 
>>> limits the scope of communication substrate to a static application (in 
>>> number of participants). Under this assumption, one can simply wait until 
>>> the first add_proc to compute the number of processes, solution as 
>>> “correct” as the current one.
>> 
>> Not necessarily - it depends on how it is used, and how it is communicated. 
>> Some of us have explored other options for using this that aren’t static, 
>> but where the info is of use.
> 
> This is a little bit too much hand waving to be constructive. Some other 
> folks in the field have developed many communications libraries, and none of 
> them needed a random number of potential processes to initialize themselves 
> correctly.

That's fine - everyone innovates and does something new. I'm not about to 
divulge proprietary, competitive info to you in advance just to justify our 
needs. I'll only note that notification of change isn't the sole jurisdiction 
of the FT group, and some of us have other uses for it.


> 
>>> The other “global” information that were made available in OPAL 
>>> (num_local_peers and my_local_rank) are only used by local BTL (SM, SMCUDA 
>>> and VADER). Moreover, my_local_rank is only used to decide who initialize 
>>> the backend file, thing that can easily be done using an atomic operation. 
>>> The number of local processes is used to prevent SM from activating itself 
>>> if we don’t have at least 2 processes per node. So, their usage is 
>>> minimally invasive, and can eventually be phased out with a little effort.
>> 
>> FWIW: the new PMI abstraction is in OPAL because it is RTE-agnostic. So all 
>> the info being discussed will actually be captured originally in the OPAL 
>> layer,  and stored in the OPAL dstore framework. In the current code, the 
>> RTE grabs the data and exposes it to the OMPI layer, which then pushes it 
>> back down to the OPAL proc.h struct.
>> 
>>  since anyone can freely query the info from opal/pmix or 
>> opal/dstore, it is really irrelevant in some ways. The info is there, in the 
>> OPAL layer, prior to BTL's being initialized. If you don't want it in a 
>> global storage, people can just get it from the appropriate OPAL API.
>> 
>> So what are we actually debating here? Global storage vs API call?
> 
> Our goals in this project are clearly orthogonal. I put a lot of effort into 
> this move because I need to use the BTLs without PMI, without RTE.

And you are certainly free to do so. Nobody is putting a gun to your head and 
demanding that your BTLs use it

> In fact the question boils down to: Do you want to be able to use the BTL to 
> bootstrap the RTE or not? If yes, then the number of processes is out of the 
> picture, either as an API or as a global storage.

Yes, I do - and no, it isn't a black/white question. I can use the BTLs to 
bootstrap just fine, even when someone uses that info for an initial 
optimization. I can always notify them later when things change, and they can 
make adjustments if necessary.

Again, nobody is forcing you to use any of the data in the opal dstore. It is 
just there if someone *wants* to use it. I fail to understand why you want to 
tell everyone else what they can do in their BTL. If you don't like how they 
wrote it, you are always free to write your own version of it. Nobody will stop 
you.

So what is the issue here?


> 
>   George.
> 
> 
>> 
>>> 
>>>  George.
>>> 
>>> 
 FWIW: here's how ompi_process_info.num_procs was used before the BTL move 
 down to OPAL:
 
 - usnic: for a minor latency optimization / sizing 

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-30 Thread Ralph Castain

On Jul 30, 2014, at 5:25 PM, George Bosilca  wrote:

> 
> On Jul 30, 2014, at 18:00 , Jeff Squyres (jsquyres)  
> wrote:
> 
>> WHAT: Should we make the job size (i.e., initial number of procs) available 
>> in OPAL?
>> 
>> WHY: At least 2 BTLs are using this info (*more below)
>> 
>> WHERE: usnic and ugni
>> 
>> TIMEOUT: there's already been some inflammatory emails about this; let's 
>> discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
>> 
>> MORE DETAIL:
>> 
>> This is an open question.  We *have* the information at the time that the 
>> BTLs are initialized: do we allow that information to go down to OPAL?
>> 
>> Ralph added this info down in OPAL in r32355, but George reverted it in 
>> r32361.
>> 
>> Points for: YES, WE SHOULD
>> +++ 2 BTLs were using it (usinc, ugni)
>> +++ Other RTE job-related info are already in OPAL (num local ranks, local 
>> rank)
>> 
>> Points for: NO, WE SHOULD NOT
>> --- What exactly is this number (e.g., num currently-connected procs?), and 
>> when is it updated?
>> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL
> --- Using this information to configure the communication environment limits 
> the scope of communication substrate to a static application (in number of 
> participants). Under this assumption, one can simply wait until the first 
> add_proc to compute the number of processes, solution as “correct” as the 
> current one.

Not necessarily - it depends on how it is used, and how it is communicated. 
Some of us have explored other options for using this that aren't static, but 
where the info is of use.

> 
>> 
> The other “global” information that were made available in OPAL 
> (num_local_peers and my_local_rank) are only used by local BTL (SM, SMCUDA 
> and VADER). Moreover, my_local_rank is only used to decide who initialize the 
> backend file, thing that can easily be done using an atomic operation. The 
> number of local processes is used to prevent SM from activating itself if we 
> don’t have at least 2 processes per node. So, their usage is minimally 
> invasive, and can eventually be phased out with a little effort.

FWIW: the new PMI abstraction is in OPAL because it is RTE-agnostic. So all the 
info being discussed will actually be captured originally in the OPAL layer,  
and stored in the OPAL dstore framework. In the current code, the RTE grabs the 
data and exposes it to the OMPI layer, which then pushes it back down to the 
OPAL proc.h struct.

 since anyone can freely query the info from opal/pmix or opal/dstore, 
it is really irrelevant in some ways. The info is there, in the OPAL layer, 
prior to BTL's being initialized. If you don't want it in a global storage, 
people can just get it from the appropriate OPAL API.

So what are we actually debating here? Global storage vs API call?

> 
>  George.
> 
> 
>> FWIW: here's how ompi_process_info.num_procs was used before the BTL move 
>> down to OPAL:
>> 
>> - usnic: for a minor latency optimization / sizing of a shared receive 
>> buffer queue length, and for the initial size of a peer lookup hash
>> - ugni: to determine the size of the per-peer buffers used for send/recv 
>> communication
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15378.php



[OMPI devel] RFC: job size info in OPAL

2014-07-30 Thread Jeff Squyres (jsquyres)
WHAT: Should we make the job size (i.e., initial number of procs) available in 
OPAL?

WHY: At least 2 BTLs are using this info (*more below)

WHERE: usnic and ugni

TIMEOUT: there's already been some inflammatory emails about this; let's 
discuss next Tuesday on the teleconf: Tue, 5 Aug 2014

MORE DETAIL:

This is an open question.  We *have* the information at the time that the BTLs 
are initialized: do we allow that information to go down to OPAL?

Ralph added this info down in OPAL in r32355, but George reverted it in r32361.

Points for: YES, WE SHOULD
+++ 2 BTLs were using it (usinc, ugni)
+++ Other RTE job-related info are already in OPAL (num local ranks, local rank)

Points for: NO, WE SHOULD NOT
--- What exactly is this number (e.g., num currently-connected procs?), and 
when is it updated?
--- We need to precisely delineate what belongs in OPAL vs. above-OPAL

FWIW: here's how ompi_process_info.num_procs was used before the BTL move down 
to OPAL:

- usnic: for a minor latency optimization / sizing of a shared receive buffer 
queue length, and for the initial size of a peer lookup hash
- ugni: to determine the size of the per-peer buffers used for send/recv 
communication

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/