Re: [OMPI devel] [devel-core] Major reduction in ORTE

2007-07-13 Thread Gleb Natapov
On Fri, Jul 13, 2007 at 06:51:44AM -0600, Ralph H Castain wrote:
> I certainly expect it will be merged in before anything major happens. We
> will likely have to modify it as things progress, and we may have to change
> how it does things, but I expect the functionality will persist.
>
Great. Thanks Ralph.

> 
> On 7/13/07 12:41 AM, "Gleb Natapov"  wrote:
> 
> > On Thu, Jul 12, 2007 at 03:04:01PM -0600, Ralph H Castain wrote:
> >> As always, any thoughts/suggestions are welcomed.
> >> 
> > I hope Sharon's work on process affinity will be merged into the trunk
> > before this works begins and functionality will be preserved during the
> > work.
> > 

--
Gleb.


Re: [OMPI devel] [devel-core] Major reduction in ORTE

2007-07-13 Thread Ralph H Castain
I certainly expect it will be merged in before anything major happens. We
will likely have to modify it as things progress, and we may have to change
how it does things, but I expect the functionality will persist.


On 7/13/07 12:41 AM, "Gleb Natapov"  wrote:

> On Thu, Jul 12, 2007 at 03:04:01PM -0600, Ralph H Castain wrote:
>> As always, any thoughts/suggestions are welcomed.
>> 
> I hope Sharon's work on process affinity will be merged into the trunk
> before this works begins and functionality will be preserved during the
> work.
> 
> --
> Gleb.
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [devel-core] Major reduction in ORTE

2007-07-13 Thread Gleb Natapov
On Thu, Jul 12, 2007 at 03:04:01PM -0600, Ralph H Castain wrote:
> As always, any thoughts/suggestions are welcomed.
> 
I hope Sharon's work on process affinity will be merged into the trunk
before this works begins and functionality will be preserved during the
work.

--
Gleb.


Re: [OMPI devel] [devel-core] Major reduction in ORTE

2007-07-12 Thread Jeff Squyres

Thanks for the summary Ralph.

On Jul 12, 2007, at 5:04 PM, Ralph H Castain wrote:


Yo all

As we are discussing functional requirements for the upcoming 1.3  
release, I
was asked to provide a little info about what is going to be  
happening to

the ORTE part of the code base over the remainder of this year.

Short answer: there will be a major code revision to reduce ORTE to  
the

minimum required to support Open MPI. This includes (a) a major design
change away from event-driven programming that will result in the
consolidation of several frameworks and removal of at least two  
others; and
(b) general cleanup to reduce memory footprint, startup message  
size, and

other areas.


Longer explanation:

At the beginning of the Open MPI project, it was quickly determined  
that
nobody (myself perhaps excepted) really wanted to build/maintain  
the RTE
underpinning Open MPI. We were, after all, primarily interested in  
MPI.
Hence, we thought it would be a good thing if we could define an  
RTE that
would be of adequate general interest to attract partners whose  
primary

focus would be extension and support of the RTE itself.

Well, after several years, it is clear that the original idea isn't  
going to

work (for a variety of reasons that aren't worth recounting here). We
therefore decided recently that it is time to accept the  
inevitable, quit
trying to support a more general RTE, and instead spend some effort  
reducing
the ORTE layer down to its most basic requirements. In particular,  
we want
to make the code easier to maintain and debug, faster and more  
scalable for

startup, and less vulnerable to race conditions.

In its essence, the plan consists of the following:

1. remove the cellid from the process name as the code will solely  
be a
single-cluster system. Other interested parties have offered to  
provide an
overlayer that will cross-connect Open MPI instances across  
clusters - we

will work with them to help facilitate the necessary hooks, but won't
duplicate that connectivity internally.

2. remove the RDS framework. All discovery and allocation will be  
done in a
single step in the RAS. We will revise the RAS to allow better co- 
existence

of resource manager specified allocations and hostfiles (more on that
later).

3. Eliminate the GPR framework, or at the very least, removal of the
subscribe/trigger functionality from it. We will be moving away  
from the
current event-driven architecture to reduce our exposure to race  
conditions
and eliminate the complexity caused by recursive callbacks due to  
trigger
events. We will explore globalized data storage in simplified  
arrays as an

alternative to the GPR database - initial tests support the idea, but
further work needs to be done. We know that people like the Eclipse  
PTP team
need access to certain data - we will work with them to figure out  
the best

way to do so given the changes to/departure of the GPR.

4. Consolidate the NS, PLS, RMGR, and SMR framework functionality  
into a
single process lifecycle management (PLM) framework. PLM components  
will
still call the ERRMGR to deal with response to process failures,  
and will
assume responsibility for storing their own data. The SCHEMA  
framework will
be eliminated as part of this change. We will move some functions  
(e.g.,
orte_abort) that are currently in the runtime and util areas into  
the PLM

components as appropriate.

5. Each framework will have logic in their respective "open"  
function that
specifically prevents them from performing component_open unless we  
are on
the HNP. If we are not on the HNP, an #if ORTE_WANT_NO_SUPPORT will  
force
the use of a "no_op" module that does nothing, but whose return  
codes will

indicate that an error did not occur. If that is not set, then a proxy
module will be utilized that provides appropriate communications to  
the HNP
to support remote applications. This will reduce memory footprints  
(since no
components will be opened) and allow us to simply pass-through MCA  
params to
all processes while ensuring proper functionality is available.  
Note that
environments like CNOS may still require special components in some  
of the

frameworks as the "no_op" may not be suitable for all API functions.

6. the SDS framework will not only support name discovery, but will  
hold all
backend operations required during startup. For example, the  
contents of the
message now sent back to the new PLM by each process will be  
dependent upon
environment. Hence, a one-to-one correspondence will be established  
between

PLM and SDS components.

7. consolidate the data in the MPI startup message (currently  
delivered at
STG1 stagegate). For example, any data in the MPI startup message  
that needs
to be indexed will be sent in an array sorted by vpid (no need to  
send the

entire list of process name structs). Whereas before we couldn't take
advantage of our knowledge of the message contents since it was  
generated by
the