Re: [OMPI devel] Remove prun tool from OMPI?

2018-06-05 Thread Thomas Naughton


On Tue, 5 Jun 2018, r...@open-mpi.org wrote:





On Jun 5, 2018, at 11:59 AM, Thomas Naughton  wrote:

Hi Ralph,


All it means is that PRRTE users must be careful to have PRRTE before OMPI in 
their path values. Otherwise, they get the wrong “prun” and it fails. I suppose 
I could update the “prun” in OMPI to match the one in PRRTE, if that helps - 
there isn’t anything incompatible between ORTE and PRRTE. Would that make sense?



Yes, if updating "OMPI prun" with latest "PRRTE prun" works ok, that
seems like a reasonable way to keep DVM for OMPI usage.

I agree that it does seem likely that users could easily get the wrong
'prun' but this may be something that falls out in future (based on
discussion on call today).

I guess the main point of interest would be to have some method for
launching the DVM scenario with OMPI.  Another option could be to rename
the binary in OMPI?


Yeah, that’s what the OHPC folks did in their distro - they renamed 
it to “ompi-prun”. If that works for you, then perhaps the best 
path forward is to do the rename and update it as well.



Sounds good to me -- seems like a good way to avoid confusion.

And having the 'ompi-prun' be in sync with (prrte) prun will make sure
things run properly, i.e., easy to drop in new snapshot of the tool when
updating PRRTE snapshots in OMPI.  (Or however done in future)

Thanks, Ralph!
--tjn


 _____
  Thomas Naughton  naught...@ornl.gov
  Research Associate   (865) 576-4184





Thanks,
--tjn

_________
 Thomas Naughton  naught...@ornl.gov
 Research Associate   (865) 576-4184


On Tue, 5 Jun 2018, r...@open-mpi.org wrote:


I know we were headed that way - it might still work when run against the 
current ORTE. I can check that and see. If so, then I guess it might be 
advisable to retain it.

All it means is that PRRTE users must be careful to have PRRTE before OMPI in 
their path values. Otherwise, they get the wrong “prun” and it fails. I suppose 
I could update the “prun” in OMPI to match the one in PRRTE, if that helps - 
there isn’t anything incompatible between ORTE and PRRTE. Would that make sense?


FWIW: Got a similar complaint from the OpenHPC folks - I gather they also have 
a “prun”’ in their distribution that they use as an abstraction over all the RM 
launchers. I’m less concerned about that one, though.



On Jun 5, 2018, at 9:55 AM, Thomas Naughton  wrote:
Hi Ralph,
Is the 'prun' tool required to launch the DVM?
I know that at some point things shifted to use 'prun' and didn't require
the URI on command-line, but I've not tested in few months.
Thanks,
--tjn
_____
Thomas Naughton  naught...@ornl.gov
Research Associate   (865) 576-4184
On Tue, 5 Jun 2018, r...@open-mpi.org wrote:

Hey folks
Does anyone have heartburn if I remove the “prun” tool from ORTE? I don’t 
believe anyone is using it, and it doesn’t look like it even works.
I ask because the name conflicts with PRRTE and can cause problems when running 
OMPI against PRRTE
Ralph
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Remove prun tool from OMPI?

2018-06-05 Thread Thomas Naughton

Hi Ralph,

All it means is that PRRTE users must be careful to have PRRTE before 
OMPI in their path values. Otherwise, they get the wrong “prun” and 
it fails. I suppose I could update the “prun” in OMPI to match the 
one in PRRTE, if that helps - there isn’t anything incompatible 
between ORTE and PRRTE. Would that make sense?



Yes, if updating "OMPI prun" with latest "PRRTE prun" works ok, that
seems like a reasonable way to keep DVM for OMPI usage.

I agree that it does seem likely that users could easily get the wrong
'prun' but this may be something that falls out in future (based on
discussion on call today).

I guess the main point of interest would be to have some method for
launching the DVM scenario with OMPI.  Another option could be to rename
the binary in OMPI?

Thanks,
--tjn

 _____
  Thomas Naughton  naught...@ornl.gov
  Research Associate   (865) 576-4184


On Tue, 5 Jun 2018, r...@open-mpi.org wrote:


I know we were headed that way - it might still work when run against the 
current ORTE. I can check that and see. If so, then I guess it might be 
advisable to retain it.

All it means is that PRRTE users must be careful to have PRRTE before OMPI in 
their path values. Otherwise, they get the wrong “prun” and it fails. I suppose 
I could update the “prun” in OMPI to match the one in PRRTE, if that helps - 
there isn’t anything incompatible between ORTE and PRRTE. Would that make sense?


FWIW: Got a similar complaint from the OpenHPC folks - I gather they also have 
a “prun”’ in their distribution that they use as an abstraction over all the RM 
launchers. I’m less concerned about that one, though.



On Jun 5, 2018, at 9:55 AM, Thomas Naughton  wrote:

Hi Ralph,

Is the 'prun' tool required to launch the DVM?

I know that at some point things shifted to use 'prun' and didn't require
the URI on command-line, but I've not tested in few months.

Thanks,
--tjn

_________
 Thomas Naughton  naught...@ornl.gov
 Research Associate   (865) 576-4184


On Tue, 5 Jun 2018, r...@open-mpi.org wrote:


Hey folks

Does anyone have heartburn if I remove the “prun” tool from ORTE? I don’t 
believe anyone is using it, and it doesn’t look like it even works.

I ask because the name conflicts with PRRTE and can cause problems when running 
OMPI against PRRTE

Ralph

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Remove prun tool from OMPI?

2018-06-05 Thread Thomas Naughton

Hi Ralph,

Is the 'prun' tool required to launch the DVM?

I know that at some point things shifted to use 'prun' and didn't require
the URI on command-line, but I've not tested in few months.

Thanks,
--tjn

 _
  Thomas Naughton  naught...@ornl.gov
  Research Associate   (865) 576-4184


On Tue, 5 Jun 2018, r...@open-mpi.org wrote:


Hey folks

Does anyone have heartburn if I remove the “prun” tool from ORTE? I don’t 
believe anyone is using it, and it doesn’t look like it even works.

I ask because the name conflicts with PRRTE and can cause problems when running 
OMPI against PRRTE

Ralph

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Q: Using a hostfile in managed environment?

2017-02-24 Thread Thomas Naughton

Hi Ralph,

OK, that's pretty much what I thought but wanted to get a sanity check. :-)
I'll see if I can reproduce the issue in a more precise manner and open an
issue if I find something off in the mapping.

Thanks,
--tjn

 _
  Thomas Naughton  naught...@ornl.gov
  Research Associate   (865) 576-4184


On Fri, 24 Feb 2017, r...@open-mpi.org wrote:




On Feb 24, 2017, at 11:57 AM, Thomas Naughton <naught...@ornl.gov> wrote:

Hi,

We're trying to track down some curious behavior and decided to take a step
back and check a base assumption.

When running within a managed environment (job allocation):

   Q: Should you be able to use `--hostfile` or `--host` options to
  operate on a subset of the resources in the allocation?
  (Example: within 4 node SLURM allocation, run on just 2 nodes in
   allocation.)


Yes - those options are used to “filter” the allocation prior to launch



   Q: Additionally, should this be the same when launching the DVM in
  order to run on a subset of resources using subsequent
  'mpirun --hnp ...' commands?
  (Only 'orte-dvm' would need to have `--hostfile` or `--host` args.)


Yes - only the DVM needs to know the filter. When operating with a DVM, “mpirun 
--hnp...” only packages up the cmd line and sends it to the DVM. All the 
mapping occurs in orte-dvm.



There are a variety of interactions with ess/ras/rmaps and the resource
manager, but the thought was that you "should" be able to use a hostfile to
operate on a subset of the allocation. Is that a flawed assumption?

Thanks,
--tjn

_________
 Thomas Naughton  naught...@ornl.gov
 Research Associate   (865) 576-4184

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel


___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] Q: Using a hostfile in managed environment?

2017-02-24 Thread Thomas Naughton

Hi,

We're trying to track down some curious behavior and decided to take a step
back and check a base assumption.

When running within a managed environment (job allocation):

Q: Should you be able to use `--hostfile` or `--host` options to
   operate on a subset of the resources in the allocation?
   (Example: within 4 node SLURM allocation, run on just 2 nodes in
allocation.)

Q: Additionally, should this be the same when launching the DVM in
   order to run on a subset of resources using subsequent
   'mpirun --hnp ...' commands?
   (Only 'orte-dvm' would need to have `--hostfile` or `--host` args.)

There are a variety of interactions with ess/ras/rmaps and the resource
manager, but the thought was that you "should" be able to use a hostfile to
operate on a subset of the allocation. Is that a flawed assumption?

Thanks,
--tjn

 _____
  Thomas Naughton  naught...@ornl.gov
  Research Associate   (865) 576-4184

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel


Re: [OMPI devel] Need to know your Github ID

2014-09-11 Thread Thomas Naughton

naughtont -> naughtont3

Thanks,
--tjn

 _
  Thomas Naughton  naught...@ornl.gov
  Research Associate   (865) 576-4184


On Wed, 10 Sep 2014, Jeff Squyres (jsquyres) wrote:


As the next step of the planned migration to Github, I need to know:

- Your Github ID (so that you can be added to the new OMPI git repo)
- Your SVN ID (so that I can map SVN->Github IDs, and therefore map Trac 
tickets to appropriate owners)

Here's the list of SVN IDs who have committed over the past year -- I'm 
guessing that most of these people will need Github IDs:

adrian
alekseys
alex
alinas
amikheev
bbenton
bosilca (done)
bouteill
brbarret
bwesarg
devendar
dgoodell (done)
edgar
eugene
ggouaillardet
hadi
hjelmn
hpcchris
hppritcha
igoru
jjhursey (done)
jladd
jroman
jsquyres (done)
jurenz
kliteyn
manjugv
miked (done)
mjbhaskar
mpiteam (done)
naughtont
osvegis
pasha
regrant
rfaucett
rhc (done)
rolfv (done)
samuel
shiqing
swise
tkordenbrock
vasily
vvenkates
vvenkatesan
yaeld
yosefe

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/09/15788.php



Re: [OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-29 Thread Thomas Naughton

Hi,

Thanks Jeff, I think that was a pretty good summary of things.

Thomas indicated there was no rush on the RFC; perhaps we can 
discuss this next-next-Tuesday (June 10)?


Phone discussion seems like a good idea and June 10 sounds good to me.

Thanks,
--tjn

 _
  Thomas Naughton  naught...@ornl.gov
  Research Associate   (865) 576-4184


On Thu, 29 May 2014, Jeff Squyres (jsquyres) wrote:


I refrained from speaking up on this thread because I was on travel, and I 
wanted to think a bit more about this before I said anything.

Let me try to summarize the arguments that have been made so far...

A. Things people seem to agree on:

1. Inclusion in trunk has no correlation to being included in a release
2. Prior examples of (effectively) single-organization components

B. Reasons to have STCI/HPX/etc. components in SVN trunk:

1. Multiple organizations are asking (ORNL, UTK, UH)
2. Easier to develop/merge the STCI/HPX/etc. components over time
3. Find all alternate RTE components in one place (vs. multiple internet repos)
4. More examples of how to use the RTE framework

C. Reasons not to have STCI/HPX/etc. components in the SVN trunk:

1. What is the (technical) gain is for being in the trunk?
2. Concerns about external release schedule pressure
3. Why have something on the trunk if it's not eventually destined for a 
release?

In particular, I think B2 and C1 seem to be in conflict with each other.

I have several thoughts about this topic, but I'm hesitant to continue this 
already lengthy thread on a contentious topic.  I also don't want to spend the 
next 30 minutes writing a lengthy, carefully-worded email that will just spawn 
further lengthy, carefully-worded emails (each costing 15-30 minutes).  Prior 
history has shown that we discuss and resolve issues much more rationally on 
the phone (vs. email hell).

I would therefore like to discuss this on a weekly Tuesday call.

Next week is bad because it's the MPI Forum meeting; I suspect that some -- but 
not all -- of us will not be on the Tuesday call because we'll be at the Forum.

Thomas indicated there was no rush on the RFC; perhaps we can discuss this 
next-next-Tuesday (June 10)?




On May 27, 2014, at 12:25 PM, Thomas Naughton <naught...@ornl.gov> wrote:



WHAT:  add new component to ompi/rte framework

WHY:   because it will simplify our maintenance & provide an alt. reference

WHEN:  no rush, soon-ish? (June 12?)

This is a component we currently maintain outside of the ompi tree to
support using OMPI with an alternate runtime system.  This will also
provide an alternate component to ORTE, which was motivation for PMI
component in related RFC.   We build/test nightly and it occasionally
catches ompi-rte abstraction violations, etc.

Thomas

_____
 Thomas Naughton  naught...@ornl.gov
 Research Associate   (865) 576-4184

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/05/14852.php



--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/05/14904.php



Re: [OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-27 Thread Thomas Naughton

Sure, if its helpful I can join a call.

--tjn

 _
  Thomas Naughton  naught...@ornl.gov
  Research Associate   (865) 576-4184


On Tue, 27 May 2014, Ralph Castain wrote:


Forgot to add: would it help to discuss this over the phone instead?


On May 27, 2014, at 12:56 PM, Ralph Castain <r...@open-mpi.org> wrote:



On May 27, 2014, at 12:50 PM, Edgar Gabriel <gabr...@cs.uh.edu> wrote:




On 5/27/2014 2:46 PM, Ralph Castain wrote:


On May 27, 2014, at 12:27 PM, Edgar Gabriel <gabr...@cs.uh.edu>
wrote:


I'll let ORNL talk about the STCI component itself (which might
have additional reasons), but keeping the code in trunk vs. an
outside github/mercurial repository has two advantages in my
opinion: i) it simplifies the propagation of know-how between the
groups,


Afraid I don't understand that - this is just glue, right?



yes, but its easier to look in one place vs. n places for every features.


and ii) avoids having to keep a separate branch up to date. (We did
the second part with OMPIO for a couple of years, and that was
really painful).


Ah, perhaps this is the "rub" - are you saying that you expect us to
propagate any changes in the RTE interface to your component? If so,
then that violates the original agreement about this framework. It
was solely to provide a point-of-interface for *external* groups to
connect their RTE's into OMPI. We agreed that we would notify people
of changes to that interface, and give them a chance to provide input
on those changes - but under no conditions were we wiling to accept
responsibility for maintaining those branch interfaces.

Given that the interface is wholly contained in the ompi/rte
component, I guess I struggle to understand the code conflict issue.
There is no change in the OMPI code base that can possibly conflict
with your component. The only things that could impact you are
changes in the OMPI layer that require modification to your
component, which is something you'd have to do regardless. We will
not test nor update that component for you.



no, not all. My point was that we invested enormous efforts at that time
to just do the svn merge from the changes on trunk to our branch, that's
all.



If you are on a branch that contains an svn checkout of the trunk, plus one 
component directory in one framework, then I'm afraid I cannot understand how 
you get merge conflicts. I've been doing this for years and haven't hit one 
yet. The only possible source of a conflict is if I touch code that is common 
to the two repos - i.e., outside of the area that I'm adding. In this case, 
that should never happen, yes?

If it does, then you touched code outside your component, and you either (a) 
are going to encounter this no matter what because you haven't pushed it up 
yet, or (b) couldn't commit that up to the main repo anyway if it impacted the 
RTE interface.

Sorry, but I'm really struggling to understand how adding only this one 
component, which you solely modify and control, can possibly help with 
maintaining your branch.



Thanks
Edgar





In addition, IANAL, but I was actually wandering about the
implications of using separate code repositories outside of ompi
for sharing code, and whether that is truly still covered by the
contributors agreement that we all signed.


Of course not - OMPI's license only declares that anything you push
into the main OMPI code repo (and hence, our official releases) is
covered by that agreement. Anything you add or distribute externally
is on your own. You can *choose* to license that code in accordance
with the OMPI license, but you aren't *required* to do so.



Anyway, I don't have strong feelings either way as well, just would
see a couple of advantages (for us) if the code was in the trunk.


I'm still trying to understand those - sorry to be a pain, but my
biggest fear at this point is that the perceived advantage is based
on a misunderstanding, and I'd like to head that off before it causes
problems.



Thanks Edgar

On 5/27/2014 1:45 PM, Ralph Castain wrote:

I think so long as we leave these components out of any release,
there is a limited potential for problems (probably most
importantly, we sidestep all the issues about syncing
releases!).

However, that said, I'm not sure what it gains anyone to include
a component that *isn't* going in a release. Nobody outside your
organizations is going to build against it - so what did it
accomplish to push the code into the repo?

Mind you, I'm not saying I'm staunchly opposed - just trying to
understand how it benefits anyone.


On May 27, 2014, at 11:28 AM, Edgar Gabriel <gabr...@cs.uh.edu>
wrote:


To through in my $0.02, I would see a benefit in adding the
component to the trunk. As I mentioned in the last teleconf, we
are currently working on adding support for the HPX runtime
enviro

Re: [OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-27 Thread Thomas Naughton

Inline comments ... way at the bottom.  ;-)

--tjn

 _
  Thomas Naughton  naught...@ornl.gov
  Research Associate   (865) 576-4184


On Tue, 27 May 2014, Ralph Castain wrote:



On May 27, 2014, at 12:50 PM, Edgar Gabriel <gabr...@cs.uh.edu> wrote:




On 5/27/2014 2:46 PM, Ralph Castain wrote:


On May 27, 2014, at 12:27 PM, Edgar Gabriel <gabr...@cs.uh.edu>
wrote:


I'll let ORNL talk about the STCI component itself (which might
have additional reasons), but keeping the code in trunk vs. an
outside github/mercurial repository has two advantages in my
opinion: i) it simplifies the propagation of know-how between the
groups,


Afraid I don't understand that - this is just glue, right?



yes, but its easier to look in one place vs. n places for every features.


and ii) avoids having to keep a separate branch up to date. (We did
the second part with OMPIO for a couple of years, and that was
really painful).


Ah, perhaps this is the "rub" - are you saying that you expect us to
propagate any changes in the RTE interface to your component? If so,
then that violates the original agreement about this framework. It
was solely to provide a point-of-interface for *external* groups to
connect their RTE's into OMPI. We agreed that we would notify people
of changes to that interface, and give them a chance to provide input
on those changes - but under no conditions were we wiling to accept
responsibility for maintaining those branch interfaces.

Given that the interface is wholly contained in the ompi/rte
component, I guess I struggle to understand the code conflict issue.
There is no change in the OMPI code base that can possibly conflict
with your component. The only things that could impact you are
changes in the OMPI layer that require modification to your
component, which is something you'd have to do regardless. We will
not test nor update that component for you.



no, not all. My point was that we invested enormous efforts at that time
to just do the svn merge from the changes on trunk to our branch, that's
all.



If you are on a branch that contains an svn checkout of the trunk, plus one 
component directory in one framework, then I'm afraid I cannot understand how 
you get merge conflicts. I've been doing this for years and haven't hit one 
yet. The only possible source of a conflict is if I touch code that is common 
to the two repos - i.e., outside of the area that I'm adding. In this case, 
that should never happen, yes?

If it does, then you touched code outside your component, and you either (a) 
are going to encounter this no matter what because you haven't pushed it up 
yet, or (b) couldn't commit that up to the main repo anyway if it impacted the 
RTE interface.

Sorry, but I'm really struggling to understand how adding only this one 
component, which you solely modify and control, can possibly help with 
maintaining your branch.



I can't speak for them but know that maintaining our rte/stci component
often requires some attention for changes at different levels.  Most
notably changes APIs related to the modex.

The "glue" code in the ompi-rte interface is generally described in
the comments of rte.h, but in my experience it generally
requires a look at rte/orte/* to know what really changes.   Having a few
different ompi-rte components in the tree seems like it offers a bit more
information about what is required.

It also helps to clarify who's maintaining components when API changes are
proposed that effect the RTE layer.  These are generally announced but
telegraphed but it can be helpful to just see the directories,
reminder about who's paying attention if you see "rte/stci" and "rte/hpx".

Also, to respond to earlier comment.  We will continue to maintain our code
(rte/stci component), but it does simplify the patches/processing we 
maintain for integration with Open-MPI work, i.e., OMPI-trunk +

OMPI-RTE-SCI + STCI.

I'd expect this would be the case for the HPX or other instances.

I thought this was the strength of the component infrastructure.  For
example, the ALPS code is external, but there are alps components in
different frameworks, etc.  And those who care about ALPS test that path.

--tjn




Thanks
Edgar





In addition, IANAL, but I was actually wandering about the
implications of using separate code repositories outside of ompi
for sharing code, and whether that is truly still covered by the
contributors agreement that we all signed.


Of course not - OMPI's license only declares that anything you push
into the main OMPI code repo (and hence, our official releases) is
covered by that agreement. Anything you add or distribute externally
is on your own. You can *choose* to license that code in accordance
with the OMPI license, but you aren't *required* to do so.



Anyway, I don't hav

[OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-27 Thread Thomas Naughton


WHAT:  add new component to ompi/rte framework

WHY:   because it will simplify our maintenance & provide an alt. reference

WHEN:  no rush, soon-ish? (June 12?)

This is a component we currently maintain outside of the ompi tree to
support using OMPI with an alternate runtime system.  This will also
provide an alternate component to ORTE, which was motivation for PMI
component in related RFC.   We build/test nightly and it occasionally
catches ompi-rte abstraction violations, etc.

Thomas

 _
  Thomas Naughton  naught...@ornl.gov
  Research Associate   (865) 576-4184



Re: [OMPI devel] RFC: remove PMI component in OMPI/RTE framework

2014-05-27 Thread Thomas Naughton

Hi Ralph,

This component does provide a alternate reference for the ompi-rte
framework.  But if it is unused (unmaintained), it seems less useful in
practice.  I'll post another RFC for related request.

--tjn

 _
  Thomas Naughton  naught...@ornl.gov
  Research Associate   (865) 576-4184


On Sun, 25 May 2014, Ralph Castain wrote:


WHAT:  remove stale and unmaintained component in ompi/rte framework

WHY:   because it is unused, unmaintained, and doesn't even compile?

WHEN:  without objections, after telecon on June 9

HOW:   svn del ompi/rte/pmi

This was a component added by Brian as a test of the ompi/rte framework while 
we developed that system. It never really had any purpose other than to provide 
an alternative to ORTE while we tested the revised integration. So far as we 
know, nobody ever used it in an actual installation.

Ralph

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/05/14838.php



Re: [OMPI devel] Q: MPI-RTE / ompi_proc_t vs. ompi_process_info_t ?

2013-12-18 Thread Thomas Naughton

Hi Ralph,

OK, thanks for clarification and code pointers. 
I'll update "rte.h" to reflect the updates.


Thanks,
--tjn

 _____
  Thomas Naughton  naught...@ornl.gov
  Research Associate   (865) 576-4184


On Wed, 18 Dec 2013, Ralph Castain wrote:


There is no relation at all between ompi_proc_t and ompi_process_info_t. The 
ompi_proc_t is defined in the MPI layer and is used in that layer in various 
places very much like orte_proc_t is used in the ORTE layer.

If you look in ompi/mca/rte/orte/rte_orte.c, you'll see how we handle the 
revised function calls. Basically, we use the process name to retrieve the 
modex data via the opal_db, and then load a pointer to the hostname into the 
ompi_proc_t proc_hostname field. Thus, the definition of ompi_proc_t remains in 
the MPI layer.

So there was no need to change the ompi/mca/rte/rte.h file, nor to #define 
anything in the component .h file - just have to modify the wrapper code inside 
the RTE component itself.

HTH
Ralph


On Dec 18, 2013, at 1:50 PM, Thomas Naughton <naught...@ornl.gov> wrote:


Hi Ralph,

Question about the MPI-RTE interface change in r29931.  The change was not
reflected in the "ompi/mca/rte/rte.h" file.

I'm curious how the newly added "struct ompi_proc_t" relates to the "struct 
ompi_process_info_t" that is described in the "rte.h" file?

I understand the general motivation for the API change but it is less clear
to me how the information previously defined in the header changes (or does
not change)?

Thanks,
--tjn

_________
 Thomas Naughton  naught...@ornl.gov
 Research Associate   (865) 576-4184


On Mon, 16 Dec 2013, svn-commit-mai...@open-mpi.org wrote:


Author: rhc (Ralph Castain)
Date: 2013-12-16 22:26:00 EST (Mon, 16 Dec 2013)
New Revision: 29931
URL: https://svn.open-mpi.org/trac/ompi/changeset/29931

Log:
Revert r29917 and replace it with a fix that resolves the thread deadlock while 
retaining the desired debug info. In an earlier commit, we had changed the 
modex accordingly:

* automatically retrieve the hostname (and all RTE info) for all procs during 
MPI_Init if nprocs < cutoff

* if nprocs > cutoff, retrieve the hostname (and all RTE info) for a proc upon 
the first call to modex_recv for that proc. This would provide the hostname for 
debugging purposes as we only report errors on messages, and so we must have 
called modex_recv to get the endpoint info

* BTLs are not to call modex_recv until they need the endpoint info for first 
message - i.e., not during add_procs so we don't call it for every process in 
the job, but only those with whom we communicate

My understanding is that only some BTLs have been modified to meet that third 
requirement, but those include the Cray ones where jobs are big enough that 
launch times were becoming an issue. Other BTLs would hopefully be modified as 
time went on and interest in using them at scale arose. Meantime, those BTLs 
would call modex_recv on every proc, and we would therefore be no worse than 
the prior behavior.

This commit revises the MPI-RTE interface to pass the ompi_proc_t instead of 
the ompi_process_name_t for the proc so that the hostname can be easily 
inserted. I have advised the ORNL folks of the change.

cmr=v1.7.4:reviewer=jsquyres:subject=Fix thread deadlock

Text files modified:
 trunk/ompi/mca/rte/orte/rte_orte.h| 7 ---
 trunk/ompi/mca/rte/orte/rte_orte_module.c |27 ++-
 trunk/ompi/proc/proc.c|26 ++
 trunk/ompi/runtime/ompi_module_exchange.c |10 +-
 4 files changed, 49 insertions(+), 21 deletions(-)


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



[OMPI devel] Q: MPI-RTE / ompi_proc_t vs. ompi_process_info_t ?

2013-12-18 Thread Thomas Naughton

Hi Ralph,

Question about the MPI-RTE interface change in r29931.  The change was not
reflected in the "ompi/mca/rte/rte.h" file.

I'm curious how the newly added "struct ompi_proc_t" relates to the 
"struct ompi_process_info_t" that is described in the "rte.h" file?


I understand the general motivation for the API change but it is less clear
to me how the information previously defined in the header changes (or does
not change)?

Thanks,
--tjn

 _____
  Thomas Naughton  naught...@ornl.gov
  Research Associate   (865) 576-4184


On Mon, 16 Dec 2013, svn-commit-mai...@open-mpi.org wrote:


Author: rhc (Ralph Castain)
Date: 2013-12-16 22:26:00 EST (Mon, 16 Dec 2013)
New Revision: 29931
URL: https://svn.open-mpi.org/trac/ompi/changeset/29931

Log:
Revert r29917 and replace it with a fix that resolves the thread deadlock while 
retaining the desired debug info. In an earlier commit, we had changed the 
modex accordingly:

* automatically retrieve the hostname (and all RTE info) for all procs during 
MPI_Init if nprocs < cutoff

* if nprocs > cutoff, retrieve the hostname (and all RTE info) for a proc upon 
the first call to modex_recv for that proc. This would provide the hostname for 
debugging purposes as we only report errors on messages, and so we must have 
called modex_recv to get the endpoint info

* BTLs are not to call modex_recv until they need the endpoint info for first 
message - i.e., not during add_procs so we don't call it for every process in 
the job, but only those with whom we communicate

My understanding is that only some BTLs have been modified to meet that third 
requirement, but those include the Cray ones where jobs are big enough that 
launch times were becoming an issue. Other BTLs would hopefully be modified as 
time went on and interest in using them at scale arose. Meantime, those BTLs 
would call modex_recv on every proc, and we would therefore be no worse than 
the prior behavior.

This commit revises the MPI-RTE interface to pass the ompi_proc_t instead of 
the ompi_process_name_t for the proc so that the hostname can be easily 
inserted. I have advised the ORNL folks of the change.

cmr=v1.7.4:reviewer=jsquyres:subject=Fix thread deadlock

Text files modified:
  trunk/ompi/mca/rte/orte/rte_orte.h| 7 ---
  trunk/ompi/mca/rte/orte/rte_orte_module.c |27 ++-
  trunk/ompi/proc/proc.c|26 ++
  trunk/ompi/runtime/ompi_module_exchange.c |10 +-
  4 files changed, 49 insertions(+), 21 deletions(-)




Re: [OMPI devel] ROMIO update breaks trunk

2013-11-14 Thread Thomas Naughton

Hi Ralph,

Does the version in AM_INIT_AUTOMAKE in configure.ac also need to be
increased?  It currently shows 1.11.

Thanks,
--tjn

 _
  Thomas Naughton  naught...@ornl.gov
  Research Associate   (865) 576-4184


On Thu, 14 Nov 2013, Ralph Castain wrote:


Ha! Jeff points out that our web site says we are at AM 1.12.2 - yet our
HACKING file says 1.11.1  Sadness
I'll leave the romio update alone and update the HACKING file to avoid
future confusion


On Nov 14, 2013, at 12:41 PM, Ralph Castain <r...@open-mpi.org> wrote:

  Just in case others are encountering this: the recent ROMIO
  update contains a line in its configure.ac that breaks the trunk
  for automake versions less than 1.12:

"I've looked a bit around online for this, and the consensus generally seems
 to be that AM_PROG_AR should be added in libtool, not in every configure.ac
 script out there. It's especially problematic as AM_PROG_AR doesn't exist i
n automake before 1.12, which means it breaks, among others, with the automa
ke we use to build our distribution tarballs :-)

See e.g. http://debbugs.gnu.org/cgi/bugreport.cgi?bug=11401 for a discussion
."

I'm going to comment that line out in ompi/mca/io/romio/romio/configure.ac s
o the trunk can build until someone figures out (a) if it is really needed, 
and (b) how to correctly add it


Ralph







[OMPI devel] 'install-sh' in SVN

2013-06-07 Thread Thomas Naughton

Hi,

It looks like an auto-generated 'install-sh' was accidentally added to SVN
under libevent in OPAL:

ompi-trunk/opal/mca/event/libevent2021/libevent/install-sh

Ok, to remove it?
--tjn

 _
  Thomas Naughton  naught...@ornl.gov
  Research Associate   (865) 576-4184



Re: [OMPI devel] [OMPI svn] svn:open-mpi r28456 - trunk

2013-05-08 Thread Thomas Naughton

Hi Ralph,

On Tue, 7 May 2013, Ralph Castain wrote:


2) Avoid adding "ignored" frameworks to the autogenerated "frameworks.h"
   header file.


This simply applies the same ignored() function that is used elsewhere in
the autogen.pl script for omitting "ignored" MCA directories from the
processing.  This just picks up the ".ompi_ignore" (and/or ".ompi_unignore) 
files.   The intent being that if you ignore a component (subdir) that will

be removed from the list, but you could also remove an entire framework if
you put the ignore file in the top-level of the framework.



That is new - I would suggest not doing that as it behaves differently than
you might expect. The .ompi_ignore in a component prevents that component
from building at all, so it won't ever be opened etc. However, the
framework *must* build the base code no matter what - and that means the
framework will be opened, selected, and closed at the minimum.


I would prefer we keep ompi_ignore cleanly defined. You can ignore all 
components by simply putting --enable-mca-no-build= on your 
configure line.


The intent being that if for whatever reason you ignore a framework in the
"${project}/mca/" space, you will not have it automatically show up in the
project's "frameworks.h" file.

On Tue, 7 May 2013, Ralph Castain wrote:


We use the frameworks.h file to "discover" the frameworks in
ompi_info.  Even if no components are built for that framework,
there still are MCA params relating to the base of that framework.
Sounds silly, I know - but there may be reasons to access those
params - e.g., to set verbosity to verify that no components are
being selected.

I think we need those frameworks to be listed...



Ok, I didn't realize the 'ompi_info' aspect.  Good to know.
However, I think honouring the ignore behavior is good in this case
b/c if you drop an ignore file in a framework, you won't get any
other autogen touches (i.e., no Makefile's are autogenerated).  So
it seems that having no Makefiles but including the framework in the
"framework.h" would break regardless.  Again, this is more of a
safety guard.


Actually, I disagree. As stated above, the framework will *always* build the
base code and be opened, selected, and closed - so you at least need
access to the verbosity parameter so you can verify those operations.
Keeping it in ompi_info is of value.



I guess I misunderstood the scope of use for the ".ompi_ignore" file.  I
thought that it could be placed at the top of the framework and it would
ignore the entire directory.

I just did a quick test with the earlier version of autogen.pl (r28241) and
it does indeed generate the Makefiles for that directory.  So it does seem
reasonable that if autogen.pl processes the directory for Makefile stuff*,
that it should process it for the "frameworks.h" entry.

I'll revert that part of the changeset to previous functionality.

Sorry, my bad,
--tjn

 _
  Thomas Naughton  naught...@ornl.gov
  Research Associate   (865) 576-4184



Re: [OMPI devel] Q: project based MCA param files

2013-05-07 Thread Thomas Naughton

Hi,

Ok, looks like this may just do the trick.  We briefly discussed this today
and probably can change our use case to make use of this mechanism instead
and avoid any further enhancments.

 Question: If you do a setenv for this MCA param, does that extend the
   default search path?  Or does it replace/override the default?

Thanks Jeff for forwarding info to devel list to get broader feedback, and
to Ralph for providing the suggestion.

--tjn

 _
  Thomas Naughton  naught...@ornl.gov
  Research Associate   (865) 576-4184


On Tue, 7 May 2013, Ralph Castain wrote:


I believe we already have a way of defining where to get the default mca params:

   ret = mca_base_var_register ("opal", "mca", "base", "param_files", "Path for MCA 
"
"configuration files containing variable 
values",
MCA_BASE_VAR_TYPE_STRING, NULL, 0, 0, 
OPAL_INFO_LVL_2,
MCA_BASE_VAR_SCOPE_READONLY, 
_base_var_files);


So wouldn't it be as easy as defining an envar? It's what we did when using the 
OMPI code with ORCM a couple of years ago, and we used it again for a recent 
project in Greenplum where the default mca param was specified in a different 
location than usual.


On May 7, 2013, at 6:28 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:


Given Ralph's questions about 
rhttps://svn.open-mpi.org/trac/ompi/changeset/28456, ORNL's second question to 
me/Nathan about MCA params is probably worth forwarding to the list -- see 
below.

Thoughts on this proposal?


Begin forwarded message:


From: "Boehm, Swen" <bo...@ornl.gov>
Subject: Re: Q: project based MCA param files
Date: May 3, 2013 5:03:43 PM EDT
To: "Jeff Squyres (jsquyres)" <jsquy...@cisco.com>
Cc: Nathan Hjelm <hje...@lanl.gov>, "Vallee, Geoffroy R." <valle...@ornl.gov>, "Naughton 
III, Thomas J." <naught...@ornl.gov>

Hi Jeff,

Here is a short description of the enhancement we would like to contribute.
Let us know what you think.

The purpose of the suggested improvements is to enable "projects" to read
MCA parameters from project specific locations. This enables the usage
of OPAL and the MCA Frameworks outside the OpenMPI project without
interfering with OpenMPI specific parameters and removes the need to
patch OPAL (e.g., to pick up params from different locations).

The possible scenarios would be the following:

a) adding the option to pick up a project specific mca-param.conf file
 Example:
  $HOME/.mca/${project}-mca-param.conf
  and /etc/mca/${project}-mca-param.conf)

b) add the option to pick up the mca-param.conf file from a project specific
 directory
 Example:
  $HOME/.${project}/mca-param.conf
  and /etc/${project}/mca-param.conf
  and/or  /etc/${project}/${project}-mca-param.conf)

c) prefixing the mca param with the project name in the existing mca-param.conf
 file and therefore following the new MCA variable system naming scheme.
 Example:
  mca_${project}_${framework}_${component}_${var_name}

The implementation has to be compatible with the current system, that is,
it should work as it does today without any added burden to the user. The
suggested approach is to provide an addition to the MCA API (something like
mca_base_add_config_file_path ()) to add lookup paths to the MCA system.
This way additional files can be picked up for the MCA param parsing if
needed.

To wrap it up:
1) Is the motivation clear?
2) Is it possible to implement the desired capability within a
  reasonable time and without changing the current behavior?
3) Does it line up with the planning / future capabilities?
4) Which of the above options (A, B, C) would you prefer?

--
Swen Boehm      | Email: bo...@ornl.gov
Oak Ridge National Laboratory  | Phone: +1 865-576-6125


On Apr 26, 2013, at 7:50 PM, Thomas Naughton <naught...@ornl.gov> wrote:


Hi,

Ok, sounds good. We'll check on this next week and get back to you.

Thanks,
--tjn

_
Thomas Naughton  naught...@ornl.gov
Research Associate   (865) 576-4184


On Fri, 26 Apr 2013, Jeff Squyres (jsquyres) wrote:


Email would probably be easiest -- I will need to page in/refresh this area of 
the code, anyway, so if you guys do the initial homework and submit some ideas, 
that would probably be easiest (For me).  :-D


On Apr 26, 2013, at 6:33 PM, Thomas Naughton <naught...@ornl.gov>
wrote:


Hi Jeff,

We don't have one yet but we can code something up and submit a pa

Re: [OMPI devel] [OMPI svn] svn:open-mpi r28456 - trunk

2013-05-07 Thread Thomas Naughton

Hi,

Briefly, I'm Thomas.  I work at ORNL.  I changed autogen.pl on my
first commit to OMPI trunk. (Insert rookie joke here.  :-D)

The changes in r28456 for autogen.pl were pretty basic/minor.  I
apologize for not sending a follow-up email to devel mailing list
outlining the changes -- poor netiquette on my part. :-/

There were four changes included in the patch.  They related
mainly to the recent changes for MCA frameworks.  I'll give a little
more description below.

Ralph, I also included your feedback and a response for #2.  Let me
know if this makes sense as I think it provides the "right" behavior
but want to double check.  Thanks.



 1) Add ifdef guard to project's autogenerated "frameworks.h" header
file, e.g., "opal/inlude/opal/frameworks.h" would have
"OPAL_FRAMEWORKS_H".


This one simply adds and ifdef to top of auto-generated file, so if code
includes the "framework.h" file you avoid multiple includes of same file.
This is generic to the given project so the "opal/" project would generate
something like:

  $ cat opal/include/opal/frameworks.h
  /*
   * This file is autogenerated by autogen.pl. Do not edit this file by hand.
   */
  #ifndef OPAL_FRAMEWORKS_H
  #define OPAL_FRAMEWORKS_H

  #include 

  extern mca_base_framework_t opal_backtrace_base_framework;

 ..

  #endif /* OPAL_FRAMEWORKS_H */

This would also be done for "ompi/" and "orte/" project directories.





 2) Avoid adding "ignored" frameworks to the autogenerated "frameworks.h"
header file.


This simply applies the same ignored() function that is used elsewhere in
the autogen.pl script for omitting "ignored" MCA directories from the
processing.  This just picks up the ".ompi_ignore" (and/or ".ompi_unignore) 
files.   The intent being that if you ignore a component (subdir) that will

be removed from the list, but you could also remove an entire framework if
you put the ignore file in the top-level of the framework.

The intent being that if for whatever reason you ignore a framework in the
"${project}/mca/" space, you will not have it automatically show up in the
project's "frameworks.h" file.

On Tue, 7 May 2013, Ralph Castain wrote:


We use the frameworks.h file to "discover" the frameworks in
ompi_info.  Even if no components are built for that framework,
there still are MCA params relating to the base of that framework.
Sounds silly, I know - but there may be reasons to access those
params - e.g., to set verbosity to verify that no components are
being selected.

I think we need those frameworks to be listed...



Ok, I didn't realize the 'ompi_info' aspect.  Good to know.
However, I think honouring the ignore behavior is good in this case
b/c if you drop an ignore file in a framework, you won't get any
other autogen touches (i.e., no Makefile's are autogenerated).  So
it seems that having no Makefiles but including the framework in the
"framework.h" would break regardless.  Again, this is more of a
safety guard.





3) Avoid adding non-MCA projects to the autogenerated
'mca_project_list', which maintains existing support for "projects" 
with new MCA framework enhancements.  Moves this down to mca_run_global().



This was just a bit of shifting code and didn't sound like there was
any discussion on this point.  This is a "do no harm" factor to
support pre-existing functionality.  The gist is that if you have a
"project" in the build directory that doesn't have an MCA directory 
structure, just avoid adding it to the list of MCA projects.





4) Add small loop at end to add projects with a "config/" subdir
   to the list of includes for 'autoreconf'.



This again is a "do no harm" factor to support pre-exising
functionality.  If you have a "${project}/config/" directory.  This
appends  the "-I ${project}/config/" to the autoreconf list.
If you do not have a "${project}/config/" dir, there is no change.


Again, I hope that gives more context/description to the changes
included in the autogen.pl patch.  In the future, I'll try to do
a better job of sending a heads up to the devel list.

Thanks,
--tjn

 _
  Thomas Naughton  naught...@ornl.gov
  Research Associate   (865) 576-4184


On Tue, 7 May 2013, Ralph Castain wrote:


Crud - it just struck me that you don't want to do one thing in that patch.


+ Avoid adding "ignored" frameworks to the autogenerated "frameworks.h"
 header file.



We use the frameworks.h file to "discover" the frameworks in ompi_info. Even if 
no components are built for that framework, there still are MCA params relating to the 
base of that framework. Sounds si

Re: [MTT devel] MTT feature request

2013-04-11 Thread Thomas Naughton

Hi Jeff,

Cool, no worries.  I just didn't want to forget.

Thanks,
--tjn

 _
  Thomas Naughton  naught...@ornl.gov
  Research Associate   (865) 576-4184


On Thu, 11 Apr 2013, Jeff Squyres (jsquyres) wrote:


Finally done -- sorry for the delay!

On Apr 10, 2013, at 11:18 AM, Thomas Naughton <naught...@ornl.gov> wrote:


Hi Jeff,

Do you know if the 'email_detailed_report_onfail' request for MTT ever got
a green light?  I just checked and didn't see it in trunk.

Quick refresher summary:
+ Adds new 'email_detailed_report_onfail' flag to TextReporter
  that only sends detailed report if the test was not "success".
  (Similar to existing 'email_detailed_report' flag.)

Thanks,
--tjn

_________
 Thomas Naughton  naught...@ornl.gov
 Research Associate   (865) 576-4184


On Thu, 28 Mar 2013, Thomas Naughton wrote:


Hi Jeff,


Sorry for the delay in replying to this; somehow this mail slipped by me.


No problem-o.   :-)

Yes, I just updated local checkout and appears to be just fine
with latest trunk.

Thanks,
--tjn

_________
 Thomas Naughton  naught...@ornl.gov
 Research Associate   (865) 576-4184


On Thu, 28 Mar 2013, Jeff Squyres (jsquyres) wrote:


Sorry for the delay in replying to this; somehow this mail slipped by me.

I'm all for what you described. Does your patch still apply to the current 
trunk?  (Texteeporter was just recently split up a bit)

Sent from my phone. No type good.

On Feb 28, 2013, at 10:07 PM, "Thomas Naughton" <naught...@ornl.gov> wrote:


Hey,

First, thank you for MTT!

I was curious if you would be willing to add a basic feature related to
detailed emails.  Basically, we only want the detailed test results if
something fails.

Initially I thought I could use an () funclet on the
'email_detailed_report' field, but it turns out the summary value used for
"$overall_mtt_status" generated inside the TextFile reporter.

So my solution is to just add another field for this case:

   # If true (1), send the detailed report if there were failures
  email_detailed_report_onfail

Attached is a small patch against mtt-trunk (r1590).

Thanks,
--tjn

_____
Thomas Naughton  naught...@ornl.gov
Research Associate   (865) 576-4184








--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/