Re: [OMPI devel] MPI Reduce Without a Barrier

2019-04-15 Thread Ralph H Castain
Not exactly. The problem is that rank=0 initially falls behind because it is 
doing more work - i.e., it has to receive all the buffers and do something with 
them. As a result, it doesn’t get to post the next allreduce before the 
messages from the other participants arrive - which means that rank=0 has to 
stick those messages into the “unexpected message” queue. As iterations go by, 
the memory consumed by that queue gets bigger and bigger, causing rank=0 to run 
slower and slower…until you run out of memory and it aborts.

Adding the occasional barrier resolves the problem by letting rank=0 catch up 
and release the memory in the “unexpected message” queue.


> On Apr 15, 2019, at 1:33 PM, Saliya Ekanayake  wrote:
> 
> Thank you, Nathan. Could you elaborate a bit on what happens internally? From 
> your answer it seems, the program will still produce the correct output at 
> the end but it'll use more resources. 
> 
> On Mon, Apr 15, 2019 at 9:00 AM Nathan Hjelm via devel 
> mailto:devel@lists.open-mpi.org>> wrote:
> If you do that it may run out of resources and deadlock or crash. I recommend 
> either 1) adding a barrier every 100 iterations, 2) using allreduce, or 3) 
> enable coll/sync (which essentially does 1). Honestly, 2 is probably the 
> easiest option and depending on how large you run may not be any slower than 
> 1 or 3.
> 
> -Nathan
> 
> > On Apr 15, 2019, at 9:53 AM, Saliya Ekanayake  > > wrote:
> > 
> > Hi Devs,
> > 
> > When doing MPI_Reduce in a loop (collecting on Rank 0), is it the correct 
> > understanding that ranks other than root (0 in this case) will pass the 
> > collective as soon as their data is written to MPI buffers without waiting 
> > for all of them to be received at the root?
> > 
> > If that's the case then what would happen (semantically) if we execute 
> > MPI_Reduce in a loop without a barrier allowing non-root ranks to hit the 
> > collective multiple times while the root will be processing an earlier 
> > reduce? For example, the root can be in the first reduce invocation, while 
> > another rank is in the second the reduce invocation.
> > 
> > Thank you,
> > Saliya
> > 
> > -- 
> > Saliya Ekanayake, Ph.D
> > Postdoctoral Scholar
> > Performance and Algorithms Research (PAR) Group
> > Lawrence Berkeley National Laboratory
> > Phone: 510-486-5772
> > 
> > ___
> > devel mailing list
> > devel@lists.open-mpi.org 
> > https://lists.open-mpi.org/mailman/listinfo/devel 
> > 
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org 
> https://lists.open-mpi.org/mailman/listinfo/devel 
> 
> 
> 
> -- 
> Saliya Ekanayake, Ph.D
> Postdoctoral Scholar
> Performance and Algorithms Research (PAR) Group
> Lawrence Berkeley National Laboratory
> Phone: 510-486-5772
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] MPI Reduce Without a Barrier

2019-04-15 Thread Ralph H Castain
There is a coll/sync component that will automatically inject those barriers 
for you so you don’t have to add them to your code. Controlled by MCA param:

coll_sync_barrier_before: Do a synchronization before each Nth collective

coll_sync_barrier_after: Do a synchronization after each Nth collective

Ralph


> On Apr 15, 2019, at 8:59 AM, Nathan Hjelm via devel 
>  wrote:
> 
> If you do that it may run out of resources and deadlock or crash. I recommend 
> either 1) adding a barrier every 100 iterations, 2) using allreduce, or 3) 
> enable coll/sync (which essentially does 1). Honestly, 2 is probably the 
> easiest option and depending on how large you run may not be any slower than 
> 1 or 3.
> 
> -Nathan
> 
>> On Apr 15, 2019, at 9:53 AM, Saliya Ekanayake  wrote:
>> 
>> Hi Devs,
>> 
>> When doing MPI_Reduce in a loop (collecting on Rank 0), is it the correct 
>> understanding that ranks other than root (0 in this case) will pass the 
>> collective as soon as their data is written to MPI buffers without waiting 
>> for all of them to be received at the root?
>> 
>> If that's the case then what would happen (semantically) if we execute 
>> MPI_Reduce in a loop without a barrier allowing non-root ranks to hit the 
>> collective multiple times while the root will be processing an earlier 
>> reduce? For example, the root can be in the first reduce invocation, while 
>> another rank is in the second the reduce invocation.
>> 
>> Thank you,
>> Saliya
>> 
>> -- 
>> Saliya Ekanayake, Ph.D
>> Postdoctoral Scholar
>> Performance and Algorithms Research (PAR) Group
>> Lawrence Berkeley National Laboratory
>> Phone: 510-486-5772
>> 
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/devel
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [MTT devel] Documentation update of gh-pages failed

2019-02-28 Thread Ralph H Castain
Travis indicates they are having a “minor service outage”, so I’m guessing that 
is the cause of the problem.


> On Feb 28, 2019, at 7:47 AM, Rezanka, Deb via mtt-devel 
>  wrote:
> 
> I have a cleaned up version of gh-pages on my repo ready to go but I don’t 
> know how to do a pull request for a specific branch like that.  It has not 
> been process by travis yet. 
>  
> The problem is a regular pull request that altered the documentation and 
> should have caused travis to rebuild and deploy the changes to gh-pages. For 
> some reason the $GH_TOKEN (see scripts/deploy.sh) seems to be invalid.  This 
> is run on the Travis CI build so I don’t know how to check that. Can you help?
> deb
>  
>  
> From: mtt-devel  on behalf of Ralph H 
> Castain 
> Reply-To: Development list for the MPI Testing Tool 
> 
> Date: Thursday, February 28, 2019 at 8:42 AM
> To: Development list for the MPI Testing Tool 
> Subject: Re: [MTT devel] Documentation update of gh-pages failed
>  
> I apologize - I haven’t been closely following this. Did you do some kind of 
> “git rm -rf” on the contents of gh-pages? I don’t see anything in the commit 
> history for that branch other than a last auto-update 2 days ago.
>  
> 
> 
> On Feb 28, 2019, at 7:34 AM, Rezanka, Deb via mtt-devel 
> mailto:mtt-devel@lists.open-mpi.org>> wrote:
>  
> Hi,  
>  
> the travis deployment of updated documentation failed with:
>  
> Cloning into 'gh-pages'...
> PUSHING CHANGES
> [gh-pages 408b555] Deploy updated open-mpi/mtt to gh-pages
> 409 files changed, 991 insertions(+), 581 deletions(-)
> rewrite pages/user_guide.md (100%)
> remote: Anonymous access to open-mpi/mtt.git denied.
> fatal: Authentication failed for 'https://@github.com/open-mpi/mtt.git/ 
> <https://github.com/open-mpi/mtt.git/>'
> Script failed with status 128
>  
> Does anyone know why the Authentication would have failed? Nothing changed in 
> the .travis.yml file.
>  
> Deb Rezanka
> LANL
> ___
> mtt-devel mailing list
> mtt-devel@lists.open-mpi.org <mailto:mtt-devel@lists.open-mpi.org>
> https://lists.open-mpi.org/mailman/listinfo/mtt-devel 
> <https://lists.open-mpi.org/mailman/listinfo/mtt-devel>
>  
> ___
> mtt-devel mailing list
> mtt-devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/mtt-devel

___
mtt-devel mailing list
mtt-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/mtt-devel

Re: [MTT devel] Documentation update of gh-pages failed

2019-02-28 Thread Ralph H Castain
I apologize - I haven’t been closely following this. Did you do some kind of 
“git rm -rf” on the contents of gh-pages? I don’t see anything in the commit 
history for that branch other than a last auto-update 2 days ago.


> On Feb 28, 2019, at 7:34 AM, Rezanka, Deb via mtt-devel 
>  wrote:
> 
> Hi,  
>  
> the travis deployment of updated documentation failed with:
>  
> Cloning into 'gh-pages'...
> PUSHING CHANGES
> [gh-pages 408b555] Deploy updated open-mpi/mtt to gh-pages
> 409 files changed, 991 insertions(+), 581 deletions(-)
> rewrite pages/user_guide.md (100%)
> remote: Anonymous access to open-mpi/mtt.git denied.
> fatal: Authentication failed for 'https://@github.com/open-mpi/mtt.git/ 
> '
> Script failed with status 128
>  
> Does anyone know why the Authentication would have failed? Nothing changed in 
> the .travis.yml file.
>  
> Deb Rezanka
> LANL
> ___
> mtt-devel mailing list
> mtt-devel@lists.open-mpi.org 
> https://lists.open-mpi.org/mailman/listinfo/mtt-devel 
> 
___
mtt-devel mailing list
mtt-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/mtt-devel

Re: [OMPI devel] Gentle reminder: sign up for the face to face

2019-02-26 Thread Ralph H Castain
Done!

> On Feb 26, 2019, at 8:33 AM, Brice Goglin  wrote:
> 
> Hello Jeff
> 
> Looks like I am not allowed to modify the page but I'll be at the meeting ;)
> 
> Brice
> 
> 
> 
> Le 26/02/2019 à 17:13, Jeff Squyres (jsquyres) via devel a écrit :
>> Gentle reminder to please sign up for the face-to-face meeting and add your 
>> items to the wiki:
>> 
>>https://github.com/open-mpi/ompi/wiki/Meeting-2019-04
>> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] rml/ofi component broken in v4.0.x and v3.1.x

2019-02-14 Thread Ralph H Castain
I would recommend just removing it - frankly, I’m surprised it is in there as 
the code was deemed non-production-ready.


> On Feb 14, 2019, at 5:11 PM, Gilles Gouaillardet  wrote:
> 
> Folks,
> 
> 
> The rml/ofi component has been removed from master.
> 
> Then common/ofi was later removed from master and mtl/ofi configury component 
> was revamped not to depend on common/ofi configury stuff.
> 
> Only the latter change was backported to the release branches.
> 
> The issue is that rml/ofi is still in v4.0.x and v3.1.x (it has never been in 
> v3.0.x) and is broken since its configury still relies on
> 
> (now removed) common/ofi configury.
> 
> 
> I guess the right fix is to update the rml/ofi configury in the release 
> branches, but do we really care ?
> 
> If not, can we simply remove the rml/ofi component (e.g. cherry-pick 
> https://github.com/open-mpi/ompi/commit/8794077520b4b4ae86be3a09cfc1abbf7bcab8ad)
>  ?
> 
> 
> Cheers,
> 
> 
> Gilles
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] [OMPI users] open-mpi.org is DOWN

2018-12-23 Thread Ralph H Castain
The security scanner has apologized for a false positive and fixed their system 
- the site has been restored.

Ralph


> On Dec 22, 2018, at 12:12 PM, Ralph H Castain  wrote:
> 
> Hello all
> 
> Apologies to everyone, but I received an alert this moring that malware has 
> been detected on the www.open-mpi.org site. I have tried to contact the 
> hosting agency and the security scanners, but nobody is around on this 
> pre-holiday weekend.
> 
> Accordingly, I have taken the site OFFLINE for the indeterminate future until 
> we can get this resolved. Sadly, with the holidays upon us, I don’t know how 
> long it will take to get responses from either company. Until we do, the site 
> will remain offline for safety reasons.
> 
> Ralph
> 
> ___
> users mailing list
> us...@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] open-mpi.org is DOWN

2018-12-22 Thread Ralph H Castain
Hello all

Apologies to everyone, but I received an alert this moring that malware has 
been detected on the www.open-mpi.org site. I have tried to contact the hosting 
agency and the security scanners, but nobody is around on this pre-holiday 
weekend.

Accordingly, I have taken the site OFFLINE for the indeterminate future until 
we can get this resolved. Sadly, with the holidays upon us, I don’t know how 
long it will take to get responses from either company. Until we do, the site 
will remain offline for safety reasons.

Ralph

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] PMIx v3.0 Standard released

2018-12-20 Thread Ralph H Castain
The PMIx community, representing a consortium of research, academic, and 
industry partners, is pleased to announce the release of the PMIx v3.0 Standard 
document. The document can be obtained from:

* the PMIx website at 
https://pmix.org/wp-content/uploads/2018/12/pmix-standard-3.0.pdf

* the PMIx Standard repository at 
https://github.com/pmix/pmix-standard/releases/tag/v3.0

* by cloning the PMIx Standard repository and generating the document yourself. 
The source can be obtained from https://github.com/pmix/pmix-standard 
 by selecting the “v3” branch. 
Generating the document requires installation of the LaTex publishing system.

Please see below for a brief summary of the release notes. This release brings 
the Standard up to date with prior PMIx Reference Implementation (PRI) 
releases. Going forward, releases of the PRI will be timed to coincide (or 
shortly trail) releases of the corresponding Standard document. The current 
roadmap (with target schedule)  includes:

PMIx v4.0: first half of 2019
* Completion of the tool/debugger support, including a new chapter that 
specifically addresses how to create PMIx-based tools
* Description of the new PMIx Group and Process Set support
* Completion of the fabric support, including new server APIs for accessing 
fabric topology information in support of scheduling operations

PMIx v5.0: second half of 2019
* Initial work on storage integration APIs
* Introduction of Python bindings

Ralph

*
PMIx v3.0 Release Notes

Initial release of version 3.0 of the PMIx Standard. Additions/changes from 
version 2.1 include:

The following APIs were introduced in v3.0 of the PMIx Standard:
* Client APIs
* PMIx_Log , PMIx_Job_control
* PMIx_Allocation_request , PMIx_Process_monitor
* PMIx_Get_credential , PMIx_Validate_credential
 * Server APIs
* PMIx_server_IOF_deliver
* PMIx_server_collect_inventory , PMIx_server_deliver_inventory
* Tool APIs
* PMIx_IOF_pull , PMIx_IOF_push , PMIx_IOF_deregister
* PMIx_tool_connect_to_server
* Common APIs
* PMIx_IOF_channel_string

The document added a chapter on security credentials, a new section for 
Input/Output (IO) forwarding to the Process Management chapter, and a few 
blocking forms of previously-existing non-blocking APIs. Attributes supporting 
the new APIs were introduced, as well as additional attributes for a few 
existing functions.

As always, creation of this release of the Standard required a great deal of 
work on the part of a number of people. We invite you to read the 
Acknowledgement section for a list of those who contributed to the Standard in 
terms of the included definitions, functional concepts, and/or authorship. Our 
thanks go out to all.

Please provide comments on the PMIx Standard by filing issues on the document 
repository https://github.com/pmix/pmix-standard/issues 
 or by sending them to the PMIx 
Community mailing list at https://groups.google.com/forum/#!forum/pmix 
. Comments should include the 
version of the PMIx standard you are commenting about, and the page, section, 
and line numbers that you are referencing. As a reminder, please note that 
messages sent to the mailing list from an unsubscribed e-mail address will be 
ignored.

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] OMPI and PRRTE separated

2018-12-17 Thread Ralph H Castain
FYI: I have deleted all the old OMPI tags from PRRTE, so we have a clean repo 
to work with now.


> On Dec 17, 2018, at 5:58 PM, Ralph H Castain  wrote:
> 
> Hello all
> 
> For those of you working with ORTE and/or PRRTE, GitHub has severed the 
> parent/child relationship between the OMPI and PRRTE repositories. Thus, we 
> will no longer be able to directly “pull” changes made to ORTE downstream 
> into PRRTE.
> 
> This marks the end of direct support for ORTE except in release branches as 
> people have time and inclination. We invite people to instead work in PRRTE 
> on any future-facing features.
> 
> The question of what to do about OPAL remains under discussion as a 
> much-reduced copy of it currently resides in PRRTE.
> Ralph
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] OMPI and PRRTE separated

2018-12-17 Thread Ralph H Castain
Hello all

For those of you working with ORTE and/or PRRTE, GitHub has severed the 
parent/child relationship between the OMPI and PRRTE repositories. Thus, we 
will no longer be able to directly “pull” changes made to ORTE downstream into 
PRRTE.

This marks the end of direct support for ORTE except in release branches as 
people have time and inclination. We invite people to instead work in PRRTE on 
any future-facing features.

The question of what to do about OPAL remains under discussion as a 
much-reduced copy of it currently resides in PRRTE.
Ralph

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] PMIx v2.1 Standard released

2018-12-06 Thread Ralph H Castain
The PMIx community, representing a consortium of research, academic, and 
industry partners, is pleased to announce the release of the PMIx v2.1 Standard 
document. The document can be obtained from:

* the PMIx website at 
https://pmix.org/wp-content/uploads/2018/12/pmix-standard-2.1.pdf 


* the PMIx Standard repository at 
https://github.com/pmix/pmix-standard/releases/tag/v2.1 


* by cloning the PMIx Standard repository and generating the document yourself. 
The source can be obtained from https://github.com/pmix/pmix-standard 
 by selecting the “v2” branch. 
Generating the document requires installation of the LaTex publishing system.

The v2.1 update includes clarifications and corrections, plus addition of 
examples:

* Clarify description of PMIx_Connect and PMIx_Disconnect APIs.
* Explain that values for the PMIX_COLLECTIVE_ALGO are environment-dependent
* Identify the namespace/rank values required for retrieving 
attribute-associated information using the PMIx_Get API
* Provide definitions for session, job, application, and other terms used 
throughout the document
* Clarify definitions of PMIX_UNIV_SIZE versus PMIX_JOB_SIZE
* Clarify server module function return values
* Provide examples of the use of PMIx_Get for retrieval of information
* Clarify the use of PMIx_Get versus PMIx_Query_info_nb
* Clarify return values for non-blocking APIs and emphasize that callback 
functions must not be invoked prior to return from the API
* Provide detailed example for construction of the PMIx_server_register_nspace 
input information array
* Define information levels (e.g., session vs job) and associated attributes 
for both storing and retrieving values
* Clarify roles of PMIx server library and host environment for collective 
operations
* Clarify definition of PMIX_UNIV_SIZE


As always, creation of this release of the Standard required a great deal of 
work on the part of a number of people. We invite you to read the 
Acknowledgement section for a list of those who contributed to the Standard in 
terms of the included definitions, functional concepts, and/or authorship. Our 
thanks go out to all.

Please provide comments on the PMIx Standard by filing issues on the document 
repository \url{https://github.com/pmix/pmix-standard/issues 
} or by sending them to the PMIx 
Community mailing list at \url{https://groups.google.com/forum/#!forum/pmix 
}. Comments should include the 
version of the PMIx standard you are commenting about, and the page, section, 
and line numbers that you are referencing. As a reminder, please note that 
messages sent to the mailing list from an unsubscribed e-mail address will be 
ignored.

Ralph
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] PRRTE v3.0.0rc1 available for testing

2018-11-28 Thread Ralph H Castain
Hi folks

Given a growing use of PRRTE plus OMPI’s announced plans to phase out ORTE in 
favor of PRRTE, it seems the time has come to begin generating formal releases 
of PRRTE. Accordingly, I have created a v3.0.0 release candidate for folks to 
(hopefully) test:

https://github.com/pmix/prrte/releases/tag/prrte-v3.0.0rc1

Note that the first number in the version triplet represents the PMIx level 
being supported - in this case, the release candidate supports PMIx v3 of the 
Standard. I know we haven’t yet released that formal document, but it is 
available in PR form and should be released fairly soon.

This release candidate actually includes support for some of the 
under-development PMIx v4 APIs. However, the version triplet is based on what 
we consider to be the “production” level of PRRTE, and the PMIx v4 support 
included in the current code is truly at the prototype level. Those who want to 
experiment with those interfaces are welcome to build this release against the 
PMIx master branch (which is at the cutting edge of PMIx v4 development) and 
work with them. Now that we have started formal releases, we’ll do a better job 
of branching early prior to introducing new APIs so we won’t have this mix 
again.

Note that PRRTE contains _no_ embedded code. Thus, building it requires that 
you have libevent, hwloc, and some version of PMIx installed - if not in 
standard locations, you’ll need to use the configure arguments to point to 
them. Any version of PMIx that is at least v2.0 is supported - PRRTE does _not_ 
support the PMIx v1 series.

Please give this a whirl and post any comments/bugs to the github issues.
Ralph

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Hints for using an own pmix server

2018-10-18 Thread Ralph H Castain


> On Oct 17, 2018, at 3:32 AM, Stephan Krempel  wrote:
> 
> 
> Hi Ralph.
> 
 One point that remains open and is interesting for me is if I can
 achieve the same with the 3.1.2 release of OpenMPI. Is it somehow
 possible to configure it as there were the "--with-ompi-pmix-rte"
 switch from version 4.x?
>>> 
>>> I’m afraid we didn’t backport that capability to the v3.x branches.
>>> I’ll ask the relevant release managers if they’d like us to do so.
>> 
>> I checked and we will not be backporting this to the v3.x series. It
>> will begin with v4.x.
> 
> Thanks for checking out. I need to check with our users if supporting
> OpenMPI 4 will be sufficient for them, else for sure I will come back
> soon with some more questions regarding how to manage supporting
> OpenMPI 3.

If it becomes an issue, I can probably provide a patch for OMPI v3 that you 
could locally install

> 
> Thank you again for the assistance.
> 
> Best regards
> 
> Stephan
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [MTT devel] complaints about github pages being generated with every PR

2018-10-17 Thread Ralph H Castain
Tell the “certain individual” to get over it :-)

We’ve discussed this several times and it isn’t a simple fix. What we want is 
to regenerate when something actually changes in the generated files, but we 
don’t have a good way of doing it.

I’m sure we would be happy to see a PR from that “certain individual”  :-)
Ralph


> On Oct 17, 2018, at 2:50 PM, Howard Pritchard  wrote:
> 
> Hi Folks,
> 
> A certain individual is complaining about the fact that MTT repo currently
> is set to have github pages updates following every commit.
> 
> I suspect fixing this requires intervention by someone with admin rights on 
> the repo.
> 
> Could we have this feature disabled?
> 
> Howard
> 
> ___
> mtt-devel mailing list
> mtt-devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/mtt-devel

___
mtt-devel mailing list
mtt-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/mtt-devel

[OMPI devel] SC'18 PMIx BoF meeting

2018-10-15 Thread Ralph H Castain
Hello all

[I’m sharing this on the OMPI mailing lists (as well as the PMIx one) as PMIx 
has become tightly integrated to the OMPI code since v2.0 was released]

The PMIx Community will once again be hosting a Birds-of-a-Feather meeting at 
SuperComputing. This year, however, will be a little different! PMIx has come a 
long, long way over the last four years, and we are starting to see 
application-level adoption of the various APIs. Accordingly, we will be 
devoting most of this year’s meeting to a tutorial-like review of several 
use-cases, including:

* fault-tolerant OpenSHMEM implementation
* interlibrary resource coordination using OpenMP and MPI
* population modeling and swarm intelligence models running natively in an HPC 
environment
* use of the PMIx_Query interface

The meeting has been shifted to Wed night, 5:15-6:45pm, in room C144. Please 
share this with others who you feel might be interested, and do plan to attend!
Ralph

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Hints for using an own pmix server

2018-10-14 Thread Ralph H Castain


> On Oct 12, 2018, at 6:15 AM, Ralph H Castain  wrote:
> 
>> One point that remains open and is interesting for me is if I can
>> achieve the same with the 3.1.2 release of OpenMPI. Is it somehow
>> possible to configure it as there were the "--with-ompi-pmix-rte"
>> switch from version 4.x?
> 
> I’m afraid we didn’t backport that capability to the v3.x branches. I’ll ask 
> the relevant release managers if they’d like us to do so.

I checked and we will not be backporting this to the v3.x series. It will begin 
with v4.x.

Ralph

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Hints for using an own pmix server

2018-10-12 Thread Ralph H Castain
I took a look at the following:

>> A remark to pmix at this point: pmix_bfrops_base_value_load() does
>> silently not handle PMIX_DATA_ARRAY type leading to not working makros
>> PMIX_VALUE_LOAD and PMIX_INFO_LOAD with that type. I think this is
>> unlucky and took me a while to figure out why it comes to a segfault
>> when pmix tried to process my PMIX_PROC_DATA infos.

It appears that this was true in the v2.x release series, but has since been 
fixed - thus, the v3.x series is okay. I’ll backport the support to the v2.x 
for their next releases.

Thanks for point it out!
Ralph

> On Oct 12, 2018, at 6:15 AM, Ralph H Castain  wrote:
> 
> Hi Stephan
> 
> 
>> On Oct 12, 2018, at 2:25 AM, Stephan Krempel > <mailto:krem...@par-tec.com>> wrote:
>> 
>> Hallo Ralph,
>> 
>>> I assume this (--with-ompi-mpix-rte) is a typo as the correct option
>>> is —with-ompi-pmix-rte?
>> 
>> You were right, this was a typo, with the correct option I now managed
>> to start an MPI helloworld program using OpenMPI and our own process
>> manager with pmix server.
> 
> Hooray! If you want me to show support for your PM on our web site, please 
> send me a little info about it. You are welcome to send it off-list if you 
> prefer.
> 
>> 
>>> It all looks okay to me for the client, but I wonder if you
>>> remembered to call register_nspace and register_client on your server
>>> prior to starting the client? If not, the connection will be dropped
>>> - you could add PMIX_MCA_ptl_base_verbose=100 to your environment to
>>> see the detailed connection handshake.
>> 
>> This has been a point that I could finally figure out from the prrte
>> code. To make it working you do not only need to call register_nspace
>> but also pass some specific information to it that OpenMPI considers to
>> be available (e.g. proc info with lrank).
> 
> My apologies - we will document this better on the PMIx web site and provide 
> some link to it on the OMPI web site. We actually do publish the info OMPI is 
> expecting, but it isn’t in an obvious enough place.
> 
>> 
>> A remark to pmix at this point: pmix_bfrops_base_value_load() does
>> silently not handle PMIX_DATA_ARRAY type leading to not working makros
>> PMIX_VALUE_LOAD and PMIX_INFO_LOAD with that type. I think this is
>> unlucky and took me a while to figure out why it comes to a segfault
>> when pmix tried to process my PMIX_PROC_DATA infos.
> 
> I’ll check that out - I don’t know why we wouldn’t handle it, so it is likely 
> just an oversight. Regardless, it should return an error if it isn’t doing it.
> 
>> 
>> So thank you again for your help so far.
>> 
>> 
>> One point that remains open and is interesting for me is if I can
>> achieve the same with the 3.1.2 release of OpenMPI. Is it somehow
>> possible to configure it as there were the "--with-ompi-pmix-rte"
>> switch from version 4.x?
> 
> I’m afraid we didn’t backport that capability to the v3.x branches. I’ll ask 
> the relevant release managers if they’d like us to do so.
> 
> Ralph
> 
>> 
>> Regards,
>> 
>> Stephan
>> 
>> 
>>> 
>>>> On Oct 9, 2018, at 3:14 PM, Stephan Krempel >>> <mailto:krem...@par-tec.com>>
>>>> wrote:
>>>> 
>>>> Hi Ralf,
>>>> 
>>>> After studying prrte a little bit, I tried something new and
>>>> followed
>>>> the description here using openmpi 4:
>>>> https://pmix.org/code/building-the-pmix-reference-server/ 
>>>> <https://pmix.org/code/building-the-pmix-reference-server/>
>>>> 
>>>> I configured openmpi 4.0.0rc3:
>>>> 
>>>> ../configure --enable-debug --prefix [...] --with-pmix=[...] \
>>>>  --with-libevent=/usr --with-ompi-mpix-rte
>>>> 
>>>> (I also tried to set --with-orte=no, but it then claims not to have
>>>> a
>>>> suitable rte and does not finish)
>>>> 
>>>> I then started my own PMIx and spawned a client compiled with mpicc
>>>> of
>>>> the new openmpi installation with this environment:
>>>> 
>>>> PMIX_NAMESPACE=namespace_3228_0_0
>>>> PMIX_RANK=0
>>>> PMIX_SERVER_URI2=pmix-server.3234;tcp4://127.0.0.1:49637
>>>> PMIX_SERVER_URI21=pmix-server.3234;tcp4://127.0.0.1:49637
>>>> PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
>>>> PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
>>>> PMIX_SECURITY_MODE=native,none
&

Re: [OMPI devel] Hints for using an own pmix server

2018-10-12 Thread Ralph H Castain
Hi Stephan


> On Oct 12, 2018, at 2:25 AM, Stephan Krempel  wrote:
> 
> Hallo Ralph,
> 
>> I assume this (--with-ompi-mpix-rte) is a typo as the correct option
>> is —with-ompi-pmix-rte?
> 
> You were right, this was a typo, with the correct option I now managed
> to start an MPI helloworld program using OpenMPI and our own process
> manager with pmix server.

Hooray! If you want me to show support for your PM on our web site, please send 
me a little info about it. You are welcome to send it off-list if you prefer.

> 
>> It all looks okay to me for the client, but I wonder if you
>> remembered to call register_nspace and register_client on your server
>> prior to starting the client? If not, the connection will be dropped
>> - you could add PMIX_MCA_ptl_base_verbose=100 to your environment to
>> see the detailed connection handshake.
> 
> This has been a point that I could finally figure out from the prrte
> code. To make it working you do not only need to call register_nspace
> but also pass some specific information to it that OpenMPI considers to
> be available (e.g. proc info with lrank).

My apologies - we will document this better on the PMIx web site and provide 
some link to it on the OMPI web site. We actually do publish the info OMPI is 
expecting, but it isn’t in an obvious enough place.

> 
> A remark to pmix at this point: pmix_bfrops_base_value_load() does
> silently not handle PMIX_DATA_ARRAY type leading to not working makros
> PMIX_VALUE_LOAD and PMIX_INFO_LOAD with that type. I think this is
> unlucky and took me a while to figure out why it comes to a segfault
> when pmix tried to process my PMIX_PROC_DATA infos.

I’ll check that out - I don’t know why we wouldn’t handle it, so it is likely 
just an oversight. Regardless, it should return an error if it isn’t doing it.

> 
> So thank you again for your help so far.
> 
> 
> One point that remains open and is interesting for me is if I can
> achieve the same with the 3.1.2 release of OpenMPI. Is it somehow
> possible to configure it as there were the "--with-ompi-pmix-rte"
> switch from version 4.x?

I’m afraid we didn’t backport that capability to the v3.x branches. I’ll ask 
the relevant release managers if they’d like us to do so.

Ralph

> 
> Regards,
> 
> Stephan
> 
> 
>> 
>>> On Oct 9, 2018, at 3:14 PM, Stephan Krempel 
>>> wrote:
>>> 
>>> Hi Ralf,
>>> 
>>> After studying prrte a little bit, I tried something new and
>>> followed
>>> the description here using openmpi 4:
>>> https://pmix.org/code/building-the-pmix-reference-server/
>>> 
>>> I configured openmpi 4.0.0rc3:
>>> 
>>> ../configure --enable-debug --prefix [...] --with-pmix=[...] \
>>>  --with-libevent=/usr --with-ompi-mpix-rte
>>> 
>>> (I also tried to set --with-orte=no, but it then claims not to have
>>> a
>>> suitable rte and does not finish)
>>> 
>>> I then started my own PMIx and spawned a client compiled with mpicc
>>> of
>>> the new openmpi installation with this environment:
>>> 
>>> PMIX_NAMESPACE=namespace_3228_0_0
>>> PMIX_RANK=0
>>> PMIX_SERVER_URI2=pmix-server.3234;tcp4://127.0.0.1:49637
>>> PMIX_SERVER_URI21=pmix-server.3234;tcp4://127.0.0.1:49637
>>> PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
>>> PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
>>> PMIX_SECURITY_MODE=native,none
>>> PMIX_PTL_MODULE=tcp,usock
>>> PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
>>> PMIX_GDS_MODULE=ds12,hash
>>> PMIX_DSTORE_ESH_BASE_PATH=/tmp/pmix_dstor_3234
>>> 
>>> The client is not connecting to my pmix server and it's environment
>>> after MPI_Init looks like that:
>>> 
>>> PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
>>> PMIX_RANK=0
>>> PMIX_PTL_MODULE=tcp,usock
>>> PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
>>> PMIX_MCA_mca_base_component_show_load_errors=1
>>> PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
>>> PMIX_DSTORE_ESH_BASE_PATH=/tmp/ompi.landroval.1001/pid.3243/pmix_ds
>>> tor_
>>> 3243
>>> PMIX_SERVER_URI2=864157696.0;tcp4://127.0.0.1:33619
>>> PMIX_SERVER_URI21=864157696.0;tcp4://127.0.0.1:33619
>>> PMIX_SECURITY_MODE=native,none
>>> PMIX_NAMESPACE=864157697
>>> PMIX_GDS_MODULE=ds12,hash
>>> ORTE_SCHIZO_DETECTION=ORTE
>>> OMPI_COMMAND=./hello_env
>>> OMPI_MCA_orte_precondition_transports=f28d6577f6b6ac08-
>>> d92c0e73869e1cfa
>>> OMPI_MCA_orte_launch=1
>>&

Re: [OMPI devel] Hints for using an own pmix server

2018-10-09 Thread Ralph H Castain
I assume this (--with-ompi-mpix-rte) is a typo as the correct option is 
—with-ompi-pmix-rte?

It all looks okay to me for the client, but I wonder if you remembered to call 
register_nspace and register_client on your server prior to starting the 
client? If not, the connection will be dropped - you could add 
PMIX_MCA_ptl_base_verbose=100 to your environment to see the detailed 
connection handshake.

> On Oct 9, 2018, at 3:14 PM, Stephan Krempel  wrote:
> 
> Hi Ralf,
> 
> After studying prrte a little bit, I tried something new and followed
> the description here using openmpi 4:
> https://pmix.org/code/building-the-pmix-reference-server/
> 
> I configured openmpi 4.0.0rc3:
> 
> ../configure --enable-debug --prefix [...] --with-pmix=[...] \
>  --with-libevent=/usr --with-ompi-mpix-rte
> 
> (I also tried to set --with-orte=no, but it then claims not to have a
> suitable rte and does not finish)
> 
> I then started my own PMIx and spawned a client compiled with mpicc of
> the new openmpi installation with this environment:
> 
> PMIX_NAMESPACE=namespace_3228_0_0
> PMIX_RANK=0
> PMIX_SERVER_URI2=pmix-server.3234;tcp4://127.0.0.1:49637
> PMIX_SERVER_URI21=pmix-server.3234;tcp4://127.0.0.1:49637
> PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
> PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
> PMIX_SECURITY_MODE=native,none
> PMIX_PTL_MODULE=tcp,usock
> PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
> PMIX_GDS_MODULE=ds12,hash
> PMIX_DSTORE_ESH_BASE_PATH=/tmp/pmix_dstor_3234
> 
> The client is not connecting to my pmix server and it's environment
> after MPI_Init looks like that:
> 
> PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
> PMIX_RANK=0
> PMIX_PTL_MODULE=tcp,usock
> PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
> PMIX_MCA_mca_base_component_show_load_errors=1
> PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
> PMIX_DSTORE_ESH_BASE_PATH=/tmp/ompi.landroval.1001/pid.3243/pmix_dstor_
> 3243
> PMIX_SERVER_URI2=864157696.0;tcp4://127.0.0.1:33619
> PMIX_SERVER_URI21=864157696.0;tcp4://127.0.0.1:33619
> PMIX_SECURITY_MODE=native,none
> PMIX_NAMESPACE=864157697
> PMIX_GDS_MODULE=ds12,hash
> ORTE_SCHIZO_DETECTION=ORTE
> OMPI_COMMAND=./hello_env
> OMPI_MCA_orte_precondition_transports=f28d6577f6b6ac08-d92c0e73869e1cfa
> OMPI_MCA_orte_launch=1
> OMPI_APP_CTX_NUM_PROCS=1
> OMPI_MCA_pmix=^s1,s2,cray,isolated
> OMPI_MCA_ess=singleton
> OMPI_MCA_orte_ess_num_procs=1
> 
> So something goes wrong but I do not have an idea what I am missing. Do
> you have an idea what I need to change? Do I have to set an MCA
> parameter to tell OpenMPI not to start orted, or does it need another
> hint in the client environment beside the stuff comming from the PMIx
> server helper library?
> 
> 
> Stephan
> 
> 
> On Tuesday, Oct 10 2018, 08:33 -0700 Ralph H Castain wrote:
>> Hi Stephan
>> 
>> Thanks for the clarification - that helps a great deal. You are
>> correct that OMPI’s orted daemons do more than just host the PMIx
>> server library. However, they are only active if you launch the OMPI
>> processes using mpirun. This is probably the source of the trouble
>> you are seeing.
>> 
>> Since you have a process launcher and have integrated the PMIx server
>> support into your RM’s daemons, you really have no need for mpirun at
>> all. You should just be able to launch the processes directly using
>> your own launcher. The PMIx support will take care of the startup
>> requirements. The application procs will not use the orted in such
>> cases.
>> 
>> So if your system is working fine with the PMIx example programs,
>> then just launch the OMPI apps the same way and it should just work.
>> 
>> On the Slurm side: I’m surprised that it doesn’t work without the
>> —with-slurm option. An application proc doesn’t care about any of the
>> Slurm-related code if PMIx is available. I might have access to a
>> machine where I can check it…
>> 
>> Ralph
>> 
>> 
>>> On Oct 9, 2018, at 3:26 AM, Stephan Krempel 
>>> wrote:
>>> 
>>> Ralph, Gilles,
>>> 
>>> thanks for your input.
>>> 
>>> Before I answer, let me shortly explain what my general intention
>>> is.
>>> We do have our own resource manager and process launcher that
>>> supports
>>> different MPI implementations in different ways. I want to adapt it
>>> to
>>> PMIx to cleanly support OpenMPI and hopefully other MPI
>>> implementation
>>> supporting PMIx in the future, too. 
>>> 
>>>> It sounds like what you really want to do is replace the orted

Re: [OMPI devel] Hints for using an own pmix server

2018-10-09 Thread Ralph H Castain
Hi Stephan

Thanks for the clarification - that helps a great deal. You are correct that 
OMPI’s orted daemons do more than just host the PMIx server library. However, 
they are only active if you launch the OMPI processes using mpirun. This is 
probably the source of the trouble you are seeing.

Since you have a process launcher and have integrated the PMIx server support 
into your RM’s daemons, you really have no need for mpirun at all. You should 
just be able to launch the processes directly using your own launcher. The PMIx 
support will take care of the startup requirements. The application procs will 
not use the orted in such cases.

So if your system is working fine with the PMIx example programs, then just 
launch the OMPI apps the same way and it should just work.

On the Slurm side: I’m surprised that it doesn’t work without the —with-slurm 
option. An application proc doesn’t care about any of the Slurm-related code if 
PMIx is available. I might have access to a machine where I can check it…

Ralph


> On Oct 9, 2018, at 3:26 AM, Stephan Krempel  wrote:
> 
> Ralph, Gilles,
> 
> thanks for your input.
> 
> Before I answer, let me shortly explain what my general intention is.
> We do have our own resource manager and process launcher that supports
> different MPI implementations in different ways. I want to adapt it to
> PMIx to cleanly support OpenMPI and hopefully other MPI implementation
> supporting PMIx in the future, too. 
> 
>> It sounds like what you really want to do is replace the orted, and
>> have your orted open your PMIx server? In other words, you want to
>> use the PMIx reference library to handle all the PMIx stuff, and
>> provide your own backend functions to support the PMIx server calls? 
> 
> You are right, that was my original plan, and I already did it so far.
> In my environment I already can launch processes that successfully call
> PMIx client functions like put, get, fence and so on, all handled by my
> servers using the PMIx server helper library. As far as I implemented
> the server functions now, all the example programs coming with the pmix
> library are working fine.
> 
> Then I tried to use that with OpenMPI and stumbled.
> My first idea was to simply replace orted but after taking a closer
> look into OpenMPI it seems to me, that it uses/needs orted not only for
> spawning and exchange of process information, but also for its general
> communication and collectives. Am I wrong with that?
> 
> So replacing it completely is perhaps not what I want since I do not
> intent to replace OpenMPIs whole communication stuff. But perhaps I do
> mix up orte and orted here, not certain about that.
> 
>> If so, then your best bet would be to edit the PRRTE code in
>> orte/orted/pmix and replace it with your code. You’ll have to deal
>> with the ORTE data objects and PRRTE’s launch procedure, but that is
>> likely easier than trying to write your own version of “orted” from
>> scratch.
> 
> I think one problem here is, that I do not really understand which
> purposes orted fulfills overall especially beside implementing the PMIx
> server side. Can you please give me a short overview?
> 
>> As for Slurm: it behaves the same way as PRRTE. It has a plugin that
>> implements the server backend functions, and the Slurm daemons “host”
>> the plugin. What you would need to do is replace that plugin with
>> your own.
> 
> I understand that, but it also seems to need some special support by
> the several slurm modules on the OpenMPI side that I do not understand,
> yet. At least when I tried OpenMPI without slurm support and
> `srun --mpi=pmix_v2` it does not work but generates a message that
> slurm support in opemmpi is missing.
> 
> 
> 
> Stephan
> 
> 
> 
>> 
>>> On Oct 8, 2018, at 5:36 PM, Gilles Gouaillardet 
>>> wrote:
>>> 
>>> Stephan,
>>> 
>>> 
>>> Have you already checked https://github.com/pmix/prrte ?
>>> 
>>> 
>>> This is the PMIx Reference RunTime Environment (PPRTE), which was
>>> built on top of orted.
>>> 
>>> Long story short, it deploys the PMIx server and then you start
>>> your MPI app with prun
>>> An example is available at https://github.com/pmix/prrte/blob/maste
>>> r/contrib/travis/test_client.sh
>>> 
>>> 
>>> Cheers,
>>> 
>>> Gilles
>>> 
>>> 
>>> On 10/9/2018 8:45 AM, Stephan Krempel wrote:
 Hallo everyone,
 
 I am currently implementing a PMIx server and I try to use it
 with
 OpenMPI. I do have an own mpiexec which starts my PMIx server and
 launches the processes.
 
 If I launch an executable linked against OpenMPI, during
 MPI_Init() the
 ORTE layer starts another PMIx server and overrides my PMIX_*
 environment so this new server is used instead of mine.
 
 So I am looking for a method to prevent orte(d) from starting a
 PMIx
 server.
 
 I already tried to understand what the slurm support is doing,
 since
 this is (at least in parts) what I think I need. Somehow when
 

Re: [OMPI devel] Hints for using an own pmix server

2018-10-08 Thread Ralph H Castain
Even PRRTE won’t allow you to stop the orted from initializing its PMIx server. 
I’m not sure I really understand your objective. Remember, PMIx is just a 
library - the orted opens it and uses it to interface to its client application 
procs. It makes no sense to have some other process perform that role as it 
won’t know any job-level information.

It sounds like what you really want to do is replace the orted, and have your 
orted open your PMIx server? In other words, you want to use the PMIx reference 
library to handle all the PMIx stuff, and provide your own backend functions to 
support the PMIx server calls? If so, then your best bet would be to edit the 
PRRTE code in orte/orted/pmix and replace it with your code. You’ll have to 
deal with the ORTE data objects and PRRTE’s launch procedure, but that is 
likely easier than trying to write your own version of “orted” from scratch.

As for Slurm: it behaves the same way as PRRTE. It has a plugin that implements 
the server backend functions, and the Slurm daemons “host” the plugin. What you 
would need to do is replace that plugin with your own.
Ralph


> On Oct 8, 2018, at 5:36 PM, Gilles Gouaillardet  wrote:
> 
> Stephan,
> 
> 
> Have you already checked https://github.com/pmix/prrte ?
> 
> 
> This is the PMIx Reference RunTime Environment (PPRTE), which was built on 
> top of orted.
> 
> Long story short, it deploys the PMIx server and then you start your MPI app 
> with prun
> An example is available at 
> https://github.com/pmix/prrte/blob/master/contrib/travis/test_client.sh
> 
> 
> Cheers,
> 
> Gilles
> 
> 
> On 10/9/2018 8:45 AM, Stephan Krempel wrote:
>> Hallo everyone,
>> 
>> I am currently implementing a PMIx server and I try to use it with
>> OpenMPI. I do have an own mpiexec which starts my PMIx server and
>> launches the processes.
>> 
>> If I launch an executable linked against OpenMPI, during MPI_Init() the
>> ORTE layer starts another PMIx server and overrides my PMIX_*
>> environment so this new server is used instead of mine.
>> 
>> So I am looking for a method to prevent orte(d) from starting a PMIx
>> server.
>> 
>> I already tried to understand what the slurm support is doing, since
>> this is (at least in parts) what I think I need. Somehow when starting
>> a job with srun --mpi=pmix_v2 the ess module pmi is started, but I was
>> not able to enforce that manually by setting an MCA parameter (oss
>> should be the correct one?!?)
>> And I do not yet have a clue how the slurm support is working.
>> 
>> So does anyone has a hint for me where I can find documentation or
>> information concerning that or is there an easy way to achieve what I
>> am trying to do that I missed?
>> 
>> Thank you in advance.
>> 
>> Regards,
>> 
>> Stephan
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/devel
>> 
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Removing ORTE code

2018-10-02 Thread Ralph H Castain
Based on silence plus today’s telecon, the stale code has been removed: 
https://github.com/open-mpi/ompi/pull/5827


> On Sep 26, 2018, at 7:00 AM, Ralph H Castain  wrote:
> 
> We are considering a “purge” of stale ORTE code and want to know if anyone is 
> using it before proceeding. With the advent of PMIx, several ORTE features 
> are no longer required by OMPI itself. However, we acknowledge that it is 
> possible that someone out there (e.g., a researcher) is using them. The 
> specific features include:
> 
> * OOB use from within an application process. We need to retain the OOB 
> itself for daemon-to-daemon communication. However, the application processes 
> no longer open a connection to their ORTE daemon, instead relying on the PMIx 
> connection to communicate their needs.
> 
> * the DFS framework - allows an application process to access a remote file 
> via ORTE. It provided essentially a function-shipping service that was used 
> by map-reduce applications we no longer support
> 
> * the notifier framework - supported output of messages to syslog and email. 
> PMIx now provides such services if someone wants to use them
> 
> * iof/tool component - we are moving to PMIx for tool support, so there are 
> no ORTE tools using this any more
> 
> We may discover additional candidates for removal as we go forward - we’ll 
> update the list as we do. First, however, we’d really like to hear back from 
> anyone who might have a need for any of the above.
> 
> Please respond by Oct 5th
> Ralph
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] btl/vader: race condition in finalize on OS X

2018-10-02 Thread Ralph H Castain
We already have the register_cleanup option in master - are you using an older 
version of PMIx that doesn’t support it?


> On Oct 2, 2018, at 4:05 AM, Jeff Squyres (jsquyres) via devel 
>  wrote:
> 
> FYI: https://github.com/open-mpi/ompi/issues/5798 brought up what may be the 
> same issue.
> 
> 
>> On Oct 2, 2018, at 3:16 AM, Gilles Gouaillardet  wrote:
>> 
>> Folks,
>> 
>> 
>> When running a simple helloworld program on OS X, we can end up with the 
>> following error message
>> 
>> 
>> A system call failed during shared memory initialization that should
>> not have.  It is likely that your MPI job will now either abort or
>> experience performance degradation.
>> 
>>  Local host:  c7.kmc.kobe.rist.or.jp
>>  System call: unlink(2) 
>> /tmp/ompi.c7.1000/pid.23376/1/vader_segment.c7.17d80001.54
>>  Error:   No such file or directory (errno 2)
>> 
>> 
>> the error does not occur on linux by default since the vader segment is in 
>> /dev/shm by default.
>> 
>> the patch below can be used to evidence the issue on linux
>> 
>> 
>> diff --git a/opal/mca/btl/vader/btl_vader_component.c 
>> b/opal/mca/btl/vader/btl_vader_component.c
>> index 115bceb..80fec05 100644
>> --- a/opal/mca/btl/vader/btl_vader_component.c
>> +++ b/opal/mca/btl/vader/btl_vader_component.c
>> @@ -204,7 +204,7 @@ static int mca_btl_vader_component_register (void)
>>OPAL_INFO_LVL_3, 
>> MCA_BASE_VAR_SCOPE_GROUP, _btl_vader_component.single_copy_mechanism);
>> OBJ_RELEASE(new_enum);
>> 
>> -if (0 == access ("/dev/shm", W_OK)) {
>> +if (0 && 0 == access ("/dev/shm", W_OK)) {
>> mca_btl_vader_component.backing_directory = "/dev/shm";
>> } else {
>> mca_btl_vader_component.backing_directory = 
>> opal_process_info.job_session_dir;
>> 
>> 
>> From my analysis, here is what happens :
>> 
>> - each rank is supposed to have its own vader_segment unlinked by btl/vader 
>> in vader_finalize().
>> 
>> - but this file might have already been destroyed by an other task in 
>> orte_ess_base_app_finalize()
>> 
>>  if (NULL == opal_pmix.register_cleanup) {
>>orte_session_dir_finalize(ORTE_PROC_MY_NAME);
>>}
>> 
>>  *all* the tasks end up removing 
>> opal_os_dirpath_destroy("/tmp/ompi.c7.1000/pid.23941/1")
>> 
>> 
>> I am not really sure about the best way to fix this.
>> 
>> - one option is to perform an intra node barrier in vader_finalize()
>> 
>> - an other option would be to implement an opal_pmix.register_cleanup
>> 
>> 
>> Any thoughts ?
>> 
>> 
>> Cheers,
>> 
>> 
>> Gilles
>> 
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] Error in TCP BTL??

2018-10-01 Thread Ralph H Castain
I’m getting this error when trying to run a simple ring program on my Mac:

[Ralphs-iMac-2.local][[21423,14],0][btl_tcp_endpoint.c:742:mca_btl_tcp_endpoint_start_connect]
 bind() failed: Invalid argument (22)

Anyone recognize the problem? It causes the job to immediately abort. This is 
with current head of master this morning - it was working when I last used it, 
but it has been an unknown period of time.
Ralph

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Mac OS X 10.4.x users?

2018-09-28 Thread Ralph H Castain
Good lord - break away!!

> On Sep 28, 2018, at 11:11 AM, Barrett, Brian via devel 
>  wrote:
> 
> All -
> 
> In trying to clean up some warnings, I noticed one (around pack/unpack in 
> net/if.h) that is due to a workaround of a bug in MacOS X 10.4.x and earlier. 
>  The simple way to remove the warning would be to remove the workaround, 
> which would break the next major version of Open MPI on 10.4.x and earlier on 
> 64 bit systems.  10.5.x was released 11 years ago and didn’t drop support for 
> any 64 bit systems.  I posted a PR which removes support for 10.4.x and 
> earlier (through the README) and removes the warning generated workaround 
> (https://github.com/open-mpi/ompi/pull/5803).
> 
> Does anyone object to breaking 10.4.x and earlier?
> 
> Brian
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] Removing ORTE code

2018-09-26 Thread Ralph H Castain
We are considering a “purge” of stale ORTE code and want to know if anyone is 
using it before proceeding. With the advent of PMIx, several ORTE features are 
no longer required by OMPI itself. However, we acknowledge that it is possible 
that someone out there (e.g., a researcher) is using them. The specific 
features include:

* OOB use from within an application process. We need to retain the OOB itself 
for daemon-to-daemon communication. However, the application processes no 
longer open a connection to their ORTE daemon, instead relying on the PMIx 
connection to communicate their needs.

* the DFS framework - allows an application process to access a remote file via 
ORTE. It provided essentially a function-shipping service that was used by 
map-reduce applications we no longer support

* the notifier framework - supported output of messages to syslog and email. 
PMIx now provides such services if someone wants to use them

* iof/tool component - we are moving to PMIx for tool support, so there are no 
ORTE tools using this any more

We may discover additional candidates for removal as we go forward - we’ll 
update the list as we do. First, however, we’d really like to hear back from 
anyone who might have a need for any of the above.

Please respond by Oct 5th
Ralph

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] OFI issues on Open MPI v4.0.0rc1

2018-09-20 Thread Ralph H Castain
That’s why we are leaving it in master - only removing it from release branch 

Sent from my iPhone

> On Sep 20, 2018, at 7:02 PM, George Bosilca  wrote:
> 
> Why not simply ompi_ignore it ? Removing a component to bring it back later 
> would force us to lose all history. I would a rather add an .ompi_ignore and 
> give an opportunity to power users do continue playing with it.
> 
>   George.
> 
> 
>> On Thu, Sep 20, 2018 at 8:04 PM Ralph H Castain  wrote:
>> I already suggested the configure option, but it doesn’t solve the problem. 
>> I wouldn’t be terribly surprised to find that Cray also has an undetected 
>> problem given the nature of the issue - just a question of the amount of 
>> testing, variety of environments, etc.
>> 
>> Nobody has to wait for the next major release, though that isn’t so far off 
>> anyway - there has never been an issue with bringing in a new component 
>> during a release series.
>> 
>> Let’s just fix this the right way and bring it into 4.1 or 4.2. We may want 
>> to look at fixing the osc/rdma/ofi bandaid as well while we are at it.
>> 
>> Ralph
>> 
>> 
>>> On Sep 20, 2018, at 4:45 PM, Patinyasakdikul, Thananon 
>>>  wrote:
>>> 
>>> I understand and agree with your point. My initial email is just out of 
>>> curiosity.
>>> 
>>> Howard tested this BTL for Cray in the summer as well. So this seems to 
>>> only affected OPA hardware.
>>> 
>>> I just remember that in the summer, I have to make some change in libpsm2 
>>> to get this BTL to work for OPA.  Maybe this is the problem as the default 
>>> libpsm2 won't work.
>>> 
>>> So maybe we can fix this in configure step to detect version of libpsm2 and 
>>> dont build if we are not satisfied.
>>> 
>>> Another idea is maybe we dont build this BTL by default. So the user with 
>>> Cray hardware can still use it if they want. (Just rebuild with the btl)  - 
>>> We just need to verify if it still works on Cray.  This way, OFI 
>>> stakeholders does not have to wait until next major release to get this in.
>>> 
>>> 
>>> Arm
>>> 
>>> 
>>>> On Thu, Sep 20, 2018, 7:18 PM Ralph H Castain  wrote:
>>>> I suspect it is a question of what you tested and in which scenarios. 
>>>> Problem is that it can bite someone and there isn’t a clean/obvious 
>>>> solution that doesn’t require the user to do something - e.g., like having 
>>>> to know that they need to disable a BTL. Matias has proposed an mca-based 
>>>> approach, but I would much rather we just fix this correctly. Bandaids 
>>>> have a habit of becoming permanently forgotten - until someone pulls on it 
>>>> and things unravel.
>>>> 
>>>> 
>>>>> On Sep 20, 2018, at 4:14 PM, Patinyasakdikul, Thananon 
>>>>>  wrote:
>>>>> 
>>>>> In the summer, I tested this BTL with along with the MTL and able to use 
>>>>> both of them interchangeably with no problem. I dont know what changed. 
>>>>> libpsm2?
>>>>> 
>>>>> 
>>>>> Arm
>>>>> 
>>>>> 
>>>>>> On Thu, Sep 20, 2018, 7:06 PM Ralph H Castain  wrote:
>>>>>> We have too many discussion threads overlapping on the same email chain 
>>>>>> - so let’s break the discussion on the OFI problem into its own chain.
>>>>>> 
>>>>>> We have been investigating this locally and found there are a number of 
>>>>>> conflicts between the MTLs and the OFI/BTL stepping on each other. The 
>>>>>> correct solution is to move endpoint creation/reporting into a the 
>>>>>> opal/mca/common area, but that is going to take some work and will 
>>>>>> likely impact release schedules.
>>>>>> 
>>>>>> Accordingly, we propose to remove the OFI/BTL component from v4.0.0, fix 
>>>>>> the problem in master, and then consider bringing it back as a package 
>>>>>> to v4.1 or v4.2.
>>>>>> 
>>>>>> Comments? If we agree, I’ll file a PR to remove it.
>>>>>> Ralph
>>>>>> 
>>>>>> 
>>>>>>> Begin forwarded message:
>>>>>>> 
>>>>>>> From: Peter Kjellström 
>>>>>>> Subject: Re: [OMPI devel] Announcing Open MPI v4.0.0rc1
>>>>>>> Date: Septembe

Re: [OMPI devel] OFI issues on Open MPI v4.0.0rc1

2018-09-20 Thread Ralph H Castain
I already suggested the configure option, but it doesn’t solve the problem. I 
wouldn’t be terribly surprised to find that Cray also has an undetected problem 
given the nature of the issue - just a question of the amount of testing, 
variety of environments, etc.

Nobody has to wait for the next major release, though that isn’t so far off 
anyway - there has never been an issue with bringing in a new component during 
a release series.

Let’s just fix this the right way and bring it into 4.1 or 4.2. We may want to 
look at fixing the osc/rdma/ofi bandaid as well while we are at it.

Ralph


> On Sep 20, 2018, at 4:45 PM, Patinyasakdikul, Thananon 
>  wrote:
> 
> I understand and agree with your point. My initial email is just out of 
> curiosity.
> 
> Howard tested this BTL for Cray in the summer as well. So this seems to only 
> affected OPA hardware.
> 
> I just remember that in the summer, I have to make some change in libpsm2 to 
> get this BTL to work for OPA.  Maybe this is the problem as the default 
> libpsm2 won't work.
> 
> So maybe we can fix this in configure step to detect version of libpsm2 and 
> dont build if we are not satisfied.
> 
> Another idea is maybe we dont build this BTL by default. So the user with 
> Cray hardware can still use it if they want. (Just rebuild with the btl)  - 
> We just need to verify if it still works on Cray.  This way, OFI stakeholders 
> does not have to wait until next major release to get this in.
> 
> 
> Arm
> 
> 
> On Thu, Sep 20, 2018, 7:18 PM Ralph H Castain  <mailto:r...@open-mpi.org>> wrote:
> I suspect it is a question of what you tested and in which scenarios. Problem 
> is that it can bite someone and there isn’t a clean/obvious solution that 
> doesn’t require the user to do something - e.g., like having to know that 
> they need to disable a BTL. Matias has proposed an mca-based approach, but I 
> would much rather we just fix this correctly. Bandaids have a habit of 
> becoming permanently forgotten - until someone pulls on it and things unravel.
> 
> 
>> On Sep 20, 2018, at 4:14 PM, Patinyasakdikul, Thananon 
>> mailto:tpati...@vols.utk.edu>> wrote:
>> 
>> In the summer, I tested this BTL with along with the MTL and able to use 
>> both of them interchangeably with no problem. I dont know what changed. 
>> libpsm2?
>> 
>> 
>> Arm
>> 
>> 
>> On Thu, Sep 20, 2018, 7:06 PM Ralph H Castain > <mailto:r...@open-mpi.org>> wrote:
>> We have too many discussion threads overlapping on the same email chain - so 
>> let’s break the discussion on the OFI problem into its own chain.
>> 
>> We have been investigating this locally and found there are a number of 
>> conflicts between the MTLs and the OFI/BTL stepping on each other. The 
>> correct solution is to move endpoint creation/reporting into a the 
>> opal/mca/common area, but that is going to take some work and will likely 
>> impact release schedules.
>> 
>> Accordingly, we propose to remove the OFI/BTL component from v4.0.0, fix the 
>> problem in master, and then consider bringing it back as a package to v4.1 
>> or v4.2.
>> 
>> Comments? If we agree, I’ll file a PR to remove it.
>> Ralph
>> 
>> 
>>> Begin forwarded message:
>>> 
>>> From: Peter Kjellström mailto:c...@nsc.liu.se>>
>>> Subject: Re: [OMPI devel] Announcing Open MPI v4.0.0rc1
>>> Date: September 20, 2018 at 5:18:35 AM PDT
>>> To: "Gabriel, Edgar" >> <mailto:egabr...@central.uh.edu>>
>>> Cc: Open MPI Developers >> <mailto:devel@lists.open-mpi.org>>
>>> Reply-To: Open MPI Developers >> <mailto:devel@lists.open-mpi.org>>
>>> 
>>> On Wed, 19 Sep 2018 16:24:53 +
>>> "Gabriel, Edgar" mailto:egabr...@central.uh.edu>> 
>>> wrote:
>>> 
>>>> I performed some tests on our Omnipath cluster, and I have a mixed
>>>> bag of results with 4.0.0rc1
>>> 
>>> I've also tried it on our OPA cluster (skylake+centos-7+inbox) with
>>> very similar results.
>>> 
>>>> compute-1-1.local.4351PSM2 has not been initialized
>>>> compute-1-0.local.3826PSM2 has not been initialized
>>> 
>>> yup I too see these.
>>> 
>>>> mpirun detected that one or more processes exited with non-zero
>>>> status, thus causing the job to be terminated. The first process to
>>>> do so was:
>>>> 
>>>>  Process name: [[38418,1],1]
>>>>  Exit code:255
>>>>  
>>

Re: [OMPI devel] OFI issues on Open MPI v4.0.0rc1

2018-09-20 Thread Ralph H Castain
I suspect it is a question of what you tested and in which scenarios. Problem 
is that it can bite someone and there isn’t a clean/obvious solution that 
doesn’t require the user to do something - e.g., like having to know that they 
need to disable a BTL. Matias has proposed an mca-based approach, but I would 
much rather we just fix this correctly. Bandaids have a habit of becoming 
permanently forgotten - until someone pulls on it and things unravel.


> On Sep 20, 2018, at 4:14 PM, Patinyasakdikul, Thananon 
>  wrote:
> 
> In the summer, I tested this BTL with along with the MTL and able to use both 
> of them interchangeably with no problem. I dont know what changed. libpsm2?
> 
> 
> Arm
> 
> 
> On Thu, Sep 20, 2018, 7:06 PM Ralph H Castain  <mailto:r...@open-mpi.org>> wrote:
> We have too many discussion threads overlapping on the same email chain - so 
> let’s break the discussion on the OFI problem into its own chain.
> 
> We have been investigating this locally and found there are a number of 
> conflicts between the MTLs and the OFI/BTL stepping on each other. The 
> correct solution is to move endpoint creation/reporting into a the 
> opal/mca/common area, but that is going to take some work and will likely 
> impact release schedules.
> 
> Accordingly, we propose to remove the OFI/BTL component from v4.0.0, fix the 
> problem in master, and then consider bringing it back as a package to v4.1 or 
> v4.2.
> 
> Comments? If we agree, I’ll file a PR to remove it.
> Ralph
> 
> 
>> Begin forwarded message:
>> 
>> From: Peter Kjellström mailto:c...@nsc.liu.se>>
>> Subject: Re: [OMPI devel] Announcing Open MPI v4.0.0rc1
>> Date: September 20, 2018 at 5:18:35 AM PDT
>> To: "Gabriel, Edgar" > <mailto:egabr...@central.uh.edu>>
>> Cc: Open MPI Developers > <mailto:devel@lists.open-mpi.org>>
>> Reply-To: Open MPI Developers > <mailto:devel@lists.open-mpi.org>>
>> 
>> On Wed, 19 Sep 2018 16:24:53 +
>> "Gabriel, Edgar" mailto:egabr...@central.uh.edu>> 
>> wrote:
>> 
>>> I performed some tests on our Omnipath cluster, and I have a mixed
>>> bag of results with 4.0.0rc1
>> 
>> I've also tried it on our OPA cluster (skylake+centos-7+inbox) with
>> very similar results.
>> 
>>> compute-1-1.local.4351PSM2 has not been initialized
>>> compute-1-0.local.3826PSM2 has not been initialized
>> 
>> yup I too see these.
>> 
>>> mpirun detected that one or more processes exited with non-zero
>>> status, thus causing the job to be terminated. The first process to
>>> do so was:
>>> 
>>>  Process name: [[38418,1],1]
>>>  Exit code:255
>>>  
>>> 
>> 
>> yup.
>> 
>>> 
>>> 2.   The ofi mtl does not work at all on our Omnipath cluster. If
>>> I try to force it using ‘mpirun –mca mtl ofi …’ I get the following
>>> error message.
>> 
>> Yes ofi seems broken. But not even disabling it helps me completely (I
>> see "mca_btl_ofi.so   [.] mca_btl_ofi_component_progress" in my
>> perf top...
>> 
>>> 3.   The openib btl component is always getting in the way with
>>> annoying warnings. It is not really used, but constantly complains:
>> ...
>>> [sabine.cacds.uh.edu:25996 <http://sabine.cacds.uh.edu:25996/>] 1 more 
>>> process has sent help message
>>> help-mpi-btl-openib.txt / ib port not selected
>> 
>> Yup.
>> 
>> ...
>>> So bottom line, if I do
>>> 
>>> mpirun –mca btl^openib –mca mtl^ofi ….
>>> 
>>> my tests finish correctly, although mpirun will still return an error.
>> 
>> I get some things to work with this approach (two ranks on two nodes
>> for example). But a lot of things crash rahter hard:
>> 
>> $ mpirun -mca btl ^openib -mca mtl
>> ^ofi ./openmpi-4.0.0rc1/imb.openmpi-4.0.0rc1
>> --
>> PSM2 was unable to open an endpoint. Please make sure that the network
>> link is active on the node and the hardware is functioning.
>> 
>>  Error: Failure in initializing endpoint
>> --
>> n909.279895hfi_userinit: assign_context command failed: Device or
>> resource busy n909.279895psmi_context_open: hfi_userinit: failed,
>> trying again (1/3)
>> ...
>>  PML add p

[OMPI devel] OFI issues on Open MPI v4.0.0rc1

2018-09-20 Thread Ralph H Castain
We have too many discussion threads overlapping on the same email chain - so 
let’s break the discussion on the OFI problem into its own chain.

We have been investigating this locally and found there are a number of 
conflicts between the MTLs and the OFI/BTL stepping on each other. The correct 
solution is to move endpoint creation/reporting into a the opal/mca/common 
area, but that is going to take some work and will likely impact release 
schedules.

Accordingly, we propose to remove the OFI/BTL component from v4.0.0, fix the 
problem in master, and then consider bringing it back as a package to v4.1 or 
v4.2.

Comments? If we agree, I’ll file a PR to remove it.
Ralph


> Begin forwarded message:
> 
> From: Peter Kjellström 
> Subject: Re: [OMPI devel] Announcing Open MPI v4.0.0rc1
> Date: September 20, 2018 at 5:18:35 AM PDT
> To: "Gabriel, Edgar" 
> Cc: Open MPI Developers 
> Reply-To: Open MPI Developers 
> 
> On Wed, 19 Sep 2018 16:24:53 +
> "Gabriel, Edgar"  wrote:
> 
>> I performed some tests on our Omnipath cluster, and I have a mixed
>> bag of results with 4.0.0rc1
> 
> I've also tried it on our OPA cluster (skylake+centos-7+inbox) with
> very similar results.
> 
>> compute-1-1.local.4351PSM2 has not been initialized
>> compute-1-0.local.3826PSM2 has not been initialized
> 
> yup I too see these.
> 
>> mpirun detected that one or more processes exited with non-zero
>> status, thus causing the job to be terminated. The first process to
>> do so was:
>> 
>>  Process name: [[38418,1],1]
>>  Exit code:255
>>  
>> 
> 
> yup.
> 
>> 
>> 2.   The ofi mtl does not work at all on our Omnipath cluster. If
>> I try to force it using ‘mpirun –mca mtl ofi …’ I get the following
>> error message.
> 
> Yes ofi seems broken. But not even disabling it helps me completely (I
> see "mca_btl_ofi.so   [.] mca_btl_ofi_component_progress" in my
> perf top...
> 
>> 3.   The openib btl component is always getting in the way with
>> annoying warnings. It is not really used, but constantly complains:
> ...
>> [sabine.cacds.uh.edu:25996] 1 more process has sent help message
>> help-mpi-btl-openib.txt / ib port not selected
> 
> Yup.
> 
> ...
>> So bottom line, if I do
>> 
>> mpirun –mca btl^openib –mca mtl^ofi ….
>> 
>> my tests finish correctly, although mpirun will still return an error.
> 
> I get some things to work with this approach (two ranks on two nodes
> for example). But a lot of things crash rahter hard:
> 
> $ mpirun -mca btl ^openib -mca mtl
> ^ofi ./openmpi-4.0.0rc1/imb.openmpi-4.0.0rc1
> --
> PSM2 was unable to open an endpoint. Please make sure that the network
> link is active on the node and the hardware is functioning.
> 
>  Error: Failure in initializing endpoint
> --
> n909.279895hfi_userinit: assign_context command failed: Device or
> resource busy n909.279895psmi_context_open: hfi_userinit: failed,
> trying again (1/3)
> ...
>  PML add procs failed
>  --> Returned "Error" (-1) instead of "Success" (0)
> --
> [n908:298761] *** An error occurred in MPI_Init
> [n908:298761] *** reported by process [4092002305,59]
> [n908:298761] *** on a NULL communicator
> [n908:298761] *** Unknown error
> [n908:298761] *** MPI_ERRORS_ARE_FATAL (processes in this communicator
>  will now abort, [n908:298761] ***and potentially your MPI job)
> [n907:407748] 255 more processes have sent help message
>  help-mtl-psm2.txt / unable to open endpoint [n907:407748] Set MCA
>  parameter "orte_base_help_aggregate" to 0 to see all help / error
>  messages [n907:407748] 127 more processes have sent help message
>  help-mpi-runtime.txt / mpi_init:startup:internal-failure
>  [n907:407748] 56 more processes have sent help message
>  help-mpi-errors.txt / mpi_errors_are_fatal unknown handle
> 
> If I disable psm2 too I get it to run (apparantly on vader?)
> 
> /Peter K
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] MTT Perl client

2018-09-18 Thread Ralph H Castain
Are we good to go with this changeover? If so, I’ll delete the Perl client from 
the main MTT repo.

> On Sep 14, 2018, at 10:06 AM, Jeff Squyres (jsquyres) via devel 
>  wrote:
> 
> On Sep 14, 2018, at 12:37 PM, Gilles Gouaillardet 
>  wrote:
>> 
>> IIRC mtt-relay is not only a proxy (squid can do that too).
> 
> Probably true.  IIRC, I think mtt-relay was meant to be a 
> dirt-stupid-but-focused-to-just-one-destination relay.
> 
>> mtt results can be manually copied from a cluster behind a firewall, and 
>> then mtt-relay can “upload” these results to mtt.open-MPI.org
> 
> Yes, but then a human has to be involved, which kinda defeats at least one of 
> the goals of MTT.  Using mtt-relay allowed MTT to still function in an 
> automated fashion.
> 
> FWIW, it may not be necessary to convert mtt-relay to python (IIRC that it's 
> protocol agnostic, but like I said: it's been quite a while since I've looked 
> at that code).  It was pretty small and straightforward.  It could also just 
> stay in mtt-legacy.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] MTT Perl client

2018-09-14 Thread Ralph H Castain
I very much doubt that there is a Python equivalent yet.

> On Sep 14, 2018, at 9:23 AM, Jeff Squyres (jsquyres) via devel 
>  wrote:
> 
> It's for environments where MTT is run where it can't reach the greater 
> internet (or, at least, it can't POST to the greater internet).  You run the 
> mtt-relay on a machine that is reachable by your machines running MTT, and it 
> works as a relay to mtt.open-mpi.org so that you can submit your MTT results.
> 
> It might actually be fairly protocol agnostic, IIRC (been a while since I've 
> looked at that code).
> 
> 
> 
>> On Sep 14, 2018, at 11:23 AM, Ralph H Castain  wrote:
>> 
>> Afraid I’m not familiar with that script - what does it do?
>> 
>> 
>>> On Sep 14, 2018, at 7:46 AM, Christoph Niethammer  
>>> wrote:
>>> 
>>> Works for the installation at HLRS.
>>> 
>>> Short note/question: I am using the mtt-relay script. This is written in 
>>> perl. Is there a python based replacement?
>>> 
>>> Best
>>> Christoph Niethammer
>>> 
>>> - Mensaje original -
>>> De: "Open MPI Developers" 
>>> Para: "Open MPI Developers" 
>>> CC: "Jeff Squyres" 
>>> Enviados: Martes, 11 de Septiembre 2018 20:37:40
>>> Asunto: Re: [OMPI devel] MTT Perl client
>>> 
>>> Works for me.
>>> 
>>>> On Sep 11, 2018, at 12:35 PM, Ralph H Castain  wrote:
>>>> 
>>>> Hi folks
>>>> 
>>>> Per today’s telecon, I have moved the Perl MTT client into its own 
>>>> repository: https://github.com/open-mpi/mtt-legacy. All the Python client 
>>>> code has been removed from that repo.
>>>> 
>>>> The original MTT repo remains at https://github.com/open-mpi/mtt. I have a 
>>>> PR to remove all the Perl client code and associated libs/modules from 
>>>> that repo. We won’t commit it until people have had a chance to switch to 
>>>> the mtt-legacy repo and verify that things still work for them.
>>>> 
>>>> Please let us know if mtt-legacy is okay or has a problem.
>>>> 
>>>> Thanks
>>>> Ralph
>>>> 
>>>> ___
>>>> devel mailing list
>>>> devel@lists.open-mpi.org
>>>> https://lists.open-mpi.org/mailman/listinfo/devel
>>> 
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> 
>>> ___
>>> devel mailing list
>>> devel@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/devel
>>> ___
>>> devel mailing list
>>> devel@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/devel
>> 
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] MTT Perl client

2018-09-14 Thread Ralph H Castain
Afraid I’m not familiar with that script - what does it do?


> On Sep 14, 2018, at 7:46 AM, Christoph Niethammer  wrote:
> 
> Works for the installation at HLRS.
> 
> Short note/question: I am using the mtt-relay script. This is written in 
> perl. Is there a python based replacement?
> 
> Best
> Christoph Niethammer
> 
> - Mensaje original -
> De: "Open MPI Developers" 
> Para: "Open MPI Developers" 
> CC: "Jeff Squyres" 
> Enviados: Martes, 11 de Septiembre 2018 20:37:40
> Asunto: Re: [OMPI devel] MTT Perl client
> 
> Works for me.
> 
>> On Sep 11, 2018, at 12:35 PM, Ralph H Castain  wrote:
>> 
>> Hi folks
>> 
>> Per today’s telecon, I have moved the Perl MTT client into its own 
>> repository: https://github.com/open-mpi/mtt-legacy. All the Python client 
>> code has been removed from that repo.
>> 
>> The original MTT repo remains at https://github.com/open-mpi/mtt. I have a 
>> PR to remove all the Perl client code and associated libs/modules from that 
>> repo. We won’t commit it until people have had a chance to switch to the 
>> mtt-legacy repo and verify that things still work for them.
>> 
>> Please let us know if mtt-legacy is okay or has a problem.
>> 
>> Thanks
>> Ralph
>> 
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Will info keys ever be fixed?

2018-09-11 Thread Ralph H Castain
On MacOS with gcc 7.3


> On Sep 11, 2018, at 3:02 PM, Jeff Squyres (jsquyres) via devel 
>  wrote:
> 
> Ralph --
> 
> What OS / compiler are you using?
> 
> I just compiled on MacOS (first time in a while) and filed a PR and a few 
> issues about the warnings I found, but I cannot replicate these warnings.  I 
> also built with gcc 7.3.0 on RHEL; couldn't replicate the warnings.
> 
> On MacOS, I'm using the default Xcode compilers:
> 
> $ gcc --version
> Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr 
> --with-gxx-include-dir=/usr/include/c++/4.2.1
> Apple LLVM version 9.1.0 (clang-902.0.39.2)
> Target: x86_64-apple-darwin17.7.0
> Thread model: posix
> InstalledDir: 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
> 
> 
> 
> 
> 
>> On Sep 10, 2018, at 6:57 PM, Ralph H Castain  wrote:
>> 
>> Still seeing this in today’s head of master:
>> 
>> info_subscriber.c: In function 'opal_infosubscribe_change_info':
>> ../../opal/util/info.h:112:31: warning: '%s' directive output may be 
>> truncated writing up to 36 bytes into a region of size 27 
>> [-Wformat-truncation=]
>> #define OPAL_INFO_SAVE_PREFIX "_OMPI_IN_"
>>   ^
>> info_subscriber.c:268:13: note: in expansion of macro 'OPAL_INFO_SAVE_PREFIX'
>> OPAL_INFO_SAVE_PREFIX "%s", key);
>> ^
>> info_subscriber.c:268:36: note: format string is defined here
>> OPAL_INFO_SAVE_PREFIX "%s", key);
>>^~
>> In file included from 
>> /opt/local/lib/gcc7/gcc/x86_64-apple-darwin17/7.3.0/include-fixed/stdio.h:425:0,
>> from ../../opal/class/opal_list.h:71,
>> from ../../opal/util/info_subscriber.h:30,
>> from info_subscriber.c:45:
>> info_subscriber.c:267:9: note: '__builtin_snprintf' output between 10 and 46 
>> bytes into a destination of size 36
>> snprintf(modkey, OPAL_MAX_INFO_KEY,
>> ^
>> In file included from 
>> /opt/local/lib/gcc7/gcc/x86_64-apple-darwin17/7.3.0/include-fixed/stdio.h:425:0,
>> from ../../opal/class/opal_list.h:71,
>> from ../../opal/util/info.h:30,
>> from info.c:46:
>> info.c: In function 'opal_info_dup_mode.constprop':
>> ../../opal/util/info.h:112:31: warning: '%s' directive output may be 
>> truncated writing up to 36 bytes into a region of size 28 
>> [-Wformat-truncation=]
>> #define OPAL_INFO_SAVE_PREFIX "_OMPI_IN_"
>>   ^
>> info.c:212:22: note: in expansion of macro 'OPAL_INFO_SAVE_PREFIX'
>>  OPAL_INFO_SAVE_PREFIX "%s", pkey);
>>  ^
>> info.c:212:45: note: format string is defined here
>>  OPAL_INFO_SAVE_PREFIX "%s", pkey);
>> ^~
>> In file included from 
>> /opt/local/lib/gcc7/gcc/x86_64-apple-darwin17/7.3.0/include-fixed/stdio.h:425:0,
>> from ../../opal/class/opal_list.h:71,
>> from ../../opal/util/info.h:30,
>> from info.c:46:
>> info.c:211:18: note: '__builtin_snprintf' output between 10 and 46 bytes 
>> into a destination of size 37
>>  snprintf(savedkey, OPAL_MAX_INFO_KEY+1,
>>  ^
>> 
>> 
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] mpirun error when not using span

2018-09-11 Thread Ralph H Castain
I believe the problem is actually a little different than you described. The 
issue occurs whenever the #procs combined with PE exceeds the number of cores 
on a node. It is caused by the fact that we aren’t considering the PE number 
when mapping processes - we only appear to be looking at it when binding. I’ll 
try to poke at it a bit.


> On Sep 11, 2018, at 9:17 AM, Shrader, David Lee  wrote:
> 
> Here's the xml output from lstopo. Thank you for taking a look!
> David
> 
> From: devel  on behalf of Ralph H Castain 
> 
> Sent: Monday, September 10, 2018 5:12 PM
> To: OpenMPI Devel
> Subject: Re: [OMPI devel] mpirun error when not using span
>  
> Could you please send the output from “lstopo --of xml foo.xml” (the file 
> foo.xml) so I can try to replicate here?
> 
> 
>> On Sep 4, 2018, at 12:35 PM, Shrader, David Lee > <mailto:dshra...@lanl.gov>> wrote:
>> 
>> Hello,
>> 
>> I have run this issue by Howard, and he asked me to forward it on to the 
>> Open MPI devel mailing list. I get an error when trying to use PE=n with 
>> '--map-by numa' and not using span when using more than one node:
>> 
>> [dshrader@ba001 openmpi-3.1.2]$ mpirun -n 16 --map-by numa:PE=4 --bind-to 
>> core --report-bindings true
>> --
>> A request was made to bind to that would result in binding more
>> processes than cpus on a resource:
>> 
>>Bind to: CORE
>>Node:ba001
>>#processes:  2
>>#cpus:   1
>> 
>> You can override this protection by adding the "overload-allowed"
>> option to your binding directive.
>> --
>> 
>> The absolute values of the numbers passed to -n and PE don't really matter; 
>> the error pops up as soon as those numbers are combined in such a way that 
>> an MPI rank ends up on the second node.
>> 
>> If I add the "span" parameter, everything works as expected:
>> 
>> [dshrader@ba001 openmpi-3.1.2]$ mpirun -n 16 --map-by numa:PE=4,span 
>> --bind-to core --report-bindings true
>> [ba002.localdomain:58502] MCW rank 8 bound to socket 0[core 0[hwt 0]], 
>> socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]]: 
>> [B/B/B/B/./././././././././././././.][./././././././././././././././././.]
>> [ba002.localdomain:58502] MCW rank 9 bound to socket 0[core 4[hwt 0]], 
>> socket 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]]: 
>> [././././B/B/B/B/./././././././././.][./././././././././././././././././.]
>> [ba002.localdomain:58502] MCW rank 10 bound to socket 0[core 8[hwt 0]], 
>> socket 0[core 9[hwt 0]], socket 0[core 10[hwt 0]], socket 0[core 11[hwt 0]]: 
>> [././././././././B/B/B/B/./././././.][./././././././././././././././././.]
>> [ba002.localdomain:58502] MCW rank 11 bound to socket 0[core 12[hwt 0]], 
>> socket 0[core 13[hwt 0]], socket 0[core 14[hwt 0]], socket 0[core 15[hwt 
>> 0]]: 
>> [././././././././././././B/B/B/B/./.][./././././././././././././././././.]
>> [ba002.localdomain:58502] MCW rank 12 bound to socket 1[core 18[hwt 0]], 
>> socket 1[core 19[hwt 0]], socket 1[core 20[hwt 0]], socket 1[core 21[hwt 
>> 0]]: 
>> [./././././././././././././././././.][B/B/B/B/./././././././././././././.]
>> [ba002.localdomain:58502] MCW rank 13 bound to socket 1[core 22[hwt 0]], 
>> socket 1[core 23[hwt 0]], socket 1[core 24[hwt 0]], socket 1[core 25[hwt 
>> 0]]: 
>> [./././././././././././././././././.][././././B/B/B/B/./././././././././.]
>> [ba002.localdomain:58502] MCW rank 14 bound to socket 1[core 26[hwt 0]], 
>> socket 1[core 27[hwt 0]], socket 1[core 28[hwt 0]], socket 1[core 29[hwt 
>> 0]]: 
>> [./././././././././././././././././.][././././././././B/B/B/B/./././././.]
>> [ba002.localdomain:58502] MCW rank 15 bound to socket 1[core 30[hwt 0]], 
>> socket 1[core 31[hwt 0]], socket 1[core 32[hwt 0]], socket 1[core 33[hwt 
>> 0]]: 
>> [./././././././././././././././././.][././././././././././././B/B/B/B/./.]
>> [ba001.localdomain:11700] MCW rank 0 bound to socket 0[core 0[hwt 0]], 
>> socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]]: 
>> [B/B/B/B/./././././././././././././.][./././././././././././././././././.]
>> [ba001.localdomain:11700] MCW rank 1 bound to socket 0[core 4[hwt 0]], 
>> socket 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]]: 
>> [././././B/B/B/B/./././././././././.][./././././././././././././././././.]
>> [ba001.localdomain:11700] MCW rank 2 bound to socket 0[core 8[h

[OMPI devel] MTT Perl client

2018-09-11 Thread Ralph H Castain
Hi folks

Per today’s telecon, I have moved the Perl MTT client into its own repository: 
https://github.com/open-mpi/mtt-legacy. All the Python client code has been 
removed from that repo.

The original MTT repo remains at https://github.com/open-mpi/mtt. I have a PR 
to remove all the Perl client code and associated libs/modules from that repo. 
We won’t commit it until people have had a chance to switch to the mtt-legacy 
repo and verify that things still work for them.

Please let us know if mtt-legacy is okay or has a problem.

Thanks
Ralph

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] mpirun error when not using span

2018-09-10 Thread Ralph H Castain
Could you please send the output from “lstopo --of xml foo.xml” (the file 
foo.xml) so I can try to replicate here?


> On Sep 4, 2018, at 12:35 PM, Shrader, David Lee  wrote:
> 
> Hello,
> 
> I have run this issue by Howard, and he asked me to forward it on to the Open 
> MPI devel mailing list. I get an error when trying to use PE=n with '--map-by 
> numa' and not using span when using more than one node:
> 
> [dshrader@ba001 openmpi-3.1.2]$ mpirun -n 16 --map-by numa:PE=4 --bind-to 
> core --report-bindings true
> --
> A request was made to bind to that would result in binding more
> processes than cpus on a resource:
> 
>Bind to: CORE
>Node:ba001
>#processes:  2
>#cpus:   1
> 
> You can override this protection by adding the "overload-allowed"
> option to your binding directive.
> --
> 
> The absolute values of the numbers passed to -n and PE don't really matter; 
> the error pops up as soon as those numbers are combined in such a way that an 
> MPI rank ends up on the second node.
> 
> If I add the "span" parameter, everything works as expected:
> 
> [dshrader@ba001 openmpi-3.1.2]$ mpirun -n 16 --map-by numa:PE=4,span 
> --bind-to core --report-bindings true
> [ba002.localdomain:58502] MCW rank 8 bound to socket 0[core 0[hwt 0]], socket 
> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]]: 
> [B/B/B/B/./././././././././././././.][./././././././././././././././././.]
> [ba002.localdomain:58502] MCW rank 9 bound to socket 0[core 4[hwt 0]], socket 
> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]]: 
> [././././B/B/B/B/./././././././././.][./././././././././././././././././.]
> [ba002.localdomain:58502] MCW rank 10 bound to socket 0[core 8[hwt 0]], 
> socket 0[core 9[hwt 0]], socket 0[core 10[hwt 0]], socket 0[core 11[hwt 0]]: 
> [././././././././B/B/B/B/./././././.][./././././././././././././././././.]
> [ba002.localdomain:58502] MCW rank 11 bound to socket 0[core 12[hwt 0]], 
> socket 0[core 13[hwt 0]], socket 0[core 14[hwt 0]], socket 0[core 15[hwt 0]]: 
> [././././././././././././B/B/B/B/./.][./././././././././././././././././.]
> [ba002.localdomain:58502] MCW rank 12 bound to socket 1[core 18[hwt 0]], 
> socket 1[core 19[hwt 0]], socket 1[core 20[hwt 0]], socket 1[core 21[hwt 0]]: 
> [./././././././././././././././././.][B/B/B/B/./././././././././././././.]
> [ba002.localdomain:58502] MCW rank 13 bound to socket 1[core 22[hwt 0]], 
> socket 1[core 23[hwt 0]], socket 1[core 24[hwt 0]], socket 1[core 25[hwt 0]]: 
> [./././././././././././././././././.][././././B/B/B/B/./././././././././.]
> [ba002.localdomain:58502] MCW rank 14 bound to socket 1[core 26[hwt 0]], 
> socket 1[core 27[hwt 0]], socket 1[core 28[hwt 0]], socket 1[core 29[hwt 0]]: 
> [./././././././././././././././././.][././././././././B/B/B/B/./././././.]
> [ba002.localdomain:58502] MCW rank 15 bound to socket 1[core 30[hwt 0]], 
> socket 1[core 31[hwt 0]], socket 1[core 32[hwt 0]], socket 1[core 33[hwt 0]]: 
> [./././././././././././././././././.][././././././././././././B/B/B/B/./.]
> [ba001.localdomain:11700] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 
> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]]: 
> [B/B/B/B/./././././././././././././.][./././././././././././././././././.]
> [ba001.localdomain:11700] MCW rank 1 bound to socket 0[core 4[hwt 0]], socket 
> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]]: 
> [././././B/B/B/B/./././././././././.][./././././././././././././././././.]
> [ba001.localdomain:11700] MCW rank 2 bound to socket 0[core 8[hwt 0]], socket 
> 0[core 9[hwt 0]], socket 0[core 10[hwt 0]], socket 0[core 11[hwt 0]]: 
> [././././././././B/B/B/B/./././././.][./././././././././././././././././.]
> [ba001.localdomain:11700] MCW rank 3 bound to socket 0[core 12[hwt 0]], 
> socket 0[core 13[hwt 0]], socket 0[core 14[hwt 0]], socket 0[core 15[hwt 0]]: 
> [././././././././././././B/B/B/B/./.][./././././././././././././././././.]
> [ba001.localdomain:11700] MCW rank 4 bound to socket 1[core 18[hwt 0]], 
> socket 1[core 19[hwt 0]], socket 1[core 20[hwt 0]], socket 1[core 21[hwt 0]]: 
> [./././././././././././././././././.][B/B/B/B/./././././././././././././.]
> [ba001.localdomain:11700] MCW rank 5 bound to socket 1[core 22[hwt 0]], 
> socket 1[core 23[hwt 0]], socket 1[core 24[hwt 0]], socket 1[core 25[hwt 0]]: 
> [./././././././././././././././././.][././././B/B/B/B/./././././././././.]
> [ba001.localdomain:11700] MCW rank 6 bound to socket 1[core 26[hwt 0]], 
> socket 1[core 27[hwt 0]], socket 1[core 28[hwt 0]], socket 1[core 29[hwt 0]]: 
> [./././././././././././././././././.][././././././././B/B/B/B/./././././.]
> [ba001.localdomain:11700] MCW rank 7 bound to socket 1[core 30[hwt 0]], 
> socket 1[core 31[hwt 0]], socket 1[core 32[hwt 0]], socket 1[core 33[hwt 0]]: 
> 

[OMPI devel] Will info keys ever be fixed?

2018-09-10 Thread Ralph H Castain
Still seeing this in today’s head of master:

info_subscriber.c: In function 'opal_infosubscribe_change_info':
../../opal/util/info.h:112:31: warning: '%s' directive output may be truncated 
writing up to 36 bytes into a region of size 27 [-Wformat-truncation=]
 #define OPAL_INFO_SAVE_PREFIX "_OMPI_IN_"
   ^
info_subscriber.c:268:13: note: in expansion of macro 'OPAL_INFO_SAVE_PREFIX'
 OPAL_INFO_SAVE_PREFIX "%s", key);
 ^
info_subscriber.c:268:36: note: format string is defined here
 OPAL_INFO_SAVE_PREFIX "%s", key);
^~
In file included from 
/opt/local/lib/gcc7/gcc/x86_64-apple-darwin17/7.3.0/include-fixed/stdio.h:425:0,
 from ../../opal/class/opal_list.h:71,
 from ../../opal/util/info_subscriber.h:30,
 from info_subscriber.c:45:
info_subscriber.c:267:9: note: '__builtin_snprintf' output between 10 and 46 
bytes into a destination of size 36
 snprintf(modkey, OPAL_MAX_INFO_KEY,
 ^
In file included from 
/opt/local/lib/gcc7/gcc/x86_64-apple-darwin17/7.3.0/include-fixed/stdio.h:425:0,
 from ../../opal/class/opal_list.h:71,
 from ../../opal/util/info.h:30,
 from info.c:46:
info.c: In function 'opal_info_dup_mode.constprop':
../../opal/util/info.h:112:31: warning: '%s' directive output may be truncated 
writing up to 36 bytes into a region of size 28 [-Wformat-truncation=]
 #define OPAL_INFO_SAVE_PREFIX "_OMPI_IN_"
   ^
info.c:212:22: note: in expansion of macro 'OPAL_INFO_SAVE_PREFIX'
  OPAL_INFO_SAVE_PREFIX "%s", pkey);
  ^
info.c:212:45: note: format string is defined here
  OPAL_INFO_SAVE_PREFIX "%s", pkey);
 ^~
In file included from 
/opt/local/lib/gcc7/gcc/x86_64-apple-darwin17/7.3.0/include-fixed/stdio.h:425:0,
 from ../../opal/class/opal_list.h:71,
 from ../../opal/util/info.h:30,
 from info.c:46:
info.c:211:18: note: '__builtin_snprintf' output between 10 and 46 bytes into a 
destination of size 37
  snprintf(savedkey, OPAL_MAX_INFO_KEY+1,
  ^


___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Open MPI website borked up?

2018-09-01 Thread Ralph H Castain
I suspect this is a stale message - I’m not seeing any problem with the website


> On Aug 29, 2018, at 12:55 PM, Howard Pritchard  wrote:
> 
> Hi Folks,
> 
> Something seems to be borked up about the OMPI website.  Got to website and 
> you'll
> get some odd parsing error appearing.
> 
> Howard
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] Continued warnings?

2018-07-31 Thread Ralph H Castain
Just curious - will this ever be fixed? From today’s head of master:

In file included from info.c:46:0:
info.c: In function 'opal_info_dup_mode':
../../opal/util/info.h:112:31: warning: '%s' directive output may be truncated 
writing up to 36 bytes into a region of size 27 [-Wformat-truncation=]
 #define OPAL_INFO_SAVE_PREFIX "_OMPI_IN_"
   ^
info.c:212:22: note: in expansion of macro 'OPAL_INFO_SAVE_PREFIX'
  OPAL_INFO_SAVE_PREFIX "%s", iterator->ie_key);
  ^
info.c:212:45: note: format string is defined here
  OPAL_INFO_SAVE_PREFIX "%s", iterator->ie_key);
 ^~
info.c:211:18: note: 'snprintf' output between 10 and 46 bytes into a 
destination of size 36
  snprintf(savedkey, OPAL_MAX_INFO_KEY,
  ^
  OPAL_INFO_SAVE_PREFIX "%s", iterator->ie_key);
  ~


___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[hwloc-devel] Create success (hwloc git dev-1242-g45371f6)

2016-09-14 Thread Ralph H Castain
Creating nightly hwloc snapshot git tarball was a success.

Snapshot:   hwloc dev-1242-g45371f6
Start time: Wed Sep 14 18:01:08 PDT 2016
End time:   Wed Sep 14 18:04:47 PDT 2016

Your friendly daemon,
Cyrador
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-devel


[hwloc-devel] Create success (hwloc git dev-1241-gc7de8f8)

2016-09-10 Thread Ralph H Castain
Creating nightly hwloc snapshot git tarball was a success.

Snapshot:   hwloc dev-1241-gc7de8f8
Start time: Sat Sep 10 18:01:07 PDT 2016
End time:   Sat Sep 10 18:04:39 PDT 2016

Your friendly daemon,
Cyrador
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-devel


[hwloc-devel] Create success (hwloc git 1.11.4-6-g8445aa9)

2016-09-07 Thread Ralph H Castain
Creating nightly hwloc snapshot git tarball was a success.

Snapshot:   hwloc 1.11.4-6-g8445aa9
Start time: Wed Sep  7 18:05:27 PDT 2016
End time:   Wed Sep  7 18:08:17 PDT 2016

Your friendly daemon,
Cyrador
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-devel


[hwloc-devel] Create success (hwloc git dev-1238-g2377852)

2016-09-07 Thread Ralph H Castain
Creating nightly hwloc snapshot git tarball was a success.

Snapshot:   hwloc dev-1238-g2377852
Start time: Wed Sep  7 18:01:08 PDT 2016
End time:   Wed Sep  7 18:04:43 PDT 2016

Your friendly daemon,
Cyrador
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-devel


[hwloc-devel] Create success (hwloc git dev-1235-gab57fb3)

2016-09-01 Thread Ralph H Castain
Creating nightly hwloc snapshot git tarball was a success.

Snapshot:   hwloc dev-1235-gab57fb3
Start time: Thu Sep  1 18:01:06 PDT 2016
End time:   Thu Sep  1 18:03:38 PDT 2016

Your friendly daemon,
Cyrador
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-devel


[hwloc-devel] Create success (hwloc git 1.11.4-3-g57eb636)

2016-08-31 Thread Ralph H Castain
Creating nightly hwloc snapshot git tarball was a success.

Snapshot:   hwloc 1.11.4-3-g57eb636
Start time: Wed Aug 31 18:05:31 PDT 2016
End time:   Wed Aug 31 18:08:39 PDT 2016

Your friendly daemon,
Cyrador
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-devel


[hwloc-devel] Create success (hwloc git dev-1234-g10801ef)

2016-08-31 Thread Ralph H Castain
Creating nightly hwloc snapshot git tarball was a success.

Snapshot:   hwloc dev-1234-g10801ef
Start time: Wed Aug 31 18:01:08 PDT 2016
End time:   Wed Aug 31 18:04:49 PDT 2016

Your friendly daemon,
Cyrador
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-devel


[hwloc-devel] Create success (hwloc git 1.11.4-1-g77e9c1e)

2016-08-29 Thread Ralph H Castain
Creating nightly hwloc snapshot git tarball was a success.

Snapshot:   hwloc 1.11.4-1-g77e9c1e
Start time: Mon Aug 29 18:05:30 PDT 2016
End time:   Mon Aug 29 18:08:45 PDT 2016

Your friendly daemon,
Cyrador
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-devel


[hwloc-devel] Create success (hwloc git dev-1232-g8fed107)

2016-08-29 Thread Ralph H Castain
Creating nightly hwloc snapshot git tarball was a success.

Snapshot:   hwloc dev-1232-g8fed107
Start time: Mon Aug 29 18:01:08 PDT 2016
End time:   Mon Aug 29 18:04:50 PDT 2016

Your friendly daemon,
Cyrador
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-devel


[hwloc-devel] Create success (hwloc git 1.11.3-50-g4a28f82)

2016-08-24 Thread Ralph H Castain
Creating nightly hwloc snapshot git tarball was a success.

Snapshot:   hwloc 1.11.3-50-g4a28f82
Start time: Wed Aug 24 18:04:53 PDT 2016
End time:   Wed Aug 24 18:07:33 PDT 2016

Your friendly daemon,
Cyrador
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-devel


[hwloc-devel] Create success (hwloc git dev-1231-gd71f145)

2016-08-24 Thread Ralph H Castain
Creating nightly hwloc snapshot git tarball was a success.

Snapshot:   hwloc dev-1231-gd71f145
Start time: Wed Aug 24 18:01:05 PDT 2016
End time:   Wed Aug 24 18:04:11 PDT 2016

Your friendly daemon,
Cyrador
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-devel


[hwloc-devel] Create success (hwloc git dev-1230-g922cbec)

2016-08-23 Thread Ralph H Castain
Creating nightly hwloc snapshot git tarball was a success.

Snapshot:   hwloc dev-1230-g922cbec
Start time: Tue Aug 23 18:01:12 PDT 2016
End time:   Tue Aug 23 18:04:47 PDT 2016

Your friendly daemon,
Cyrador
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-devel


[hwloc-devel] Create success (hwloc git 1.11.3-49-gc038b2b)

2016-08-22 Thread Ralph H Castain
Creating nightly hwloc snapshot git tarball was a success.

Snapshot:   hwloc 1.11.3-49-gc038b2b
Start time: Mon Aug 22 18:05:25 PDT 2016
End time:   Mon Aug 22 18:08:25 PDT 2016

Your friendly daemon,
Cyrador
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-devel


[hwloc-devel] Create success (hwloc git dev-1226-g64d92a8)

2016-08-17 Thread Ralph H Castain
Creating nightly hwloc snapshot git tarball was a success.

Snapshot:   hwloc dev-1226-g64d92a8
Start time: Wed Aug 17 18:01:06 PDT 2016
End time:   Wed Aug 17 18:04:45 PDT 2016

Your friendly daemon,
Cyrador
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-devel


[hwloc-devel] Create success (hwloc git dev-1222-gdbe0cfd)

2016-08-10 Thread Ralph H Castain
Creating nightly hwloc snapshot git tarball was a success.

Snapshot:   hwloc dev-1222-gdbe0cfd
Start time: Wed Aug 10 18:01:05 PDT 2016
End time:   Wed Aug 10 18:04:43 PDT 2016

Your friendly daemon,
Cyrador
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-devel


[hwloc-devel] Create success (hwloc git 1.11.3-43-g8f0e3cd)

2016-08-10 Thread Ralph H Castain
Creating nightly hwloc snapshot git tarball was a success.

Snapshot:   hwloc 1.11.3-43-g8f0e3cd
Start time: Wed Aug 10 08:50:41 PDT 2016
End time:   Wed Aug 10 08:53:56 PDT 2016

Your friendly daemon,
Cyrador
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-devel


Re: [OMPI devel] 1.3.1rc3 was borked; 1.3.1rc4 is out

2009-03-04 Thread Ralph H. Castain
Looks okay to me Brian - I went ahead and filed the CMR and sent it on to
Brad for approval.

Ralph


> On Tue, 3 Mar 2009, Brian W. Barrett wrote:
>
>> On Tue, 3 Mar 2009, Jeff Squyres wrote:
>>
>>> 1.3.1rc3 had a race condition in the ORTE shutdown sequence.  The only
>>> difference between rc3 and rc4 was a fix for that race condition.
>>> Please
>>> test ASAP:
>>>
>>>   http://www.open-mpi.org/software/ompi/v1.3/
>>
>> I'm sorry, I've failed to test rc1 & rc2 on Catamount.  I'm getting a
>> compile
>> failure in the ORTE code.  I'll do a bit more testing and send Ralph an
>> e-mail this afternoon.
>
>
> Attached is a patch against v1.3 branch that makes it work on Red Storm.
> I'm not sure it's right, so I'm just e-mailing it rather than committing..
> Sorry Ralph, but can you take a look? :(
>
> Brian___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] IBCM error

2008-07-14 Thread Ralph H. Castain
I've been quietly following this discussion, but now feel a need to jump
in here. I really must disagree with the idea of building either IBCM or
RDMACM support by default. Neither of these has been proven to reliably
work, or to be advantageous. Our own experiences in testing them have been
slightly negative at best. When the did work, they were slower, didn't
scale well, and unreliable.

I'm not trying to rain on anyone's parade. These are worthwhile in the
long term. However, they clearly need further work to be "ready for prime
time".

Accordingly, I would recommend that they -only- be built if specifically
requested. Remember, most of our users just build blindly. It makes no
sense to have them build support for what can only be classed as an
experimental capability at this time.

Also, note that the OFED install is less-than-reliable wrt IBCM and
RDMACM. We have spent considerable time chasing down installation problems
that allowed the system to build, but then caused it to crash-and-burn if
we attempted to use it. We have gained experience at knowing when/where to
look now, but that doesn't lessen the reputation impact OMPI is getting as
a "buggy, cantankerous beast" according to our sys admins.

Not a reputation we should be encouraging.

Turning this off by default allows those more adventurous souls to explore
this capability, while letting our production-oriented customers install
and run in peace.

Ralph



> On Jul 14, 2008, at 9:21 AM, Pavel Shamis (Pasha) wrote:
>
>>> Should we not even build support for it?
>> I think IBCM CPC build should be enabled by default. The IBCM is
>> supplied with OFED so it should not be any problem during install.
>
> Ok.  But remember that there are at least some OS's where /dev/ucm* do
> *not* get created by default for some unknown reason (even though IBCM
> is installed).
>
>>> PRO: don't even allow the possibility of running with it, because
>>> we know that there are issues with the ibcm userspace library
>>> (i.e., reduce problem reports from users)
>>>
>>> PRO: users don't have to have libibcm installed on compute nodes
>>> (we've actually gotten some complaints about this)
>> We got compliances only for case when ompi was build on platform
>> with IBCM and after it was run on platform without IBCM.  Also we
>> did not have option to disable
>> the ibcm during compilation. So actually it was no way to install
>> OMPI on compute node. We added the option and the problem was
>> resolved.
>> In most cases the OFED install is the same on all nodes and it
>> should not be any problem to build IBCM support by default.
>
>
> Ok, sounds good.
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>



[OMPI devel] User request: add envar?

2008-07-11 Thread Ralph H Castain
Yo folks

For those not following the user list, this request was generated today:

 Absolutely, these are useful time and time again so should be part of
 the API and hence stable.  Care to mention what they are and I'll add it
 to my note as something to change when upgrading to 1.3 (we are looking
 at testing a snapshot in the near future).
>>> 
>>> Surely:
>>> 
>>> OMPI_COMM_WORLD_SIZE#procs in the job
>>> OMPI_COMM_WORLD_LOCAL_SIZE  #procs in this job that are sharing the node
>>> OMPI_UNIVERSE_SIZE  total #slots allocated to this user
>>> (across all nodes)
>>> OMPI_COMM_WORLD_RANKproc's rank
>>> OMPI_COMM_WORLD_LOCAL_RANK  local rank on node - lowest rank'd proc on
>>> the node is given local_rank=0
>>> 
>>> If there are others that would be useful, now is definitely the time to
>>> speak up!
>> 
>> The only other one I'd like to see is some kind of global identifier for
>> the job but as far as I can see I don't believe that openmpi has such a
>> concept.
> 
> Not really - of course, many environments have a jobid they assign at time
> of allocation. We could create a unified identifier from that to ensure a
> consistent name was always available, but the problem would be that not all
> environments provide it (e.g., rsh). To guarantee that the variable would
> always be there, we would have to make something up in those cases.

I could easily create such an envar, even for non-managed environments. The
plan would be to use the RM-provided jobid where one was available, and to
use the mpirun jobid where not.

My thought was to call it OMPI_JOB_ID, unless someone has another
suggestion.

Any objection to my doing so, and/or suggestions on alternative
implementations?

Ralph




Re: [OMPI devel] PLM consistency: priority

2008-07-11 Thread Ralph H Castain
Ummm...I actually was talking about the "PLM", not the "PML".

But I believe what you suggest concurs with what I said. In the PLM, you
could still provide multiple components you want considered, though it has
less meaning there. My suggestion really is only that we eliminate the
params to adjust relative priority as they are just confusing the user and
potentially misleading them as to what is going to happen.

Ralph



On 7/11/08 9:07 AM, "Aurélien Bouteiller" <boute...@eecs.utk.edu> wrote:

> We don't want the user to have to select by hand the best PML. The
> logic inside the current selection process selects the best pml for
> the underlying network. However changing the priority is pretty
> meaningless from the user's point of view. So while retaining the
> selection process including priorities, we might want to remove the
> priority parameter, and use only the pml=ob1,cm syntax from the user's
> point of view.
> 
> Aurelien
> 
> Le 11 juil. 08 à 10:56, Ralph H Castain a écrit :
> 
>> Okay, another fun one. Some of the PLM modules use MCA params to
>> adjust
>> their relative selection priority. This can lead to very unexpected
>> behavior
>> as which module gets selected will depend on the priorities of the
>> other
>> selectable modules - which changes from release to release as people
>> independently make adjustments and/or new modules are added.
>> 
>> Fortunately, this doesn't bite us too often since many environments
>> only
>> support one module, and since there is nothing to tell the user that
>> the plm
>> module whose priority they raised actually -didn't- get used!
>> 
>> However, in the interest of "least astonishment", some of us working
>> on the
>> RTE had changed our coding approach to avoid this confusion. Given
>> that we
>> have this nice mca component select logic that takes the specified
>> module -
>> i.e., "-mca plm foo" always yields foo if it can run, or errors out
>> if it
>> can't - then the safest course is to remove MCA params that adjust
>> module
>> priorities and have the user simply tell us which module they want
>> us to
>> use.
>> 
>> Do we want to make this consistent, at least in the PLM? Or do you
>> want to
>> leave the user guessing? :-)
>> 
>> Ralph
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel





[OMPI devel] PLM consistency: priority

2008-07-11 Thread Ralph H Castain
Okay, another fun one. Some of the PLM modules use MCA params to adjust
their relative selection priority. This can lead to very unexpected behavior
as which module gets selected will depend on the priorities of the other
selectable modules - which changes from release to release as people
independently make adjustments and/or new modules are added.

Fortunately, this doesn't bite us too often since many environments only
support one module, and since there is nothing to tell the user that the plm
module whose priority they raised actually -didn't- get used!

However, in the interest of "least astonishment", some of us working on the
RTE had changed our coding approach to avoid this confusion. Given that we
have this nice mca component select logic that takes the specified module -
i.e., "-mca plm foo" always yields foo if it can run, or errors out if it
can't - then the safest course is to remove MCA params that adjust module
priorities and have the user simply tell us which module they want us to
use.

Do we want to make this consistent, at least in the PLM? Or do you want to
leave the user guessing? :-)

Ralph




Re: [OMPI devel] PLM consistency: launch agent param

2008-07-11 Thread Ralph H Castain
I suppose we could even just make it an mpirun cmd line param, at that
point. As an MCA param, though, we have typically insisted on a particular
syntax that includes framework and component...


On 7/11/08 8:41 AM, "Don Kerr" <don.k...@sun.com> wrote:

> For something as fundamental as launch do we still need to specify the
> component, could it just be "launch_agent"?
> 
> Jeff Squyres wrote:
>> Sounds good to me.  We've done similar things in other frameworks --
>> put in MCA base params for things that all components could use.  How
>> about plm_base_launch_agent?
>> 
>> 
>> On Jul 11, 2008, at 10:17 AM, Ralph H Castain wrote:
>> 
>>> Since the question of backward compatibility of params came up... ;-)
>>> 
>>> I've been perusing the various PLM modules to check consistency. One
>>> thing I
>>> noted right away is that -every- PLM module registers an MCA param to
>>> let
>>> the user specify an orted cmd. I believe this specifically was done so
>>> people could insert their favorite debugger in front of the "orted"
>>> on the
>>> spawned command line - e.g., "valgrind orted".
>>> 
>>> The problem is that this forces the user to have to figure out the
>>> name of
>>> the PLM module being used as the param is called "-mca
>>> plm_rsh_agent", or
>>> "-mca plm_lsf_orted", or...you name it.
>>> 
>>> For users that only ever operate in one environment, who cares. However,
>>> many users (at least around here) operate in multiple environments,
>>> and this
>>> creates confusion.
>>> 
>>> I propose to create a single MCA param name for this value -
>>> something like
>>> "-mca plm_launch_agent" or whatever - and get rid of all these
>>> individual
>>> registrations to reduce the user confusion.
>>> 
>>> Comments? I'll put my helmet on
>>> Ralph
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] PLM consistency: launch agent param

2008-07-11 Thread Ralph H Castain
Since the question of backward compatibility of params came up... ;-)

I've been perusing the various PLM modules to check consistency. One thing I
noted right away is that -every- PLM module registers an MCA param to let
the user specify an orted cmd. I believe this specifically was done so
people could insert their favorite debugger in front of the "orted" on the
spawned command line - e.g., "valgrind orted".

The problem is that this forces the user to have to figure out the name of
the PLM module being used as the param is called "-mca plm_rsh_agent", or
"-mca plm_lsf_orted", or...you name it.

For users that only ever operate in one environment, who cares. However,
many users (at least around here) operate in multiple environments, and this
creates confusion.

I propose to create a single MCA param name for this value - something like
"-mca plm_launch_agent" or whatever - and get rid of all these individual
registrations to reduce the user confusion.

Comments? I'll put my helmet on
Ralph




Re: [OMPI devel] v1.3 RM: need a ruling

2008-07-11 Thread Ralph H Castain



On 7/11/08 7:48 AM, "Terry Dontje"  wrote:

> Jeff Squyres wrote:
>> Check that -- Ralph and I talked more about #1383 and have come up
>> with a decent/better solution that a) is not wonky and b) does not
>> involve MCA parameter synonyms.  We're working on it in an hg and will
>> put it back when done (probably within a business day or three).
>> 
>> So I think the MCA synonym stuff *isn't* needed for v1.3 after all.
>> 
> I am not dead yet!!!
> 
> So, there was also the name change of pls_rsh_agent to plm_rsh_agent
> because the pls's were sucked into plm's (or so I believe).  Anyways,
> this seems like another case to support synonyms.  Also are there other
> pls mca parameters that have had their names changed to plm?

I think you're opening a really ugly can of worms. How far back do you want
to go? v1.0? v0.1? We have a history of changing mca param names across
major releases, so trying to keep everything alive could well become a
nightmare.

I would hate to try and figure out all the changes - and what about the
params that simply have disappeared, or had their functionality absorbed by
some combination of other params?

My head aches already... :-)

Ralph

> 
> --td
>> I think the MCA param synonyms and "deprecated" stuff is useful for
>> the future, but at this point, nothing in v1.3 would use it.  So my
>> $0.02 is that we should leave it out.
>> 
>> 
>> 
>> On Jul 10, 2008, at 2:00 PM, Jeff Squyres (jsquyres) wrote:
>> 
>>> K, will do.  Note that it turns out that we did not yet solve the
>>> mpi_paffinity_alone issue, but we're working on it.  I'm working on
>>> the IOF issue ATM; will return to mpi_paffinity_alone in a bit...
>>> 
>>> 
>>> On Jul 10, 2008, at 1:56 PM, George Bosilca wrote:
>>> 
 I'm 100% with Brad on this. Please go ahead and include this feature
 in the 1.3.
 
  george.
 
 On Jul 10, 2008, at 11:33 AM, Brad Benton wrote:
 
> I think this is very reasonable to go ahead and include for 1.3.  I
> find that preferable to a 1.3-specific "wonky" workaround.  Plus,
> this sounds like something that is very good to have in general.
> 
> --brad
> 
> 
> On Wed, Jul 9, 2008 at 8:49 PM, Jeff Squyres 
> wrote:
> v1.3 RMs: Due to some recent work, the MCA parameter
> mpi_paffinity_alone disappeared -- it was moved and renamed to be
> opal_paffinity_alone.  This is Bad because we have a lot of
> historical precent based on the MCA param name
> "mpi_paffinity_alone" (FAQ, PPT presentations, e-mails on public
> lists, etc.).  So it needed to be restored for v1.3.  I just
> noticed that I hadn't opened a ticket on this -- sorry -- I opened
> #1383 tonight.
> 
> For a variety of reasons described in the commit message r1383,
> Lenny and I first decided that it would be best to fix this problem
> by the functionality committed in r18770 (have the ability to find
> out where an MCA parameter was set).  This would allow us to
> register two MCA params: mpi_paffinity_alone and
> opal_paffinity_alone, and generally do the Right Thing (because we
> could then tell if a user had set a value or whether it was a
> default MCA param value).  This functionality will also be useful
> in the openib BTL, where there is a blend of MCA parameters and INI
> file parameters.
> 
> However, after doing that, it seemed like only a few more steps to
> implement an overall better solution: implement "synonyms" for MCA
> parameters.  I.e., register the name "mpi_paffinity_alone" as a
> synonym for opal_paffinity_alone.  Along the way, it was trivial to
> add a "deprecated" flag for MCA parameters that we no longer want
> to use anymore (this deprecated flag is also useful in the OB1 PML
> and openib BTL).
> 
> So to fix a problem that needed to be fixed for v1.3 (restore the
> MCA parameter "mpi_paffinity_alone"), I ended up implementing new
> functionality.
> 
> Can this go into v1.3, or do we need to implement some kind of
> alternate fix?  (I admit to not having thought through what it
> would take to fix without the new MCA parameter functionality -- it
> might be kinda wonky)
> 
> --
> Jeff Squyres
> Cisco Systems
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
 
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> -- 
>>> Jeff Squyres
>>> Cisco Systems
>>> 
>>> ___
>>> devel mailing list
>>> 

Re: [OMPI devel] Ticket #1267 - thread locks in ompi_proc_t code

2008-07-07 Thread Ralph H Castain
I will have to correct something here. From what I can see, it appears the
MPI code may not be creating ompi_proc_t structures, but rather creating
arrays of ompi_proc_t* pointers that are then filled with values equal to
the pointers in the ompi_proc_list held inside of ompi/proc/proc.c.

It appears, though, that this may be done in a non-thread-safe manner. When
the arrays are filled by calling ompi_proc_world or ompi_proc_all, the
objects themselves are never OBJ_RETAIN'd. As a result, when the first
thread in the code calls OBJ_RELEASE, the object is removed from the
ompi_proc_list and free'd - but the other threads that called
ompi_proc_world/all still retain pointers that now reference invalid memory.

Perhaps the best path forward is to devise a thread-safe design for this
code area and present it for people's review. I'll see what I can do.

Again, comments are welcomed
Ralph



On 7/7/08 8:22 AM, "Ralph H Castain" <r...@lanl.gov> wrote:

> I am seeking input before making a change to the ompi/proc/proc.c code to
> resolve the referenced ticket. The change could potentially impact how the
> ompi_proc_t struct is used in the rest of the MPI code base. If this doesn't
> impact you, please ignore the remainder of this note.
> 
> 
> I was asked last week to take a look at ticket #1267, filed by Tim Prins
> several months ago. This ticket references a report on the devel list about
> thread locks when calling comm_spawn and using MPI_Init_thread. The thread
> lock is caused by the constructor/destructor in the ompi_proc_t class
> explicitly removing the referenced ompi_proc_t from the static local global
> ompi_proc_list, and calling OPAL_THREAD_LOCK/OPAL_THREAD_UNLOCK around that
> list operation.
> 
> As far as I can see, Tim correctly resolved the constructor conflict by
> simply removing the thread lock/unlock and list append operation from the
> constructor. A scan of the code shows that OBJ_NEW is -only- called from
> within the ompi/proc/proc.c code, so this won't be an issue.
> 
> However, I noted several issues surrounding the creation and release of
> ompi_proc_t objects that -may- cause problems in making a similar change to
> the destructor to fix the rest of the threading problem. These may have been
> created in response to the list modification code currently existing in the
> ompi_proc_t object destructor - or they may be due to other factors.
> 
> Specifically, the MPI code base outside of ompi/proc/proc.c:
> 
> 1. -never- treats the ompi_proc_t as an opal object. Instead, the code
> specifically calls calloc to create space for the structures, and then
> manually constructs them.
> 
> 2. consistently calls OBJ_RELEASE on the resulting structures, even though
> they were never created as opal objects via OBJ_NEW.
> 
> I confess to being puzzled here as the destructor will attempt to remove the
> referenced ompi_proc_t* pointer from the ompi_proc_list in ompi/proc/proc.c,
> but (since OBJ_NEW wasn't called) that object won't be on the list. Looking
> at the code itself, it appears we rely on the list function to realize that
> this object isn't on the list and ignore the request to remove it. We don't
> check the return code from the opal_list_remove_item, and so ignore the
> returned error.
> 
> My point here is to seek comment about the proposed fix for the problem
> referenced in the ticket. My proposal is to remove the thread lock/unlock
> and list manipulation from the ompi_proc_t destructor. From what I can see
> (as described above), this should not impact the rest of the code base. I
> will then add thread lock/unlock protection explicitly to ompi_proc_finalize
> to protect its list operations.
> 
> It appears to me that this change would also open the way to allowing the
> remainder of the code base to treat ompi_proc_t as an object, using OBJ_NEW
> to correctly and consistently initialize those objects. I note that any
> change to ompi_proc_t today (which has occurred in the not-too-distant
> past!) can create a problem throughout the current code base due to the
> manual construction of this object. This is why we have objects in the first
> place - I suspect people didn't use OBJ_NEW because of the automatic change
> it induced in the ompi_proc_list in ompi/proc/proc.c.
> 
> Any comments would be welcome.
> Ralph
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] Ticket #1267 - thread locks in ompi_proc_t code

2008-07-07 Thread Ralph H Castain
I am seeking input before making a change to the ompi/proc/proc.c code to
resolve the referenced ticket. The change could potentially impact how the
ompi_proc_t struct is used in the rest of the MPI code base. If this doesn't
impact you, please ignore the remainder of this note.


I was asked last week to take a look at ticket #1267, filed by Tim Prins
several months ago. This ticket references a report on the devel list about
thread locks when calling comm_spawn and using MPI_Init_thread. The thread
lock is caused by the constructor/destructor in the ompi_proc_t class
explicitly removing the referenced ompi_proc_t from the static local global
ompi_proc_list, and calling OPAL_THREAD_LOCK/OPAL_THREAD_UNLOCK around that
list operation.

As far as I can see, Tim correctly resolved the constructor conflict by
simply removing the thread lock/unlock and list append operation from the
constructor. A scan of the code shows that OBJ_NEW is -only- called from
within the ompi/proc/proc.c code, so this won't be an issue.

However, I noted several issues surrounding the creation and release of
ompi_proc_t objects that -may- cause problems in making a similar change to
the destructor to fix the rest of the threading problem. These may have been
created in response to the list modification code currently existing in the
ompi_proc_t object destructor - or they may be due to other factors.

Specifically, the MPI code base outside of ompi/proc/proc.c:

1. -never- treats the ompi_proc_t as an opal object. Instead, the code
specifically calls calloc to create space for the structures, and then
manually constructs them.

2. consistently calls OBJ_RELEASE on the resulting structures, even though
they were never created as opal objects via OBJ_NEW.

I confess to being puzzled here as the destructor will attempt to remove the
referenced ompi_proc_t* pointer from the ompi_proc_list in ompi/proc/proc.c,
but (since OBJ_NEW wasn't called) that object won't be on the list. Looking
at the code itself, it appears we rely on the list function to realize that
this object isn't on the list and ignore the request to remove it. We don't
check the return code from the opal_list_remove_item, and so ignore the
returned error.

My point here is to seek comment about the proposed fix for the problem
referenced in the ticket. My proposal is to remove the thread lock/unlock
and list manipulation from the ompi_proc_t destructor. From what I can see
(as described above), this should not impact the rest of the code base. I
will then add thread lock/unlock protection explicitly to ompi_proc_finalize
to protect its list operations.

It appears to me that this change would also open the way to allowing the
remainder of the code base to treat ompi_proc_t as an object, using OBJ_NEW
to correctly and consistently initialize those objects. I note that any
change to ompi_proc_t today (which has occurred in the not-too-distant
past!) can create a problem throughout the current code base due to the
manual construction of this object. This is why we have objects in the first
place - I suspect people didn't use OBJ_NEW because of the automatic change
it induced in the ompi_proc_list in ompi/proc/proc.c.

Any comments would be welcome.
Ralph




[OMPI devel] Trunk broken with linear, direct routing

2008-07-01 Thread Ralph H Castain
Since this appears to have gone unnoticed, it may not be a big deal.
However, I have found that multi-node operations are broken if you invoke
the linear or direct routed modules.

Things work fine with the default binomial routed module.

I will be working to fix this - just a heads up.
Ralph




[OMPI devel] Framework selection

2008-07-01 Thread Ralph H Castain
I ran into something unexpected today relative to the selection of
frameworks. It was totally unplanned, and may be an error on my part - or I
may be expecting the incorrect behavior. However, since others may encounter
it unexpectedly as well, I am sending this to the list.

What I had done was:

1. set OMPI_MCA_routed=direct in my environment

2. (much later) executed: mpirun ... -mca routed binomial ...

What happened was that mpirun selected the direct routed module, while my
application procs selected the binomial module. This unfortunately doesn't
generate a warning, but rather segfaults and/or hangs at some unpredictable
time depending upon the invoked communication patterns.

It was my understanding that the cmd line should override anything in the
environment. Is this no longer true? I checked and orterun does indeed
process the cmd line prior to invoking orte_init.

Or did I inadvertently do something wrong here (other than the fact that I
had forgotten the envar was set)?

Thanks
Ralph




Re: [OMPI devel] mtt IBM SPAWN error

2008-06-30 Thread Ralph H Castain
Well, that error indicates that it was unable to launch the daemon on witch3
for some reason. If you look at the error reported by bash, you will see
that the ³orted² binary wasn¹t found!

Sounds like a path error ­ you might check to see if witch3 has the binaries
installed, and if they are where you told the system to look...

Ralph



On 6/30/08 5:21 AM, "Lenny Verkhovsky"  wrote:

> I am not familiar with spawn test of IBM, but maybe this is right behavior,
> if spawn test allocates 3 ranks on the node, and then allocates another 3
> then this test suppose to fail due to max_slots=4.
>  
> But it fails with the fallowing hostfile as well BUT WITH A DIFFERENT ERROR.
>  
> #cat hostfile2 
> witch2 slots=4 max_slots=4
> witch3 slots=4 max_slots=4
> witch1:/home/BENCHMARKS/IBM # /home/USERS/lenny/OMPI_ORTE_18772/bin/mpirun -np
> 3 -hostfile hostfile2 dynamic/spawn
> bash: orted: command not found
> [witch1:22789] 
> --
> A daemon (pid 22791) died unexpectedly with status 127 while attempting
> to launch so we are aborting.
> There may be more information reported by the environment (see above).
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --
> [witch1:22789] 
> --
> mpirun was unable to cleanly terminate the daemons on the nodes shown
> below. Additional manual cleanup may be required - please refer to
> the "orte-clean" tool for assistance.
> --
> witch3 - daemon did not report back when launched
>  
> On Mon, Jun 30, 2008 at 9:38 AM, Lenny Verkhovsky 
> wrote:
>> Hi, 
>> trying to run mtt I failed to run IBM spawn test. It fails only when using
>> hostfile, and not when using host list.
>> ( OMPI from TRUNK )
>>  
>> This is working :
>> #mpirun -np 3 -H witch2 dynamic/spawn
>>  
>> This Fails:
>> # cat hostfile
>> witch2 slots=4 max_slots=4
>> #mpirun -np 3 -hostfile hostfile dynamic/spawn
>> [witch1:12392] 
>> --
>> There are not enough slots available in the system to satisfy the 3 slots
>> that were requested by the application:
>>   dynamic/spawn
>> 
>> Either request fewer slots for your application, or make more slots available
>> for use.
>> --
>> [witch1:12392] 
>> --
>> A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
>> launch so we are aborting.
>> 
>> There may be more information reported by the environment (see above).
>> 
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>> --
>> mpirun: clean termination accomplished
>>  
>>  
>> Using hostfile1 also works
>> #cat hostfile1
>> witch2
>> witch2
>> witch2
>>  
>>  
>> Best Regards
>> Lenny.
>> 
> 






Re: [OMPI devel] mtt IBM SPAWN error

2008-06-30 Thread Ralph H Castain
That¹s correct ­ and is precisely the behavior it should exhibit. The
reasons:

1. when you specify ­host, we assume max_slots is infinite since you cannot
provide any info to the contrary. We therefore allow you to oversubscribe
the node to your heart¹s desire. However, note one problem: if your original
launch is only one proc, we will set it to be aggressive in terms of
yielding the processor. Your subsequent comm_spawn¹d procs will therefore
suffer degraded performance if they oversubscribe the node.

Can¹t be helped - there is no way to pass enough info with -host for us to
do better.


2. when you run with -hostfile, your hostfile is telling us to allow no more
than 4 procs on the node. You used three in your original launch, leaving
only one slot available. Since each of the procs in the IBM test attempts to
spawn another, your job will fail.

We can always do more to improve the error messaging...
Ralph


On 6/30/08 12:38 AM, "Lenny Verkhovsky"  wrote:

> Hi, 
> trying to run mtt I failed to run IBM spawn test. It fails only when using
> hostfile, and not when using host list.
> ( OMPI from TRUNK )
>  
> This is working :
> #mpirun -np 3 -H witch2 dynamic/spawn
>  
> This Fails:
> # cat hostfile
> witch2 slots=4 max_slots=4
> #mpirun -np 3 -hostfile hostfile dynamic/spawn
> [witch1:12392] 
> --
> There are not enough slots available in the system to satisfy the 3 slots
> that were requested by the application:
>   dynamic/spawn
> 
> Either request fewer slots for your application, or make more slots available
> for use.
> --
> [witch1:12392] 
> --
> A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
> launch so we are aborting.
> 
> There may be more information reported by the environment (see above).
> 
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --
> mpirun: clean termination accomplished
>  
>  
> Using hostfile1 also works
> #cat hostfile1
> witch2
> witch2
> witch2
>  
>  
> Best Regards
> Lenny.
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel






Re: [OMPI devel] PML selection logic

2008-06-26 Thread Ralph H Castain
Just to complete this thread...

Brian raised a very good point, so we identified it on the weekly telecon as
a subject that really should be discussed at next week's technical meeting.
I think we can find a reasonable answer, but there are several ways it can
be done. So rather than doing our usual piecemeal approach to the solution,
it makes sense to begin talking about a more holistic design for
accommodating both needs.

Thanks Brian for pointing out the bigger picture.
Ralph



On 6/24/08 8:22 AM, "Brian W. Barrett" <brbar...@open-mpi.org> wrote:

> yeah, that could be a problem, but it's such a minority case and we've got
> to draw the line somewhere.
> 
> Of course, it seems like this is a never ending battle between two
> opposing forces...  The desire to do the "right thing" all the time at
> small and medium scale and the desire to scale out to the "big thing".
> It seems like in the quest to kill off the modex, we've run into these
> pretty often.
> 
> The modex doesn't hurt us at small scale (indeed, we're probably ok with
> the routed communication pattern up to 512 nodes or so if we don't do
> anything stupid, maybe further).  Is it time to admit defeat in this
> argument and have a configure option that turns off the modex (at the cost
> of some of these correctness checks) for the large machines, but keeps
> things simple for the common case?  I'm sure there are other things where
> this will come up, so perhaps a --enable-large-scale?  Maybe it's a dumb
> idea, but it seems like we've made a lot of compromises lately around
> this, where no one ends up really happy with the solution :/.
> 
> Brian
> 
> 
> On Tue, 24 Jun 2008, George Bosilca wrote:
> 
>> Brian hinted a possible bug in one of his replies. How does this work in the
>> case of dynamic processes? We can envision several scenarios, but lets take a
>> simple: 2 jobs that get connected with connect/accept. One might publish the
>> PML name (simply because the -mca argument was on) and one might not?
>> 
>> george.
>> 
>> On Jun 24, 2008, at 8:28 AM, Jeff Squyres wrote:
>> 
>>> Also sounds good to me.
>>> 
>>> Note that the most difficult part of the forward-looking plan is that we
>>> usually can't tell the difference between "something failed to initialize"
>>> and "you don't have support for feature X".
>>> 
>>> I like the general philosophy of: running out of the box always works just
>>> fine, but if you/the sysadmin is smart, you can get performance
>>> improvements.
>>> 
>>> 
>>> On Jun 23, 2008, at 4:18 PM, Shipman, Galen M. wrote:
>>> 
>>>> I concur
>>>> - galen
>>>> 
>>>> On Jun 23, 2008, at 3:44 PM, Brian W. Barrett wrote:
>>>> 
>>>>> That sounds like a reasonable plan to me.
>>>>> 
>>>>> Brian
>>>>> 
>>>>> On Mon, 23 Jun 2008, Ralph H Castain wrote:
>>>>> 
>>>>>> Okay, so let's explore an alternative that preserves the support you are
>>>>>> seeking for the "ignorant user", but doesn't penalize everyone else.
>>>>>> What we
>>>>>> could do is simply set things up so that:
>>>>>> 
>>>>>> 1. if -mca plm xyz is provided, then no modex data is added
>>>>>> 
>>>>>> 2. if it is not provided, then only rank=0 inserts the data. All other
>>>>>> procs
>>>>>> simply check their own selection against the one given by rank=0
>>>>>> 
>>>>>> Now, if a knowledgeable user or sys admin specifies what to use for
>>>>>> their
>>>>>> system, we won't penalize their startup time. A user who doesn't know
>>>>>> what
>>>>>> to do gets to run, albeit less scalably on startup.
>>>>>> 
>>>>>> Looking forward from there, we can look to a day where failing to
>>>>>> initialize
>>>>>> something that exists on the system could be detected in some other
>>>>>> fashion,
>>>>>> letting the local proc abort since it would know that other procs that
>>>>>> detected similar capabilities may well have selected that PML. For now,
>>>>>> though, this would solve the problem.
>>>>>> 
>>>>>> Make sense?
>>>>>> Ralph
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 6/23/08 1:31

Re: [OMPI devel] PML selection logic

2008-06-24 Thread Ralph H Castain
It is a good point. What I have prototyped would still handle it -
basically, it checks to see if any data has been published and does a modex
if so.

So if one side does send modex data, the other side will faithfully decode
it. I think the bigger issue will be if both sides don't, and they don't
match.

So perhaps for dynamic processes we just have to force the modex, just like
we force other things to happen that don't occur during a normal startup.
The prototype will handle that just fine, but I wasn't planning on
committing it into 1.3 just so we could test all these use-cases.


On 6/24/08 8:16 AM, "George Bosilca" <bosi...@eecs.utk.edu> wrote:

> Brian hinted a possible bug in one of his replies. How does this work
> in the case of dynamic processes? We can envision several scenarios,
> but lets take a simple: 2 jobs that get connected with connect/accept.
> One might publish the PML name (simply because the -mca argument was
> on) and one might not?
> 
>george.
> 
> On Jun 24, 2008, at 8:28 AM, Jeff Squyres wrote:
> 
>> Also sounds good to me.
>> 
>> Note that the most difficult part of the forward-looking plan is
>> that we usually can't tell the difference between "something failed
>> to initialize" and "you don't have support for feature X".
>> 
>> I like the general philosophy of: running out of the box always
>> works just fine, but if you/the sysadmin is smart, you can get
>> performance improvements.
>> 
>> 
>> On Jun 23, 2008, at 4:18 PM, Shipman, Galen M. wrote:
>> 
>>> I concur
>>> - galen
>>> 
>>> On Jun 23, 2008, at 3:44 PM, Brian W. Barrett wrote:
>>> 
>>>> That sounds like a reasonable plan to me.
>>>> 
>>>> Brian
>>>> 
>>>> On Mon, 23 Jun 2008, Ralph H Castain wrote:
>>>> 
>>>>> Okay, so let's explore an alternative that preserves the support
>>>>> you are
>>>>> seeking for the "ignorant user", but doesn't penalize everyone
>>>>> else. What we
>>>>> could do is simply set things up so that:
>>>>> 
>>>>> 1. if -mca plm xyz is provided, then no modex data is added
>>>>> 
>>>>> 2. if it is not provided, then only rank=0 inserts the data. All
>>>>> other procs
>>>>> simply check their own selection against the one given by rank=0
>>>>> 
>>>>> Now, if a knowledgeable user or sys admin specifies what to use
>>>>> for their
>>>>> system, we won't penalize their startup time. A user who doesn't
>>>>> know what
>>>>> to do gets to run, albeit less scalably on startup.
>>>>> 
>>>>> Looking forward from there, we can look to a day where failing to
>>>>> initialize
>>>>> something that exists on the system could be detected in some
>>>>> other fashion,
>>>>> letting the local proc abort since it would know that other procs
>>>>> that
>>>>> detected similar capabilities may well have selected that PML.
>>>>> For now,
>>>>> though, this would solve the problem.
>>>>> 
>>>>> Make sense?
>>>>> Ralph
>>>>> 
>>>>> 
>>>>> 
>>>>> On 6/23/08 1:31 PM, "Brian W. Barrett" <brbar...@open-mpi.org>
>>>>> wrote:
>>>>> 
>>>>>> The problem is that we default to OB1, but that's not the right
>>>>>> choice for
>>>>>> some platforms (like Pathscale / PSM), where there's a huge
>>>>>> performance
>>>>>> hit for using OB1.  So we run into a situation where user
>>>>>> installs Open
>>>>>> MPI, starts running, gets horrible performance, bad mouths Open
>>>>>> MPI, and
>>>>>> now we're in that game again.  Yeah, the sys admin should know
>>>>>> what to do,
>>>>>> but it doesn't always work that way.
>>>>>> 
>>>>>> Brian
>>>>>> 
>>>>>> 
>>>>>> On Mon, 23 Jun 2008, Ralph H Castain wrote:
>>>>>> 
>>>>>>> My fault - I should be more precise in my language. ;-/
>>>>>>> 
>>>>>>> #1 is not adequate, IMHO, as it forces us to -always- do a
>>>>>>> modex. It seems
>>>>>>> to me that a simpler solution

Re: [OMPI devel] PML selection logic

2008-06-23 Thread Ralph H Castain
Okay, so let's explore an alternative that preserves the support you are
seeking for the "ignorant user", but doesn't penalize everyone else. What we
could do is simply set things up so that:

1. if -mca plm xyz is provided, then no modex data is added

2. if it is not provided, then only rank=0 inserts the data. All other procs
simply check their own selection against the one given by rank=0

Now, if a knowledgeable user or sys admin specifies what to use for their
system, we won't penalize their startup time. A user who doesn't know what
to do gets to run, albeit less scalably on startup.

Looking forward from there, we can look to a day where failing to initialize
something that exists on the system could be detected in some other fashion,
letting the local proc abort since it would know that other procs that
detected similar capabilities may well have selected that PML. For now,
though, this would solve the problem.

Make sense?
Ralph



On 6/23/08 1:31 PM, "Brian W. Barrett" <brbar...@open-mpi.org> wrote:

> The problem is that we default to OB1, but that's not the right choice for
> some platforms (like Pathscale / PSM), where there's a huge performance
> hit for using OB1.  So we run into a situation where user installs Open
> MPI, starts running, gets horrible performance, bad mouths Open MPI, and
> now we're in that game again.  Yeah, the sys admin should know what to do,
> but it doesn't always work that way.
> 
> Brian
> 
> 
> On Mon, 23 Jun 2008, Ralph H Castain wrote:
> 
>> My fault - I should be more precise in my language. ;-/
>> 
>> #1 is not adequate, IMHO, as it forces us to -always- do a modex. It seems
>> to me that a simpler solution to what you describe is for the user to
>> specify -mca pml ob1, or -mca pml cm. If the latter, then you could deal
>> with the failed-to-initialize problem cleanly by having the proc directly
>> abort.
>> 
>> Again, sometimes I think we attempt to automate too many things. This seems
>> like a pretty clear case where you know what you want - the sys admin, if
>> nobody else, can certainly set that mca param in the default param file!
>> 
>> Otherwise, it seems to me that you are relying on the modex to detect that
>> your proc failed to init the correct subsystem. I hate to force a modex just
>> for that - if so, then perhaps this could again be a settable option to
>> avoid requiring non-scalable behavior for those of us who want scalability?
>> 
>> 
>> On 6/23/08 1:21 PM, "Brian W. Barrett" <brbar...@open-mpi.org> wrote:
>> 
>>> The selection code was added because frequently high speed interconnects
>>> fail to initialize properly due to random stuff happening (yes, that's a
>>> horrible statement, but true).  We ran into a situation with some really
>>> flaky machines where most of the processes would chose CM, but a couple
>>> would fail to initialize the MTL and therefore chose OB1.  This lead to a
>>> hang situation, which is the worst of the worst.
>>> 
>>> I think #1 is adequate, although it doesn't handle spawn particularly
>>> well.  And spawn is generally used in environments where such network
>>> mismatches are most likely to occur.
>>> 
>>> Brian
>>> 
>>> 
>>> On Mon, 23 Jun 2008, Ralph H Castain wrote:
>>> 
>>>> Since my goal is to eliminate the modex completely for managed
>>>> installations, could you give me a brief understanding of this eventual PML
>>>> selection logic? It would help to hear an example of how and why different
>>>> procs could get different answers - and why we would want to allow them to
>>>> do so.
>>>> 
>>>> Thanks
>>>> Ralph
>>>> 
>>>> 
>>>> 
>>>> On 6/23/08 11:59 AM, "Aurélien Bouteiller" <boute...@eecs.utk.edu> wrote:
>>>> 
>>>>> The first approach sounds fair enough to me. We should avoid 2 and 3
>>>>> as the pml selection mechanism used to be
>>>>> more complex before we reduced it to accommodate a major design bug in
>>>>> the BTL selection process. When using the complete PML selection, BTL
>>>>> would be initialized several times, leading to a variety of bugs.
>>>>> Eventually the PML selection should return to its old self, when the
>>>>> BTL bug gets fixed.
>>>>> 
>>>>> Aurelien
>>>>> 
>>>>> Le 23 juin 08 à 12:36, Ralph H Castain a écrit :
>>>>> 
>>>>>> Yo all
>>>>>> 
>>>&

Re: [OMPI devel] PML selection logic

2008-06-23 Thread Ralph H Castain
My fault - I should be more precise in my language. ;-/

#1 is not adequate, IMHO, as it forces us to -always- do a modex. It seems
to me that a simpler solution to what you describe is for the user to
specify -mca pml ob1, or -mca pml cm. If the latter, then you could deal
with the failed-to-initialize problem cleanly by having the proc directly
abort.

Again, sometimes I think we attempt to automate too many things. This seems
like a pretty clear case where you know what you want - the sys admin, if
nobody else, can certainly set that mca param in the default param file!

Otherwise, it seems to me that you are relying on the modex to detect that
your proc failed to init the correct subsystem. I hate to force a modex just
for that - if so, then perhaps this could again be a settable option to
avoid requiring non-scalable behavior for those of us who want scalability?


On 6/23/08 1:21 PM, "Brian W. Barrett" <brbar...@open-mpi.org> wrote:

> The selection code was added because frequently high speed interconnects
> fail to initialize properly due to random stuff happening (yes, that's a
> horrible statement, but true).  We ran into a situation with some really
> flaky machines where most of the processes would chose CM, but a couple
> would fail to initialize the MTL and therefore chose OB1.  This lead to a
> hang situation, which is the worst of the worst.
> 
> I think #1 is adequate, although it doesn't handle spawn particularly
> well.  And spawn is generally used in environments where such network
> mismatches are most likely to occur.
> 
> Brian
> 
> 
> On Mon, 23 Jun 2008, Ralph H Castain wrote:
> 
>> Since my goal is to eliminate the modex completely for managed
>> installations, could you give me a brief understanding of this eventual PML
>> selection logic? It would help to hear an example of how and why different
>> procs could get different answers - and why we would want to allow them to
>> do so.
>> 
>> Thanks
>> Ralph
>> 
>> 
>> 
>> On 6/23/08 11:59 AM, "Aurélien Bouteiller" <boute...@eecs.utk.edu> wrote:
>> 
>>> The first approach sounds fair enough to me. We should avoid 2 and 3
>>> as the pml selection mechanism used to be
>>> more complex before we reduced it to accommodate a major design bug in
>>> the BTL selection process. When using the complete PML selection, BTL
>>> would be initialized several times, leading to a variety of bugs.
>>> Eventually the PML selection should return to its old self, when the
>>> BTL bug gets fixed.
>>> 
>>> Aurelien
>>> 
>>> Le 23 juin 08 à 12:36, Ralph H Castain a écrit :
>>> 
>>>> Yo all
>>>> 
>>>> I've been doing further research into the modex and came across
>>>> something I
>>>> don't fully understand. It seems we have each process insert into
>>>> the modex
>>>> the name of the PML module that it selected. Once the modex has
>>>> exchanged
>>>> that info, it then loops across all procs in the job to check their
>>>> selection, and aborts if any proc picked a different PML module.
>>>> 
>>>> All well and good...assuming that procs actually -can- choose
>>>> different PML
>>>> modules and hence create an "abort" scenario. However, if I look
>>>> inside the
>>>> PML's at their selection logic, I find that a proc can ONLY pick a
>>>> module
>>>> other than ob1 if:
>>>> 
>>>> 1. the user specifies the module to use via -mca pml xyz or by using a
>>>> module specific mca param to adjust its priority. In this case,
>>>> since the
>>>> mca param is propagated, ALL procs have no choice but to pick that
>>>> same
>>>> module, so that can't cause us to abort (we will have already
>>>> returned an
>>>> error and aborted if the specified module can't run).
>>>> 
>>>> 2. the pml/cm module detects that an MTL module was selected, and
>>>> that it is
>>>> other than "psm". In this case, the CM module will be selected
>>>> because its
>>>> default priority is higher than that of OB1.
>>>> 
>>>> In looking deeper into the MTL selection logic, it appears to me
>>>> that you
>>>> either have the required capability or you don't. I can see that in
>>>> some
>>>> environments (e.g., rsh across unmanaged collections of machines),
>>>> it might
>>>> be possible for someone to launch across a set of mach

[OMPI devel] PML selection logic

2008-06-23 Thread Ralph H Castain
Yo all

I've been doing further research into the modex and came across something I
don't fully understand. It seems we have each process insert into the modex
the name of the PML module that it selected. Once the modex has exchanged
that info, it then loops across all procs in the job to check their
selection, and aborts if any proc picked a different PML module.

All well and good...assuming that procs actually -can- choose different PML
modules and hence create an "abort" scenario. However, if I look inside the
PML's at their selection logic, I find that a proc can ONLY pick a module
other than ob1 if:

1. the user specifies the module to use via -mca pml xyz or by using a
module specific mca param to adjust its priority. In this case, since the
mca param is propagated, ALL procs have no choice but to pick that same
module, so that can't cause us to abort (we will have already returned an
error and aborted if the specified module can't run).

2. the pml/cm module detects that an MTL module was selected, and that it is
other than "psm". In this case, the CM module will be selected because its
default priority is higher than that of OB1.

In looking deeper into the MTL selection logic, it appears to me that you
either have the required capability or you don't. I can see that in some
environments (e.g., rsh across unmanaged collections of machines), it might
be possible for someone to launch across a set of machines where some do and
some don't have the required support. However, in all other cases, this will
be homogeneous across the system.

Given this analysis (and someone more familiar with the PML should feel free
to confirm or correct it), it seems to me that this could be streamlined via
one or more means:

1. at the most, we could have rank=0 add the PML module name to the modex,
and other procs simply check it against their own and return an error if
they differ. This accomplishes the identical functionality to what we have
today, but with much less info in the modex.

2. we could eliminate this info from the modex altogether by requiring the
user to specify the PML module if they want something other than the default
OB1. In this case, there can be no confusion over what each proc is to use.
The CM module will attempt to init the MTL - if it cannot do so, then the
job will return the correct error and tell the user that CM/MTL support is
unavailable.

3. we could again eliminate the info by not inserting it into the modex if
(a) the default PML module is selected, or (b) the user specified the PML
module to be used. In the first case, each proc can simply check to see if
they picked the default - if not, then we can insert the info to indicate
the difference. Thus, in the "standard" case, no info will be inserted.

In the second case, we will already get an error if the specified PML module
could not be used. Hence, the modex check provides no additional info or
value.

I understand the motivation to support automation. However, in this case,
the automation actually doesn't seem to buy us very much, and it isn't
coming "free". So perhaps some change in how this is done would be in order?

Ralph





Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Ralph H Castain
Hmmm...something isn't right, Pasha. There is simply no way you should be
encountering this error. You are picking up the wrong grpcomm module.

I went ahead and fixed the grpcomm/basic module, but as I note in the commit
message, that is now an experimental area. The grpcomm/bad module is the
default for that reason.

Check to ensure you have the orte/mca/grpcomm/bad directory, and that it is
getting built. My guess is that you have a corrupted checkout or build and
that the component is either missing or not getting built.


On 6/19/08 1:37 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il> wrote:

> Ralph H Castain wrote:
>> I can't find anything wrong so far. I'm waiting in a queue on Odin to try
>> there since Jeff indicated you are using rsh as a launcher, and that's the
>> only access I have to such an environment. Guess Odin is being pounded
>> because the queue isn't going anywhere.
>>   
>  I use ssh., here is command line:
> ./bin/mpirun -np 2 -H sw214,sw214 -mca btl openib,sm,self
> ./osu_benchmarks-3.0/osu_latency
>> Meantime, I'm building on RoadRunner and will test there (TM enviro).
>> 
>> 
>> On 6/19/08 1:18 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il> wrote:
>> 
>>   
>>>> You'll have to tell us something more than that, Pasha. What kind of
>>>> environment, what rev level were you at, etc.
>>>>   
>>>>   
>>> Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI) 1.3a1r18682M
>>> , OFED 1.3.1
>>> Pasha.
>>> 
>>>> So far as I know, the trunk is fine.
>>>> 
>>>> 
>>>> On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il>
>>>> wrote:
>>>> 
>>>>   
>>>>   
>>>>> I tried to run trunk on my machines and I got follow error:
>>>>> 
>>>>> [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
>>>>> end of buffer in file base/grpcomm_base_modex.c at line 451
>>>>> [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
>>>>> end of buffer in file grpcomm_basic_module.c at line 560
>>>>> [sw214:04365]
>>>>> --
>>>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>>>> likely to abort.  There are many reasons that a parallel process can
>>>>> fail during MPI_INIT; some of which are due to configuration or
>>>>> environment
>>>>> problems.  This failure appears to be an internal failure; here's some
>>>>> additional information (which may only be relevant to an Open MPI
>>>>> developer):
>>>>> 
>>>>>   orte_grpcomm_modex failed
>>>>>   --> Returned "Data unpack would read past end of buffer" (-26) instead
>>>>> of "Success" (0)
>>>>> 
>>>>> ___
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> 
>>>>> 
>>>> ___
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>>>   
>>>>   
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>>   
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Ralph H Castain
Ha! I found it - you left out one very important detail. You are specifying
the use of the grpcomm basic module instead of the default "bad" one.

I just checked and that module is indeed showing a problem. I'll see what I
can do.

For now, though, just use the default grpcomm and it will work fine.


On 6/19/08 1:18 PM, "Pavel Shamis (Pasha)"  wrote:

> 
>> You'll have to tell us something more than that, Pasha. What kind of
>> environment, what rev level were you at, etc.
>>   
> Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI) 1.3a1r18682M
> , OFED 1.3.1
> Pasha.
>> So far as I know, the trunk is fine.
>> 
>> 
>> On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" 
>> wrote:
>> 
>>   
>>> I tried to run trunk on my machines and I got follow error:
>>> 
>>> [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
>>> end of buffer in file base/grpcomm_base_modex.c at line 451
>>> [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
>>> end of buffer in file grpcomm_basic_module.c at line 560
>>> [sw214:04365] 
>>> --
>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during MPI_INIT; some of which are due to configuration or environment
>>> problems.  This failure appears to be an internal failure; here's some
>>> additional information (which may only be relevant to an Open MPI
>>> developer):
>>> 
>>>   orte_grpcomm_modex failed
>>>   --> Returned "Data unpack would read past end of buffer" (-26) instead
>>> of "Success" (0)
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>>   
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Ralph H Castain
I can't find anything wrong so far. I'm waiting in a queue on Odin to try
there since Jeff indicated you are using rsh as a launcher, and that's the
only access I have to such an environment. Guess Odin is being pounded
because the queue isn't going anywhere.

Meantime, I'm building on RoadRunner and will test there (TM enviro).


On 6/19/08 1:18 PM, "Pavel Shamis (Pasha)"  wrote:

> 
>> You'll have to tell us something more than that, Pasha. What kind of
>> environment, what rev level were you at, etc.
>>   
> Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI) 1.3a1r18682M
> , OFED 1.3.1
> Pasha.
>> So far as I know, the trunk is fine.
>> 
>> 
>> On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" 
>> wrote:
>> 
>>   
>>> I tried to run trunk on my machines and I got follow error:
>>> 
>>> [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
>>> end of buffer in file base/grpcomm_base_modex.c at line 451
>>> [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
>>> end of buffer in file grpcomm_basic_module.c at line 560
>>> [sw214:04365] 
>>> --
>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during MPI_INIT; some of which are due to configuration or environment
>>> problems.  This failure appears to be an internal failure; here's some
>>> additional information (which may only be relevant to an Open MPI
>>> developer):
>>> 
>>>   orte_grpcomm_modex failed
>>>   --> Returned "Data unpack would read past end of buffer" (-26) instead
>>> of "Success" (0)
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>>   
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] RML Send

2008-06-19 Thread Ralph H Castain
Okay, I've traced this down. The problem is that a DSS-internal function has
been exposed via the API, so now people can mistakenly call the wrong one.
You should -never- be using opal_dss.pack_buffer or opal_dss.unpack_buffer.
Those were supposed to be internal to the DSS only, and will definitely mess
you up if called directly.

I'll fix this problem to avoid future issues. There is a comment in dss.h
that warns you never to call those functions, but who would remember?

I sure wouldn't. I've only avoided the problem because of ignorance - I
didn't know those API's existed!

Should have a fix in later today.
Ralph



On 6/19/08 8:43 AM, "Ralph H Castain" <r...@lanl.gov> wrote:

> WOW! Somebody really screwed up the DSS by adding some new API's I'd never
> heard of before, but really can cause the system to break!
> 
> I'm going to have to straighten this mess out - it is a total disaster.
> There needs to be just ONE way of packing and unpacking, not two totally
> incompatible methods.
> 
> Will let you know when it is fixed - probably early next week.
> Ralph
>  
> 
> 
> On 6/19/08 8:34 AM, "Leonardo Fialho" <lfia...@aomail.uab.es> wrote:
> 
>> Hi Ralph,
>> 
>> Mi mistake, I'm really using ORTE_PROC_MY_DAEMON->jobid.
>> 
>> I have success using pack_buffer()/unpack_buffer() and OPAL_BYTE type,
>> something strange occur when I was using pack()/unpack(). The value of
>> num_bytes increase, example:
>> I tried to read num_bytes=5, and after a unpack this var have 33! I
>> don't understand it...
>> 
>> Thanks,
>> Leonardo Fialho
>> 
>> Ralph Castain escribió:
>>> 
>>> On 6/17/08 3:35 PM, "Leonardo Fialho" <lfia...@aomail.uab.es> wrote:
>>> 
>>>   
>>>> Hi Ralph,
>>>> 
>>>> 1) Yes, I'm using ORTE_RML_TAG_DAEMON with a new "command" that I
>>>> defined in "odls_types.h".
>>>> 2) I'm packing and unpacking variables like OPAL_INT, OPAL_SIZE, ...
>>>> 3) I'm not blocking the "process_commands" function with long code.
>>>> 4) To know the daemon's vpid and jobid I used the same jobid from the
>>>> app (in this solution, I can be changed) and the vpid is ordered
>>>> sequentially (0 for mpirun and 1 to N for the orted's).
>>>> 
>>> 
>>> The jobid of the daemons is different from the jobid of the apps. So at the
>>> moment, you are actually sending the message to another app!
>>> 
>>> You can find the jobid of the daemons by extracting it as
>>> ORTE_PROC_MY_DAEMON->jobid. Please note, though, that the app has no
>>> knowledge of the contact info for that daemon, so this message will have to
>>> route through the local daemon. Happens transparently, but just wanted to be
>>> clear as to how this is working.
>>> 
>>>   
>>>> The problems is: I need to send a buffered data, and I don't know the
>>>> type of this data. I'm trying to use OPAL_NULL and OPAL_DATA_VALUE to
>>>> send it but I got no success :(
>>>> 
>>> 
>>> If I recall correctly, you were trying to archive messages that flowed
>>> through the PML - correct? I would suggest just treating them as bytes and
>>> packing them as an opal_byte_object_t, something like this:
>>> 
>>> opal_byte_object_t bo;
>>> 
>>> bo.size = sizeof(my-data);
>>> bo.data = *my_data;
>>> 
>>> opal_dss.pack(*buffer, , 1, OPAL_BYTE_OBJECT);
>>>  
>>> Then on the other end:
>>> 
>>> opal_byte_object_t *bo;
>>> int32_t n;
>>> 
>>> opal_dss.unpack(*buffer, , , OPAL_BYTE_OBJECT);
>>> 
>>> You can then transfer the data into whatever storage you like. All this does
>>> is pass the #bytes and the bytes as a collected unit - you could, of course,
>>> simply pass the #bytes and bytes with independent packs if you wanted:
>>> 
>>> int32_t num_bytes;
>>> uint8_t *my_data;
>>> 
>>> opal_dss.pack(*buffer, _bytes, 1, OPAL_INT32);
>>> opal_dss.pack(*buffer, my-data, num_bytes, OPAL_BYTE);
>>> 
>>> ...
>>> 
>>> opal_dss.unpack(*buffer, _bytes, , OPAL_INT32);
>>> my_data = (uint8_t*)malloc(num_bytes);
>>> opal_dss.unpack(*buffer, _data, _bytes, OPAL_BYTE);
>>> 
>>> 
>>> Up to you.
>>> 
>>> Hope that helps
>>> Ralph
>>> 
>>>   
>>>> Thanks in advance,
>>>> Leonardo Fialho
>>&

Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Ralph H Castain
You'll have to tell us something more than that, Pasha. What kind of
environment, what rev level were you at, etc.

So far as I know, the trunk is fine.


On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" 
wrote:

> I tried to run trunk on my machines and I got follow error:
> 
> [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
> end of buffer in file base/grpcomm_base_modex.c at line 451
> [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
> end of buffer in file grpcomm_basic_module.c at line 560
> [sw214:04365] 
> --
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
> 
>   orte_grpcomm_modex failed
>   --> Returned "Data unpack would read past end of buffer" (-26) instead
> of "Success" (0)
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn] svn:open-mpi r18677

2008-06-19 Thread Ralph H Castain
I would argue that this behavior is in fact consistent - the returned state
is that all required connections have been opened and is independent of the
selected routed module. How that is done is irrelevant to the caller.

Each routed module knows precisely what connections are used for its
operation. It is therefore trivial for it to internally do the right thing.
For example, in the binomial case, no communication is required whatsoever
for an MPI proc (only a daemon would ever warmup its connections to its
parent and/or children). In the direct module, the old wireup is required.
In a topo-aware module, we may want to do some other pattern.

In all cases, the precise pattern to be used depends upon whether we are
warming up the connections of a daemon, the HNP, or an application process.
We will shortly be calling "warmup_routes" for all three cases, though for
now the actions taken may be "null" in some cases.

So we might as well let each routed module do what it thinks is required. I
don't see much advantage in having something that digs the info out of the
module, and then attempts to reconstruct what the module already knew how to
do. What matters is that the end state is consistent - what happens under
the covers is solely determined by the selected routed module.

Ralph


On 6/19/08 10:05 AM, "George Bosilca"  wrote:

> Ralph,
> 
> I don't necessarily agree with this statement. There is a generic
> method to do the correct wireup, and this method works independent of
> the selected routed algorithms.
> 
> One can use the routed to ask for the next hop for each of the
> destinations, make a unique list out of these first hop destinations,
> and then finally generate the connections to them. Of course there is
> a cost associated with this method. Creating the temporary list will
> be a quite expensive, but this list will be smaller for highly
> optimized routed components. Eventually, a more optimized approach
> will be to use the get_routing_tree function in order to gather the
> direct routes, and then start the connections to these children. This
> approach is not more complex than the current implementation, and give
> us the benefit of having a consistent behavior in all cases.
> 
>george.
> 
> On Jun 19, 2008, at 3:48 PM, r...@osl.iu.edu wrote:
> 
>> Author: rhc
>> Date: 2008-06-19 09:48:26 EDT (Thu, 19 Jun 2008)
>> New Revision: 18677
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/18677
>> 
>> Log:
>> Shift responsibility for preconnecting the oob to the orte routed
>> framework, which is the only place that knows what needs to be done.
>> Only the direct module will actually do anything - it uses the same
>> algo as the original preconnect function.
>> 
>> Text files modified:
>>   trunk/ompi/mca/dpm/dpm.h| 7 +-
>>   trunk/ompi/runtime/mpiruntime.h | 1
>>   trunk/ompi/runtime/ompi_mpi_init.c  |29 ++
>> +--
>>   trunk/ompi/runtime/ompi_mpi_preconnect.c|80
>> ---
>>   trunk/orte/mca/grpcomm/basic/grpcomm_basic_module.c |13 -
>>   trunk/orte/mca/odls/base/odls_base_default_fns.c| 5 +
>>   trunk/orte/mca/routed/binomial/routed_binomial.c|20 +++
>> ++
>>   trunk/orte/mca/routed/direct/routed_direct.c|56 +++
>> 
>>   trunk/orte/mca/routed/linear/routed_linear.c| 7 +++
>>   trunk/orte/mca/routed/routed.h  |10 
>>   trunk/orte/orted/orted_comm.c   | 5 ++
>>   trunk/orte/util/nidmap.c|81 +++
>> +++-
>>   12 files changed, 184 insertions(+), 130 deletions(-)
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] RML Send

2008-06-19 Thread Ralph H Castain
WOW! Somebody really screwed up the DSS by adding some new API's I'd never
heard of before, but really can cause the system to break!

I'm going to have to straighten this mess out - it is a total disaster.
There needs to be just ONE way of packing and unpacking, not two totally
incompatible methods.

Will let you know when it is fixed - probably early next week.
Ralph



On 6/19/08 8:34 AM, "Leonardo Fialho" <lfia...@aomail.uab.es> wrote:

> Hi Ralph,
> 
> Mi mistake, I'm really using ORTE_PROC_MY_DAEMON->jobid.
> 
> I have success using pack_buffer()/unpack_buffer() and OPAL_BYTE type,
> something strange occur when I was using pack()/unpack(). The value of
> num_bytes increase, example:
> I tried to read num_bytes=5, and after a unpack this var have 33! I
> don't understand it...
> 
> Thanks,
> Leonardo Fialho
> 
> Ralph Castain escribió:
>> 
>> On 6/17/08 3:35 PM, "Leonardo Fialho" <lfia...@aomail.uab.es> wrote:
>> 
>>   
>>> Hi Ralph,
>>> 
>>> 1) Yes, I'm using ORTE_RML_TAG_DAEMON with a new "command" that I
>>> defined in "odls_types.h".
>>> 2) I'm packing and unpacking variables like OPAL_INT, OPAL_SIZE, ...
>>> 3) I'm not blocking the "process_commands" function with long code.
>>> 4) To know the daemon's vpid and jobid I used the same jobid from the
>>> app (in this solution, I can be changed) and the vpid is ordered
>>> sequentially (0 for mpirun and 1 to N for the orted's).
>>> 
>> 
>> The jobid of the daemons is different from the jobid of the apps. So at the
>> moment, you are actually sending the message to another app!
>> 
>> You can find the jobid of the daemons by extracting it as
>> ORTE_PROC_MY_DAEMON->jobid. Please note, though, that the app has no
>> knowledge of the contact info for that daemon, so this message will have to
>> route through the local daemon. Happens transparently, but just wanted to be
>> clear as to how this is working.
>> 
>>   
>>> The problems is: I need to send a buffered data, and I don't know the
>>> type of this data. I'm trying to use OPAL_NULL and OPAL_DATA_VALUE to
>>> send it but I got no success :(
>>> 
>> 
>> If I recall correctly, you were trying to archive messages that flowed
>> through the PML - correct? I would suggest just treating them as bytes and
>> packing them as an opal_byte_object_t, something like this:
>> 
>> opal_byte_object_t bo;
>> 
>> bo.size = sizeof(my-data);
>> bo.data = *my_data;
>> 
>> opal_dss.pack(*buffer, , 1, OPAL_BYTE_OBJECT);
>>  
>> Then on the other end:
>> 
>> opal_byte_object_t *bo;
>> int32_t n;
>> 
>> opal_dss.unpack(*buffer, , , OPAL_BYTE_OBJECT);
>> 
>> You can then transfer the data into whatever storage you like. All this does
>> is pass the #bytes and the bytes as a collected unit - you could, of course,
>> simply pass the #bytes and bytes with independent packs if you wanted:
>> 
>> int32_t num_bytes;
>> uint8_t *my_data;
>> 
>> opal_dss.pack(*buffer, _bytes, 1, OPAL_INT32);
>> opal_dss.pack(*buffer, my-data, num_bytes, OPAL_BYTE);
>> 
>> ...
>> 
>> opal_dss.unpack(*buffer, _bytes, , OPAL_INT32);
>> my_data = (uint8_t*)malloc(num_bytes);
>> opal_dss.unpack(*buffer, _data, _bytes, OPAL_BYTE);
>> 
>> 
>> Up to you.
>> 
>> Hope that helps
>> Ralph
>> 
>>   
>>> Thanks in advance,
>>> Leonardo Fialho
>>> 
>>> 
>>> Ralph H Castain escribió:
>>> 
>>>> I'm not sure exactly how you are trying to do this, but the usual procedure
>>>> would be:
>>>> 
>>>> 1. call opal_dss.pack(*buffer, *data, #data, data_type) for each thing you
>>>> want to put in the buffer. So you might call this to pack a string:
>>>> 
>>>> opal_dss.pack(*buffer, , 1, OPAL_STRING);
>>>> 
>>>> 2. once you have everything packed into the buffer, you send the buffer
>>>> with
>>>> 
>>>> orte_rml.send_buffer(*dest, *buffer, dest_tag, 0);
>>>> 
>>>> What you will need is a tag that the daemon is listening on that won't
>>>> interfere with its normal operations - i.e., what you send won't get held
>>>> forever waiting to get serviced, and your servicing won't block us from
>>>> responding to a ctrl-c. You can probably use ORTE_RML_TAG_DAEMON, but you
>>>>

Re: [OMPI devel] RML Send

2008-06-17 Thread Ralph H Castain
I'm not sure exactly how you are trying to do this, but the usual procedure
would be:

1. call opal_dss.pack(*buffer, *data, #data, data_type) for each thing you
want to put in the buffer. So you might call this to pack a string:

opal_dss.pack(*buffer, , 1, OPAL_STRING);

2. once you have everything packed into the buffer, you send the buffer with

orte_rml.send_buffer(*dest, *buffer, dest_tag, 0);

What you will need is a tag that the daemon is listening on that won't
interfere with its normal operations - i.e., what you send won't get held
forever waiting to get serviced, and your servicing won't block us from
responding to a ctrl-c. You can probably use ORTE_RML_TAG_DAEMON, but you
need to ensure you don't block anything.

BTW: how is the app figuring out the name of the remote daemon? The proc
will have access to the daemon's vpid (assuming it knows the nodename where
the daemon is running) in the ESS, but not the jobid - I assume you are
using some method to compute the daemon jobid from the apps?


On 6/17/08 12:08 PM, "Leonardo Fialho"  wrote:

> Hi All,
> 
> I´m using RML to send log messages from a PML to a ORTE daemon (located
> in another node). I got success sending the message header, but now I
> need to send the message data (buffer). How can I do it? The problem is
> what data type I need to use for packing/unpacking? I tried
> OPAL_DATA_VALUE but don´t get success...
> 
> Thanks,





Re: [OMPI devel] [OMPI svn] svn:open-mpi r18625

2008-06-09 Thread Ralph H Castain
Okay, it's fixed now in r18629


On 6/9/08 3:23 PM, "Ralph H Castain" <r...@lanl.gov> wrote:

> Visibility issue - fix coming in a minute...
> 
> 
> On 6/9/08 3:10 PM, "Ralph H Castain" <r...@lanl.gov> wrote:
> 
>> Interesting - it compiles for me under three different environments.
>> 
>> Let me check - perhaps something isn't getting committed properly
>> 
>> 
>> On 6/9/08 3:07 PM, "Aurélien Bouteiller" <boute...@eecs.utk.edu> wrote:
>> 
>>> This commit looks like it does not compile.
>>> orterun.o: In function `orterun':
>>> ../../../../trunk/orte/tools/orterun/orterun.c:525: undefined
>>> reference to `orte_totalview_init_before_spawn'
>>> orterun.o: In function `job_completed':
>>> ../../../../trunk/orte/tools/orterun/orterun.c:603: undefined
>>> reference to `orte_totalview_finalize'
>>> orterun.o: In function `parse_globals':
>>> ../../../../trunk/orte/tools/orterun/orterun.c:1106: undefined
>>> reference to `orte_run_debugger'
>>> collect2: ld returned 1 exit status
>>> 
>>> Aurelien
>>> 
>>> 
>>> Le 9 juin 08 à 16:34, r...@osl.iu.edu a écrit :
>>> 
>>>> Author: rhc
>>>> Date: 2008-06-09 16:34:14 EDT (Mon, 09 Jun 2008)
>>>> New Revision: 18625
>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/18625
>>>> 
>>>> Log:
>>>> Refs #1255
>>>> 
>>>> This commit repairs the debugger initialization procedure. I am not
>>>> closing the ticket, however, pending Jeff's review of how it
>>>> interfaces to the ompi_debugger code he implemented. There were
>>>> duplicate symbols being created in that code, but not used anywhere.
>>>> I replaced them with the ORTE-created symbols instead. However,
>>>> since they aren't used anywhere, I have no way of checking to ensure
>>>> I didn't break something.
>>>> 
>>>> So the ticket can be checked by Jeff when he returns from
>>>> vacation... :-)
>>>> 
>>>> Added:
>>>>   trunk/orte/util/totalview.c
>>>>   trunk/orte/util/totalview.h
>>>> Removed:
>>>>   trunk/orte/tools/orterun/totalview.c
>>>>   trunk/orte/tools/orterun/totalview.h
>>>> Text files modified:
>>>>   trunk/ompi/debuggers/ompi_debuggers.c |13 +
>>>> +---
>>>>   trunk/orte/mca/plm/base/plm_base_launch_support.c |14 
>>>> +-
>>>>   trunk/orte/tools/orterun/Makefile.am  | 5 ++---
>>>>   trunk/orte/tools/orterun/orterun.c| 7 +--
>>>>   trunk/orte/util/Makefile.am   | 6 --
>>>>   trunk/orte/util/show_help.h   | 4 ++--
>>>>   6 files changed, 20 insertions(+), 29 deletions(-)
>>>> 
>>>> 
>>>> Diff not shown due to size (41811 bytes).
>>>> To see the diff, run the following command:
>>>> 
>>>> svn diff -r 18624:18625 --no-diff-deleted
>>>> 
>>>> ___
>>>> svn mailing list
>>>> s...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] [OMPI svn] svn:open-mpi r18625

2008-06-09 Thread Ralph H Castain
Visibility issue - fix coming in a minute...


On 6/9/08 3:10 PM, "Ralph H Castain" <r...@lanl.gov> wrote:

> Interesting - it compiles for me under three different environments.
> 
> Let me check - perhaps something isn't getting committed properly
> 
> 
> On 6/9/08 3:07 PM, "Aurélien Bouteiller" <boute...@eecs.utk.edu> wrote:
> 
>> This commit looks like it does not compile.
>> orterun.o: In function `orterun':
>> ../../../../trunk/orte/tools/orterun/orterun.c:525: undefined
>> reference to `orte_totalview_init_before_spawn'
>> orterun.o: In function `job_completed':
>> ../../../../trunk/orte/tools/orterun/orterun.c:603: undefined
>> reference to `orte_totalview_finalize'
>> orterun.o: In function `parse_globals':
>> ../../../../trunk/orte/tools/orterun/orterun.c:1106: undefined
>> reference to `orte_run_debugger'
>> collect2: ld returned 1 exit status
>> 
>> Aurelien
>> 
>> 
>> Le 9 juin 08 à 16:34, r...@osl.iu.edu a écrit :
>> 
>>> Author: rhc
>>> Date: 2008-06-09 16:34:14 EDT (Mon, 09 Jun 2008)
>>> New Revision: 18625
>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/18625
>>> 
>>> Log:
>>> Refs #1255
>>> 
>>> This commit repairs the debugger initialization procedure. I am not
>>> closing the ticket, however, pending Jeff's review of how it
>>> interfaces to the ompi_debugger code he implemented. There were
>>> duplicate symbols being created in that code, but not used anywhere.
>>> I replaced them with the ORTE-created symbols instead. However,
>>> since they aren't used anywhere, I have no way of checking to ensure
>>> I didn't break something.
>>> 
>>> So the ticket can be checked by Jeff when he returns from
>>> vacation... :-)
>>> 
>>> Added:
>>>   trunk/orte/util/totalview.c
>>>   trunk/orte/util/totalview.h
>>> Removed:
>>>   trunk/orte/tools/orterun/totalview.c
>>>   trunk/orte/tools/orterun/totalview.h
>>> Text files modified:
>>>   trunk/ompi/debuggers/ompi_debuggers.c |13 +
>>> +---
>>>   trunk/orte/mca/plm/base/plm_base_launch_support.c |14 
>>> +-
>>>   trunk/orte/tools/orterun/Makefile.am  | 5 ++---
>>>   trunk/orte/tools/orterun/orterun.c| 7 +--
>>>   trunk/orte/util/Makefile.am   | 6 --
>>>   trunk/orte/util/show_help.h   | 4 ++--
>>>   6 files changed, 20 insertions(+), 29 deletions(-)
>>> 
>>> 
>>> Diff not shown due to size (41811 bytes).
>>> To see the diff, run the following command:
>>> 
>>> svn diff -r 18624:18625 --no-diff-deleted
>>> 
>>> ___
>>> svn mailing list
>>> s...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] [OMPI svn] svn:open-mpi r18625

2008-06-09 Thread Ralph H Castain
Interesting - it compiles for me under three different environments.

Let me check - perhaps something isn't getting committed properly


On 6/9/08 3:07 PM, "Aurélien Bouteiller"  wrote:

> This commit looks like it does not compile.
> orterun.o: In function `orterun':
> ../../../../trunk/orte/tools/orterun/orterun.c:525: undefined
> reference to `orte_totalview_init_before_spawn'
> orterun.o: In function `job_completed':
> ../../../../trunk/orte/tools/orterun/orterun.c:603: undefined
> reference to `orte_totalview_finalize'
> orterun.o: In function `parse_globals':
> ../../../../trunk/orte/tools/orterun/orterun.c:1106: undefined
> reference to `orte_run_debugger'
> collect2: ld returned 1 exit status
> 
> Aurelien
> 
> 
> Le 9 juin 08 à 16:34, r...@osl.iu.edu a écrit :
> 
>> Author: rhc
>> Date: 2008-06-09 16:34:14 EDT (Mon, 09 Jun 2008)
>> New Revision: 18625
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/18625
>> 
>> Log:
>> Refs #1255
>> 
>> This commit repairs the debugger initialization procedure. I am not
>> closing the ticket, however, pending Jeff's review of how it
>> interfaces to the ompi_debugger code he implemented. There were
>> duplicate symbols being created in that code, but not used anywhere.
>> I replaced them with the ORTE-created symbols instead. However,
>> since they aren't used anywhere, I have no way of checking to ensure
>> I didn't break something.
>> 
>> So the ticket can be checked by Jeff when he returns from
>> vacation... :-)
>> 
>> Added:
>>   trunk/orte/util/totalview.c
>>   trunk/orte/util/totalview.h
>> Removed:
>>   trunk/orte/tools/orterun/totalview.c
>>   trunk/orte/tools/orterun/totalview.h
>> Text files modified:
>>   trunk/ompi/debuggers/ompi_debuggers.c |13 +
>> +---
>>   trunk/orte/mca/plm/base/plm_base_launch_support.c |14 
>> +-
>>   trunk/orte/tools/orterun/Makefile.am  | 5 ++---
>>   trunk/orte/tools/orterun/orterun.c| 7 +--
>>   trunk/orte/util/Makefile.am   | 6 --
>>   trunk/orte/util/show_help.h   | 4 ++--
>>   6 files changed, 20 insertions(+), 29 deletions(-)
>> 
>> 
>> Diff not shown due to size (41811 bytes).
>> To see the diff, run the following command:
>> 
>> svn diff -r 18624:18625 --no-diff-deleted
>> 
>> ___
>> svn mailing list
>> s...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] Communication between entities

2008-05-29 Thread Ralph H Castain
I see, thanks for the explanation!

I'm afraid you'll have no choice, though, but to relay the message via the
local daemon. I know that creates a window of vulnerability, but it cannot
be helped.

Passing full contact info for all daemons to all procs would take us back a
few steps and cause a whole lot of sockets to be opened...


On 5/29/08 8:04 AM, "Leonardo Fialho" <lfia...@aomail.uab.es> wrote:

> Ralph,
> 
> I want to implement a receiver based message log (called RADIC
> architecture) that stores the log file in another node (than no stable
> storage is necessary).
> 
> I developed a wrapper to PML that manage the messages and then store it
> locally (or in a stable storage), but now I need to migrate this "log
> file" to other node. Only PML need this file (to generate and recovery
> after a fail) but ORTE daemon store and manage the files to launch then
> when one node dies.
> 
> In this approach ORTE daemon are treated like application "protectors",
> and the application are the "protected".
> 
> Thanks,
> Leonardo
> 
> 
> Ralph H Castain escribió:
>> There is no way to send a message to a daemon located on another node
>> without relaying it through the local daemon. The application procs have no
>> knowledge of the contact info for any daemon other than their own, so even
>> using the direct routed module would not work.
>> 
>> Can you provide some reason why the normal relay is unacceptable? And why
>> the PML would want to communicate with a daemon, which, after all, is -not-
>> an MPI process and has no idea what a PML is?
>> 
>> 
>> On 5/29/08 7:41 AM, "Leonardo Fialho" <lfia...@aomail.uab.es> wrote:
>> 
>>   
>>> Hi All,
>>> 
>>> If, inside a PML component I need to send a message to the ORTE daemon
>>> located in other node, how can I do it?
>>> 
>>> It´s safe to create a thread to manage this communication independently
>>> or Open MPI have any service to do it (like RML in ORTE environment)?
>>> 
>>> I saw a socket connection between the application and the local ORTE
>>> daemon, but I don´t want to send the message to local ORTE daemon an
>>> then it send the same message to que remote ORTE daemon...
>>> 
>>> Thanks,
>>> 
>> 
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>   
> 





Re: [OMPI devel] Communication between entities

2008-05-29 Thread Ralph H Castain
There is no way to send a message to a daemon located on another node
without relaying it through the local daemon. The application procs have no
knowledge of the contact info for any daemon other than their own, so even
using the direct routed module would not work.

Can you provide some reason why the normal relay is unacceptable? And why
the PML would want to communicate with a daemon, which, after all, is -not-
an MPI process and has no idea what a PML is?


On 5/29/08 7:41 AM, "Leonardo Fialho"  wrote:

> Hi All,
> 
> If, inside a PML component I need to send a message to the ORTE daemon
> located in other node, how can I do it?
> 
> It´s safe to create a thread to manage this communication independently
> or Open MPI have any service to do it (like RML in ORTE environment)?
> 
> I saw a socket connection between the application and the local ORTE
> daemon, but I don´t want to send the message to local ORTE daemon an
> then it send the same message to que remote ORTE daemon...
> 
> Thanks,





Re: [OMPI devel] Open MPI session directory location

2008-05-28 Thread Ralph H Castain
After chatting with Jeff to better understand the ompi_info issue, I
consolidated all the ORTE-level MCA param registrations that are relevant to
users and had ompi_info call it. You will now see them displayed by
ompi_info.

Ralph


On 5/27/08 1:57 PM, "Jeff Squyres" <jsquy...@cisco.com> wrote:

> Oops, sorry.
> 
> We were having problems with the memory allocator when ompi_info
> called orte_init().  I think it might be best to call the ORTE MCA
> registration function directly...
> 
> 
> On May 27, 2008, at 10:40 AM, Ralph H Castain wrote:
> 
>> I see the problem (I think). A recent change was made to ompi_info
>> so it no
>> longer calls orte_init. As a result, none of the ORTE-level params
>> (i.e.,
>> those params registered outside of ORTE frameworks) are being
>> reported.
>> 
>> I'll chat with Jeff and see how we resolve the problem.
>> 
>> 
>> On 5/27/08 8:32 AM, "Ralph H Castain" <r...@lanl.gov> wrote:
>> 
>>> It "should" be visible nownot sure why it isn't. It conforms to
>>> the
>>> naming rules and -used- to be reported by ompi_info...
>>> 
>>> 
>>> 
>>> On 5/27/08 8:31 AM, "Shipman, Galen M." <gship...@ornl.gov> wrote:
>>> 
>>>> Make that "ompi_info".
>>>> 
>>>> We need to make that visible via orte_info.
>>>> I thought this was done at some point, perhaps it got overwritten?
>>>> 
>>>> Thanks,
>>>> 
>>>> Galen
>>>> 
>>>> On May 27, 2008, at 10:27 AM, Ralph H Castain wrote:
>>>> 
>>>>> -mca orte_tmpdir_base foo
>>>>> 
>>>>> 
>>>>> 
>>>>> On 5/27/08 8:24 AM, "Gleb Natapov" <gl...@voltaire.com> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>>  Is there a way to change where Open MPI creates session
>>>>>> directory. I
>>>>>> can't find mca parameter that specifies this.
>>>>>> 
>>>>>> --
>>>>>> Gleb.
>>>>>> ___
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> 
>>>>> 
>>>>> ___
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>>> ___
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 




Re: [OMPI devel] Open MPI session directory location

2008-05-27 Thread Ralph H Castain
I see the problem (I think). A recent change was made to ompi_info so it no
longer calls orte_init. As a result, none of the ORTE-level params (i.e.,
those params registered outside of ORTE frameworks) are being reported.

I'll chat with Jeff and see how we resolve the problem.


On 5/27/08 8:32 AM, "Ralph H Castain" <r...@lanl.gov> wrote:

> It "should" be visible nownot sure why it isn't. It conforms to the
> naming rules and -used- to be reported by ompi_info...
> 
> 
> 
> On 5/27/08 8:31 AM, "Shipman, Galen M." <gship...@ornl.gov> wrote:
> 
>> Make that "ompi_info".
>> 
>> We need to make that visible via orte_info.
>> I thought this was done at some point, perhaps it got overwritten?
>> 
>> Thanks,
>> 
>> Galen
>> 
>> On May 27, 2008, at 10:27 AM, Ralph H Castain wrote:
>> 
>>> -mca orte_tmpdir_base foo
>>> 
>>> 
>>> 
>>> On 5/27/08 8:24 AM, "Gleb Natapov" <gl...@voltaire.com> wrote:
>>> 
>>>> Hi,
>>>> 
>>>>   Is there a way to change where Open MPI creates session
>>>> directory. I
>>>> can't find mca parameter that specifies this.
>>>> 
>>>> --
>>>> Gleb.
>>>> ___
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Open MPI session directory location

2008-05-27 Thread Ralph H Castain
It "should" be visible nownot sure why it isn't. It conforms to the
naming rules and -used- to be reported by ompi_info...



On 5/27/08 8:31 AM, "Shipman, Galen M." <gship...@ornl.gov> wrote:

> Make that "ompi_info".
> 
> We need to make that visible via orte_info.
> I thought this was done at some point, perhaps it got overwritten?
> 
> Thanks,
> 
> Galen
> 
> On May 27, 2008, at 10:27 AM, Ralph H Castain wrote:
> 
>> -mca orte_tmpdir_base foo
>> 
>> 
>> 
>> On 5/27/08 8:24 AM, "Gleb Natapov" <gl...@voltaire.com> wrote:
>> 
>>> Hi,
>>> 
>>>   Is there a way to change where Open MPI creates session
>>> directory. I
>>> can't find mca parameter that specifies this.
>>> 
>>> --
>>> Gleb.
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [RFC] mca_base_select()

2008-05-06 Thread Ralph H Castain
Hmmmwell, I hit a problem (of course!). I have mca-no-build on the filem
framework on my Mac. If I just mpriun -n 3 ./hello, I get the following
error:

--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_filem_base_select failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS

--

After looking at the source code for filem_select, I can run just fine if I
specify -mca filem none on the cmd line. Otherwise, it looks like your
select logic insists that at least one component must be built and
selectable?

Is that generally true, or is your filem framework the exception? I think
this would not be a good general requirement - frankly, I don't think it is
good for any framework to have such a requirement.

Ralph



On 5/6/08 12:09 PM, "Josh Hursey"  wrote:

> This has been committed in r18381
> 
> Please let me know if you have any problems with this commit.
> 
> Cheers,
> Josh
> 
> On May 5, 2008, at 10:41 AM, Josh Hursey wrote:
> 
>> Awesome.
>> 
>> The branch is updated to the latest trunk head. I encourage folks to
>> check out this repository and make sure that it builds on their
>> system. A normal build of the branch should be enough to find out if
>> there are any cut-n-paste problems (though I tried to be careful,
>> mistakes do happen).
>> 
>> I haven't heard any problems so this is looking like it will come in
>> tomorrow after the teleconf. I'll ask again there to see if there are
>> any voices of concern.
>> 
>> Cheers,
>> Josh
>> 
>> On May 5, 2008, at 9:58 AM, Jeff Squyres wrote:
>> 
>>> This all sounds good to me!
>>> 
>>> On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote:
>>> 
 What:  Add mca_base_select() and adjust frameworks & components to
 use
 it.
 Why:   Consolidation of code for general goodness.
 Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play
 When:  Code ready now. Documentation ready soon.
 Timeout: May 6, 2008 (After teleconf) [1 week]
 
 Discussion:
 ---
 For a number of years a few developers have been talking about
 creating a MCA base component selection function. For various
 reasons
 this was never implemented. Recently I decided to give it a try.
 
 A base select function will allow Open MPI to provide completely
 consistent selection behavior for many of its frameworks (18 of 31
 to
 be exact at the moment). The primary goal of this work is to
 improving
 code maintainability through code reuse. Other benefits also result
 such as a slightly smaller memory footprint.
 
 The mca_base_select() function represented the most commonly used
 logic for component selection: Select the one component with the
 highest priority and close all of the not selected components. This
 function can be found at the path below in the branch:
 opal/mca/base/mca_base_components_select.c
 
 To support this I had to formalize a query() function in the
 mca_base_component_t of the form:
 int mca_base_query_component_fn(mca_base_module_t **module, int
 *priority);
 
 This function is specified after the open and close component
 functions in this structure as to allow compatibility with
 frameworks
 that do not use the base selection logic. Frameworks that do *not*
 use
 this function are *not* effected by this commit. However, every
 component in the frameworks that use the mca_base_select function
 must
 adjust their component query function to fit that specified above.
 
 18 frameworks in Open MPI have been changed. I have updated all of
 the
 components in the 18 frameworks available in the trunk on my branch.
 The effected frameworks are:
 - OPAL Carto
 - OPAL crs
 - OPAL maffinity
 - OPAL memchecker
 - OPAL paffinity
 - ORTE errmgr
 - ORTE ess
 - ORTE Filem
 - ORTE grpcomm
 - ORTE odls
 - ORTE pml
 - ORTE ras
 - ORTE rmaps
 - ORTE routed
 - ORTE snapc
 - OMPI crcp
 - OMPI dpm
 - OMPI pubsub
 
 There was a question of the memory footprint change as a result of
 this commit. I used 'pmap' to determine process memory footprint
 of a
 hello world MPI program. Static and Shared build numbers are below
 along with variations on launching locally and to a single node
 allocated by SLURM. All of this was on Indiana University's Odin
 machine. We compare against the trunk (r18276) 

[OMPI devel] Loadbalancing

2008-04-23 Thread Ralph H Castain
I added a new "loadbalance" feature to OMPI today in r18252.

Brief summary: adding --loadbalance to the mpirun cmd line will cause the
round-robin mapper to balance your specified #procs across the available
nodes.

More detail:
Several users had noted that mapping byslot always caused us to
preferentially load the first nodes in an allocation, potentially leaving
other nodes unused. If they mapped bynode, of course, this wouldn't happen -
but then they were forced to a specific rank-to-node relationship.

What they wanted was to have the ranks numbered byslot, but to have the ppn
balanced across the entire allocation.

This is now supported via the --loadbalance cmd line option. Here is an
example of its affect (again, remember that loadbalance only impacts mapping
byslot):

 no-lb  lb bynode
node0:  0,1,2,30,1,2   0,3,6
node1:  4,5,6  3,4 1,4
node2: 5,6 2,5


As you can see, the affect of --loadbalance is to balance the ppn across all
the available nodes while retaining byslot rank associations. In this case,
instead of leaving one node unused, we take advantage of all available
resources.

Hope this proves helpful
Ralph




Re: [OMPI devel] Memchecker: breaks trunk again

2008-04-21 Thread Ralph H Castain
Thanks Brian - I had been told precisely the opposite priority rule just a
few weeks ago by someone else, hence my confusion.


On 4/21/08 8:48 AM, "Brian W. Barrett" <brbar...@open-mpi.org> wrote:

> On Mon, 21 Apr 2008, Ralph H Castain wrote:
> 
>> So it appears to be a combination of memchecker=yes automatically requiring
>> valgrind, and the override on the configure line of a param set by a
>> platform file not working.
> 
> So I can't speak to the valgrind/memchecker issue, but can to the
> platform/configure issue.  The platform file was intended to provide a
> mechanism to allow repeatability in builds.  By design, options in the
> platform file have higher priority than options given on the configure
> command line.
> 
> Brian
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Memchecker: breaks trunk again

2008-04-21 Thread Ralph H Castain



On 4/21/08 8:10 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote:

> Hmm.  I do not have this problem on OS X (where I do not have Valgrind
> installed) as of this morning's trunk.  configure correctly determines
> that valgrind is not present and therefore continues on:
> 
> --- MCA component memchecker:valgrind (m4 configuration macro)
> checking for MCA component memchecker:valgrind compile mode... static
> checking if MCA component memchecker:valgrind can compile... no
> 
> ...etc.
> 
> Note that your message indicates that configure thinks that valgrind
> support was explicitly requested.  In this case, configure thinks,
> "you explicitly requested it, I cannot provide it, so I should abort
> rather than give unexpected results."

I know that - but I am not explicitly requesting it. In fact, I explicitly
put --without-valgrind, to no effect.

Here is what I have discovered may be the source of the problem. I had
inserted a enable_memchecker=yes line in my platform file. This apparently
has the unfortunate side effect of now requiring valgrind. IMHO, this should
not be setup that way per my earlier comment about requiring debuggers
unless memchecker has absolutely no other way to run - which looking at the
framework, would not appear to be true.

I then tried still using my platform file, but adding --disable-memchecker
to the configure line. The disable request was apparently ignored, at least
as far as the valgrind part of the request is concerned. The build failed at
the same point.

Removing the enable_memchecker=yes line from my platform file allows me to
successfully navigate configure.

So it appears to be a combination of memchecker=yes automatically requiring
valgrind, and the override on the configure line of a param set by a
platform file not working.

I can send you some stuff off-list in a little bit, if you still need it.


> 
> Can you send your full configure output and config.log?
> 
> 
> On Apr 21, 2008, at 9:51 AM, Ralph H Castain wrote:
> 
>> As an FYI for anyone similarly afflicted:
>> 
>> The only solution I have found is to gut the file
>> opal/mca/memchecker/valgrind/configure.m4:
>> 
>> # MCA_memchecker_valgrind_CONFIG([action-if-found], [action-if-not-
>> found])
>> # ---
>> AC_DEFUN([MCA_memchecker_valgrind_CONFIG],[
>> 
>>happy=0  # none_needed
>>happy_value=0# none_needed
>>memchecker_valgrind_happy=0  # This should suffice to get rid
>> of the
>> component
>>should_build=2
>>want_component=0
>> ])dnl
>> 
>> Nothing else will allow you to build unless you have the valgrind
>> headers
>> installed on your machine.
>> 
>> Ralph
>> 
>> 
>> 
>> On 4/21/08 7:28 AM, "Ralph H Castain" <r...@lanl.gov> wrote:
>> 
>>> I am finding that the memchecker code is again breaking the trunk,
>>> specifically on any machine that does not have valgrind installed.
>>> Apparently, memchecker now forces a requirement for valgrind?
>>> 
>>> Here is what I get:
>>> 
>>> --- MCA component memchecker:valgrind (m4 configuration macro)
>>> checking for MCA component memchecker:valgrind compile mode... static
>>> checking checking for the valgrind include directory ... none needed
>>> checking valgrind/valgrind.h usability... no
>>> checking valgrind/valgrind.h presence... no
>>> checking for valgrind/valgrind.h... no
>>> configure: WARNING: *** Could not find valgrind header files, as
>>> valgrind
>>> support was requested
>>> configure: error: *** Cannot continue
>>> 
>>> 
>>> Could somebody please fix this? I thought we had decided many moons
>>> ago that
>>> we would not require specific debuggers in default build
>>> configurations - I
>>> am somewhat surprised, therefore, to find that memchecker is "on" by
>>> default, and now requires valgrind!
>>> 
>>> I have tried --disable-memchecker, but nothing will allow me to get
>>> past
>>> this error.
>>> 
>>> Thanks
>>> Ralph
>>> 
>>> 
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 




[OMPI devel] Vprotocol build problem

2008-04-21 Thread Ralph H Castain
I am now simply trying some of our vaunted configure system's options to see
what actually works, and what doesn't.

Here is one that does NOT work:

enable_mca_no_build=pml-v

Generates the following build error:

configure: error: conditional "OMPI_BUILD_vprotocol_pessimist_DSO" was never
defined.
Usually this means the macro was only invoked conditionally.


Could somebody please fix this? Although I know it is "on" by default,
people should be able to turn it "off" - or we need to tell them "you
can't".

Thanks
Ralph




Re: [OMPI devel] Memchecker: breaks trunk again

2008-04-21 Thread Ralph H Castain
As an FYI for anyone similarly afflicted:

The only solution I have found is to gut the file
opal/mca/memchecker/valgrind/configure.m4:

# MCA_memchecker_valgrind_CONFIG([action-if-found], [action-if-not-found])
# ---
AC_DEFUN([MCA_memchecker_valgrind_CONFIG],[

happy=0  # none_needed
happy_value=0# none_needed
memchecker_valgrind_happy=0  # This should suffice to get rid of the
component
should_build=2
want_component=0
])dnl

Nothing else will allow you to build unless you have the valgrind headers
installed on your machine.

Ralph



On 4/21/08 7:28 AM, "Ralph H Castain" <r...@lanl.gov> wrote:

> I am finding that the memchecker code is again breaking the trunk,
> specifically on any machine that does not have valgrind installed.
> Apparently, memchecker now forces a requirement for valgrind?
> 
> Here is what I get:
> 
> --- MCA component memchecker:valgrind (m4 configuration macro)
> checking for MCA component memchecker:valgrind compile mode... static
> checking checking for the valgrind include directory ... none needed
> checking valgrind/valgrind.h usability... no
> checking valgrind/valgrind.h presence... no
> checking for valgrind/valgrind.h... no
> configure: WARNING: *** Could not find valgrind header files, as valgrind
> support was requested
> configure: error: *** Cannot continue
> 
> 
> Could somebody please fix this? I thought we had decided many moons ago that
> we would not require specific debuggers in default build configurations - I
> am somewhat surprised, therefore, to find that memchecker is "on" by
> default, and now requires valgrind!
> 
> I have tried --disable-memchecker, but nothing will allow me to get past
> this error.
> 
> Thanks
> Ralph
> 
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] Memchecker: breaks trunk again

2008-04-21 Thread Ralph H Castain
I am finding that the memchecker code is again breaking the trunk,
specifically on any machine that does not have valgrind installed.
Apparently, memchecker now forces a requirement for valgrind?

Here is what I get:

--- MCA component memchecker:valgrind (m4 configuration macro)
checking for MCA component memchecker:valgrind compile mode... static
checking checking for the valgrind include directory ... none needed
checking valgrind/valgrind.h usability... no
checking valgrind/valgrind.h presence... no
checking for valgrind/valgrind.h... no
configure: WARNING: *** Could not find valgrind header files, as valgrind
support was requested
configure: error: *** Cannot continue


Could somebody please fix this? I thought we had decided many moons ago that
we would not require specific debuggers in default build configurations - I
am somewhat surprised, therefore, to find that memchecker is "on" by
default, and now requires valgrind!

I have tried --disable-memchecker, but nothing will allow me to get past
this error.

Thanks
Ralph






[OMPI devel] Using do-not-launch, display-map, and do-not-resolve to test mappings

2008-04-17 Thread Ralph H Castain
Brief summary:
In r18190, I have restored the --do-not-launch capability, and added a
--do-not-resolve flag. This note describes how you can use those to build
and test application mappings without first getting an allocation and/or
launching it.

Longer description:

Users and developers have both expressed a need to develop potentially
complex process mappings "offline" - i.e., before attempting to actually
launch the application. This has been particularly problematic when the
mappings are large and target managed environments where obtaining an
allocation can take quite some time to clear the queue.

We used to have a "do-not-launch" flag that would allow the system to
allocate and map a job, but then exit without attempting to launch it. This
had been "disabled" during ORTE changes in recent months. We still had the
ability to "display-map" however, but the procedure would often hang or
abort as the system would attempt to resolve all network names in a
hostfile.

To resolve these problems, I have:

1. re-implemented the "do-not-launch" flag so it properly works. It is set
by specifying --do-not-launch on the mpirun command line

2. added a --do-not-resolve option to the mpirun command line that instructs
the system to not attempt to resolve network names


For an example of how these can be used, consider the case where you want to
build a sequential map of processes versus hostfile names via the new RMAPS
seq module. It will be a big job, so you would like to ensure that the map
is correct before (a) sitting in a queue for hours/days waiting to get an
allocation, and (b) finding out it is wrong and having to abort.

What you can do is use these new options to build and test your map
-without- getting an allocation by:

1. build a hostfile that describes your desired mapping - it would have a
list of host names in rank order of where you want a process to go. These
hosts can have any names - we won't be trying to resolve them, so the fact
that they are not necessarily reachable on the network is irrelevant.

2. do an mpirun of your job, including -mca rmaps seq -hostfile my_hosts
--do-not-launch --do-not-resolve --display-map on the cmd line. This
instructs mpirun to use the seq mapper, which will subsequently use the
specified hostfile to do the mapping. It also tells mpirun to display the
resulting map so you can see where your procs would have gone, but to not
attempt to find them on the network and to -not- attempt to launch the job.

What you'll get is a display node-by-node of what proc ranks are assigned to
that node. Once you get this looking the way you want, you can then simply
submit the job to your target cluster with confidence that the procs will be
mapped the way you wanted.


Hope that helps
Ralph




  1   2   3   >