[OMPI devel] Announcing Open MPI v4.0.0rc5

2018-10-23 Thread Geoffrey Paulsen
Announcing (our hopefully last) RC5 of Open MPI v4.0.0Available at: https://www.open-mpi.org/software/ompi/v4.0/ 
Differences in v4.0.0rc5 from v4.0.0rc4:
* Fix race condition in btl/vader when writing header* Fix a double free error when using hostfile* Fix configury for internal PMIx* Ignore --with-foo=external arguments in subdirs* Remove C99-style comments in mpi.h* Fix race condition in opal_free_list_get.  Fixes #2921* Fix hang/timeout during finalize in osc/ucx* Fixed zero-size window processing in osc/ucx* Fix return code from mca_pml_ucx_init()* Add worker flush before osc/ucx module free* Btl/uct bugfixes and code cleanup.  Fixes Issues #5820, #5821* Fix javadoc build failure with OpenJDK 11* Add ompi datatype attribute to release ucp_datatype in pml/ucx* Squash a bunch of harmless compiler warnings* Fortran/use-mpi-f08: Correct f08 routine signatures* Fortran: add CHARACTER and LOGICAL support to MPI_Sizeof()* Mpiext/pcollreq: Correct f08 routine signatures* Make dist: Add missing file to tarball* Disabled openib/verbs* removed components: pml/bfo, crs/blcr, crs/criu and crs/dmtcp
 
 

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] IBM CI re-enabled.

2018-10-18 Thread Geoffrey Paulsen
 
I've re-enabled IBM CI for PRs.
 

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] I Shut down IBM CI last night

2018-10-18 Thread Geoffrey Paulsen
Devel,  I shut down IBM CI last night to upgrade our UCX and IB drivers.  Still tinkering, but it should be online again < 1 hour.
 
---Geoffrey PaulsenSoftware Engineer, IBM Spectrum MPIEmail: gpaul...@us.ibm.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] Announcing Open MPI v4.0.0rc4

2018-10-08 Thread Geoffrey Paulsen
Announcing Open MPI v4.0.0rc4Please download from https://www.open-mpi.org/software/ompi/v4.0/ and provide feedback from your favorite platforms.
 
changes from rc3 include: 
PR #5780 - Fortran 08 bindings fixes    fortran/use-mpi-f08: Corrections to PMPI signatures of collectives    interface to state correct intent for inout arguments and use the    ASYNCHRONOUS attribute in non-blocking collective calls.PR #5834 - 2 more vader fixes  1. Issue #5814 - work around Oracle C v5.15 compiler bug  2. ensure the fast box tag is always read firstPR #5794 - mtl ofi: Change from opt-in to opt-out provider selectionPR #5802 -  mpi.h: remove MPI_UB/MPI_LB when not enabling MPI-1 compatPR #5823 - btl/tcp: output the IP address correctlyPR #5826 - TCP BTL socklen fixes Issue #3035PR #5790 - shmem/lock: progress communications while waiting for shmem_lockPR #5791 - OPAL/COMMON/UCX: used __func__ macro instead of __FUNCTION__
 

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] Announcing Open MPI v4.0.0rc3

2018-09-28 Thread Geoffrey Paulsen
The Third release candidate for Open MPI v4.0.0 (rc3) has been built and is available at:
https://www.open-mpi.org/software/ompi/v4.0/
Only one News worth difference from v4.0.0rc2:- Fix a problem with ORTE not reporting error messages if an application  terminated normally but exited with non-zero error code.  Thanks to Emre Brookes for reporting

- OSHMEM updated to the OpenSHMEM 1.4 API.- Do not build Open SHMEM layer when there are no SPMLs available.  Currently, this means the Open SHMEM layer will only build if  a MXM or UCX library is found.- A UCX BTL was added for enhanced MPI RMA support using UCX- With this release,  OpenIB BTL now only supports iWarp and RoCE by default.- Updated internal HWLOC to 2.0.1- Updated internal PMIx to 3.0.1- Change the priority for selecting external verses internal HWLOC  and PMIx packages to build.  Starting with this release, configure  by default selects available external HWLOC and PMIx packages over the internal ones.- Updated internal ROMIO to 3.2.1.- Removed support for the MXM MTL.- Removed support for SCIF.- Improved CUDA support when using UCX.- Enable use of CUDA allocated buffers for OMPIO.- Improved support for two phase MPI I/O operations when using OMPIO.- Added support for Software-based Performance Counters, see  https://github.com/davideberius/ompi/wiki/How-to-Use-Software-Based-Performance-Counters-(SPCs)-in-Open-MPI- Various improvements to MPI RMA performance when using RDMA  capable interconnects.- Update memkind component to use the memkind 1.6 public API.- Fix problems with use of newer map-by mpirun options.  Thanks to  Tony Reina for reporting.- Fix rank-by algorithms to properly rank by object and span- Allow for running as root of two environment variables are set.  Requested by Axel Huebl.- Fix a problem with building the Java bindings when using Java 10.  Thanks to Bryce Glover for reporting.- Fix a problem with ORTE not reporting error messages if an application terminated normally but exited with non-zero error code.  Thanks to Emre Brookes for reporting.

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] Open MPI web-ex now.

2018-09-25 Thread Geoffrey Paulsen
web-ex: https://cisco.webex.com/ciscosales/j.php?MTID=m94bcdafd80c2e40b480b2c97c702293a
 
 

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] Announcing Open MPI v4.0.0rc2

2018-09-22 Thread Geoffrey Paulsen
The Second release candidate for the Open MPI v4.0.0 release has been built and will be available tonight at:https://www.open-mpi.org/software/ompi/v4.0/
 
Major differences from v4.0.0rc1 include: 
- Removed support for SCIF.- Enable use of CUDA allocated buffers for OMPIO.- Fix a problem with ORTE not reporting error messages if an application  terminated normally but exited with non-zero error code. Thanks to Emre Brookes for reporting.
 All Major differences from v3.1x include:

 
- OSHMEM updated to the OpenSHMEM 1.4 API.- Do not build Open SHMEM layer when there are no SPMLs available.  Currently, this means the Open SHMEM layer will only build if  a MXM or UCX library is found.- A UCX BTL was added for enhanced MPI RMA support using UCX- With this release,  OpenIB BTL now only supports iWarp and RoCE by default.- Updated internal HWLOC to 2.0.1- Updated internal PMIx to 3.0.1- Change the priority for selecting external verses internal HWLOC  and PMIx packages to build.  Starting with this release, configure  by default selects available external HWLOC and PMIx packages over  the internal ones.- Updated internal ROMIO to 3.2.1.- Removed support for the MXM MTL.- Removed support for SCIF.- Improved CUDA support when using UCX.- Enable use of CUDA allocated buffers for OMPIO.- Improved support for two phase MPI I/O operations when using OMPIO.- Added support for Software-based Performance Counters, see  https://github.com/davideberius/ompi/wiki/How-to-Use-Software-Based-Performance-Counters-(SPCs)-in-Open-MPI- Various improvements to MPI RMA performance when using RDMA  capable interconnects.- Update memkind component to use the memkind 1.6 public API.- Fix problems with use of newer map-by mpirun options.  Thanks to  Tony Reina for reporting.- Fix rank-by algorithms to properly rank by object and span- Allow for running as root of two environment variables are set.  Requested by Axel Huebl.- Fix a problem with building the Java bindings when using Java 10.  Thanks to Bryce Glover for reporting.- Fix a problem with ORTE not reporting error messages if an application  terminated normally but exited with non-zero error code.  Thanks to  Emre Brookes for reporting.

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] Announcing Open MPI v4.0.0rc1

2018-09-16 Thread Geoffrey Paulsen
The first release candidate for the Open MPI v4.0.0 release is posted at 
https://www.open-mpi.org/software/ompi/v4.0/
Major changes include:


4.0.0 -- September, 2018


- OSHMEM updated to the OpenSHMEM 1.4 API.
- Do not build Open SHMEM layer when there are no SPMLs available.
  Currently, this means the Open SHMEM layer will only build if
  a MXM or UCX library is found.
- A UCX BTL was added for enhanced MPI RMA support using UCX
- With this release,  OpenIB BTL now only supports iWarp and RoCE by default.
- Updated internal HWLOC to 2.0.1
- Updated internal PMIx to 3.0.1
- Change the priority for selecting external verses internal HWLOC
  and PMIx packages to build.  Starting with this release, configure
  by default selects available external HWLOC and PMIx packages over
  the internal ones.
- Updated internal ROMIO to 3.2.1.
- Removed support for the MXM MTL.
- Improved CUDA support when using UCX.
- Improved support for two phase MPI I/O operations when using OMPIO.
- Added support for Software-based Performance Counters, see
  https://github.com/davideberius/ompi/wiki/How-to-Use-Software-Based-Performance-Counters-(SPCs)-in-Open-MPI- Various improvements to MPI RMA performance when using RDMA
  capable interconnects.
- Update memkind component to use the memkind 1.6 public API.
- Fix problems with use of newer map-by mpirun options.  Thanks to
  Tony Reina for reporting.
- Fix rank-by algorithms to properly rank by object and span
- Allow for running as root of two environment variables are set.
  Requested by Axel Huebl.
- Fix a problem with building the Java bindings when using Java 10.
  Thanks to Bryce Glover for reporting.

Our goal is to release 4.0.0 by mid Oct, so any testing is appreciated.

 

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] Github deprecated "Github services" Does this affect us?

2018-06-07 Thread Geoffrey Paulsen
Devel,
 
   I just came across Github's deprecation announcement of Github Services.
https://developer.github.com/changes/2018-04-25-github-services-deprecation/   Does anyone know if this will affect Open-MPI at all, and do we need to change any processes because of this?
---Geoffrey Paulsen

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] Today's Open-MPI discussion notes highlighting potential new runtime approach.

2018-06-05 Thread Geoffrey Paulsen
All,   In today's Open MPI Web-Ex (Minutes here: https://github.com/open-mpi/ompi/wiki/WeeklyTelcon_20180605) we discussed the future of Open MPI ORTE runtime (mpirun / orteds, launching etc).  Nothing was decided, but please take a look and discuss on the mailing lists, and or come to next week's Web-Ex for more discussion.   This discussion is in the context of Open MPI v5.0 which we haven't yet decided on the schedule for (but v4.0 branches from master mid-July, and releases mid-Sept).
   We'd love to hear your input.   Thanks,   Geoff Paulsen

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] Today's Open MPI Web-Ex. 29 minutes.

2017-12-12 Thread Geoffrey Paulsen
Web-Ex: https://cisco.webex.com/ciscosales/j.php?MTID=me125278da54f7bbeb722fc30d5b73a2f
 
Minutes: https://github.com/open-mpi/ompi/wiki/WeeklyTelcon_20171212

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] Open MPI WebEx:

2017-04-04 Thread Geoffrey Paulsen
WebEx: https://cisco.webex.com/ciscosales/j.php?MTID=me125278da54f7bbeb722fc30d5b73a2f
 
Agenda / Minutes: https://github.com/open-mpi/ompi/wiki/WeeklyTelcon_20170404
 
+ MTT testing of Master
---Geoffrey PaulsenSoftware Engineer, IBM Spectrum MPIPhone: 720-349-2832Email: gpaul...@us.ibm.comwww.ibm.com

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] RFC: Open MPI v3.0 branching soon (next week). Move to date-based release

2017-02-21 Thread Geoffrey Paulsen
RFC: Open MPI v3.0 branching soon (next week). Move to date-based release
 
At the Face-to-face in San Jose (minutes: https://github.com/open-mpi/ompi/wiki/Meeting-Minutes-2017-01) we agreed that starting with v3.0, we would switch to three date-based releases each year.  These would be rleased on the 15th of the months Feb, June, and October.  At the face-to-face, we agreed for the first cycle, we would branch for v3.0 on June 15th, and release on October 15th (and branch for next release that same day).  Date based releases means we might have to ship with possibly critical bugs, but this gives us more motivation to get testing done EARLY.
 
In today's WebEx (minutes: https://github.com/open-mpi/ompi/wiki/WeeklyTelcon_20170221) we discussed accelerating the transition to the date-based releases so that we could ship v3.0 on June 15th.  To do this, we'd need to branch v3.x from master soon.  We set the preliminary date to branch v3.x from master next Tuesday Feb 28th.  What does the community think about this?  Can everyone who has new features destined for v3.0 get them into master, within a week?  Once v3.x branches, there would be only bugfixes only accepted to that branch.  The good news is that any features that won't make the v3.x branch, would go out in the next release (which we have decreed would be branched June 15th, and shipped October 15th).
 
We'd like to solicit input from the community in this thread on the devel list by Monday, Feb 27th.  Please answer the following questions:
 
1) Are you okay with branching for v3.0 Tuesday Feb 28th?  If not, please discuss reasons, and possible solutions.
 
2) Is anyone working on any new features that they'd like to get out with v3.0 but is not yet in master?  Remember if it misses v3.0, there will be another opportunity with v3.1 in 4 months.
 

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] New Open MPI Community Bylaws to discuss

2016-10-11 Thread Geoffrey Paulsen
We have been discussing new Bylaws for the Open MPI Community.  The primary motivator is to allow non-members to commit code.  Details in the proposal (link below).
 
Old Bylaws / Procedures:  https://github.com/open-mpi/ompi/wiki/Admistrative-rules
New Bylaws proposal: https://github.com/open-mpi/ompi/wiki/Proposed-New-Bylaws
 
Open MPI members will be voting on October 25th.  Please voice any comments or concerns.

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] Minutes from Telcon today

2016-01-26 Thread Geoffrey Paulsen
https://github.com/open-mpi/ompi/wiki/WeeklyTelcon_20160126
Open MPI Weekly Telcon
Dialup Info: (Do not post to public mailing list or public wiki)
 
Attendees
Geoff PaulsenJeff SquyresBrad BentonEdgar GabrielGeoffroy ValleeJoshua LaddNathan HjelmRalph CastainRyan GrantSylvain JeaugeyTodd Kordenbrock
 
Agenda
 
Review 1.10
Milestones: https://github.com/open-mpi/ompi-release/milestones/v1.10.31.10.2 went out the Door.Already have a bug (Giles) Ralph fixed.Another bug Fortran - broken F08 bindings (Jeff) saw late last night.
Issue https://github.com/open-mpi/ompi/issues/1323If it's broken, how did it pass testing? Jeff needs a day or two to dig into.Need to verify that library versions are still correct? -Jeff took care of.MPI_Abort investigation (Ralph)? - Periodically have this issue where MPI_Abort + MTT has some issue. Perl is suspect, Ralph will look into ruby or another language.1.10 C Strided mutex lock issue. (Nathan)?High CPU utilization on Async progress thread (Ralph)? Ralph Fixed... One off 1.10, not in master. In 1.10.2
 
Review 2.0.x
Wiki: https://github.com/open-mpi/ompi/wiki/Releasev20Blocker Issues: https://github.com/open-mpi/ompi/issues?utf8=%E2%9C%93=is%3Aopen+milestone%3Av2.0.0+label%3AblockerIssue 1252 - Nathan's progression decay function progress? Looking at files today.
udcm, openib_error_handler - opal_outputs would be sufficent.Issue 1215 - Group Comm Errors thing (Ralph) - Deal with race condition in ORTE collectives.
Launch goes down the tree. Mutex goes across the tree.So possible to receive a modex message before you receive launch message.Milestones: https://github.com/open-mpi/ompi-release/milestones/v2.0.0Group Comms weren't working for Comms of powers of 2. (Nathan)? Fixed.ROMIO default for OMPI on Luster (only) PR 896?894, 890, 900, 901 - Jeff and Howard are good with. Jeff?
Taking all of those merged.Issue 1292 - Asked Ralph if this is right way to fix this. (Ralph)Issue 1177 - large message writev, fixed but not merged to master - Test working everywhere but OS X / BSD (George).
OS X / BSD limits large message total size to 32K?Not going to fix for 2.0.0Someone can write code to handle OS X / BSD.Issue 1299 - hang (Nathan)? Need to go ahead an fix this today. Giles has patch, Nathan just needs to verify.2.0.0 does not compile on Solaris due to statfs(). Now that we moved to OMPIO, we're now hitting the problem.
Edgar is working on it. Solaris has different number of args and return code.Issue 1301 - check max CQ size before creating CQ. (Josh)
If it passes Jenkins, happy. UD OOB (Mellanox runs). Approved, Pending Jenkins.HWThreads - Ralph? Talk to Mike about use case? A commit has been done, and moved to 1.10.
Pinged Giles that it should go to 2.0 also.Travis Status on 2.0?
Going well.Nathan is good with 2.0 for 1sidedPR918 - Ralph reviewed on master. Giles PRed it to 2.0.PR919 - hwloc - Ralph will reviewPR911 - use correct endpoint. Just got word from nVidia that this is good.PR917 - Ryan will look at today. LANL hardware that hits this is going away. Doesn't affect Aries. Aries doesn't have get_alignment(). Want this in.
 
Review Master?
BTL flags = 305 perf got horrible? Edgar? Worked around by removing this on his cluster. Don't understand why. He always used to set it, but now doesn't.OMPIO not finding PDFS2 - configure work Edgar is
 
MTT status:
Cisco was showing timeouts. Jeff found 2 things on cluster. Specific problem couldn't replicate.
not handling OOB on Master or 1.10. Cisco cluster 4 or 5 IP addresses on each node. eth0 was down on one node. Timeout on eth0 was taking quite a while. Jeff removed those two nodes. Unusual for real world. OOB verbosity exposes.Long running problem, need a good solution.
 
Status Updates:
Cisco - Been working on Cluster, Release issues with Howard. have a couple of small scalability improvements for usNIC.ORNL - Not much to report. Any progress with UBUNTU package ownership? Geoffroy will look on Saturday.UTK - Not much to report.NVIDIA - Sylvain not much, A user issue not 

[OMPI devel] Please sign up on wiki if you're coming to Face 2 Face in Dallas Feb 23-25

2016-01-12 Thread Geoffrey Paulsen
Hello,
 
  Please sign up on the wiki (https://github.com/open-mpi/ompi/wiki/Meeting-2016-02) if you're planning to come to the Developer's conference hosted by IBM in Dallas [Feb 23-25].
 
  Thanks,
  Geoff
---Geoffrey PaulsenSoftware Engineer, IBM Platform MPIIBM Platform-MPIPhone: 720-349-2832Email: gpaul...@us.ibm.comwww.ibm.com



[OMPI devel] Minutes from today's Telcon

2016-01-12 Thread Geoffrey Paulsen
Also available here:
 
https://github.com/open-mpi/ompi/wiki/WeeklyTelcon_20160112
 
Open MPI Weekly Telcon
Dialup Info: (Do not post to public mailing list or public wiki)
 
Attendees
Brad BentonEdgar GabrielGeoffroy ValleeGeorgeHowardJosh HurseyNathan HjelmRalphRyan GrantSylvain JeaugeyTodd Kordenbrock
 
Minutes
 
Review 1.10
Milestones: https://github.com/open-mpi/ompi-release/milestones/v1.10.2
mpirun hangs on ONLY SLES 12. Minimum 40 procs/node. at very end of mpirun. Only seeing it in certain cases. Not sure what's going on.Is mpirun not exiting because ORTED not exiting? Nathan saw this on 2.0wait for Paul Hardgrove.No objections for Ralph shipping 1.10.2
 
Review 2.0.x
Wiki: https://github.com/open-mpi/ompi/wiki/Releasev20Blocker Issues: https://github.com/open-mpi/ompi/issues?utf8=%E2%9C%93=is%3Aopen+milestone%3Av2.0.0+label%3AblockerMilestones: https://github.com/open-mpi/ompi-release/milestones/v2.0.0Group Comms weren't working for Comms of powers of 2. Nathan found massive memory issue.https://github.com/open-mpi/ompi/issues/1252 - Nathan working on a decay function for progress functions to "fix" this.
Nathan's been delayed until later this week. Could get done by middle of next week.George commented that openib btl specificly could be made to only progress if there is a send/recv message posted.
ugeniee progress - could only check for data grams every (only 200ns hit).Prefer to stick with nathan's original decay function without modifying openib.https://github.com/open-mpi/ompi/issues/1225 - Totalview debugger problem + PMPI-x.
SLURM users use srun, doesn't have this issue.DDT does NOT have this issue either. Don't know why it's different. Attach FIFO.
mpirun waits on a pipe for debugger to write a 1 on that pipe.Don't see how that CAN work.Nathan's been using attach, rather than mpirun --debug. Attach happens after launch, so then it's not going through this step. Nathan thinks not so critical since attach works.Anything will work, as long as you're ATTACHING to a running job, rather than launching through debugger.Barring a breakthrough with PMI-x notify in next week. We'll do an RC2 and just carfully document what works/doesn't as far as debuggers.Will disable "mpirun --debug" and print an error on 2.0 branch that says it's broken.No longer a blocker for 2.0.0 due to schedule. Still want to fix this for next release.No new features (except for
Howard will reviewreview group commdon't know if we'll bother with pls filesystem.UXC using Modex stuff.OMPI-IO + Luster slow on 2.0.0 (and master) branches. Discussed making ROMIO default for OMPI on Luster (only).
 
Review Master?
Bunch of failures on Master branch. No chance to look at yet.Cisco and Ivy cluster.Nathan's seeing a resource deadlock avoided on OMPI Waitall. Some TCP BTL issue. Looks like something going on down there. Should be fairly easy to test this. Cisco TCP one-sided stuff.
Nathan will see if he can figure this out. Haven't changed one-sided pt2pt receintly. Surprised. Maybe proclocks on by default? Need to work this out. Just changed locks from being conditional to being unconditional.Edgar found some luster issues. OMPI master, has bad MPI-IO performance on luster. Looked reasonable on master, but now performance is poor. Not completely sure when get performance
Luster itself, could switch back to ROMIO for default.GPFS, and others will look good, but Luster is bad. Can't have OMPI-IO as default on Luster.Problem for 2.0.0 AND Master Branch.https://github.com/open-mpi/ompi/issues/398 ready for Pull request
Nathan - Should go to 2.1 (since mpull changes pushed to 2.1).https://github.com/open-mpi/ompi/pull/1118 - mpull rewrite should be ready to go, but want George to look at make comments. 

[OMPI devel] No meeting today 12/29/2015 either.

2015-12-29 Thread Geoffrey Paulsen
I think many people are out this week.
 
Please note that Ralph respun 1.10.2.rc3.
 
See everyone next Tuesday Jan 5th, 2016.  Have a Happy New Year!
 
 



[OMPI devel] Hotels for Feb Face 2 Face

2015-12-16 Thread Geoffrey Paulsen
I've updated the wiki to include a map of 3 hotels near DFW that offers a shuttle both to/from DFW and the IBM Innovation Center for those who wish to go without a car.
 
https://github.com/open-mpi/ompi/wiki/Meeting-2016-02
---Geoffrey PaulsenSoftware Engineer, IBM Platform MPIIBM Platform-MPIPhone: 720-349-2832Email: gpaul...@us.ibm.comwww.ibm.com



[OMPI devel] No Meeting 12/22/2015

2015-12-15 Thread Geoffrey Paulsen
In today's telcom we decided to skip next week's meeting.
 



[OMPI devel] Minutes from Weekly Telecon 12/15/2015

2015-12-15 Thread Geoffrey Paulsen
https://github.com/open-mpi/ompi/wiki/WeeklyTelcon_20151215
 
Also, reminder, NO meeting next week 12/22/2015.



[OMPI devel] Agenda 12/8

2015-12-07 Thread Geoffrey Paulsen
Open MPI Meeting 12/8/2015
--- Attendees --
Agenda:- Review 1.10  o  Milestones: https://github.com/open-mpi/ompi-release/milestones/v1.10.2  o  1.10.2 Release Candidate before the holidays?- Review 2.x  o  Wiki: https://github.com/open-mpi/ompi/wiki/Releasev20  o  Blocker issues: https://github.com/open-mpi/ompi/issues?utf8=%E2%9C%93=is%3Aopen+milestone%3Av2.0.0+label%3Ablocker + 1064 - Ralph / Jeff, is this do-able by december?     + Dynamic add procs is busted now when set value to 0 (not related to PMI-x)  o  Milestones:  https://github.com/open-mpi/ompi-release/milestones/v2.0.0 + One of us will go through ALL Issues for 2.0.0 to ask if they can be moved out to future release.  o RFC on embedded PMIx version handling  o RFC process wiki page?- MTT status- Status Update: LANL, Houston, HLRS, IBM
   --- Status Update Rotation Cisco, ORNL, UTK, NVIDIAMellanox, Sandia, IntelLANL, Houston, HLRS, IBM



[OMPI devel] Meeting Notes 12/1/2015

2015-12-01 Thread Geoffrey Paulsen
Open MPI Meeting 12/1/2015 
--- Attendees --Geoff PaulsenJeff SquyresGeoffroy ValeeHowardRyan GrantSylvain Jeaugey - new nVidia contact (replaces Rolf) previously at Bull Computing (10 years) lives in Santa Clara.Todd Kordenbrock
Agenda:- Solicit volunteer to run the weekly telecon- Review 1.10  o  Milestones: https://github.com/open-mpi/ompi-release/milestones/v1.10.2     + one PR for 1.10.2  (PR 782)     + Need someone to clarify on this, to resolve.     + After we decide if it's right, A core developer will need to create PR for Master.     + Rest of PRs are for 1.10.3 (March or April 2016?)  o  When do we want to start release work for 1.10.2?     + How about a 1.10.2 Release Candidate before the holidays?     + Ralph will send email about this to dev list to solicit discussion.- Review 2.x   o  Wiki: https://github.com/open-mpi/ompi/wiki/Releasev20  o  Blocker issues: https://github.com/open-mpi/ompi/issues?utf8=%E2%9C%93=is%3Aopen+milestone%3Av2.0.0+label%3Ablocker     + 1064 - Ralph / Jeff, is this do-able by december?     + Dynamic add procs is busted now when set value to 0 (not related to PMI-x)  o  Milestones:  https://github.com/open-mpi/ompi-release/milestones/v2.0.0     + One of us will go through ALL Issues for 2.0.0 to ask if they can be moved out to future release.  o RFC on embedded PMIx version handling     + PMI-x, once it's stablized, treat it just like hwloc or libevent.     + PMI-x would have seperate releases, and if Open MPI needs to cherry pick specific releases.     + PMI-x when it has a new release, we'll create a new directory and validate when it's ready to go remove older ones.     + PMI-x Ralph will create a tarball for that.     + PMI-x - Needs to be in 2.0.0, need to update it and go to right naming convention while do that.  o RFC process - Every so often there are 'big deal' issues / PR requests.  It's hard to spot these BIG ones.     + Ralph proposing that if you're making a major change or change to core-code:          Send RFC to devel list before you do it!  (and again with PR when it's ready, put "RFC" in PR title.)     + Good idea to send out RFC before you start to do it, then others can give a heads up or comment.     + Prevent potential conflicts of parallel development.        Howard - Nice to have affected components, and reason for wanting change.        Jeff - Had a nice format for RFC's before.  Short / Long versions.  Might want to nail down.        Jeff - Propose we put "RFC" in PR title.        Jeff - should the body and format be in PR        Discussion about proposed work should be on devel email.        Discussion about already written code is on PR, and        Jeff proposes a wiki page describing this process.    Where - what does it affect.    When - when can we discuss?  Give at least 1 week for others to reply.    What - summary    Why - Some justification, better than "I was board".    Down below deeper discussion.o Supercomputing reports    + OMPI BoF went well.  Over 100 people in room.  Slides on OMPI website, and on Jeff's Blog.    + People appreciated the BoF format of "status, roadmap, what's going well, what needs more attention, etc"    + PMI-x Bof Went well too.  Scaling improvements went REALLY well.    + PMI-x showed really good slope, they thought it was wire up times of daemons.      Mellanox needs to remove requirement to remove LID and GID, but still like a yearo Status Update: Mellanox, Sandia, Intel   + Mellanox (via Ralph)      1. Artem will be working with Ralph et al. to finish off the OMPI side issues in PMIx integration.      2. Igor Ivanov will continue to fix memory corruption bugs uncovered in Valgrind.      3. Artem and Igor will start looking at making the necessary changes to UCX PML to use the direct modex.      4. Mellanox plans to submit UCX PML for inclusion in 1.10.3.      5. Mellanox plans to submit missing routines needed for OSHMEM 1.2 spec compliance for inclusion in 1.10.3. Igor Ivanov will be leading this.   + Sandia (ryan Grant)      - Put Portals triggered Ops on master.  Will run tests there for a while and then put PR for 2.0 branch.   + Intel (Ralph)     - PMI-x Working on Pull Requests.     - HPC stuff occupying alot of his time.  Announcing Open HPC to create a community distributions optimized for HPC.        - Building on top of OPAL.o Howard has request for Slyvan / nVidia   + Slyvan stopped Rolf's MTT yesterday, hoping to have it back by end of the week.   + MLX5 HCAs - on master there are lots of errors, not sure because of software.   + Nvidia cluster shows up bugs before other clusters.   + Right now master under defaults running really clean.  But turning on dynamic add-procs is showing lots of issue in Comm_dup, and other Comm creation code.   --- Status Update Rotation LANL, Houston, HLRS, IBMCisco, ORNL, UTK, NVIDIAMellanox, Sandia, Intel



[OMPI devel] Doodle to find time to discuss issues/398

2015-11-03 Thread Geoffrey Paulsen
Anyone interested, please add your name to doodle, and we'll find a time that everyone can meet.
 
http://doodle.com/poll/3gk6bx4dzgrpsqva
 
--- Agenda ---
 
In the Open MPI call today we discussed a few aspects of https://github.com/open-mpi/ompi/issues/3981) Moving ompi_info_t down to opal_info_t to allow lower level components access to this functionality2) Implementing OMPI_Comm_Info get/set in a way that can be reused for Windows and Files also.   There are a number of issues around how the standard words the return values from get() that are left up to the implementation, for example:  - values for non-explicitly set keys that the MPI layer is using.  - values for non-explicitly set keys that the MPI layer is not using.  - values for explicitly overwritten values.  - Communication (to user via docs??) of what hints Open MPI recognizes.  - Communication (to user via docs??) of what values are required to be the same/different on all ranks of Comm.  - additional consistancy checking of values in debugging mode?  - ability to print/log unrecognized hints.  
---Geoffrey PaulsenSoftware Engineer, IBM Platform MPIIBM Platform-MPIPhone: 720-349-2832Email: gpaul...@us.ibm.comwww.ibm.com



[OMPI devel] IBM Innovation Center Reserved for Open MPI Face-2-Face

2015-10-20 Thread Geoffrey Paulsen
We have the Dallas IBM Innovation Center (http://ibm.com/partnerworld/iic/dallas.htm) reserved 2/23 - 2/25, 2016.
 
IBM Innovation Center - Dallas1177 South Beltline RdCoppell, TX 75019469-549-8444
 
https://www.google.com/maps/place/IBM+Innovation+Center+-+Dallas/@32.942725,-96.9965226,17z/data="">
 
 
We've reserved two rooms, a large classroom "Hollerith" from 8:30am - 5pm each day: 
There is also a "Think Bar" for us to lounge about in our PJs and eat lunch.  I think it's not as "Bar" like as some would prefer.
 
---Geoffrey PaulsenSoftware Engineer, IBM Platform MPIIBM Platform-MPIPhone: 720-349-2832Email: gpaul...@us.ibm.comwww.ibm.com


[MTT devel] Detail link error

2015-08-21 Thread Geoffrey Paulsen
At http://mtt.open-mpi.org/index.php?_start_timestamp=past+24+hours_start_timestamp=show_http_username=all_http_username=show_local_username=all_local_username=hide_platform_name=all_platform_name=show_platform_hardware=all_platform_hardware=show_os_name=all_os_name=show_mpi_name=all_mpi_name=show_mpi_version=all_mpi_version=show_suite_name=all_suite_name=show_test_name=all_test_name=hide_np=all_np=show_full_command=_full_command=show=test_run_trial=
If I click on Details, I get:Fatal error: Allowed memory size of 67108864 bytes exhausted (tried to allocate 80 bytes) in /nfs/data/osl/www/mtt.open-mpi.org/reporter/dashboard.inc on line 267
 
---Geoffrey PaulsenSoftware Engineer, IBM Platform MPIIBM Platform-MPIPhone: 720-349-2832Email: gpaul...@us.ibm.comwww.ibm.com



Re: [OMPI devel] OMPI_PROC_BIND value is invalid errors

2015-06-30 Thread Geoffrey Paulsen

I discussed with Robert Ho who was working with Ralph on this option.  He
believes it's possible that the PGI compiler / runtime does not understand
OMP_PROC_BIND=SPREAD which was only introduced in OpenMP 4.0.

Unfortunately I can't find any docs as the http://www.pgroup.com/index.htm
is down right now.

We have PGI version 11.8 which only support OpenMP version 3.0, and does
not list OMP_PROC_BIND at all.

in 11.8, PGI supported MP_BIND=yes which would request the PGI runtime
libraries to bind processes or threads in a parallel region to phsyical
processors (default is no).
It also supported MP_BLIST=a,b,c,d  (when MP_BIND was set to yes to map how
you wanted threads or processes bound to physical processors 0,1,2,3.

There is a note in the documentation that setting MP_BIND does NOT affect
the compiler behavior at all, only the runtime library.


Regards,

Geoffrey (Geoff) Paulsen
Software Engineer - Platform MPI
   
   
   
 Phone: 1-720-349-2832  
IBM
 E-mail: gpaul...@us.ibm.com   
1177 S Belt 
Line Rd
 Coppell, TX 
75019-4642
  United 
States
   





From:   Howard Pritchard 
To: Open MPI Developers 
List-Post: devel@lists.open-mpi.org
Date:   06/29/2015 09:27 PM
Subject:Re: [OMPI devel] OMPI_PROC_BIND value is invalid errors
Sent by:"devel" 



I decided just to disable the carver/pgi mtt runs.


2015-06-29 15:10 GMT-06:00 Ralph Castain :
  Very strange then - again, can you run it with the verbose flag and send
  me the output? I can't replicate what you are seeing.


  On Mon, Jun 29, 2015 at 4:05 PM, Howard Pritchard 
  wrote:
   ibm dataplex and laki ~= cray.  nothing to do with cray.
   Cray runs fine since I use aprun there.


   2015-06-29 13:54 GMT-06:00 Ralph Castain :
 Hmmm...is this some Cray weirdness? I checked the code and it looks
 right, and it runs correctly for me on both Mac and Linux. All it is
 doing is calling "setenv", so I'm wondering if there is something
 environ-specific going on here?

 I added some debug in cast that might help - can you run it on the
 Cray with "--mca rtc_base_verbose 5" on the cmd line?


 On Mon, Jun 29, 2015 at 1:19 PM, Jeff Squyres (jsquyres) <
 jsquy...@cisco.com> wrote:
  Ahh... it's OMP_PROC_BIND, not OMPI_PROC_BIND.

  Yes, Ralph just added this.

  I chatted with him about this on the phone moments ago; he's pretty
  sure he knows where to go look to find the problem.


  > On Jun 29, 2015, at 12:00 PM, Howard Pritchard  wrote:
  >
  > laki is also showing the errors:
  >
  >
  > Here's the shortened url:
  >
  > http://goo.gl/Ra264U
  >
  > looks like the badness started with the latest nightly.
  > I think there was some activity in the orte binding area recently.
  >
  > Howard
  >
  >
  >
  >
  > 2015-06-29 9:52 GMT-06:00 Jeff Squyres (jsquyres) <
  jsquy...@cisco.com>:
  > Can you provide an MTT short URL to show the results?
  >
  > Or, if the MTT results are not on the community reporter, can you
  show a bit more context in the output?
  >
  >
  > > On Jun 29, 2015, at 11:47 AM, Howard Pritchard <
  hpprit...@gmail.com> wrote:
  > >
  > > Hi Folks,
  > >
  > > I'm seeing an error I've not seen before in the MTT runs on the
  ibm dataplex
  > > at NERSC.  The mpirun launched jobs are failing with
  > >
  > > OMPI_PROC_BIND value is invalid
  > >
  > > errors.
  > >
  > > This is is for the trivial ring tests.
  > >
  > > Is anyone else seeing these types of errors?
  > >
  > > Howard
  > >
  > > ___
  > > devel mailing list
  > > de...@open-mpi.org
  > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
  > > Link to this post:
  http://www.open-mpi.org/community/lists/devel/2015/06/17558.php
  >
  >
  > --
  > Jeff Squyres
  > jsquy...@cisco.com
  > For corporate legal information go to:
  http://www.cisco.com/web/about/doing_business/legal/cri/
  >
  > ___
  > devel mailing list
  > de...@open-mpi.org
  > Subscription: