Re: [OMPI devel] Fortran 2018 ISO_Fortran_binding.h will be available from gcc 9

2019-04-09 Thread Jeff Squyres (jsquyres) via devel
To follow up for the web archives: see Gilles' PR for this in 
https://github.com/open-mpi/ompi/pull/6569.


> On Apr 3, 2019, at 12:33 AM, Gilles Gouaillardet  wrote:
> 
> Folks,
> 
> 
> FYI, and as posted by Damian Rouson in the de...@mpich.org ML
> 
> 
>> The upcoming GCC 9 release will contain the first version of gfortran that 
>> provides the Fortran 2018 ISO_Fortran_binding.h header file, which MPICH 
>> requires in order to build MPI 3's mpi_f08 module.  If anyone is interested 
>> in testing, please checkout the current GCC trunk and submit any related 
>> issues to the gfortran developers at fort...@gcc.gnu.org 
>> .
>> 
>> I'm cc'ing Paul Thomas, who contributed the ISO_Fortran_binding patch to 
>> gfortran under contract for Sourcery Institute.
>> 
> 
> 
> 
> As far as I am aware of, only Intel and Cray compilers currently support this 
> feature.
> 
> 
> Cheers,
> 
> Gilles
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Face to face meeting: need agenda items

2019-03-26 Thread Jeff Squyres (jsquyres) via devel
We talked about this on the weekly Webex today and decided two things:

1. There are not enough agenda items that required a face-to-face meeting.  We 
have therefore decided to cancel this meeting; we'll see everyone at the next 
face-to-face meeting (which will be in the summer / fall / whenever makes 
sense).

We're really sorry for those who already booked airline / hotel tickets.  There 
just aren't enough topics to justify a face-to-face meeting this time. :-(

2. There is one sizable topic that really could benefit from a face-to-face 
discussion: PRRTE.  It's unfortunately not enough to justify using the existing 
3-day timeslot for the existing face-to-face meeting, however.  So a separate 
Doodle will be sent around, aiming at a 1-day meeting just to talk about / work 
on PRRTE.  More details will be included in that proposal (location, webex, 
...etc.).




> On Mar 25, 2019, at 11:30 AM, Jeff Squyres (jsquyres) via devel 
>  wrote:
> 
> I do not see any new additions to the agenda since last week:
> 
>https://github.com/open-mpi/ompi/wiki/Meeting-2019-04
> 
> If we don't get a substantial set of new items on the agenda by tomorrow, 
> it's going to be tempting to cancel this face-to-face meeting (i.e., defer it 
> to summer/fall when we have things that will be useful to all be together in 
> a room to discuss).
> 
> 
>> On Mar 19, 2019, at 11:44 AM, Jeff Squyres (jsquyres)  
>> wrote:
>> 
>> Folks --
>> 
>> The agenda for the face-to-face meeting is pretty light 
>> (https://github.com/open-mpi/ompi/wiki/Meeting-2019-04).
>> 
>> If we don't get enough items in the agenda, we might want to cancel the 
>> meeting.
>> 
>> *** If you have agenda items, please put them on the wiki ASAP ***
>> 
>> We should probably make a decision on whether to cancel during next 
>> Tuesday's webex (I'm guessing some of you may have booked travel already).
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> 
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Face to face meeting: need agenda items

2019-03-25 Thread Jeff Squyres (jsquyres) via devel
I do not see any new additions to the agenda since last week:

https://github.com/open-mpi/ompi/wiki/Meeting-2019-04

If we don't get a substantial set of new items on the agenda by tomorrow, it's 
going to be tempting to cancel this face-to-face meeting (i.e., defer it to 
summer/fall when we have things that will be useful to all be together in a 
room to discuss).


> On Mar 19, 2019, at 11:44 AM, Jeff Squyres (jsquyres)  
> wrote:
> 
> Folks --
> 
> The agenda for the face-to-face meeting is pretty light 
> (https://github.com/open-mpi/ompi/wiki/Meeting-2019-04).
> 
> If we don't get enough items in the agenda, we might want to cancel the 
> meeting.
> 
> *** If you have agenda items, please put them on the wiki ASAP ***
> 
> We should probably make a decision on whether to cancel during next Tuesday's 
> webex (I'm guessing some of you may have booked travel already).
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] Proposal: Github "stale" bot

2019-03-19 Thread Jeff Squyres (jsquyres) via devel
I have proposed the use of the Github Probot "stale" bot:

https://probot.github.io/apps/stale/
https://github.com/open-mpi/ompi/pull/6495

The short version of what this bot does is:

1. After a period of inactivity, a label will be applied to mark an issue as 
stale, and optionally a comment will be posted to notify contributors that the 
Issue or Pull Request will be closed.

2. If the Issue or Pull Request is updated, or anyone comments, then the stale 
label is removed and nothing further is done until it becomes stale again.

3. If no more activity occurs, the Issue or Pull Request will be automatically 
closed with an optional comment.

Specifically, the PR I propose sets the Stalebot config as:

- After 60 days of inactivity, issues/PRs will get a warning
- After 7 more days of inactivity, issues/PRs will be closed and the "Auto 
closed" label will be applied
- Issues/PRs with the "help wanted" or "good first issue" will be ignored by 
the Stalebot

Thoughts?

If we move ahead with this: given that this will apply to *all* OMPI 
issues/PRs, we might want to take a whack at closing a whole pile of old 
issues/PRs first before unleashing the Stalebot.

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] Face to face meeting: need agenda items

2019-03-19 Thread Jeff Squyres (jsquyres) via devel
Folks --

The agenda for the face-to-face meeting is pretty light 
(https://github.com/open-mpi/ompi/wiki/Meeting-2019-04).

If we don't get enough items in the agenda, we might want to cancel the meeting.

*** If you have agenda items, please put them on the wiki ASAP ***

We should probably make a decision on whether to cancel during next Tuesday's 
webex (I'm guessing some of you may have booked travel already).

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Help regarding Openmpi source code

2019-03-12 Thread Jeff Squyres (jsquyres) via devel
Additionally, the code for implementing the PMPI interface is done through 
#define's.

Depending on the platform, it happens either as Clement describes (with 
#defines for MPI_Foo --> PMPI_Foo) or if the platform doesn't support weak 
symbols (e.g., MacOS), the source code for the C API functions is essentially 
compiled twice:

1. once as "normal"
2. a second time with #define MPI_Foo PMPI_Foo (for all values of Foo)



> On Mar 4, 2019, at 12:33 PM, Clement Foyer  wrote:
> 
> Hi,
> 
> The reason you cannot find it is because it actually is not define as such. 
> The PMPI interface is defined as the profiling interface. The principle is 
> you have the real symbol in your shared library that is PMPI_Send, with a 
> weak alias being MPI_Send. So, if a third party want to intercept calls to 
> MPI_Send (as an example, to count how many times this function is called), he 
> just need to provide its own library that present the MPI_Send symbol, does 
> its work (e.g., increment some counter to keep track of how many times this 
> function was called by the user), and then call the PMPI_Send function 
> provided by Open MPI. As the ompi symbol for MPI_Send is defined as weak, it 
> won’t collide with the already defined symbol and thus simply be ignored 
> instead of rising a warning at link time. 
> 
> Note: in order to have the third party library loaded in time such as its 
> MPI_Send symbol is considered and not the Open MPI one, you need to set your 
> LD_PRELOAD variable accordingly.
> 
> You would end up with user code calling MPI_Send provided by third party, 
> calling PMPI_Send provided by the Open MPI library. Without this third party 
> library, you have user code that call MPI_Send which is actually an alias for 
> PMPI_Send provided by Open MPI library.
> 
> There might be some other implications, but I may let the people more 
> proficient than myself explain them :)
> 
> Regards,
> Clément
> 
> On 04/03/2019, 16:45, "devel on behalf of vishakha patil" 
>  mailto:pvishakha.offic...@gmail.com> wrote:
> 
> Hi,
> 
> Greetings for the day! 
> 
> I am a MTech(Computer) student of Savitribai Phule Pune University, 
> Maharashtra, India.
> For my project purpose I have downloaded nightly snapshot tarball of openmpi 
> v4.0.x series.
> It got build successfully. But while traversing the code (manually as well as 
> using cscope) I am not able to find the implementation of PMPI_* (PMPI_Send 
> etc) which is getting called from MPI_Send function of send.c file.
> I have gone through the readme and make files but not able to find it. Could 
> you please help me with the same? 
> It is required to get complete understanding of openmpi algorithms as it is 
> part of my MTech project.
> 
> Please let me know if any other details required. Thank you!
> 
> Regards,
> Vishakha
> 
> ___ devel mailing list 
> devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel
> 
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] Gentle reminder: sign up for the face to face

2019-02-26 Thread Jeff Squyres (jsquyres) via devel
Gentle reminder to please sign up for the face-to-face meeting and add your 
items to the wiki:

https://github.com/open-mpi/ompi/wiki/Meeting-2019-04

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] Queued up Open MPI mails

2019-02-07 Thread Jeff Squyres (jsquyres) via devel
As you can probably tell from the floodgate of backlogged Open MPI mails that 
probably just landed in your inbox, there was some kind of issue at our mail 
list provider (but it only affected some of our lists).  They just released all 
the backlogged emails.

Enjoy the onslaught!

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] Open MPI face-to-face

2019-02-07 Thread Jeff Squyres (jsquyres) via devel
It has been settled: Tue Apr 23 - Thu Apr 25, 2019, in San Jose, CA (probably 
at Cisco).

Please add your names and agenda items:

https://github.com/open-mpi/ompi/wiki/Meeting-2019-04

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] One-sided tests in MTT

2019-02-07 Thread Jeff Squyres (jsquyres) via devel
I'm re-sending this back to the devel list, because I just now realized that 
Nathan sent this to the devel list from an address that was not subscribed, so 
it was rejected / never went through to the entire list (I got it because I was 
CC'ed).


> On Jan 30, 2019, at 1:55 PM, Nathan Hjelm  wrote:
> 
> 
> For rma-mt:
> 
> Build:
> autoreconf -vif
> configure
> 
> Run:
> cd src
> mpirun  -n 2 -N 1 (if multinode) ./rmamt_bw -o put (or get) -s flush (pscw, 
> fence, lock, etc) -t  -x (binds threads to cores)
> 
> If that exits successfully then that is a pretty good smoke-test that 
> multi-threaded RMA is working.
> 
> ARMCI:
> 
> Build:
> ./autogen.pl
> ./configure
> make
> 
> Run:
> make check
> 
> ARMCI was broken with osc/rdma for a couple of released and we didn't know. 
> It is worth running the checks with OMPI_MCA_osc=sm,rdma, 
> OMPI_MCA_osc=sm,pt2pt, and OMPI_MCA_osc=sm,ucx to test each possible 
> configuration.
> 
> -Nathan
> 
> On Jan 30, 2019, at 11:26 AM, "Jeff Squyres (jsquyres) via devel" 
>  wrote:
> 
>> Yo Nathan --
>> 
>> I see you just added 2 suites of one-sided tests to the MTT repo. Huzzah!
>> 
>> Can you provide some simple recipes -- frankly, for someone who doesn't 
>> want/care to know how the tests work :-) -- on how to:
>> 
>> 1. Build the test suites
>> 2. Run in MTT
>> 
>> Thanks!
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> 
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] MPIX FP16 datatypes

2019-01-31 Thread Jeff Squyres (jsquyres) via devel
All --

Fujitsu has submitted a comprehensive PR to add FP16 datatypes to OMPI under 
"MPIX_*" names.

It adds quite a bit of infrastructure in the datatype and op areas, and then 
exposes that infrastructure through "MPIX_" names in an mpiext extension.

>From a technical standpoint, this PR seems to be just about ready.  But it's a 
>big change, and we'd like a few more eyes on it.  Here's some information from 
>KAWASHIMA Takahiro:

All, could you comment if you have opinions? I am about to merge FP16 (half 
precision floating point) datatype support. Corresponding C/C++ types are not 
yet standardized but they are proposed in ISO/IEC WGs. The background is 
described in a issue and a 
slide
 in the MPI Forum. Links to related pages are listed in my 
page.

Please respond by COB Thursday, Feb 7, 2019.  Thanks.

--
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] One-sided tests in MTT

2019-01-30 Thread Jeff Squyres (jsquyres) via devel
Yo Nathan --

I see you just added 2 suites of one-sided tests to the MTT repo.  Huzzah!

Can you provide some simple recipes -- frankly, for someone who doesn't 
want/care to know how the tests work :-) -- on how to:

1. Build the test suites
2. Run in MTT

Thanks!

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] Location poll for April Open MPI dev face-to-face meeting

2019-01-22 Thread Jeff Squyres (jsquyres) via devel
The week has been chosen: April 22.

Now take this poll to register your preference for the location (we're still 
checking availability, but this at least gets everyone's preferences down):


https://docs.google.com/forms/d/e/1FAIpQLSdrJw7xfVNo3nAfoB4dsnMu7ihiZ0WCjglo2KBZqvY_3BZkkg/viewform

Please fill this out by CBO Friday, Jan 25, 2019.

Thanks.

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] Doodle poll for April face-to-face meeting

2019-01-16 Thread Jeff Squyres (jsquyres) via devel
Please don't forget to fill out the Doodle poll to select the week for the next 
Open MPI developer's face-to-face meeting by COB THIS FRIDAY, JANUARY 18 2019:

https://doodle.com/poll/vvvzwuiizy7mnx64

Thanks!

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] Item for next Tuesday's agenda: remove openib BTL

2019-01-15 Thread Jeff Squyres (jsquyres) via devel
Rats -- I forgot to get this on this morning's webex agenda.

Geoff Paulsen -- please add this to next Tuesday's agenda:

"Remove openib and affiliated stuff"
https://github.com/open-mpi/ompi/pull/6270

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] Next Open MPI face-to-face dev meeting

2019-01-15 Thread Jeff Squyres (jsquyres) via devel
Here's a Doodle poll to pick the week of the next face-to-face meeting.  We're 
targeting the April(ish) timeframe:

https://doodle.com/poll/vvvzwuiizy7mnx64

Please fill out this doodle by the end of this week (i.e., by COB Friday, 18 
Jan 2019).

Thank you!

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] For discussion on the Webex tomorrow

2019-01-14 Thread Jeff Squyres (jsquyres) via devel
All --

I'd like to discuss this on the webex tomorrow:

https://github.com/open-mpi/ompi/issues/6278

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] Time to remove openib BTL?

2019-01-10 Thread Jeff Squyres (jsquyres) via devel
There's a compile issue in the openib BTL on master - see 
https://github.com/open-mpi/ompi/issues/6265.

It's probably not hard to fix, but... do we care?

Is it just time to remove the openib BTL on master?

Per https://github.com/open-mpi/ompi/wiki/5.0.x-FeatureList, the openib BTL is 
on the chopping block, anyway...

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Did someone enable Travis?

2019-01-08 Thread Jeff Squyres (jsquyres) via devel
It looks like Travis was enabled within the last day or so -- 
https://travis-ci.org/open-mpi/ompi/pull_requests shows 3 Travis builds on PRs, 
all submitted within the last 24 hours.  The last PR build before that was 2 
years ago.

I've taken the liberty of re-disabling Travis.

Does anyone know how it got re-enabled?  I.e., was it intentional / we should 
turn it back on / fix it?


> On Jan 8, 2019, at 5:57 PM, Jeff Squyres (jsquyres)  
> wrote:
> 
> Did someone enable Travis CI on GitHub:open-mpi/ompi?
> 
> I thought we had specifically disabled Travis after we kept running into 
> problems with it...?
> 
> I ask because it's failing on some PRs for reasons that seem to have nothing 
> to do with the PR.  I don't know if our Travis setup has bit rotted, if 
> there's a genuine problem, or if Travis is just acting wonky...
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] Did someone enable Travis?

2019-01-08 Thread Jeff Squyres (jsquyres) via devel
Did someone enable Travis CI on GitHub:open-mpi/ompi?

I thought we had specifically disabled Travis after we kept running into 
problems with it...?

I ask because it's failing on some PRs for reasons that seem to have nothing to 
do with the PR.  I don't know if our Travis setup has bit rotted, if there's a 
genuine problem, or if Travis is just acting wonky...

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Help needed debugging openmpi 3.1 builds for Fedora

2018-12-17 Thread Jeff Squyres (jsquyres) via devel
Orion --

Thanks for the bug report; I filed it here:

https://github.com/open-mpi/ompi/issues/6200


> On Dec 16, 2018, at 9:55 PM, Orion Poplawski  wrote:
> 
> On 12/15/18 1:13 PM, devel@lists.open-mpi.org wrote:
>> I'm testing out rebuilding Fedora packages with openmpi 3.1 in Fedora COPR:
>> https://copr.fedorainfracloud.org/coprs/g/scitech/openmpi3.1/builds/
>> A number of packages are failing running tests only on Fedora Rawhide x86_64 
>> with processes killed with signal 4 (Illegal instruction).  For example:
>> + 
>> PYTHONPATH=/builddir/build/BUILDROOT/mpi4py-3.0.0-6.git39ca78422646.fc30.x86_64/usr/lib64/python2.7/site-packages/openmpi
>>  + mpiexec -n 1 python2 test/runtests.py -v --no-builddir 
>> --thread-level=serialized -e spawn
>> BUILDSTDERR: 
>> --
>> BUILDSTDERR: Primary job  terminated normally, but 1 process returned
>> BUILDSTDERR: a non-zero exit code. Per user-direction, the job has been 
>> aborted.
>> BUILDSTDERR: 
>> --
>> BUILDSTDERR: 
>> --
>> BUILDSTDERR: mpiexec noticed that process rank 0 with PID 0 on node 
>> 656ae442c6bf45fe9b45c5481f41bc45 exited on signal 4 (Illegal instruction).
>> BUILDSTDERR: 
>> --
>> Unfortunately I have been unable to reproduce this in any local mock builds. 
>>  So I'm left wondering if this is some kind of peculiarity with the COPR 
>> builders or if there is a real problem with openmpi.  Any suggestions for 
>> how to further debug this would be greatly appreciated.
>> (PID 0 seems very odd - that seems to be the same in the different failures)
>> - Orion
> 
> I believe I have tracked this down to the libpsm2 library.  If anyone here is 
> interested, I've filed a bug here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1659852
> 
> -- 
> Orion Poplawski
> Manager of NWRA Technical Systems  720-772-5637
> NWRA, Boulder/CoRA Office FAX: 303-415-9702
> 3380 Mitchell Lane   or...@nwra.com
> Boulder, CO 80301 https://www.nwra.com/
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] 2.1.6rc1

2018-11-28 Thread Jeff Squyres (jsquyres) via devel
I just pushed rc1 for what we hope to be the final final final no really trust 
me final version of the 2.1.x series.  We may sit on this release until 
January, just because everyone is still recovering from SC'18+US Thanksgiving 
holiday, and we have the MPI Forum next week, and the Christmas holiday shortly 
thereafter.  That being said, we don't expect many (any?) more additions to the 
2.1.6 before release.

   https://www.open-mpi.org/software/ompi/v2.1/

2.1.6rc1 includes the following fixes:

- Update the openib BTL to handle a newer flavor of the
  ibv_exp_query() API.  Thanks to Angel Beltre (and others) for
  reporting the issue.
- Fix a segv when specifying a username in a hostfile.  Thanks to
  StackOverflow user @derangedhk417 for reporting the issue.
- Work around Oracle compiler v5.15 bug (which resulted in a failure
  to compile Open MPI source code).
- Disable CUDA async receive support in the openib BTL by default
  because it is broken for sizes larger than the GPUDirect RDMA
  limit.  User can set the MCA variable btl_openib_cuda_async_recv to
  true to re-enable CUDA async receive support.
- Various atomic and shared memory consistency bug fixes, especially
  affecting the vader (shared memory) BTL and PMIx.
- Add openib BTL support for BCM57XXX and BCM58XXX Broadcom HCAs.
- Fix segv in oob/ud component.  Thanks to Balázs Hajgató for
  reporting the issue.

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] Open MPI SC'18 State of the Union BOF slides

2018-11-16 Thread Jeff Squyres (jsquyres) via devel
Thanks to all who came to the Open MPI SotU BOF at SC'18 in Dallas, TX, USA 
this week!  It was great talking with you all.

Here are the slides that we presented:

https://www.open-mpi.org/papers/sc-2018/

Please feel free to ask any followup questions on the users or devel lists.

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Openmpi configure: Could not determine the fortran compiler flag

2018-10-22 Thread Jeff Squyres (jsquyres) via devel
> On Oct 22, 2018, at 12:58 PM, Santiago Serebrinsky  
> wrote:
> 
> Precisely, that was the problem.
> 
> I disabled Fortran support to move ahead and see if I could manage that way 
> (and I found I couldn't!), at least in some of my uses. But I (and perhaps 
> others as well) would still need to have Fortran support, so the issue is not 
> moot at all.

Ah, ok.

I did see a curious error in your config.log:

-
12164 configure:62128: gfortran   conftest.f90 -Isubdir   -lz
12165 f951.exe: Fatal Error: Reading module 'ompi_mod_flag' at line 21 column 
49: Unexpected EOF
12166 compilation terminated.^M
12167 configure:62135: $? = 1
-

Which looks like it failed to compile a program that used the test Fortran 
module that configure created.

Specifically, the overall test is here:

https://github.com/open-mpi/ompi/blob/master/config/ompi_fortran_find_module_include_flag.m4#L36-L72

It basically does this:

1. Make a "subdir"
2. Cd into that "subdir"
3. Compile a trivial Fortran program that should emit a Fortran module 
4. Cd back into ..
5. Try compiling a trivial Fortran test program that uses the module that was 
just emitted, using a few different CLI options to specify the subdir where the 
test Fortran module can be found

The first option -- "-I" -- seems to work, but it seems to think that the 
emitted Fortran module is invalid.  That's where we get that config.log error.

I admit that I'm a bit confused as to why gfortran thinks the module file is 
invalid ("Unexpected EOF").  You might want to try replicating what the test is 
doing manually to see if your gfortran really is emitting invalid modules...?

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Openmpi configure: Could not determine the fortran compiler flag

2018-10-22 Thread Jeff Squyres (jsquyres) via devel
For what it's worth, it looks like Open MPI's configure found your mingw 
fortran compiler, but was unable to determine what flag to use to find fortran 
modules -- that's what caused configure to abort.

From your later messages, it looks like you just ended up disabling Fortran 
support, so this is somewhat moot, but I wanted to tie up this email thread for 
the archive.


> On Oct 21, 2018, at 6:01 PM, Santiago Serebrinsky  
> wrote:
> 
> Compiler:
>   $ which gfortran.exe
>   /mingw64/bin/gfortran.exe
> 
> I am attaching config.log (renamed to keep track of the error produced).
> 
> PS: To try moving further, I did 
>  ./configure --prefix=$HOME/usr/local --disable-mpi-fortran
> which led me to a later error. This is posted in a separate thread.
> 
> 
> On Sun, Oct 21, 2018 at 2:25 PM Jeff Squyres (jsquyres) via devel 
>  wrote:
> Also, please send the entire output from configure as well as the config.log 
> file (please compress).
> 
> Thanks!
> 
> 
> > On Oct 21, 2018, at 4:08 AM, Marco Atzeri  wrote:
> > 
> > Am 21.10.2018 um 09:56 schrieb Santiago Serebrinsky:
> >> Hi all,
> >> I am using Msys2 from PortableApps under Win10. More precisely,
> >> |$ uname -a MSYS_NT-10.0-WOW Galapagos 2.11.1(0.329/5/3) 2018-09-10 13:25 
> >> i686 Msys |
> >> I mean to install openmpi. Since I found no pre-built package (I would 
> >> love to have it!), I downloaded openmpi-3.1.2. When I
> >> |./configure --prefix=$HOME/usr/local |
> >> after many config detections, I get
> >> |checking for Fortran compiler module include flag... configure: WARNING: 
> >> *** Could not determine the fortran compiler flag to indicate where 
> >> modules reside configure: error: *** Cannot continue|
> > 
> > what fortran compiler do you have ?
> > 
> > ---
> > Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
> > https://www.avast.com/antivirus
> > 
> > ___
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
> 


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Openmpi configure: Could not determine the fortran compiler flag

2018-10-21 Thread Jeff Squyres (jsquyres) via devel
Also, please send the entire output from configure as well as the config.log 
file (please compress).

Thanks!


> On Oct 21, 2018, at 4:08 AM, Marco Atzeri  wrote:
> 
> Am 21.10.2018 um 09:56 schrieb Santiago Serebrinsky:
>> Hi all,
>> I am using Msys2 from PortableApps under Win10. More precisely,
>> |$ uname -a MSYS_NT-10.0-WOW Galapagos 2.11.1(0.329/5/3) 2018-09-10 13:25 
>> i686 Msys |
>> I mean to install openmpi. Since I found no pre-built package (I would love 
>> to have it!), I downloaded openmpi-3.1.2. When I
>> |./configure --prefix=$HOME/usr/local |
>> after many config detections, I get
>> |checking for Fortran compiler module include flag... configure: WARNING: 
>> *** Could not determine the fortran compiler flag to indicate where modules 
>> reside configure: error: *** Cannot continue|
> 
> what fortran compiler do you have ?
> 
> ---
> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
> https://www.avast.com/antivirus
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] btl/vader: race condition in finalize on OS X

2018-10-02 Thread Jeff Squyres (jsquyres) via devel
FYI: https://github.com/open-mpi/ompi/issues/5798 brought up what may be the 
same issue.


> On Oct 2, 2018, at 3:16 AM, Gilles Gouaillardet  wrote:
> 
> Folks,
> 
> 
> When running a simple helloworld program on OS X, we can end up with the 
> following error message
> 
> 
> A system call failed during shared memory initialization that should
> not have.  It is likely that your MPI job will now either abort or
> experience performance degradation.
> 
>   Local host:  c7.kmc.kobe.rist.or.jp
>   System call: unlink(2) 
> /tmp/ompi.c7.1000/pid.23376/1/vader_segment.c7.17d80001.54
>   Error:   No such file or directory (errno 2)
> 
> 
> the error does not occur on linux by default since the vader segment is in 
> /dev/shm by default.
> 
> the patch below can be used to evidence the issue on linux
> 
> 
> diff --git a/opal/mca/btl/vader/btl_vader_component.c 
> b/opal/mca/btl/vader/btl_vader_component.c
> index 115bceb..80fec05 100644
> --- a/opal/mca/btl/vader/btl_vader_component.c
> +++ b/opal/mca/btl/vader/btl_vader_component.c
> @@ -204,7 +204,7 @@ static int mca_btl_vader_component_register (void)
> OPAL_INFO_LVL_3, 
> MCA_BASE_VAR_SCOPE_GROUP, _btl_vader_component.single_copy_mechanism);
>  OBJ_RELEASE(new_enum);
> 
> -if (0 == access ("/dev/shm", W_OK)) {
> +if (0 && 0 == access ("/dev/shm", W_OK)) {
>  mca_btl_vader_component.backing_directory = "/dev/shm";
>  } else {
>  mca_btl_vader_component.backing_directory = 
> opal_process_info.job_session_dir;
> 
> 
> From my analysis, here is what happens :
> 
>  - each rank is supposed to have its own vader_segment unlinked by btl/vader 
> in vader_finalize().
> 
> - but this file might have already been destroyed by an other task in 
> orte_ess_base_app_finalize()
> 
>   if (NULL == opal_pmix.register_cleanup) {
> orte_session_dir_finalize(ORTE_PROC_MY_NAME);
> }
> 
>   *all* the tasks end up removing 
> opal_os_dirpath_destroy("/tmp/ompi.c7.1000/pid.23941/1")
> 
> 
> I am not really sure about the best way to fix this.
> 
>  - one option is to perform an intra node barrier in vader_finalize()
> 
>  - an other option would be to implement an opal_pmix.register_cleanup
> 
> 
> Any thoughts ?
> 
> 
> Cheers,
> 
> 
> Gilles
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Error in TCP BTL??

2018-10-01 Thread Jeff Squyres (jsquyres) via devel
I get that 100% time in the runs on MacOS, too (with today's HEAD):

--
$ mpirun -np 4 --mca btl tcp,self ring_c
Process 0 sending 10 to 1, tag 201 (4 processes in ring)
[JSQUYRES-M-26UT][[5535,1],0][btl_tcp_endpoint.c:742:mca_btl_tcp_endpoint_start_connect]
 bind() failed: Invalid argument (22)
[JSQUYRES-M-26UT:85104] *** An error occurred in MPI_Send
[JSQUYRES-M-26UT:85104] *** reported by process [362741761,0]
[JSQUYRES-M-26UT:85104] *** on communicator MPI_COMM_WORLD
[JSQUYRES-M-26UT:85104] *** MPI_ERR_OTHER: known error not in list
[JSQUYRES-M-26UT:85104] *** MPI_ERRORS_ARE_FATAL (processes in this 
communicator will now abort,
[JSQUYRES-M-26UT:85104] ***and potentially your MPI job)
--


> On Oct 1, 2018, at 2:12 PM, Ralph H Castain  wrote:
> 
> I’m getting this error when trying to run a simple ring program on my Mac:
> 
> [Ralphs-iMac-2.local][[21423,14],0][btl_tcp_endpoint.c:742:mca_btl_tcp_endpoint_start_connect]
>  bind() failed: Invalid argument (22)
> 
> Anyone recognize the problem? It causes the job to immediately abort. This is 
> with current head of master this morning - it was working when I last used 
> it, but it has been an unknown period of time.
> Ralph
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Patcher on MacOS

2018-09-28 Thread Jeff Squyres (jsquyres) via devel
I didn't start working on a patch -- all I did was open #5671.


> On Sep 28, 2018, at 5:51 PM, Nathan Hjelm via devel 
>  wrote:
> 
> Nope.  We just never bothered to disable it on osx. I think Jeff was working 
> on a patch.
> 
> -Nathan
> 
>> On Sep 28, 2018, at 3:21 PM, Barrett, Brian via devel 
>>  wrote:
>> 
>> Is there any practical reason to have the memory patcher component enabled 
>> for MacOS?  As far as I know, we don’t have any transports which require 
>> memory hooks on MacOS, and with the recent deprecation of the syscall 
>> interface, it emits a couple of warnings.  It would be nice to crush said 
>> warnings and the easiest way would be to not build the component.
>> 
>> Thoughts?
>> 
>> Brian
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/devel
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Patcher on MacOS

2018-09-28 Thread Jeff Squyres (jsquyres) via devel
I asked a similar question recently:

https://github.com/open-mpi/ompi/issues/5671


> On Sep 28, 2018, at 5:21 PM, Barrett, Brian via devel 
>  wrote:
> 
> Is there any practical reason to have the memory patcher component enabled 
> for MacOS?  As far as I know, we don’t have any transports which require 
> memory hooks on MacOS, and with the recent deprecation of the syscall 
> interface, it emits a couple of warnings.  It would be nice to crush said 
> warnings and the easiest way would be to not build the component.
> 
> Thoughts?
> 
> Brian
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Mac OS X 10.4.x users?

2018-09-28 Thread Jeff Squyres (jsquyres) via devel
Fun fact: we cause configure to fail for OS X <= 10.4 anyway:

https://github.com/open-mpi/ompi/blob/master/configure.ac#L328-L347

According to the comment, we do this because of a known-bad implementation of 
pty in the OS X kernel that causes kernel panics.

So I think we're definitely safe removing a OS X 10.4.x workaround.


> On Sep 28, 2018, at 2:18 PM, Ralph H Castain  wrote:
> 
> Good lord - break away!!
> 
>> On Sep 28, 2018, at 11:11 AM, Barrett, Brian via devel 
>>  wrote:
>> 
>> All -
>> 
>> In trying to clean up some warnings, I noticed one (around pack/unpack in 
>> net/if.h) that is due to a workaround of a bug in MacOS X 10.4.x and 
>> earlier.  The simple way to remove the warning would be to remove the 
>> workaround, which would break the next major version of Open MPI on 10.4.x 
>> and earlier on 64 bit systems.  10.5.x was released 11 years ago and didn’t 
>> drop support for any 64 bit systems.  I posted a PR which removes support 
>> for 10.4.x and earlier (through the README) and removes the warning 
>> generated workaround (https://github.com/open-mpi/ompi/pull/5803).
>> 
>> Does anyone object to breaking 10.4.x and earlier?
>> 
>> Brian
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/devel
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] How to know when CI on a PR is done

2018-09-19 Thread Jeff Squyres (jsquyres) via devel
FYI for developers: I just merged a little python script in the ompi-scripts 
repo that watches for when CI on a PR is done.  I find this very handy, 
especially because Open MPI's CI can take anywhere from 15 minutes to multiple 
hours.  This script lets me file a PR and then move on to something else -- 
I'll get a notification when the CI on the PR completes (via a separate script 
-- see the PR description and the commit message for an example / more info).

Comments / improvements welcome.



Begin forwarded message:

From: Jeff Squyres mailto:notificati...@github.com>>
Subject: [open-mpi/ompi-scripts] wait-for-pr-ci-completion.py: wait for Github 
PR CI (#18)
Date: September 18, 2018 at 6:50:48 PM EDT
To: open-mpi/ompi-scripts 
mailto:ompi-scri...@noreply.github.com>>
Cc: Jeff Squyres mailto:jsquy...@cisco.com>>, Your activity 
mailto:your_activ...@noreply.github.com>>
Reply-To: open-mpi/ompi-scripts 
mailto:reply+00096b6409dc3872a7fd141b4b0aafed3b008c317d39e99a92cf000117b9434892a169ce158c4...@reply.github.com>>


As the script name implies, wait for a Github PR's CI to complete.
This script does no notification; it just waits -- when the CI
completes, this script completes. You'll typically want to execute
another command after this script completes, for example:

$ ./wait-for-pr-ci-completion.py \
--pr https://github.com/open-mpi/ompi/pull/5731; \
pushover CI for PR5731 is done


where pushover is a script I use to send push notifications to my
phone.

See the comments at the beginning of this script to see its
requirements and how to use it.

This script may get copied out of the Open MPI script repo, so I took
the liberty of including the license in the file.

Signed-off-by: Jeff Squyres j...@squyres.com



Here's a sample output from running this script:

$ ./wait-for-pr-ci-completion.py --pr 
https://github.com/open-mpi/ompi/pull/5731 ; pushover PR 5731 done
2018-09-18 17:22:33,106 INFO: PR 5731: opal_config_asm.m4: fix typo in new C11 
atomic code
2018-09-18 17:22:33,106 INFO: PR 5731 is open
2018-09-18 17:22:33,638 INFO: Found new pending CI: IBM CI (PGI Compiler) 
(Build started)
2018-09-18 17:22:33,638 INFO: Found new pending CI: IBM CI (GNU Compiler) 
(Build started)
2018-09-18 17:22:33,638 INFO: Found new pending CI: IBM CI (XL Compiler) (Build 
started)
2018-09-18 17:22:33,639 INFO: Found new pending CI: Pull Request Build Checker 
(Build started for merge commit.)
2018-09-18 17:22:33,639 INFO: Found new success CI: Signed-off-by checker 
(Commit is signed off.  Yay!)
2018-09-18 17:22:33,639 INFO: Found new success CI: Commit email checker (Good 
email address.  Yay!)
2018-09-18 17:23:34,417 INFO: Found new pending CI: Mellanox (Build started 
sha1 is merged.)
2018-09-18 17:24:35,156 INFO: Found update pending CI: IBM CI (PGI Compiler) 
(Build started...)
2018-09-18 17:29:38,226 ERROR: Got Connection error.  Sleeping and trying 
again...
2018-09-18 17:29:43,927 INFO: Found update pending CI: IBM CI (XL Compiler) 
(Build started...)
2018-09-18 17:31:46,507 INFO: Found update pending CI: IBM CI (GNU Compiler) 
(Build started...)
2018-09-18 17:42:56,214 ERROR: Got Connection error.  Sleeping and trying 
again...
2018-09-18 17:46:04,160 INFO: Found update pending CI: IBM CI (XL Compiler) 
([6/7] Running Run Examples...)
2018-09-18 17:47:04,962 INFO: Found update pending CI: IBM CI (GNU Compiler) 
([6/7] Running Run Examples...)
2018-09-18 17:47:04,962 INFO: Found update success CI: IBM CI (XL Compiler) 
(All Tests Passed!)
2018-09-18 17:48:05,860 INFO: Found update success CI: IBM CI (GNU Compiler) 
(All Tests Passed!)
2018-09-18 17:50:07,405 INFO: Found update pending CI: IBM CI (PGI Compiler) 
(Building Tests...)
2018-09-18 17:51:08,152 INFO: Found update success CI: IBM CI (PGI Compiler) 
(All Tests Passed!)
2018-09-18 18:01:16,773 INFO: Found update success CI: Mellanox (Build 
finished. )
2018-09-18 18:07:21,745 INFO: Found update success CI: Pull Request Build 
Checker (All Tests Passed!
 )
2018-09-18 18:07:21,745 INFO: All CI statuses are complete:
2018-09-18 18:07:21,746 INFO: PASSED IBM CI (PGI Compiler): All Tests Passed!
2018-09-18 18:07:21,746 INFO: PASSED IBM CI (GNU Compiler): All Tests Passed!
2018-09-18 18:07:21,746 INFO: PASSED IBM CI (XL Compiler): All Tests Passed!
2018-09-18 18:07:21,746 INFO: PASSED Pull Request Build Checker: All Tests 
Passed!
2018-09-18 18:07:21,746 INFO: PASSED Signed-off-by checker: Commit is signed 
off.  Yay!
2018-09-18 18:07:21,746 INFO: PASSED Commit email checker: Good email address.  
Yay!
2018-09-18 18:07:21,746 INFO: PASSED Mellanox: Build finished.
*** Message sent to pushover successfully:
PR 5731 done



You can view, comment on, or merge this pull request online at:

  https://github.com/open-mpi/ompi-scripts/pull/18

Commit Summary

  *   wait-for-pr-ci-completion.py: wait for Github PR CI

File Changes

  *   A 

[OMPI devel] New "State" labels in github

2018-09-18 Thread Jeff Squyres (jsquyres) via devel
Brian and I just added some new "State" labels on GitHub to help with managing 
all the open issues.  Please add and keep up to date the "State" labels on your 
open issues.

See this wiki page for more information (might wanna bookmark it):


https://github.com/open-mpi/ompi/wiki/SubmittingBugs#assign-appropriate-labels

Thank you!

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Announcing Open MPI v4.0.0rc1

2018-09-18 Thread Jeff Squyres (jsquyres) via devel
On Sep 18, 2018, at 3:46 PM, Thananon Patinyasakdikul  
wrote:
> 
> I tested on our cluster (UTK). I will give a thumb up but I have some 
> comments.
> 
> What I understand with 4.0.
> - openib btl is disabled by default (can be turned on by mca)

It is disabled by default *for InfiniBand*.  It is still enabled by default for 
RoCE and iWARP.

> - pml ucx will be the default for infiniband hardware.
> - btl uct is for one-sided but can also be used for two sided as well (needs 
> explicit mca).
> 
> My question is, what if the user does not have UCX installed (but they have 
> infiniband hardware). The user will not have fast transport for their 
> hardware. As of my testing, this release will fall back to btl/tcp if I dont 
> specify the mca to use uct or force openib. Will this be a problem? 

This is a question for Mellanox.

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] MTT Perl client

2018-09-14 Thread Jeff Squyres (jsquyres) via devel
On Sep 14, 2018, at 12:37 PM, Gilles Gouaillardet 
 wrote:
> 
> IIRC mtt-relay is not only a proxy (squid can do that too).

Probably true.  IIRC, I think mtt-relay was meant to be a 
dirt-stupid-but-focused-to-just-one-destination relay.

> mtt results can be manually copied from a cluster behind a firewall, and then 
> mtt-relay can “upload” these results to mtt.open-MPI.org

Yes, but then a human has to be involved, which kinda defeats at least one of 
the goals of MTT.  Using mtt-relay allowed MTT to still function in an 
automated fashion.

FWIW, it may not be necessary to convert mtt-relay to python (IIRC that it's 
protocol agnostic, but like I said: it's been quite a while since I've looked 
at that code).  It was pretty small and straightforward.  It could also just 
stay in mtt-legacy.

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] MTT Perl client

2018-09-14 Thread Jeff Squyres (jsquyres) via devel
It's for environments where MTT is run where it can't reach the greater 
internet (or, at least, it can't POST to the greater internet).  You run the 
mtt-relay on a machine that is reachable by your machines running MTT, and it 
works as a relay to mtt.open-mpi.org so that you can submit your MTT results.

It might actually be fairly protocol agnostic, IIRC (been a while since I've 
looked at that code).



> On Sep 14, 2018, at 11:23 AM, Ralph H Castain  wrote:
> 
> Afraid I’m not familiar with that script - what does it do?
> 
> 
>> On Sep 14, 2018, at 7:46 AM, Christoph Niethammer  wrote:
>> 
>> Works for the installation at HLRS.
>> 
>> Short note/question: I am using the mtt-relay script. This is written in 
>> perl. Is there a python based replacement?
>> 
>> Best
>> Christoph Niethammer
>> 
>> - Mensaje original -
>> De: "Open MPI Developers" 
>> Para: "Open MPI Developers" 
>> CC: "Jeff Squyres" 
>> Enviados: Martes, 11 de Septiembre 2018 20:37:40
>> Asunto: Re: [OMPI devel] MTT Perl client
>> 
>> Works for me.
>> 
>>> On Sep 11, 2018, at 12:35 PM, Ralph H Castain  wrote:
>>> 
>>> Hi folks
>>> 
>>> Per today’s telecon, I have moved the Perl MTT client into its own 
>>> repository: https://github.com/open-mpi/mtt-legacy. All the Python client 
>>> code has been removed from that repo.
>>> 
>>> The original MTT repo remains at https://github.com/open-mpi/mtt. I have a 
>>> PR to remove all the Perl client code and associated libs/modules from that 
>>> repo. We won’t commit it until people have had a chance to switch to the 
>>> mtt-legacy repo and verify that things still work for them.
>>> 
>>> Please let us know if mtt-legacy is okay or has a problem.
>>> 
>>> Thanks
>>> Ralph
>>> 
>>> ___
>>> devel mailing list
>>> devel@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/devel
>> 
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> 
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/devel
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/devel
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Will info keys ever be fixed?

2018-09-11 Thread Jeff Squyres (jsquyres) via devel
Ralph --

What OS / compiler are you using?

I just compiled on MacOS (first time in a while) and filed a PR and a few 
issues about the warnings I found, but I cannot replicate these warnings.  I 
also built with gcc 7.3.0 on RHEL; couldn't replicate the warnings.

On MacOS, I'm using the default Xcode compilers:

$ gcc --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr 
--with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 9.1.0 (clang-902.0.39.2)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin





> On Sep 10, 2018, at 6:57 PM, Ralph H Castain  wrote:
> 
> Still seeing this in today’s head of master:
> 
> info_subscriber.c: In function 'opal_infosubscribe_change_info':
> ../../opal/util/info.h:112:31: warning: '%s' directive output may be 
> truncated writing up to 36 bytes into a region of size 27 
> [-Wformat-truncation=]
>  #define OPAL_INFO_SAVE_PREFIX "_OMPI_IN_"
>^
> info_subscriber.c:268:13: note: in expansion of macro 'OPAL_INFO_SAVE_PREFIX'
>  OPAL_INFO_SAVE_PREFIX "%s", key);
>  ^
> info_subscriber.c:268:36: note: format string is defined here
>  OPAL_INFO_SAVE_PREFIX "%s", key);
> ^~
> In file included from 
> /opt/local/lib/gcc7/gcc/x86_64-apple-darwin17/7.3.0/include-fixed/stdio.h:425:0,
>  from ../../opal/class/opal_list.h:71,
>  from ../../opal/util/info_subscriber.h:30,
>  from info_subscriber.c:45:
> info_subscriber.c:267:9: note: '__builtin_snprintf' output between 10 and 46 
> bytes into a destination of size 36
>  snprintf(modkey, OPAL_MAX_INFO_KEY,
>  ^
> In file included from 
> /opt/local/lib/gcc7/gcc/x86_64-apple-darwin17/7.3.0/include-fixed/stdio.h:425:0,
>  from ../../opal/class/opal_list.h:71,
>  from ../../opal/util/info.h:30,
>  from info.c:46:
> info.c: In function 'opal_info_dup_mode.constprop':
> ../../opal/util/info.h:112:31: warning: '%s' directive output may be 
> truncated writing up to 36 bytes into a region of size 28 
> [-Wformat-truncation=]
>  #define OPAL_INFO_SAVE_PREFIX "_OMPI_IN_"
>^
> info.c:212:22: note: in expansion of macro 'OPAL_INFO_SAVE_PREFIX'
>   OPAL_INFO_SAVE_PREFIX "%s", pkey);
>   ^
> info.c:212:45: note: format string is defined here
>   OPAL_INFO_SAVE_PREFIX "%s", pkey);
>  ^~
> In file included from 
> /opt/local/lib/gcc7/gcc/x86_64-apple-darwin17/7.3.0/include-fixed/stdio.h:425:0,
>  from ../../opal/class/opal_list.h:71,
>  from ../../opal/util/info.h:30,
>  from info.c:46:
> info.c:211:18: note: '__builtin_snprintf' output between 10 and 46 bytes into 
> a destination of size 37
>   snprintf(savedkey, OPAL_MAX_INFO_KEY+1,
>   ^
> 
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] MTT Perl client

2018-09-11 Thread Jeff Squyres (jsquyres) via devel
Works for me.

> On Sep 11, 2018, at 12:35 PM, Ralph H Castain  wrote:
> 
> Hi folks
> 
> Per today’s telecon, I have moved the Perl MTT client into its own 
> repository: https://github.com/open-mpi/mtt-legacy. All the Python client 
> code has been removed from that repo.
> 
> The original MTT repo remains at https://github.com/open-mpi/mtt. I have a PR 
> to remove all the Perl client code and associated libs/modules from that 
> repo. We won’t commit it until people have had a chance to switch to the 
> mtt-legacy repo and verify that things still work for them.
> 
> Please let us know if mtt-legacy is okay or has a problem.
> 
> Thanks
> Ralph
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch v4.0.x updated. v1.10.7-1907-g71d3afd

2018-09-11 Thread Jeff Squyres (jsquyres) via devel
On Sep 11, 2018, at 2:17 PM, Jeff Squyres (jsquyres) via devel 
 wrote:
> 
>> diff --git a/VERSION b/VERSION
>> index 6fadf03..a9706a3 100644
>> --- a/VERSION
>> +++ b/VERSION
> 
>> +libmpi_mpifh_so_version=61:0:21
> 
> Just curious: any reason this one is 60 and all the others are 61?

Er -- I said that backwards: any reason this one is 61 and all the rest are 60?

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch v4.0.x updated. v1.10.7-1907-g71d3afd

2018-09-11 Thread Jeff Squyres (jsquyres) via devel
On Sep 9, 2018, at 4:29 PM, Gitdub  wrote:
> 
> diff --git a/VERSION b/VERSION
> index 6fadf03..a9706a3 100644
> --- a/VERSION
> +++ b/VERSION


> +libmpi_mpifh_so_version=61:0:21

Geoff --

Just curious: any reason this one is 60 and all the others are 61?

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Cannot find libverbs when without-verbs is used

2018-09-11 Thread Jeff Squyres (jsquyres) via devel
I notice from your configure log that you're building Mellanox MXM support.

Does that pull in libibverbs as a dependent library?


> On Sep 11, 2018, at 7:23 AM, Mijakovic, Robert  
> wrote:
> 
> Hi guys,
> 
> I have configured OpenMPI to build without-verbs but the build fails with an 
> error saying that ld cannot find libverbs.
> 
> Configure:
> ==> 
> '/home/hpc/pr28fa/di52sut/spack_lrz/spack/var/spack/stage/openmpi-3.1.2-jg4gwt4cjfgu66vyq5pox7yavfwzri3m/openmpi-3.1.2/configure'
>  
> '--prefix=/home/hpc/pr28fa/di52sut/spack_lrz/spack/opt/x86_avx2/linux-sles12-x86_64/gcc-7.3.0/openmpi-3.1.2-jg4gwt4cjfgu66vyq5pox7yavfwzri3m'
>  '--enable-shared' 
> '--with-wrapper-ldflags=-Wl,-rpath,/lrz/sys/compilers/gcc/7.3.0/lib64' 
> '--enable-static' '--without-pmi' '--enable-mpi-cxx' 
> '--with-zlib=/home/hpc/pr28fa/di52sut/spack_lrz/spack/opt/x86_avx2/linux-sles12-x86_64/gcc-7.3.0/zlib-1.2.11-ajxhsmrlv2kvicpk3gdckgrroxr45mdl'
>  '--without-psm' '--without-psm2' '--without-verbs' 
> '--with-mxm=/opt/mellanox/mxm' '--without-ucx' '--without-libfabric' 
> '--without-alps' '--without-lsf' '--without-tm' '--without-slurm' 
> '--without-sge' '--without-loadleveler' '--disable-memchecker' 
> '--with-hwloc=/home/hpc/pr28fa/di52sut/spack_lrz/spack/opt/x86_avx2/linux-sles12-x86_64/gcc-7.3.0/hwloc-1.11.9-c4ktzih4jwg673rwwzgy4zvofd75tgvo'
>  '--disable-java' '--disable-mpi-java' '--without-cuda' 
> '--enable-cxx-exceptions’
> 
> Build:
>   CCLD libmpi.la
> /home/hpc/pr28fa/di52sut/spack_lrz/spack/opt/x86_avx2/linux-sles12-x86_64/gcc-7.3.0/binutils-2.31.1-ntosmj7bfrraftmq4jbvwbu6xnt3kbrz/bin/ld:
>  cannot find -libverbs
> 
> Attach please find the complete log.
> 
> 
> Thank you for your time.
> 
> Best regards,
> Robert
> --
> Dr. Robert Mijaković
> 
> Leibniz Supercomputing Centre
> HPC Systems and Services
> Boltzmannstr. 1
> D-85748 Garching
> Room I.2.034
> Phone:  +49 89 35831 8734
> Fax: +49 89 35831 9700
> Mobile:+49 (157) 786 605 00
> mailto:robert.mijako...@lrz.de
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Open MPI website borked up?

2018-09-04 Thread Jeff Squyres (jsquyres) via devel
Yes, there was a problem for a short while last week; it was fixed.


> On Sep 1, 2018, at 4:55 PM, Ralph H Castain  wrote:
> 
> I suspect this is a stale message - I’m not seeing any problem with the 
> website
> 
> 
>> On Aug 29, 2018, at 12:55 PM, Howard Pritchard  wrote:
>> 
>> Hi Folks,
>> 
>> Something seems to be borked up about the OMPI website.  Got to website and 
>> you'll
>> get some odd parsing error appearing.
>> 
>> Howard
>> 
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/devel
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Test mail

2018-09-01 Thread Jeff Squyres (jsquyres) via devel
Heh.  That was a backlog email that was delivered quite a ways after it was 
actually sent.  Safe to ignore this thread.

> On Aug 28, 2018, at 6:34 PM, Nathan Hjelm  wrote:
> 
> no
> 
> Sent from my iPhone
> 
>> On Aug 27, 2018, at 8:51 AM, Jeff Squyres (jsquyres) via devel 
>>  wrote:
>> 
>> Will this get through?
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> 
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] Test mail

2018-08-28 Thread Jeff Squyres (jsquyres) via devel
Will this get through?

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] lists.open-mpi.org appears to be back

2018-08-28 Thread Jeff Squyres (jsquyres) via devel
I originally sent this mail on Saturday, but it looks like lists.open-mpi.org 
was *not* actually back at this time.

I'm finally starting to see all the backlogged messages on Tuesday, around 5pm 
US Eastern time.  So I think lists.open-mpi.org is finally back in service.

Sorry for the interruption, folks.



> On Aug 26, 2018, at 3:22 PM, Jeff Squyres (jsquyres)  
> wrote:
> 
> The lists.open-mpi.org server went offline due to an outage at our hosting 
> provider sometime in the evening on Aug 22 / early morning Aug 23 (US Eastern 
> time).  As of yesterday morning (Saturday, Aug 25), the list server now 
> appears to be back online; I've seen at least a few backlogged emails finally 
> come through.
> 
> If you sent a mail in the last few days and it doesn't show up on 
> https://www.mail-archive.com/devel@lists.open-mpi.org/ or 
> https://www.mail-archive.com/users@lists.open-mpi.org/, you may need to 
> resend it.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] lists.open-mpi.org appears to be back

2018-08-28 Thread Jeff Squyres (jsquyres) via devel
The lists.open-mpi.org server went offline due to an outage at our hosting 
provider sometime in the evening on Aug 22 / early morning Aug 23 (US Eastern 
time).  As of yesterday morning (Saturday, Aug 25), the list server now appears 
to be back online; I've seen at least a few backlogged emails finally come 
through.

If you sent a mail in the last few days and it doesn't show up on 
https://www.mail-archive.com/devel@lists.open-mpi.org/ or 
https://www.mail-archive.com/users@lists.open-mpi.org/, you may need to resend 
it.

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] lists.open-mpi.org appears to be back

2018-08-28 Thread Jeff Squyres (jsquyres) via devel
The lists.open-mpi.org server went offline due to an outage at our hosting 
provider sometime in the evening on Aug 22 / early morning Aug 23 (US Eastern 
time).  The list server now appears to be back online; I've seen at least a few 
backlogged emails finally come through.

If you sent a mail in the last few days and it doesn't show up on 
https://www.mail-archive.com/devel@lists.open-mpi.org/ or 
https://www.mail-archive.com/users@lists.open-mpi.org/, you may need to resend 
it.

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] v2.1.5rc1 is out

2018-08-17 Thread Jeff Squyres (jsquyres) via devel
Thanks for the testing.

I'm assuming the MXM failure has been around for a while, and the correct way 
to fix it is to upgrade to a newer Open MPI and/or use UCX.


> On Aug 17, 2018, at 11:01 AM, Vallee, Geoffroy R.  wrote:
> 
> FYI, that segfault problem did not occur when I tested 3.1.2rc1.
> 
> Thanks,
> 
>> On Aug 17, 2018, at 10:28 AM, Pavel Shamis  wrote:
>> 
>> It looks to me like mxm related failure ? 
>> 
>> On Thu, Aug 16, 2018 at 1:51 PM Vallee, Geoffroy R.  
>> wrote:
>> Hi,
>> 
>> I ran some tests on Summitdev here at ORNL:
>> - the UCX problem is solved and I get the expected results for the tests 
>> that I am running (netpipe and IMB).
>> - without UCX:
>>* the performance numbers are below what would be expected but I 
>> believe at this point that the slight performance deficiency is due to other 
>> users using other parts of the system. 
>>* I also encountered the following problem while running IMB_EXT and 
>> I now realize that I had the same problem with 2.4.1rc1 but did not catch it 
>> at the time:
>> [summitdev-login1:112517:0] Caught signal 11 (Segmentation fault)
>> [summitdev-r0c2n13:91094:0] Caught signal 11 (Segmentation fault)
>>  backtrace 
>> 2 0x00073864 mxm_handle_error()  
>> /var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:641
>> 3 0x00073fa4 mxm_error_signal_handler()  
>> /var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:616
>> 4 0x00017b24 ompi_osc_rdma_component_query()  osc_rdma_component.c:0
>> 5 0x000d4634 ompi_osc_base_select()  ??:0
>> 6 0x00065e84 ompi_win_create()  ??:0
>> 7 0x000a2488 PMPI_Win_create()  ??:0
>> 8 0x1000b28c IMB_window()  ??:0
>> 9 0x10005764 IMB_init_buffers_iter()  ??:0
>> 10 0x10001ef8 main()  ??:0
>> 11 0x00024980 generic_start_main.isra.0()  libc-start.c:0
>> 12 0x00024b74 __libc_start_main()  ??:0
>> ===
>>  backtrace 
>> 2 0x00073864 mxm_handle_error()  
>> /var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:641
>> 3 0x00073fa4 mxm_error_signal_handler()  
>> /var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:616
>> 4 0x00017b24 ompi_osc_rdma_component_query()  osc_rdma_component.c:0
>> 5 0x000d4634 ompi_osc_base_select()  ??:0
>> 6 0x00065e84 ompi_win_create()  ??:0
>> 7 0x000a2488 PMPI_Win_create()  ??:0
>> 8 0x1000b28c IMB_window()  ??:0
>> 9 0x10005764 IMB_init_buffers_iter()  ??:0
>> 10 0x10001ef8 main()  ??:0
>> 11 0x000000024980 generic_start_main.isra.0()  libc-start.c:0
>> 12 0x00024b74 __libc_start_main()  ??:0
>> ===
>> 
>> FYI, the 2.x series is not important to me so it can stay as is. I will move 
>> on testing 3.1.2rc1.
>> 
>> Thanks,
>> 
>> 
>>> On Aug 15, 2018, at 6:07 PM, Jeff Squyres (jsquyres) via devel 
>>>  wrote:
>>> 
>>> Per our discussion over the weekend and on the weekly webex yesterday, 
>>> we're releasing v2.1.5.  There are only two changes:
>>> 
>>> 1. A trivial link issue for UCX.
>>> 2. A fix for the vader BTL issue.  This is how I described it in NEWS:
>>> 
>>> - A subtle race condition bug was discovered in the "vader" BTL
>>> (shared memory communications) that, in rare instances, can cause
>>> MPI processes to crash or incorrectly classify (or effectively drop)
>>> an MPI message sent via shared memory.  If you are using the "ob1"
>>> PML with "vader" for shared memory communication (note that vader is
>>> the default for shared memory communication with ob1), you need to
>>> upgrade to v2.1.5 to fix this issue.  You may also upgrade to the
>>> following versions to fix this issue:
>>> - Open MPI v3.0.1 (released March, 2018) or later in the v3.0.x
>>>   series
>>> - Open MPI v3.1.2 (expected end of August, 2018) or later
>>> 
>>> This vader fix was warranted serious enough to generate a 2.1.5 release.  
>>> This really will be the end of the 2.1.x series.  Trust me; my name is Joe 
>>> Isuzu.
>>> 
>>> 2.1.5rc1 will be available from the usual location in a few minutes (the 
>>> website will update in about 7 minutes):
>>> 
>>>   https://www.open-mpi.org/software/ompi/v2.1/
>>> 
>>> -- 
>>> Jeff Squyres
>>&g

[OMPI devel] v2.1.5rc1 is out

2018-08-15 Thread Jeff Squyres (jsquyres) via devel
Per our discussion over the weekend and on the weekly webex yesterday, we're 
releasing v2.1.5.  There are only two changes:

1. A trivial link issue for UCX.
2. A fix for the vader BTL issue.  This is how I described it in NEWS:

- A subtle race condition bug was discovered in the "vader" BTL
  (shared memory communications) that, in rare instances, can cause
  MPI processes to crash or incorrectly classify (or effectively drop)
  an MPI message sent via shared memory.  If you are using the "ob1"
  PML with "vader" for shared memory communication (note that vader is
  the default for shared memory communication with ob1), you need to
  upgrade to v2.1.5 to fix this issue.  You may also upgrade to the
  following versions to fix this issue:
  - Open MPI v3.0.1 (released March, 2018) or later in the v3.0.x
series
  - Open MPI v3.1.2 (expected end of August, 2018) or later

This vader fix was warranted serious enough to generate a 2.1.5 release.  This 
really will be the end of the 2.1.x series.  Trust me; my name is Joe Isuzu.

2.1.5rc1 will be available from the usual location in a few minutes (the 
website will update in about 7 minutes):

https://www.open-mpi.org/software/ompi/v2.1/

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] Next Open MPI face-to-face meeting

2018-08-14 Thread Jeff Squyres (jsquyres) via devel
It's been settled:

9am Tue, Oct 16 - noonish Thu Oct 18
At Cisco, San Jose, CA, USA

Put your name on the wiki if you're going to attend (so that I can get guest 
badge+wifi for you):

https://github.com/open-mpi/ompi/wiki/Meeting-2018-09

Start adding agenda items.

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Open MPI v2.1.4rc1

2018-08-10 Thread Jeff Squyres (jsquyres) via devel
Thanks Geoffroy.

I don't think I'm worried about this for v2.1.4, and the UCX community hasn't 
responded.  So I'm going to release 2.1.4 as-is.


> On Aug 9, 2018, at 3:33 PM, Vallee, Geoffroy R.  wrote:
> 
> Hi,
> 
> I tested on Summitdev here at ORNL and here are my comments (but I only have 
> a limited set of data for summitdev so my feedback is somewhat limited):
> - netpipe/mpi is showing a slightly lower bandwidth than the 3.x series (I do 
> not believe it is a problem).
> - I am facing a problem with UCX, it is unclear to me that it is relevant 
> since I am using UCX master and I do not know whether it is expected to work 
> with OMPI v2.1.x. Note that I am using the same tool for testing all other 
> releases of Open MPI and I never had that problem before, having in mind that 
> I only tested the 3.x series so far.
> 
> make[2]: Entering directory 
> `/autofs/nccs-svm1_home1/gvh/.ompi-release-tester/scratch/summitdev/2.1.4rc1/scratch/UCX/ompi_build/ompi/mca/pml/ucx'
> /bin/sh ../../../../libtool  --tag=CC   --mode=link gcc -std=gnu99  -O3 
> -DNDEBUG -finline-functions -fno-strict-aliasing -pthread -module 
> -avoid-version  -o mca_pml_ucx.la -rpath 
> /ccs/home/gvh/.ompi-release-tester/scratch/summitdev/2.1.4rc1/scratch/UCX/ompi_install/lib/openmpi
>  pml_ucx.lo pml_ucx_request.lo pml_ucx_datatype.lo pml_ucx_component.lo -lucp 
>  -lrt -lm -lutil  
> libtool: link: gcc -std=gnu99 -shared  -fPIC -DPIC  .libs/pml_ucx.o 
> .libs/pml_ucx_request.o .libs/pml_ucx_datatype.o .libs/pml_ucx_component.o   
> -lucp -lrt -lm -lutil  -O3 -pthread   -pthread -Wl,-soname -Wl,mca_pml_ucx.so 
> -o .libs/mca_pml_ucx.so
> /usr/bin/ld: cannot find -lucp
> collect2: error: ld returned 1 exit status
> make[2]: *** [mca_pml_ucx.la] Error 1
> make[2]: Leaving directory 
> `/autofs/nccs-svm1_home1/gvh/.ompi-release-tester/scratch/summitdev/2.1.4rc1/scratch/UCX/ompi_build/ompi/mca/pml/ucx'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory 
> `/autofs/nccs-svm1_home1/gvh/.ompi-release-tester/scratch/summitdev/2.1.4rc1/scratch/UCX/ompi_build/ompi'
> make: *** [all-recursive] Error 1
> 
> My 2 cents,
> 
>> On Aug 6, 2018, at 5:04 PM, Jeff Squyres (jsquyres) via devel 
>>  wrote:
>> 
>> Open MPI v2.1.4rc1 has been pushed.  It is likely going to be the last in 
>> the v2.1.x series (since v4.0.0 is now visible on the horizon).  It is just 
>> a bunch of bug fixes that have accumulated since v2.1.3; nothing huge.  
>> We'll encourage users who are still using the v2.1.x series to upgrade to 
>> this release; it should be a non-event for anyone who has already upgraded 
>> to the v3.0.x or v3.1.x series.
>> 
>>   https://www.open-mpi.org/software/ompi/v2.1/
>> 
>> If no serious-enough issues are found, we plan to release 2.1.4 this Friday, 
>> August 10, 2018.
>> 
>> Please test!
>> 
>> Bug fixes/minor improvements:
>> - Disable the POWER 7/BE block in configure.  Note that POWER 7/BE is
>> still not a supported platform, but it is no longer automatically
>> disabled.  See
>> https://github.com/open-mpi/ompi/issues/4349#issuecomment-374970982
>> for more information.
>> - Fix bug with request-based one-sided MPI operations when using the
>> "rdma" component.
>> - Fix issue with large data structure in the TCP BTL causing problems
>> in some environments.  Thanks to @lgarithm for reporting the issue.
>> - Minor Cygwin build fixes.
>> - Minor fixes for the openib BTL:
>> - Support for the QLogic RoCE HCA
>> - Support for the Boradcom Cumulus RoCE HCA
>> - Enable support for HDR link speeds
>> - Fix MPI_FINALIZED hang if invoked from an attribute destructor
>> during the MPI_COMM_SELF destruction in MPI_FINALIZE.  Thanks to
>> @AndrewGaspar for reporting the issue.
>> - Java fixes:
>> - Modernize Java framework detection, especially on OS X/MacOS.
>>   Thanks to Bryce Glover for reporting and submitting the fixes.
>> - Prefer "javac -h" to "javah" to support newer Java frameworks.
>> - Fortran fixes:
>> - Use conformant dummy parameter names for Fortran bindings.  Thanks
>>   to Themos Tsikas for reporting and submitting the fixes.
>> - Build the MPI_SIZEOF() interfaces in the "TKR"-style "mpi" module
>>   whenever possible.  Thanks to Themos Tsikas for reporting the
>>   issue.
>> - Fix array of argv handling for the Fortran bindings of
>>   MPI_COMM_SPAWN_MULTIPLE (and its associated man page).
>> - Make NAG Fortran compiler support more robust in configure.
>> - Disable the "pt2pt" one-sided MPI component when MPI_THREAD_MULTIPLE
>> is u

[OMPI devel] Open MPI v2.1.4rc1

2018-08-06 Thread Jeff Squyres (jsquyres) via devel
Open MPI v2.1.4rc1 has been pushed.  It is likely going to be the last in the 
v2.1.x series (since v4.0.0 is now visible on the horizon).  It is just a bunch 
of bug fixes that have accumulated since v2.1.3; nothing huge.  We'll encourage 
users who are still using the v2.1.x series to upgrade to this release; it 
should be a non-event for anyone who has already upgraded to the v3.0.x or 
v3.1.x series.

https://www.open-mpi.org/software/ompi/v2.1/

If no serious-enough issues are found, we plan to release 2.1.4 this Friday, 
August 10, 2018.

Please test!

Bug fixes/minor improvements:
- Disable the POWER 7/BE block in configure.  Note that POWER 7/BE is
  still not a supported platform, but it is no longer automatically
  disabled.  See
  https://github.com/open-mpi/ompi/issues/4349#issuecomment-374970982
  for more information.
- Fix bug with request-based one-sided MPI operations when using the
  "rdma" component.
- Fix issue with large data structure in the TCP BTL causing problems
  in some environments.  Thanks to @lgarithm for reporting the issue.
- Minor Cygwin build fixes.
- Minor fixes for the openib BTL:
  - Support for the QLogic RoCE HCA
  - Support for the Boradcom Cumulus RoCE HCA
  - Enable support for HDR link speeds
- Fix MPI_FINALIZED hang if invoked from an attribute destructor
  during the MPI_COMM_SELF destruction in MPI_FINALIZE.  Thanks to
  @AndrewGaspar for reporting the issue.
- Java fixes:
  - Modernize Java framework detection, especially on OS X/MacOS.
Thanks to Bryce Glover for reporting and submitting the fixes.
  - Prefer "javac -h" to "javah" to support newer Java frameworks.
- Fortran fixes:
  - Use conformant dummy parameter names for Fortran bindings.  Thanks
to Themos Tsikas for reporting and submitting the fixes.
  - Build the MPI_SIZEOF() interfaces in the "TKR"-style "mpi" module
whenever possible.  Thanks to Themos Tsikas for reporting the
issue.
  - Fix array of argv handling for the Fortran bindings of
MPI_COMM_SPAWN_MULTIPLE (and its associated man page).
  - Make NAG Fortran compiler support more robust in configure.
- Disable the "pt2pt" one-sided MPI component when MPI_THREAD_MULTIPLE
  is used.  This component is simply not safe in MPI_THREAD_MULTIPLE
  scenarios, and will not be fixed in the v2.1.x series.
- Make the "external" hwloc component fail gracefully if it is tries
  to use an hwloc v2.x.y installation.  hwloc v2.x.y will not be
  supported in the Open MPI v2.1.x series.
- Fix "vader" shared memory support for messages larger than 2GB.
  Thanks to Heiko Bauke for the bug report.
- Configure fixes for external PMI directory detection.  Thanks to
  Davide Vanzo for the report.


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Continued warnings?

2018-07-31 Thread Jeff Squyres (jsquyres) via devel
Confirmed and filed https://github.com/open-mpi/ompi/issues/5502.


> On Jul 31, 2018, at 9:31 AM, Ralph H Castain  wrote:
> 
> Just curious - will this ever be fixed? From today’s head of master:
> 
> In file included from info.c:46:0:
> info.c: In function 'opal_info_dup_mode':
> ../../opal/util/info.h:112:31: warning: '%s' directive output may be 
> truncated writing up to 36 bytes into a region of size 27 
> [-Wformat-truncation=]
>  #define OPAL_INFO_SAVE_PREFIX "_OMPI_IN_"
>^
> info.c:212:22: note: in expansion of macro 'OPAL_INFO_SAVE_PREFIX'
>   OPAL_INFO_SAVE_PREFIX "%s", iterator->ie_key);
>   ^
> info.c:212:45: note: format string is defined here
>   OPAL_INFO_SAVE_PREFIX "%s", iterator->ie_key);
>  ^~
> info.c:211:18: note: 'snprintf' output between 10 and 46 bytes into a 
> destination of size 36
>   snprintf(savedkey, OPAL_MAX_INFO_KEY,
>   ^
>   OPAL_INFO_SAVE_PREFIX "%s", iterator->ie_key);
>   ~
> 
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] v4.0.x -> v5.0.x: .so versioning, etc.

2018-07-18 Thread Jeff Squyres (jsquyres) via devel
Devel community:

We had a lengthy discussion on the weekly webex yesterday about a request from 
our downstream packagers: if possible, they would strongly prefer if we did not 
change the major .so version in the upcoming v4.0.x.

The exact rationale for this gets quite complex, but the short version is: 
there's a bazillion packages dependent upon libmpi.so, and it would be really 
nice if the distros didn't have to recompile all of them because we arbitrarily 
changed the libmpi.so major version in v4.0.x.

Note, however, that v4.0.x will be doing the following:

1. Continuing to not build the MPI C++ bindings by default, although you can 
enable them via --enable-mpi-cxx
2. Not including declarations for functions and globals from MPI-1 that were 
deleted in the MPI-3.0 spec in mpi.h/mpif.h/mpi+mpi_f08 modules, although you 
can enable them via --enable-mpi1-compat.

  --> Note that the symbols for these functions/globals are still in libmpi.so 
for ABI reasons -- you'll just run into *compile* errors if you try to use 
functions like MPI_Attr_get(), because it won't be declared in mpi.h.

This is part of a long-term plan to *actually* delete both the C++ bindings and 
the deleted MPI-1 functions and globals in Open MPI v5.0.0 (sometime in 2019).  
That is: --enable-mpi-cxx and --enable-mpi1-compat will go away, and all those 
symbols will no longer be available.

For more background / detail, see:

- https://github.com/open-mpi/ompi/issues/5447
- https://www.mail-archive.com/ompi-packagers@lists.open-mpi.org/msg00015.html

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] [p]ompi_foo_f symbols in mpi_f08.mod

2018-07-17 Thread Jeff Squyres (jsquyres) via devel
On Jul 17, 2018, at 8:49 PM, Gilles Gouaillardet  wrote:
> 
> I noted the internal Fortran bindings (e.g. [p]ompi_barrier_f and friends) 
> are defined in the user facing mpi_f08.mod.
> 
> My impressions are :
> 
>  1. pompi_barrier_f and friends are never used (e.g. pbarrier_f08.F90 calls 
> ompi_barrier_f and *not* pompi_barrier_f)
> 
> 2. these symbols could be part of an internal module that is only used at 
> build time, and hence do not have to end up in mpi_f08.mod
> 
> 1) should the pompi_barrier_f and friends be called/removed/left untouched ?

I presume you are referring to the pompi_FOO_f functions (and not the 
ompi_FOO_f functions), right?  (I ask because your opening sentence refers to 
"[p]ompi_barrier_f")

I think you noted that [p]barrier_f08.F90 both invoke ompi_barrier_f().  So we 
definitely need the ompi_FOO_f() functions.

But you're right -- I don't see a use of the pompi_FOO_f() functions.  I can't 
think of why they would be invoked at all.  I think they're safe to remove.

> 2) is there any rationale (and which one) for having [p]ompi_foo_f symbols in 
> mpi_f08.mod ?

Per above, I don't think there's any use for the pompi_FOO_f symbols, though.

The ompi_FOO_f symbols are *prototyped* in the mpi_f08 module -- they are not 
defined there.

We need the ompi_FOO_f symbols (which are the actual OMPI Fortran 
implementations in ompi/mpi/fortran/mpif-h/*_f.c) so that we can call them from 
F08 code.

That being said, they *are* internal symbols, and we could *probably* put them 
in a standalone, internal-only module (which I think is what you are saying in 
point "2.", above).  I don't think that that would created a dependency from 
mpi_f08.mod to our internal module...?

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] openmpi 3.1.x examples

2018-07-17 Thread Jeff Squyres (jsquyres) via devel
On Jul 17, 2018, at 1:18 AM, Marco Atzeri  wrote:
> 
> I was aware, as I am not building it anymore, however
> probably we should exclude the C++ from default examples.

examples/Makefile won't build the C++ (or Fortran or OSHMEM) examples if they 
aren't built.

My $0.02: As long as we're shipping the C++ bindings -- even if you have to 
turn them on manually -- we should still ship the C++ examples.

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] Cygwin Fortran compiler options (was: openmpi 3.1.x examples)

2018-07-16 Thread Jeff Squyres (jsquyres) via devel
Split the Fortran issue off into its own thread (and kept it on devel; no need 
for ompi-packagers)


On Jul 13, 2018, at 4:35 PM, Marco Atzeri  wrote:
> 
> the fortran problem is due to a compiler settings
> It works with
> 
> $ mpifort -g  hello_usempi.f90  -o hello_usempi -fintrinsic-modules-path 
> /usr/lib

Should we add something to configure to detect/use this flag automatically?

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] openmpi 3.1.x examples

2018-07-16 Thread Jeff Squyres (jsquyres) via devel
On Jul 13, 2018, at 4:35 PM, Marco Atzeri  wrote:
> 
>> For one. The C++ bindings are no longer part of the standard and they are 
>> not built by default in v3.1x. They will be removed entirely in Open MPI 
>> v5.0.0.

Hey Marco -- you should probably join our packagers mailing list:

https://lists.open-mpi.org/mailman/listinfo/ompi-packagers

Low volume, but intended exactly for packagers like you.  It's fairly recent; 
we realized we needed to keep in better communication with our downstream 
packagers.

(+ompi-packagers to the CC)

As Nathan mentioned, we stopped building the MPI C++ bindings by default in 
Open MPI 3.0.  You can choose to build them with the configure --enable-mpi-cxx.

This is the current plan:

- In v4.0, we're no longer building a bunch of other deleted MPI-1 functions by 
default (which can be restored via --enable-mpi1-compat, and --enable-mpi-cxx 
will still work).

- In v5.0, delete all the C++ bindings and the deleted MPI-1 functions.

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Open MPI: Undefined reference to pthread_atfork

2018-06-22 Thread Jeff Squyres (jsquyres) via devel
Can you send all the information listed here:

https://www.open-mpi.org/community/help/


> On Jun 22, 2018, at 4:44 PM, lille stor  wrote:
> 
> Thanks again Jeff.
>  
> Here is the output of running "mpic++" with parameter "--showme" (in bold 
> what "mpic++" added):
>  
> g++ test.cpp -lUtils -pthread -Wl,-rpath -Wl,/home/dummy/openmpi/build/lib 
> -Wl,--enable-new-dtags -L/home/dummy/openmpi/build/lib -lmpi
>  
> This fails to compile with the error "Undefined reference to pthread_atfork".
>  
> Nonetheless, if a program uses directly "libmpi.so" it can be compiled 
> successfully with the parameters "-L/home/dummy/openmpi/build/lib -lmpi" like 
> you suggested in a previous email).
>  
> The problem is that the program I am trying to compile does not use directly 
> "libmpi.so" but through the library "libMyUtils.so". This makes the usage of 
> parameters "-L/home/dummy/openmpi/build/lib -lmpi" not possible.
>  
> Any more suggestions?
>  
> Thanks,
>  
> L.
>  
>  
>  
>  
> Sent: Friday, June 22, 2018 at 10:17 PM
> From: "Jeff Squyres (jsquyres)" 
> To: "lille stor" 
> Cc: "Open MPI Developers List" 
> Subject: Re: [OMPI devel] Open MPI: Undefined reference to pthread_atfork
> On Jun 22, 2018, at 4:09 PM, lille stor  wrote:
> >
> > I tried compile the C++ program using "mpic++" like you suggested but 
> > unfortunately g++ still throws the same errror 
> > ("/home/dummy/openmpi/build/lib/libopen-pal.so.20: undefined reference to 
> > pthread_atfork").
> >
> > I suspect that the problem maybe to the fact that the C++ program does not 
> > use directly Open MPI library but through another library (hence the 
> > parameter "-Wl,-rpath-link,/home/dummy/openmpi/build/lib" when compiling 
> > it), therefore one cannot pass the usual parameters 
> > "-L/home/dummy/openmpi/build/lib -lmpi" to g++. To summarize the dependency 
> > flow: program -> library -> Open MPI library.
> 
> If your program uses mpic++ instead of g++, then mpic++ should add all the 
> relevant parameters to compile an Open MPI program (including, if necessary, 
> -pthread or -lpthread).
> 
> What does "mpic++ --showme" show?
> 
> Can you compile a simple C or C++ MPI program manually? (using mpicc / 
> mpic++, without going through the secondary program)
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
>  


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Open MPI: Undefined reference to pthread_atfork

2018-06-22 Thread Jeff Squyres (jsquyres) via devel
On Jun 22, 2018, at 4:09 PM, lille stor  wrote:
> 
> I tried compile the C++ program using "mpic++" like you suggested but 
> unfortunately g++ still throws the same errror 
> ("/home/dummy/openmpi/build/lib/libopen-pal.so.20: undefined reference to 
> pthread_atfork").
>  
> I suspect that the problem maybe to the fact that the C++ program does not 
> use directly Open MPI library but through another library (hence the 
> parameter "-Wl,-rpath-link,/home/dummy/openmpi/build/lib" when compiling it), 
> therefore one cannot pass the usual parameters 
> "-L/home/dummy/openmpi/build/lib -lmpi" to g++.  To summarize the dependency 
> flow: program -> library -> Open MPI library.

If your program uses mpic++ instead of g++, then mpic++ should add all the 
relevant parameters to compile an Open MPI program (including, if necessary, 
-pthread or -lpthread).

What does "mpic++ --showme" show?

Can you compile a simple C or C++ MPI program manually? (using mpicc / mpic++, 
without going through the secondary program)

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Open MPI: Undefined reference to pthread_atfork

2018-06-22 Thread Jeff Squyres (jsquyres) via devel
I think Ralph is a little confused -- 2.1.3 is recent enough.  :-)

Are you using "mpic++" to compile your application?  That should add in all the 
relevant flags that are needed to compile an Open MPI C++ application.


> On Jun 22, 2018, at 3:29 PM, r...@open-mpi.org wrote:
> 
> OMPI 2.1.3??? Is there any way you could update to something more recent?
> 
>> On Jun 22, 2018, at 12:28 PM, lille stor  wrote:
>> 
>> Hi,
>> 
>>  
>> When compiling a C++ source file named test.cpp that needs a shared library 
>> named libUtils.so (which in its turn needs Open MPI shared library, hence 
>> the parameter -Wl,-rpath-link,/home/dummy/openmpi/build/lib ) as follows:
>> 
>> g++ test.cpp -lUtils -Wl,-rpath-link,/home/dummy/openmpi/build/lib
>> An error is thrown /home/dummy/openmpi/build/lib/libopen-pal.so.20: 
>> undefined reference to pthread_atfork.
>> 
>> I passed -pthread and -lpthread (before and after -lUtils) to g++ but none 
>> of these solved the error.
>> 
>>  
>> Environment where this error is thrown:
>> 
>> OS: Ubuntu 14.04
>> Compiler: g++ 4.9
>> MPI: Open MPI 2.1.3
>>  
>> Thank you for your help,
>> 
>> L.
>> 
>>  
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/devel
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] New binding option

2018-06-21 Thread Jeff Squyres (jsquyres) via devel
On Jun 21, 2018, at 10:26 AM, r...@open-mpi.org wrote:
> 
>>> Alternatively, processes can be assigned to processors based on
>>> their local rank on a node using the \fI--bind-to cpuset:ordered\fP option
>>> with an associated \fI--cpu-list "0,2,5"\fP. This directs that the first
>>> rank on a node be bound to cpu0, the second rank on the node be bound
>>> to cpu1, and the third rank on the node be bound to cpu5. Note that an
>>> error will result if more processes are assigned to a node than cpus
>>> are provided.
>> 
>> Question about this: do the CPUs in the list correspond to the Linux virtual 
>> processor IDs?  E.g., do they correspond to what one would pass to 
>> numactl(1)?
> 
> I didn’t change the meaning of the list - it is still the local cpu ID per 
> hwloc
> 
>> Also, a minor quibble: it might be a little confusing to have --bind-to 
>> cpuset, and then have to specify a CPU list (not a CPU set).  Should it be 
>> --cpuset-list or --cpuset?
> 
> Your PR is welcome! Historically, that option has always been --cpu-list and 
> I didn’t change it

Oh, I see!  I didn't realize / forgot / whatever that --cpu-list is an existing 
option.

Let me change my question, then: should "--bind-to cpuset" be changed to 
"--bind-to cpulist"?  (Or even "cpu-list" to exactly match the existing 
"--cpu-list" CLI option)  This would be for two reasons:

1. Make the terminology agree between the two options.
2. Don't use the term "cpuset" because that has a specific meaning in Linux 
(that isn't tied to hwloc's logical processor IDs)

(Yes, I'm happy to do a PR to do this)

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] New binding option

2018-06-21 Thread Jeff Squyres (jsquyres) via devel
On Jun 21, 2018, at 9:41 AM, r...@open-mpi.org wrote:
> 
> Alternatively, processes can be assigned to processors based on
> their local rank on a node using the \fI--bind-to cpuset:ordered\fP option
> with an associated \fI--cpu-list "0,2,5"\fP. This directs that the first
> rank on a node be bound to cpu0, the second rank on the node be bound
> to cpu1, and the third rank on the node be bound to cpu5. Note that an
> error will result if more processes are assigned to a node than cpus
> are provided.

Question about this: do the CPUs in the list correspond to the Linux virtual 
processor IDs?  E.g., do they correspond to what one would pass to numactl(1)?

Also, a minor quibble: it might be a little confusing to have --bind-to cpuset, 
and then have to specify a CPU list (not a CPU set).  Should it be 
--cpuset-list or --cpuset?

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Removing the oob/ud component

2018-06-19 Thread Jeff Squyres (jsquyres) via devel
Oob/tcp should likely be good enough.

Do you want to force the two and see if there's noticeable difference for you?

mpirun --mca oob tcp ...
mpirun --mca oob ud ...



> On Jun 19, 2018, at 6:14 PM, Ben Menadue  wrote:
> 
> Hi Jeff,
> 
> What’s the replacement that it should use instead? I’m pretty sure oob/ud is 
> being picked by default on our IB cluster. Or is oob/tcp good enough?
> 
> Cheers,
> Ben
> 
>> On 20 Jun 2018, at 5:20 am, Jeff Squyres (jsquyres) via devel 
>>  wrote:
>> 
>> We talked about this on the webex today, but for those of you who weren't 
>> there: we're talking about removing the oob/ud component:
>> 
>>   https://github.com/open-mpi/ompi/pull/5300
>> 
>> We couldn't find anyone who still cares about this component (e.g., LANL, 
>> Mellanox, ...etc.), and no one is maintaining it.  If no one says anything 
>> within the next 2 weeks, we'll remove oob/ud before the branch for v4.0.0.
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> 
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/devel
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] Removing the oob/ud component

2018-06-19 Thread Jeff Squyres (jsquyres) via devel
We talked about this on the webex today, but for those of you who weren't 
there: we're talking about removing the oob/ud component:

https://github.com/open-mpi/ompi/pull/5300

We couldn't find anyone who still cares about this component (e.g., LANL, 
Mellanox, ...etc.), and no one is maintaining it.  If no one says anything 
within the next 2 weeks, we'll remove oob/ud before the branch for v4.0.0.

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] openmpi-3.1.0 cygwin patch

2018-06-16 Thread Jeff Squyres (jsquyres) via devel
Thanks Macro.  I've filed https://github.com/open-mpi/ompi/pull/5277 here for 
master; will follow up with PR's to the release branches after that passes CI / 
is merged.


> On Jun 11, 2018, at 10:43 AM, Marco Atzeri  wrote:
> 
> On 5/28/2018 11:58 AM, Marco Atzeri wrote:
>> On 5/24/2018 11:07 AM, Marco Atzeri wrote:
>>> On 5/23/2018 2:58 PM, Gilles Gouaillardet wrote:
 Marco,
 
 Have you tried to build Open MPI with an external (e.g. Cygwin provided) 
 libevent library ?
 If that works, I think that would be the preferred method.
 
 Cheers,
 
 Gilles
>>> 
>>> I will try.
>>> If I remember right there was an issue in the past as
>>> somewhere a WIN32 was defined an it was screwing the build.
>>> 
>>> Regards
>>> Marco
>>> 
>> I am validating a patch workaround to see if it works with both
>> internal and external libevent.
>> The build with external libevent passed all
>> osu-micro-benchmarks-5.4.2 MPI tests
> 
> attached patch allows build of 3.1.0 on cygwin 32 bit
> and 64 bit versions, configured with
> 
>--with-libevent=external \
>--disable-mca-dso \
>--disable-sysv-shmem \
>--enable-cxx-exceptions \
>--with-threads=posix \
>--without-cs-fs \
>--with-mpi-param_check=always \
>--enable-contrib-no-build=vt,libompitrace \
> --enable-mca-no-build=paffinity,installdirs-windows,timer-windows,shmem-sysv,patcher
> 
> the 64 bit version also use
>  --enable-builtin-atomics
> 
> Tested with libevent 2.0.22-1
> 
> Regards
> Marco
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Shared object dependencies

2018-06-12 Thread Jeff Squyres (jsquyres) via devel
On Jun 12, 2018, at 7:34 AM, Gabriel, Edgar  wrote:
> 
> Well, I am still confused. What is different on nixOS vs. other linux distros 
> that makes this error appear,

Fair enough.  I don't think I realized nixOS was a Linux distro.

That being said, every time I think I understand linkers, I find out that I 
don't know jack about linkers.  :-(

> and is it relevant enough for the backport or should we just go forward for 
> 4.0? Is it again a RTLD_GLOBAL issue as it was back 2014?

Yeah, we should probably figure this one out.  I don't know the answer here.

> And last but not least, I raised on the github discussion one series question 
> about the mca parameter names.
> 
> No way for 2 series backport btw., that version did not even have 
> common/ompio yet, that was introduced in the 3.0 release. 

Good to know.

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Shared object dependencies

2018-06-12 Thread Jeff Squyres (jsquyres) via devel
On Jun 12, 2018, at 7:21 AM, Gilles Gouaillardet 
 wrote:
> 
> I think this also depends on the linker (configuration ?) and possibly the 
> order the libraries are dlopen’ed.
> 
> Note the issue was initially reported (as warnings only) from ompi_info, so 
> there is a possibility it we all missed it.
> 
> That being said, the errors make perfect sense to me.
> 
> fwiw, I installed a NixOS virtual machine and reproduced the issue right away.

OIC -- right -- this was reported on NixOS, not vanilla Linux.  Ok.

These fixes will need to be back-ported to at least 3.0.x and 3.1.x, right?

Do they need to also go to v2.1.x?

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Shared object dependencies

2018-06-12 Thread Jeff Squyres (jsquyres) via devel
How is it that Edgar is not running into these issues?

Edgar: are you compiling with --disable-dlopen, perchance?


> On Jun 12, 2018, at 6:04 AM, Gilles Gouaillardet 
>  wrote:
> 
> Edgar,
> 
> Regarding this specific problem, the issue is mca_fcoll_individual.so did not 
> depend on libmca_commom_ompio.so,
> the PR does address that (among other abstraction violations)
> 
> What about following up in github  ?
> 
> Cheers,
> 
> Gilles
> 
> On Tuesday, June 12, 2018, Gabriel, Edgar  wrote:
> So , I am still surprised to see this error message: if you look at lets say 
> just one error message (and all others are the same):
> 
> > > [orc-login2:107400] mca_base_component_repository_open: unable to open
> > > mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so:
> > > undefined symbol: mca_common_ompio_file_write (ignored)
> 
> How comes that the symbol mca_common_ompio_file_write can not be found ? It 
> is in the common, that symbol should always be there, isn't it? 
> Your fix Gilles (which we can discuss) will not address this problem in my 
> opinion. The symbols at this point that are accessed from the ompio component 
> are used through a function pointer, not by name, and that should work in my 
> opinion.(e.g. we do not call directly mca_io_ompio_set_aggregator_props, but 
> we call the function pointer fh->f_set_aggregator_props), and the same with 
> the mca parmaeters, we access them through a function that is stored as a 
> function pointer on the file handle structure.
> 
> Thanks
> Edgar
>  
> 
> > -Original Message-
> > From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of Gilles
> > Gouaillardet
> > Sent: Tuesday, June 12, 2018 3:28 AM
> > To: devel@lists.open-mpi.org
> > Subject: Re: [OMPI devel] Shared object dependencies
> > 
> > Tyson,
> > 
> > 
> > thanks for taking the time to do some more tests.
> > 
> > 
> > This is really a bug in Open MPI, and unlike what I thought earlier, there 
> > are
> > still
> > 
> > some abstraction violations here and there related to ompio.
> > 
> > 
> > I filed https://github.com/open-mpi/ompi/pull/5263 in order to address them
> > 
> > 
> > Meanwhile, you can configure Open MPI with --disable-dlopen and hopefully,
> > that will be
> > 
> > enought to hide the issue.
> > 
> > 
> > Cheers,
> > 
> > 
> > Gilles
> > 
> > 
> > On 6/12/2018 5:58 AM, Tyson Whitehead wrote:
> > > I have now also tried release 3.1.0.  Same thing (were I have replaced
> > > /nix/store/glx60yay0hmmizhlxhqhnx9w3k4j9g1z-openmpi-3.1.0 with )
> > >
> > > [orc-login2:107400] mca_base_component_repository_open: unable to open
> > > mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so:
> > > undefined symbol: mca_common_ompio_file_write (ignored)
> > > [orc-login2:107400] mca_base_component_repository_open: unable to open
> > > mca_fcoll_dynamic_gen2: .../lib/openmpi/mca_fcoll_dynamic_gen2.so:
> > > undefined symbol: mca_common_ompio_register_print_entry (ignored)
> > > [orc-login2:107400] mca_base_component_repository_open: unable to open
> > > mca_fcoll_dynamic: .../lib/openmpi/mca_fcoll_dynamic.so: undefined
> > > symbol: mca_common_ompio_register_print_entry (ignored)
> > > [orc-login2:107400] mca_base_component_repository_open: unable to open
> > > mca_fcoll_two_phase: .../lib/openmpi/mca_fcoll_two_phase.so: undefined
> > > symbol: mca_common_ompio_register_print_entry (ignored)
> > > [orc-login2:107400] mca_base_component_repository_open: unable to open
> > > mca_fcoll_static: .../lib/openmpi/mca_fcoll_static.so: undefined
> > > symbol: mca_common_ompio_register_print_entry (ignored)
> > >   Package: Open MPI nixbld@localhost Distribution
> > >  Open MPI: 3.1.0
> > >Open MPI repo revision: v3.1.0
> > > Open MPI release date: May 07, 2018
> > >      Open RTE: 3.1.0
> > >Open RTE repo revision: v3.1.0
> > > Open RTE release date: May 07, 2018
> > >  OPAL: 3.1.0
> > > OPAL repo revision: v3.1.0
> > > OPAL release date: May 07, 2018
> > >
> > > I straced the process, and, as far as I could tell, it was just mostly
> > > opening the shared objects in alphabetical order.  Would appreciate
> > > any insight, such as whether this is normal behaviour I can ignore or
> > > not?
> > >
> > > Thanks!  -Tyson
> > > On Fri, 8 Jun 2018 at 17:37, Tyson Whitehead 
> > wrote:
> > >> This email starts out talking about version 1.10.7 to give a complete
> > >> picture.  I tested 2.1.3 as well, it also exhibits this issue,
> > >> although to a lesser extent though, and am asking for help on that
> > >> release.
> > >>
> > >> I was compiling the OpenMPI 1.10.7 shipped with NixOS against a newer
> > >> libibverbs with a large set of drivers and get some strange errors
> > >> when when running opmi_info (I've replaced the common prefix
> > >> /nix/store/9zm0pqsh67fw0xi5cpnybnd7hgzryffs-openmpi-1.10.7 with ...)
> > >>
> > >> [mon241:04077] mca: base: 

Re: [OMPI devel] Shared object dependencies

2018-06-08 Thread Jeff Squyres (jsquyres) via devel
Before digging any deeper, did you perchance install multiple versions of Open 
MPI into the same prefix?

If so, remember that Open MPI installs lots of plugins.  The exact set of 
plugins changes every release.  So if you install version A.B.C in to 
/opt/openmpi, and then install version X.Y.Z in to /opt/openmpi, note that the 
installation of X.Y.Z did not *uninstall* A.B.C first.  Hence, you might still 
have some stale A.B.C components in the tree that Open MPI X.Y.Z may try to 
open.  Since the underlying libraries that these plugins use have now been 
upgraded to X.Y.Z, the stale A.B.C component may (and likely will) fail to open.

If that's not what is happening, let us know and we can dig deeper.


> On Jun 8, 2018, at 5:37 PM, Tyson Whitehead  wrote:
> 
> This email starts out talking about version 1.10.7 to give a complete
> picture.  I tested 2.1.3 as well, it also exhibits this issue,
> although to a lesser extent though, and am asking for help on that
> release.
> 
> I was compiling the OpenMPI 1.10.7 shipped with NixOS against a newer
> libibverbs with a large set of drivers and get some strange errors
> when when running opmi_info (I've replaced the common prefix
> /nix/store/9zm0pqsh67fw0xi5cpnybnd7hgzryffs-openmpi-1.10.7 with ...)
> 
> [mon241:04077] mca: base: component_find: unable to open
> .../lib/openmpi/mca_btl_openib: .../lib/openmpi/mca_btl_openib.so:
> undefined symbol: mca_mpool_grdma_evict (ignored)
> [mon241:04077] mca: base: component_find: unable to open
> .../lib/openmpi/mca_fcoll_individual:
> .../lib/openmpi/mca_fcoll_individual.so: undefined symbol:
> mca_io_ompio_file_write (ignored)
> [mon241:04077] mca: base: component_find: unable to open
> .../lib/openmpi/mca_fcoll_ylib: .../lib/openmpi/mca_fcoll_ylib.so:
> undefined symbol: ompi_io_ompio_scatter_data (ignored)
> [mon241:04077] mca: base: component_find: unable to open
> .../lib/openmpi/mca_fcoll_dynamic:
> .../lib/openmpi/mca_fcoll_dynamic.so: undefined symbol:
> ompi_io_ompio_allgatherv_array (ignored)
> [mon241:04077] mca: base: component_find: unable to open
> .../lib/openmpi/mca_fcoll_two_phase:
> .../lib/openmpi/mca_fcoll_two_phase.so: undefined symbol:
> ompi_io_ompio_set_aggregator_props (ignored)
> [mon241:04077] mca: base: component_find: unable to open
> .../lib/openmpi/mca_fcoll_static: .../lib/openmpi/mca_fcoll_static.so:
> undefined symbol: ompi_io_ompio_allgather_array (ignored)
> Package: Open MPI nixbld@ Distribution
>   Open MPI: 1.10.7
> Open MPI repo revision: v1.10.6-48-g5e373bf
>  Open MPI release date: May 16, 2017
>   Open RTE: 1.10.7
> Open RTE repo revision: v1.10.6-48-g5e373bf
>  Open RTE release date: May 16, 2017
>   OPAL: 1.10.7
> OPAL repo revision: v1.10.6-48-g5e373bf
>  OPAL release date: May 16, 2017
> ...
> 
> I dug into the first of these (figured out what library provided it,
> looked at the declared dependencies, poked around in the automake
> file) , and, as far as I could determine, it seems that
> mca_btl_openib.so simply isn't linked to list mca_mpool_grdma.so
> (which provides the symbol) as a dependency.
> 
> Seeing as 1.10.7 is no longer supported.  I figured I would try 2.1.3
> in case this has been fixed.  I compiled it up as well, and it seems
> all but the mca_fcoll_individual one have been resolved (I've replaced
> /nix/store/4kh0zbn8pmdqhvwagicswg70rwnpm570-openmpi-2.1.3 with ...)
> 
> [mon241:05544] mca_base_component_repository_open: unable to open
> mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so:
> undefined symbol: ompio_io_ompio_file_read (ignored)
> Package: Open MPI nixbld@ Distribution
>   Open MPI: 2.1.3
> Open MPI repo revision: v2.1.2-129-gcfd8f3f
>  Open MPI release date: Mar 13, 2018
>   Open RTE: 2.1.3
> Open RTE repo revision: v2.1.2-129-gcfd8f3f
>  Open RTE release date: Mar 13, 2018
>   OPAL: 2.1.3
> OPAL repo revision: v2.1.2-129-gcfd8f3f
>  OPAL release date: Mar 13, 2018
> ...
> 
> Again I was able to find this symbol in the mca_io_ompio.so library.
> I looked through the source again, and it seems pretty clear that the
> function is indeed called, but the library isn't linked to list the
> mca_io_ompio.so library as a dependency
> 
> Looking through the various shared libraries in the .../lib/openmpi
> directory though, and it seems none of them have dependencies on each
> other.  How is this suppose to work?  Is the component library just
> suppose to load everything so all symbols get resolved?  Is the above
> error I'm seeing an error then?
> 
> Any insight would be appreciated.
> 
> Thanks!  -Tyson
> 
> PS:  Please note that the openmpi code was compiled without any
> patches and without any special configure flags other than
> --prefix= (NixOS also adds --diasble-static and
> --disable-dependency-tracking by default, but I removed those, it
> didn't make a difference)..
> 

Re: [OMPI devel] Github deprecated "Github services" Does this affect us?

2018-06-08 Thread Jeff Squyres (jsquyres) via devel
The only thing it affects that we were using was Travis.

But a) we're no longer using Travis, and b) I'm sure Travis will address the 
issue, anyway.


> On Jun 7, 2018, at 10:06 AM, Geoffrey Paulsen  wrote:
> 
> Devel,
>  
>I just came across Github's deprecation announcement of Github Services.
> https://developer.github.com/changes/2018-04-25-github-services-deprecation/
> 
>Does anyone know if this will affect Open-MPI at all, and do we need to 
> change any processes because of this?
> 
> ---
> Geoffrey Paulsen
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Remove prun tool from OMPI?

2018-06-05 Thread Jeff Squyres (jsquyres) via devel
No objection from me.

> On Jun 5, 2018, at 12:09 PM, r...@open-mpi.org wrote:
> 
> Hey folks
> 
> Does anyone have heartburn if I remove the “prun” tool from ORTE? I don’t 
> believe anyone is using it, and it doesn’t look like it even works.
> 
> I ask because the name conflicts with PRRTE and can cause problems when 
> running OMPI against PRRTE
> 
> Ralph
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Master broken

2018-06-04 Thread Jeff Squyres (jsquyres) via devel
Fix should now be merged.


> On Jun 4, 2018, at 1:20 PM, Peter Kjellström  wrote:
> 
> On Sun, 3 Jun 2018 08:41:42 -0700
> Thananon Patinyasakdikul  wrote:
> 
>> It is tested against 1.5. it should not work with lower version. I
>> will fix it.
> 
> FWIW, it also builds ok against up to date libfabric (1.6.1).
> 
> /Peter
> 
>> Arm
>> 
>> On Sun, Jun 3, 2018, 7:43 AM r...@open-mpi.org 
>> wrote:
>> 
>>> Here are more problems with a different version of libfabric:
>>> 
>>> *btl_ofi_component.c:* In function ‘*validate_info*’:
>>> *btl_ofi_component.c:64:23:* *error: *‘*FI_MR_VIRT_ADDR*’ undeclared
>>> (first use in this function)
>>>  (mr_mode & ~(*FI_MR_VIRT_ADDR* | FI_MR_ALLOCATED |
>>> FI_MR_PROV_KEY)) == 0)) {
>>>   *^~~*
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel