[MTT devel] MTT docs now online

2018-07-12 Thread r...@open-mpi.org
We finally got the bugs out and the documentation for Python MTT implementation is now online at https://open-mpi.github.io/mtt . It also picked up the Perl stuff, but we’ll just ignore that little detail :-) Thanks to Akshaya Jagannadharao for all the hard work

Re: [OMPI devel] Odd warning in OMPI v3.0.x

2018-07-06 Thread r...@open-mpi.org
ok, i’ll fix it > On Jul 6, 2018, at 3:09 PM, Nathan Hjelm via devel > wrote: > > Looks like a bug to me. The second argument should be a value in v3.x.x. > > -Nathan > >> On Jul 6, 2018, at 4:00 PM, r...@open-mpi.org wrote: >> >> I’m seei

[OMPI devel] Odd warning in OMPI v3.0.x

2018-07-06 Thread r...@open-mpi.org
I’m seeing this when building the v3.0.x branch: runtime/ompi_mpi_init.c:395:49: warning: passing argument 2 of ‘opal_atomic_cmpset_32’ makes integer from pointer without a cast [-Wint-conversion] if (!opal_atomic_cmpset_32(_mpi_state, , desired)) {

[OMPI devel] Fwd: [pmix] Release candidates available for testing

2018-07-01 Thread r...@open-mpi.org
FYI - v3.0.0 will go into master for the OMPI v4 branch. v2.1.2 should go into updates for OMPI v3.1 and v3.0 branches Ralph > Begin forwarded message: > > From: "r...@open-mpi.org" > Subject: [pmix] Release candidates available for testing > Date: June 29, 2018 at

Re: [OMPI devel] Open MPI: Undefined reference to pthread_atfork

2018-06-22 Thread r...@open-mpi.org
OMPI 2.1.3??? Is there any way you could update to something more recent? > On Jun 22, 2018, at 12:28 PM, lille stor wrote: > > Hi, > > > When compiling a C++ source file named test.cpp that needs a shared library > named libUtils.so (which in its turn needs Open MPI shared library, hence

Re: [OMPI devel] New binding option

2018-06-21 Thread r...@open-mpi.org
> On Jun 21, 2018, at 7:37 AM, Jeff Squyres (jsquyres) via devel > wrote: > > On Jun 21, 2018, at 10:26 AM, r...@open-mpi.org wrote: >> >>>> Alternatively, processes can be assigned to processors based on >>>> their local rank on a node using

Re: [OMPI devel] New binding option

2018-06-21 Thread r...@open-mpi.org
> On Jun 21, 2018, at 6:47 AM, Jeff Squyres (jsquyres) via devel > wrote: > > On Jun 21, 2018, at 9:41 AM, r...@open-mpi.org wrote: >> >> Alternatively, processes can be assigned to processors based on >> their local rank on a node using the \fI--bi

[OMPI devel] New binding option

2018-06-21 Thread r...@open-mpi.org
Hello all I have added a new binding option to OMPI master: Alternatively, processes can be assigned to processors based on their local rank on a node using the \fI--bind-to cpuset:ordered\fP option with an associated \fI--cpu-list "0,2,5"\fP. This directs that the first rank on a node be bound

Re: [OMPI devel] ARM failure on PR to master

2018-06-10 Thread r...@open-mpi.org
Now moved to https://github.com/open-mpi/ompi/pull/5258 <https://github.com/open-mpi/ompi/pull/5258> - same error > On Jun 8, 2018, at 9:04 PM, r...@open-mpi.org wrote: > > Can someone who knows/cares about ARM perhaps take a look at PR > https://github.com/open-mpi/ompi/p

[OMPI devel] ARM failure on PR to master

2018-06-08 Thread r...@open-mpi.org
Can someone who knows/cares about ARM perhaps take a look at PR https://github.com/open-mpi/ompi/pull/5247 ? I’m hitting an error in the ARM CI tests that I can’t understand: --> Running example: hello_c

[OMPI devel] PRRTE+OMPI status

2018-06-07 Thread r...@open-mpi.org
Hi folks I now have it so that you can run MTT using OMPI against PRRTE. Current results look promising: +-+-+-+--+--+--+--+--+--+ | Phase | Section

Re: [OMPI devel] Remove prun tool from OMPI?

2018-06-06 Thread r...@open-mpi.org
I have renamed prun for now - will do the update in a bit > On Jun 5, 2018, at 12:20 PM, Thomas Naughton wrote: > > > On Tue, 5 Jun 2018, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote: > >> >> >>> On Jun 5, 2018, at 11:59 AM, Thomas Naught

Re: [OMPI devel] Remove prun tool from OMPI?

2018-06-05 Thread r...@open-mpi.org
Thanks, > --tjn > > _ > Thomas Naughton naught...@ornl.gov > Research Associate (865) 576-4184 > > > On Tue, 5 Jun 2018, r...@o

Re: [OMPI devel] Remove prun tool from OMPI?

2018-06-05 Thread r...@open-mpi.org
naught...@ornl.gov > Research Associate (865) 576-4184 > > > On Tue, 5 Jun 2018, r...@open-mpi.org wrote: > >> Hey folks >> >> Does anyone have heartburn if I remove the “prun” tool from ORTE? I don’t >> believe anyone is using it, and it do

[OMPI devel] Remove prun tool from OMPI?

2018-06-05 Thread r...@open-mpi.org
Hey folks Does anyone have heartburn if I remove the “prun” tool from ORTE? I don’t believe anyone is using it, and it doesn’t look like it even works. I ask because the name conflicts with PRRTE and can cause problems when running OMPI against PRRTE Ralph

Re: [OMPI devel] Master broken

2018-06-03 Thread r...@open-mpi.org
IN && ret != -FI_EINTR)) { ^ What the heck version was this tested against??? > On Jun 3, 2018, at 7:32 AM, r...@open-mpi.org wrote: > > On my system, which has libfabric installed (but maybe an older version than

[OMPI devel] Master broken

2018-06-03 Thread r...@open-mpi.org
On my system, which has libfabric installed (but maybe an older version than expected?): btl_ofi_component.c: In function ‘mca_btl_ofi_component_progress’: btl_ofi_component.c:557:63: error: ‘FI_EINTR’ undeclared (first use in this function) } else if (OPAL_UNLIKELY(ret != -FI_EAGAIN

Re: [OMPI devel] Master warnings?

2018-06-02 Thread r...@open-mpi.org
No problem - I just commented because earlier in the week it had built clean, so I was surprised to get the flood. This was with gcc 6.3.0, so not that old > On Jun 2, 2018, at 7:19 AM, Nathan Hjelm wrote: > > Should have it fixed today or tomorrow. Guess I didn't have a sufficiently > old

[OMPI devel] Master warnings?

2018-06-01 Thread r...@open-mpi.org
Geez guys - what happened? In file included from monitoring_prof.c:47:0: ../../../../ompi/include/mpi.h:423:9: warning: ‘__error__’ attribute ignored [-Wattributes] __mpi_interface_removed__("MPI_Comm_errhandler_fn was removed in MPI-3.0; use MPI_Comm_errhandler_function instead");

[OMPI devel] Some disturbing warnings on master today

2018-05-30 Thread r...@open-mpi.org
In file included from /usr/include/stdio.h:411:0, from ../../opal/util/malloc.h:24, from ../../opal/include/opal_config_bottom.h:331, from ../../opal/include/opal_config.h:2919, from ../../opal/util/argv.h:33,

Re: [OMPI devel] Running on Kubernetes

2018-05-28 Thread r...@open-mpi.org
osal.md>. > In this version we've managed to not use ssh, relying on `kubectl exec` > instead. It's still pretty "ghetto", but at least we've managed to train some > tensorflow models with it. :) Please take a look and let me know what you > think. > > Thanks, > >

Re: [OMPI devel] [OMPI users] 3.x - hang in MPI_Comm_disconnect

2018-05-22 Thread r...@open-mpi.org
sses but then > using MPI_Comm_disconnect when closing the cluster. I think the idea is that > they can then create and destroy clusters several times within the same R > script. But of course, that won’t work here when you can’t disconnect > processes. > > Cheers, > Ben >

Re: [OMPI devel] About supporting HWLOC 2.0.x

2018-05-22 Thread r...@open-mpi.org
Arg - just remembered. I should have noted in my comment that I started with that PR and did make a few further adjustments, though not much. > On May 22, 2018, at 8:49 AM, Jeff Squyres (jsquyres) > wrote: > > Geoffroy -- check out

Re: [OMPI devel] About supporting HWLOC 2.0.x

2018-05-22 Thread r...@open-mpi.org
I’ve been running with hwloc 2.0.1 for quite some time now without problem, including use of the shared memory segment. It would be interesting to hear what changes you had to make. However, that said, there is a significant issue in ORTE when trying to map-by NUMA as hwloc 2.0.1 no longer

Re: [OMPI devel] [OMPI users] 3.x - hang in MPI_Comm_disconnect

2018-05-21 Thread r...@open-mpi.org
Comm_connect and Comm_disconnect are both broken in OMPI v2.0 and above, including OMPI master - the precise reasons differ across the various releases. From what I can tell, the problem is in the OMPI side (as opposed to PMIx). I’ll try to file a few issues (since the problem is different in

Re: [OMPI devel] Open MPI 3.1.0rc4 posted

2018-04-17 Thread r...@open-mpi.org
I’ll let you decide about 3.1.0. FWIW: I think Gilles fix should work for external PMIx v1.2.5 as well. > On Apr 17, 2018, at 7:56 AM, Barrett, Brian via devel > wrote: > > Do we honestly care for 3.1.0? I mean, we went 6 months without it working > and no one

Re: [OMPI devel] Running on Kubernetes

2018-03-16 Thread r...@open-mpi.org
I haven’t really spent any time with Kubernetes, but it seems to me you could just write a Kubernetes plm (and maybe an odls) component and bypass the ssh stuff completely given that you say there is a launcher API. > On Mar 16, 2018, at 11:02 AM, Jeff Squyres (jsquyres) >

[OMPI devel] Fabric manager interactions: request for comments

2018-02-05 Thread r...@open-mpi.org
Hello all The PMIx community is starting work on the next phase of defining support for network interactions, looking specifically at things we might want to obtain and/or control via the fabric manager. A very preliminary draft is shown here:

Re: [OMPI devel] cannot push directly to master anymore

2018-01-31 Thread r...@open-mpi.org
> On Jan 31, 2018, at 8:41 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > > On Jan 31, 2018, at 11:33 AM, r...@open-mpi.org wrote: >> >> If CI takes 30 min, then not a problem - when CI takes 6 hours (as it >> sometimes does), then that’

Re: [OMPI devel] cannot push directly to master anymore

2018-01-31 Thread r...@open-mpi.org
> On Jan 31, 2018, at 7:36 AM, Jeff Squyres (jsquyres) > wrote: > > On Jan 31, 2018, at 10:14 AM, Gilles Gouaillardet > wrote: >> >> I tried to push some trivial commits directly to the master branch and >> was surprised that is no more

Re: [OMPI devel] hwloc2 and cuda and non-default cudatoolkit install location

2017-12-20 Thread r...@open-mpi.org
FWIW: what we do in PMIx (where we also have some overlapping options) is to add in OMPI a new --enable-pmix-foo option and then have the configury in the corresponding OMPI component convert it to use inside of the embedded PMIx itself. It isn’t a big deal - just have to do a little code to

Re: [OMPI devel] hwloc 2 thing

2017-12-13 Thread r...@open-mpi.org
s > there anyway we can split the work between the hosts.. > > > Thanks for your help. > > Best regards, > Silpa > > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > On Sat, Jul 22, 2017 at 6:28 PM, r...@open-mp

Re: [OMPI devel] Enable issue tracker for ompi-www repo?

2017-11-04 Thread r...@open-mpi.org
Hi Chris It was just an oversight - I have turned on the issue tracker, so feel free to post, or a PR is also welcome Ralph > On Nov 4, 2017, at 5:03 AM, Gilles Gouaillardet > wrote: > > Chris, > > feel free to issue a PR, or fully describe the issue so a

Re: [OMPI devel] Cuda build break

2017-10-04 Thread r...@open-mpi.org
Fix is here: https://github.com/open-mpi/ompi/pull/4301 <https://github.com/open-mpi/ompi/pull/4301> > On Oct 4, 2017, at 11:19 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > > Thanks Ralph. > >> On Oct 4, 2017, at 2:07 PM, r...@open-mpi.org wrot

Re: [OMPI devel] HWLOC / rmaps ppr build failure

2017-10-04 Thread r...@open-mpi.org
Thanks! Fix is here: https://github.com/open-mpi/ompi/pull/4301 > On Oct 4, 2017, at 11:10 AM, Brice Goglin wrote: > > Looks like you're using a hwloc < 1.11. If you want to support this old > API while using the 1.11 names,

Re: [OMPI devel] Cuda build break

2017-10-04 Thread r...@open-mpi.org
I’ll fix > On Oct 4, 2017, at 10:57 AM, Sylvain Jeaugey wrote: > > See my last comment on #4257 : > > https://github.com/open-mpi/ompi/pull/4257#issuecomment-332900393 > > We should completely disable CUDA in hwloc. It is breaking the build, but > more importantly, it

Re: [OMPI devel] HWLOC / rmaps ppr build failure

2017-10-04 Thread r...@open-mpi.org
Hmmm...I suspect this is a hwloc v2 vs v1 issue. I’ll fix it > On Oct 4, 2017, at 10:54 AM, Barrett, Brian via devel > wrote: > > It looks like a change in either HWLOC or the rmaps ppr component is causing > Cisco build failures on master for the last couple of

Re: [OMPI devel] Jenkins nowhere land again

2017-10-03 Thread r...@open-mpi.org
he only options for the OMPI builder are to either wait until Nathan or I > get home and get our servers running again or to not test OS X (which has its > own problems). I don’t have a strong preference here, but I also don’t want > to make the decision unilaterally. > > Brian >

[OMPI devel] Jenkins nowhere land again

2017-10-03 Thread r...@open-mpi.org
We are caught between two infrastructure failures: Mellanox can’t pull down a complete PR OMPI is hanging on the OS-X server Can someone put us out of our misery? Ralph ___ devel mailing list devel@lists.open-mpi.org

Re: [OMPI devel] Map by socket broken in 3.0.0?

2017-10-03 Thread r...@open-mpi.org
Found the bug - see https://github.com/open-mpi/ompi/pull/4291 Will PR for the next 3.0.x release > On Oct 2, 2017, at 9:55 PM, Ben Menadue wrote: > > Hi, > > I having trouble using map by socket on remote nodes. > >

[OMPI devel] ORTE DVM update

2017-09-18 Thread r...@open-mpi.org
Hi all The DVM on master is working again. You will need to use the new “prun” tool instead of “orterun” to submit your jobs - note that “prun” automatically finds the DVM, and so there is no longer any need to have orte-dvm report its URI, nor does prun take the “-hnp” argument. The

Re: [OMPI devel] Stale PRs

2017-09-06 Thread r...@open-mpi.org
then and probably is no longer even relevant (and has lots of conflicts as a result) Ralph > On Aug 31, 2017, at 11:15 AM, r...@open-mpi.org wrote: > > Thanks George - wasn’t picking on you, just citing the oldest one on the > list. Once that goes in, I’ll be poking the next :-)

Re: [OMPI devel] Open MPI 3.1 Feature List

2017-09-05 Thread r...@open-mpi.org
We currently have PMIx v2.1.0beta in OMPI master. This includes cross-version support - i.e., OMPI v3.1 would be able to run against an RM using any PMIx version. At the moment, the shared memory (or dstore) support isn’t working across versions, but I’d consider that a “bug” that will

Re: [OMPI devel] configure --with paths don't allow for absolute path specification

2017-09-02 Thread r...@open-mpi.org
u > _cannot_ when using --with-pmi=/usr/include/slurm if pmi.h and pmi2.h are > installed *only* in /usr/include/slurm. > > > On Saturday, September 2, 2017 9:55 AM, "r...@open-mpi.org" > <r...@open-mpi.org> wrote: > > > I’m honestly confused by this

Re: [OMPI devel] configure --with paths don't allow for absolute path specification

2017-09-02 Thread r...@open-mpi.org
I’m honestly confused by this as I don’t understand what you are trying to accomplish. Neither OMPI nor PMIx uses those headers. PMIx provides them just as a convenience for anyone wanting to compile a PMI based code, and so that we could internally write functions that translate from PMI to

Re: [OMPI devel] Stale PRs

2017-08-31 Thread r...@open-mpi.org
er solution that what > we have today, unfortunately not perfect as it would require additions to the > configure. Waiting for reviews. > > George. > > > On Thu, Aug 31, 2017 at 10:12 AM, r...@open-mpi.org > <mailto:r...@open-mpi.org> <r...@open-mpi.org <m

Re: [OMPI devel] Stale PRs

2017-08-31 Thread r...@open-mpi.org
Thanks to those who made a first pass at these old PRs. The oldest one is now dated Dec 2015 - nearly a two-year old change for large messages over the TCP BTL, waiting for someone to commit. > On Aug 30, 2017, at 7:34 AM, r...@open-mpi.org wrote: > > Hey folks > > This is get

Re: [OMPI devel] [2.1.2rc3] libevent SEGV on FreeBSD/amd64

2017-08-30 Thread r...@open-mpi.org
Yeah, that caught my eye too as that is impossibly large. We only have a handful of active queues - looks to me like there is some kind of alignment issue. Paul - has this configuration worked with prior versions of OMPI? Or is this something new? Ralph > On Aug 30, 2017, at 4:17 PM, Larry

[OMPI devel] Stale PRs

2017-08-30 Thread r...@open-mpi.org
Hey folks This is getting ridiculous - we have PRs sitting on GitHub that are more than a year old! If they haven’t been committed in all that time, they can’t possibly be worth anything now. Would people _please_ start paying attention to their PRs? Either close them, or update/commit them.

Re: [MTT devel] trying again to ppost results to "IU" using pyclient

2017-08-15 Thread r...@open-mpi.org
I have not been able to do so, sadly. > On Aug 15, 2017, at 2:31 PM, Howard Pritchard wrote: > > HI Folks, > > Thanks Josh. That doesn't seem to help much though. Exactly which URL > should we be submitting a POST request to get a serial > number, and what should the

Re: [OMPI devel] Verbosity for "make check"

2017-08-08 Thread r...@open-mpi.org
Okay, I’ll update that PR accordingly > On Aug 8, 2017, at 10:51 AM, Jeff Squyres (jsquyres) > wrote: > > Per our discussion on the webex today about getting verbosity out of running > "make check" (e.g., to see what the heck is going on in >

Re: [OMPI devel] PMIX visibility

2017-07-25 Thread r...@open-mpi.org
George - I believe this PR fixes the problems. At least, it now runs on OSX for me: https://github.com/open-mpi/ompi/pull/3957 <https://github.com/open-mpi/ompi/pull/3957> > On Jul 25, 2017, at 5:27 AM, r...@open-mpi.org wrote: > > Ouch - sorry about that. pmix_setenv is a

Re: [OMPI devel] PMIX visibility

2017-07-25 Thread r...@open-mpi.org
Ouch - sorry about that. pmix_setenv is actually defined down in the code base, so let me investigate why it got into pmix_common. > On Jul 24, 2017, at 10:26 PM, George Bosilca wrote: > > The last PMIX import broke the master on all platforms that support > visibility. I

Re: [OMPI devel] hwloc 2 thing

2017-07-22 Thread r...@open-mpi.org
> avoid the issues on ld libraries while running. > > thanks, > silpa > > > > > On Friday, 21 July 2017 8:52 AM, "r...@open-mpi.org" <r...@open-mpi.org> > wrote: > > > Yes - I have a PR about cleared that will remove the hwloc2 install. It ne

Re: [OMPI devel] hwloc 2 thing

2017-07-20 Thread r...@open-mpi.org
Yes - I have a PR about cleared that will remove the hwloc2 install. It needs to be redone > On Jul 20, 2017, at 8:18 PM, Howard Pritchard wrote: > > Hi Folks, > > I'm noticing that if I pull a recent version of master with hwloc 2 support > into my local repo, that my

Re: [OMPI devel] LD_LIBRARY_PATH and environment variables not getting set in remote hosts

2017-07-20 Thread r...@open-mpi.org
You must be kidding - 1.2.8??? We wouldn’t even know where to begin to advise you on something that old - I’m actually rather surprised it even compiled on a new Linux. > On Jul 20, 2017, at 4:22 AM, saisilpa b via devel > wrote: > > HI Gilles, > > Thanks for your

Re: [OMPI devel] Issue/PR tagging

2017-07-19 Thread r...@open-mpi.org
e a Target:v??? label > If an issue is fixed in master, but not merged into branches, don’t close the > issue > > I think that’s about it. There’s some workflows we want to build to automate > enforcing many of these things, but for now, it’s just hints to help the RMs > not lose t

[OMPI devel] Issue/PR tagging

2017-07-19 Thread r...@open-mpi.org
Hey folks I know we made some decisions last week about how to tag issues and PRs to make things easier to track for release branches, but the wiki notes don’t cover what we actually decided to do. Can someone briefly summarize? I honestly have forgotten if we tag issues, or tag PRs Ralph

Re: [OMPI devel] Open MPI 3.0.0 first release candidate posted

2017-06-29 Thread r...@open-mpi.org
I tracked down a possible source of the oob/tcp error - this should address it, I think: https://github.com/open-mpi/ompi/pull/3794 > On Jun 29, 2017, at 3:14 PM, Howard Pritchard wrote: > > Hi Brian, > > I tested this rc

Re: [OMPI devel] SLURM 17.02 support

2017-06-27 Thread r...@open-mpi.org
d with exit code 1 > > Don’t think it really matters, since v2.x probably wasn’t what the customer > wanted. > > Brian > >> On Jun 19, 2017, at 7:18 AM, Howard Pritchard <hpprit...@gmail.com >> <mailto:hpprit...@gmail.com>> wrote: >> >> Hi Ra

[OMPI devel] PMIx Working Groups: Call for participants

2017-06-26 Thread r...@open-mpi.org
Hello all There are two new PMIx working groups starting up to work on new APIs and attributes to support application/tool interactions with the system management stack in the following areas: 1. tiered storage support - prepositioning of files/binaries/libraries, directed hot/warm/cold

Re: [OMPI devel] orterun busted

2017-06-23 Thread r...@open-mpi.org
Odd - I guess my machine is just consistently lucky, as was the CI’s when this went thru. The problem field is actually stale - we haven’t used it in years - so I simply removed it from orte_process_info. https://github.com/open-mpi/ompi/pull/3741

Re: [OMPI devel] Abstraction violation!

2017-06-22 Thread r...@open-mpi.org
available machines have an mpi.h somewhere in the default path because we always install _something_. I wonder if our master would fail in a distro that didn’t have an MPI installed... > On Jun 22, 2017, at 5:02 PM, r...@open-mpi.org wrote: > > It apparently did come in that way. We just n

[OMPI devel] Abstraction violation!

2017-06-22 Thread r...@open-mpi.org
I don’t understand what someone was thinking, but you CANNOT #include “mpi.h” in opal/util/info.c. It has broken pretty much every downstream project. Please fix this! Ralph ___ devel mailing list devel@lists.open-mpi.org

Re: [OMPI devel] orte-clean not cleaning left over temporary I/O files in /tmp

2017-06-20 Thread r...@open-mpi.org
make should be under the session directory, not directly in /tmp. > On May 9, 2017, at 2:10 AM, Christoph Niethammer <nietham...@hlrs.de> wrote: > > Hi, > > I am using Open MPI 2.1.0. > > Best > Christoph > > - Original Message - > F

Re: [OMPI devel] SLURM 17.02 support

2017-06-19 Thread r...@open-mpi.org
; wrote: >> >> Hi Ralph >> >> I think a helpful error message would suffice. >> >> Howard >> >> r...@open-mpi.org <r...@open-mpi.org> schrieb am Di. 13. Juni 2017 um 11:15: >> Hey folks >> >> Brian brought this up today on t

Re: [OMPI devel] Coverity strangeness

2017-06-16 Thread r...@open-mpi.org
t;> false positive >> >> >> if you have contacts at coverity, it would be interesting to report this >> false positive >> >> >> >> Cheers, >> >> >> Gilles >> >> >> On 6/16/2017 12:02 PM, r...@open-mpi.

[OMPI devel] Coverity strangeness

2017-06-15 Thread r...@open-mpi.org
I’m trying to understand some recent coverity warnings, and I confess I’m a little stumped - so I figured I’d ask out there and see if anyone has a suggestion. This is in the PMIx repo, but it is reported as well in OMPI (down in opal/mca/pmix/pmix2x/pmix). The warnings all take the following

[OMPI devel] SLURM 17.02 support

2017-06-13 Thread r...@open-mpi.org
Hey folks Brian brought this up today on the call, so I spent a little time investigating. After installing SLURM 17.02 (with just --prefix as config args), I configured OMPI with just --prefix config args. Getting an allocation and then executing “srun ./hello” failed, as expected. However,

Re: [OMPI devel] ompi_info "developer warning"

2017-06-05 Thread r...@open-mpi.org
even "- " > > > Cheers, > > > Gilles > > ----- Original Message - > > So we are finally getting rid of the 80 chars per line limit? > > George. > > > > On Sun, Jun 4, 2017 at 11:23 PM, r...@open-mpi.org <mailto:r...@op

Re: [OMPI devel] ompi_info "developer warning"

2017-06-05 Thread r...@open-mpi.org
I added the change to https://github.com/open-mpi/ompi/pull/3651 <https://github.com/open-mpi/ompi/pull/3651>. We’ll just have to hope that people intuitively understand that “-“ means “disabled”. > On Jun 5, 2017, at 7:01 AM, r...@open-mpi.org wrote: > > Fine with me - I don

Re: [OMPI devel] ompi_info "developer warning"

2017-06-04 Thread r...@open-mpi.org
; example > "MCA (-) pml monitoring" > > > Cheers, > > Gilles > > On 6/3/2017 5:26 AM, r...@open-mpi.org wrote: >> I keep seeing this when I run ompi_info --all: >> >> ***

[OMPI devel] ompi_info "developer warning"

2017-06-02 Thread r...@open-mpi.org
I keep seeing this when I run ompi_info --all: ** *** DEVELOPER WARNING: A field in ompi_info output is too long and *** will appear poorly in the prettyprint output. *** *** Value: "MCA (disabled) pml monitoring" ***

[OMPI devel] Master MTT results

2017-06-01 Thread r...@open-mpi.org
Hey folks I scanned the nightly MTT results from last night on master, and the RTE looks pretty solid. However, there are a LOT of onesided segfaults occurring, and I know that will eat up people’s disk space. Just wanted to ensure folks were aware of the problem Ralph

Re: [OMPI devel] Time to remove Travis?

2017-06-01 Thread r...@open-mpi.org
I’d vote to remove it - it’s too unreliable anyway > On Jun 1, 2017, at 6:30 AM, Jeff Squyres (jsquyres) > wrote: > > Is it time to remove Travis? > > I believe that the Open MPI PRB now covers all the modern platforms that > Travis covers, and we have people actively

Re: [OMPI devel] mapper issue with heterogeneous topologies

2017-05-31 Thread r...@open-mpi.org
I don’t believe we check topologies prior to making that decision - this is why we provide map-by options. Seems to me that this oddball setup has a simple solution - all he has to do is set a mapping policy for that environment. Can even be done in the default mca param file. I wouldn’t

Re: [OMPI devel] Open MPI 3.x branch naming

2017-05-31 Thread r...@open-mpi.org
> On May 31, 2017, at 7:48 AM, Jeff Squyres (jsquyres) > wrote: > > On May 30, 2017, at 11:37 PM, Barrett, Brian via devel > wrote: >> >> We have now created a v3.0.x branch based on today’s v3.x branch. I’ve >> reset all outstanding v3.x PRs

Re: [OMPI devel] PMIX busted

2017-05-31 Thread r...@open-mpi.org
Sorry for the hassle... > On May 31, 2017, at 7:31 AM, George Bosilca <bosi...@icl.utk.edu> wrote: > > After removing all leftover files and redoing the autogen things went back to > normal. Sorry for the noise. > > George. > > > > On Wed, May 31,

Re: [OMPI devel] PMIX busted

2017-05-31 Thread r...@open-mpi.org
No - I just rebuilt it myself, and I don’t see any relevant MTT build failures. Did you rerun autogen? > On May 31, 2017, at 7:02 AM, George Bosilca wrote: > > I have problems compiling the current master. Anyone else has similar issues ? > > George. > > > CC

[OMPI devel] Please turn off MTT on v1.10

2017-05-30 Thread r...@open-mpi.org
The v1.10 series is closed and no new commits will be made to that branch. So please turn off any MTT runs you have scheduled for that branch - this will allow people to commit tests that will not run on the v1.10 series. Thanks Ralph ___ devel

[OMPI devel] Stale PRs

2017-05-26 Thread r...@open-mpi.org
Hey folks We’re seeing a number of stale PRs hanging around again - these are PRs that were submitted against master (in some cases, months ago) that cleared CI and were never committed. Could people please take a look at their PRs and either commit them or delete them? We are trying to get

Re: [OMPI devel] Updating the v1.10.7 tag

2017-05-19 Thread r...@open-mpi.org
n the future, since that release series is now > effectively done. > > More specifically: hopefully everyone does the "git tag -d ..." instructions > and this becomes a moot point. > > > >> On May 19, 2017, at 11:25 AM, r...@open-mpi.org wrote: >>

Re: [OMPI devel] Updating the v1.10.7 tag

2017-05-19 Thread r...@open-mpi.org
, and it is a good idea to do it. > On May 19, 2017, at 8:03 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > > On May 19, 2017, at 5:06 AM, r...@open-mpi.org wrote: >> >> $ git tag -d v1.10.7 >> $ git pull (or w

[OMPI devel] Updating the v1.10.7 tag

2017-05-19 Thread r...@open-mpi.org
Hi folks I apparently inadvertently tagged the wrong hash the other night when tagging v1.10.7. I have corrected it, but if you updated your clone _and_ checked out the v1.10.7 tag in the interim, you might need to manually delete the tag on your clone and re-pull. It’s trivial to do: $ git

Re: [OMPI devel] Combining Binaries for Launch

2017-05-15 Thread r...@open-mpi.org
So long as both binaries use the same OMPI version, I can’t see why there would be an issue. It sounds like you are thinking of running an MPI process on the GPU itself (instead of using an offload library)? People have done that before - IIRC, the only issue is trying to launch a process onto

Re: [OMPI devel] Socket buffer sizes

2017-05-15 Thread r...@open-mpi.org
Thanks - already done, as you say > On May 15, 2017, at 7:32 AM, Håkon Bugge wrote: > > Dear Open MPIers, > > > Automatic tuning of socket buffers has been in the linux kernel since > 2.4.17/2.6.7. That is some time ago. I remember, at the time, that we removed > the

Re: [OMPI devel] Quick help with OMPI_COMM_WORLD_LOCAL_RANK

2017-05-12 Thread r...@open-mpi.org
If you configure with --enable-debug, then you can set the following mca params on your cmd line: --mca plm_base_verbose 5 will show you the details of the launch --mca odls_base_verbose 5 will show you the details of the fork/exec > On May 12, 2017, at 10:30 AM, Kumar, Amit

Re: [OMPI devel] Quick help with OMPI_COMM_WORLD_LOCAL_RANK

2017-05-12 Thread r...@open-mpi.org
That’s a pretty ancient release, but a quick glance at the source code indicates that you should always see it when launched via mpirun, and never when launched via srun > On May 12, 2017, at 9:22 AM, Kumar, Amit wrote: > > Dear OpenMPI, > > Under what circumstances I

[OMPI devel] OMPI v1.10.7rc1 ready for evaluation

2017-05-12 Thread r...@open-mpi.org
Hi folks We want/need to release a final version of the 1.10 series that will contain all remaining cleanups. Please take a gander at it. https://www.open-mpi.org/software/ompi/v1.10/ Changes: 1.10.7 -- - Fix bug in TCP BTL that impacted

Re: [OMPI devel] orte-clean not cleaning left over temporary I/O files in /tmp

2017-05-08 Thread r...@open-mpi.org
What version of OMPI are you using? > On May 8, 2017, at 8:56 AM, Christoph Niethammer wrote: > > Hello > > According to the manpage "...orte-clean attempts to clean up any processes > and files left over from Open MPI jobs that were run in the past as well as > any

Re: [OMPI devel] Open MPI 3.x branch naming

2017-05-05 Thread r...@open-mpi.org
+1 Go for it :-) > On May 5, 2017, at 2:34 PM, Barrett, Brian via devel > wrote: > > To be clear, we’d do the move all at once on Saturday morning. Things that > would change: > > 1) nightly tarballs would rename from openmpi-v3.x--.tar.gz > to

Re: [OMPI devel] v3 branch - Problem with LSF

2017-05-05 Thread r...@open-mpi.org
I would suggest not bringing it over in isolation - we planned to do an update that contains a lot of related changes, including the PMIx update. Probably need to do that pretty soon given the June target. > On May 5, 2017, at 3:04 PM, Vallee, Geoffroy R. wrote: > > Hi, >

Re: [OMPI devel] remote spawn - have no children

2017-05-03 Thread r...@open-mpi.org
(s) are then responsible to send command to orted to start mpi > application? > Which event names should I search for? > > Thank you, > Justin > > - Original Message - >> From: r...@open-mpi.org >> To: "OpenMPI Devel" <devel@lists.open-mpi.org> &g

Re: [OMPI devel] remote spawn - have no children

2017-05-03 Thread r...@open-mpi.org
in the system. Note that the output has nothing to do with spawning your mpi_hello - it is solely describing the startup of the daemons. > On May 3, 2017, at 6:26 AM, r...@open-mpi.org wrote: > > The orte routed framework does that for you - there is an API for that > purpose. > >

Re: [OMPI devel] remote spawn - have no children

2017-05-03 Thread r...@open-mpi.org
The orte routed framework does that for you - there is an API for that purpose. > On May 3, 2017, at 12:17 AM, Justin Cinkelj wrote: > > Important detail first: I get this message from significantly modified Open > MPI code, so problem exists solely due to my mistake.

Re: [OMPI devel] openib oob module

2017-04-21 Thread r...@open-mpi.org
t; Subject: Re: [OMPI devel] openib oob module > > Folks, > > > fwiw, i made https://github.com/open-mpi/ompi/pull/3393 and it works for me > on a mlx4 cluster (Mellanox QDR) > > > Cheers, > > > Gilles > > > On 4/21/2017 1:31 AM, r...@open-mpi.org wr

Re: [OMPI devel] openib oob module

2017-04-20 Thread r...@open-mpi.org
enable oob verbose in my last test. Here is the updated output file. > > Thanks, > Shiqing > > From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of > r...@open-mpi.org > Sent: Thursday, April 20, 2017 4:29 PM > To: OpenMPI Devel > Subject: Re:

Re: [OMPI devel] openib oob module

2017-04-20 Thread r...@open-mpi.org
> Shiqing > > > From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of > r...@open-mpi.org > Sent: Thursday, April 20, 2017 3:49 PM > To: OpenMPI Devel > Subject: Re: [OMPI devel] openib oob module > > Hi Shiqing! > > Been a long time - hop

Re: [OMPI devel] openib oob module

2017-04-20 Thread r...@open-mpi.org
Hi Shiqing! Been a long time - hope you are doing well. I see no way to bring the oob module back now that the BTLs are in the OPAL layer - this is why it was removed as the oob is in ORTE, and thus not accessible from OPAL. Ralph > On Apr 20, 2017, at 6:02 AM, Shiqing Fan

Re: [OMPI devel] Program which runs wih 1.8.3, fails with 2.0.2

2017-04-19 Thread r...@open-mpi.org
Fully expected - if ORTE can’t start one or more daemons, then the MPI job itself will never be executed. There was an SGE integration issue in the 2.0 series - I fixed it, but IIRC it didn’t quite make the 2.0.2 release. In fact, I just checked and it did indeed miss that release. You have

  1   2   3   >