Re: [OMPI devel] Announcing Open MPI v5.0.0rc2

2022-01-10 Thread Ralph Castain via devel
k at the referenced code area to see if there needs to be some Cygwin-related tweak. Ralph > On Jan 9, 2022, at 11:09 PM, Marco Atzeri via devel > wrote: > > On 10.01.2022 06:50, Marco Atzeri wrote: >> On 09.01.2022 15:54, Ralph Castain via devel wrote: >>> Hi Ma

Re: [OMPI devel] Announcing Open MPI v5.0.0rc2

2022-01-09 Thread Ralph Castain via devel
Hi Marco Try the patch here (for the prrte 3rd-party subdirectory): https://github.com/openpmix/prrte/pull/1173 Ralph > On Jan 9, 2022, at 12:29 AM, Marco Atzeri via devel > wrote: > > On 01.01.2022 20:07, Barrett, Brian wrote: >> Marco - >> There are some patches that haven't made it to

Re: [OMPI devel] openmpi/pmix compatibility

2021-10-10 Thread Ralph Castain via devel
It was a bug (typo in the attribute name when backported from OMPI master) in OMPI 4.1.1 - it has been fixed. > On Oct 9, 2021, at 9:18 PM, Orion Poplawski via devel > wrote: > > It looks like openmpi 4.1.1 is not compatible with pmix 4.1.0 - is that > expected? > > In file included from

Re: [OMPI devel] Support timelines

2021-09-16 Thread Ralph Castain via devel
Answered on packager list - apologies that it didn't get answered there in a timely fashion. > On Sep 16, 2021, at 6:56 AM, Orion Poplawski via devel > wrote: > > Is there any documentation that would indicate how long the 4.0 (or any > particular release series) will be supported? This

[OMPI devel] New host/hostfile policy

2021-06-16 Thread Ralph Castain via devel
We've been struggling a bit lately with the problem of resolving multiple names for the same host. Part of the problem has been the need to minimize DNS resolves as systems were taking way too long to perform them, resulting in very long startup times. I've done my best to minimize this and

[OMPI devel] Auto-forward of envars

2021-04-14 Thread Ralph Castain via devel
PMIx and PRRTE both read and forward their respective default MCA parameters from default system and user-level param files: /etc/pmix-mca-params.conf /.pmix/mca-params.conf /etc/prte-mca-params.conf /.prte/mca-params.conf PMIx will also do the same thing for OMPI default system and user-level

Re: [OMPI devel] Slurm integration and rankfiles....

2021-03-21 Thread Ralph Castain via devel
can't possibly describe an arbitrary (rankfile) layout, so I was nervous about why they would be required if a rankfile was provided... Martyn On Mon, 15 Mar 2021 at 19:57, Ralph Castain via devel mailto:devel@lists.open-mpi.org> > wrote: Martyn? Why are you saying SLURM_TASKS_PE

[OMPI devel] Accessing HWLOC from inside OMPI

2021-03-17 Thread Ralph Castain via devel
Hi folks I've written a wiki page explaining how OMPI handles HWLOC from inside the OMPI code base starting with OMPI v5. The link is on the home page under the Developer Documents (Accessing the HWLOC topology tree from inside the MPI/OPAL layers):

Re: [OMPI devel] Slurm integration and rankfiles....

2021-03-15 Thread Ralph Castain via devel
22:19:09 +0000 > Ralph Castain via devel wrote: > >> Why would it not be set? AFAICT, Slurm is supposed to always set that >> envar, or so we've been told. > > Maybe confusion on the exact name? > > AFAIK slurm always sets SLURM_TASKS_PER_NODE but only sets > SLU

Re: [OMPI devel] Slurm integration and rankfiles....

2021-03-12 Thread Ralph Castain via devel
unset in the configuration. Martyn On Thu, 11 Mar 2021 at 16:09, Ralph Castain via devel mailto:devel@lists.open-mpi.org> > wrote: What version of Slurm is this? > On Mar 11, 2021, at 8:03 AM, Martyn Foster via devel > mailto:devel@lists.open-mpi.org> > wrote: > > Hi al

Re: [OMPI devel] Slurm integration and rankfiles....

2021-03-11 Thread Ralph Castain via devel
What version of Slurm is this? > On Mar 11, 2021, at 8:03 AM, Martyn Foster via devel > wrote: > > Hi all, > > Using a rather trivial example > mpirun -np 1 -rf rankfile ./HelloWorld > on a Slurm system; > -- > While

Re: [OMPI devel] mpirun alternative

2021-03-09 Thread Ralph Castain via devel
aving the independent MPI jobs leave a trace there, such that they can find each other and create the initial socket. 2. You could replace ssh/rsh with a no-op script (that returns success such that the mpirun process thinks it successfully started the processes), and then handcraft the environmen

Re: [OMPI devel] mpirun alternative

2021-03-05 Thread Ralph Castain via devel
I'm afraid that won't work - there is no way for the job to "self assemble". One could create a way to do it, but it would take some significant coding in the guts of OMPI to get there. On Mar 5, 2021, at 9:40 AM, Gabriel Tanase via devel mailto:devel@lists.open-mpi.org> > wrote: Hi all, I

[OMPI devel] Remove stale OPAL dss and opal_tree code

2021-02-18 Thread Ralph Castain via devel
Hi folks I'm planning on removing the OPAL dss (pack/unpack) code as part of my work to reduce the code base I historically supported. The pack/unpack functionality is now in PMIx (has been since v3.0 was released), and so we had duplicate capabilities spread across OPAL and PRRTE. I have

Re: [OMPI devel] [EXTERNAL] Support for AMD M100?

2021-02-11 Thread Ralph Castain via devel
FWIW: now that I am out of Intel, we are planning on upping the PMIx support for GPUs in general, so I expect we'll be including this one. Support will include providing info on capabilities (for both local and remote devices), distances from every proc to each of its local GPUs, affinity

Re: [OMPI devel] configure problem on master

2021-02-04 Thread Ralph Castain via devel
Sounds like I need to resync the PMIx lustre configury with the OMPI one - I'll do that. On Feb 4, 2021, at 11:56 AM, Gabriel, Edgar via devel mailto:devel@lists.open-mpi.org> > wrote: I have a weird problem running configure on master on our cluster. Basically, configure fails when I request

Re: [OMPI devel] HWLOC duplication relief

2021-02-03 Thread Ralph Castain via devel
021, at 8:09 AM, Ralph Castain via devel > wrote: > > What if we do this: > > - if you are using PMIx v4.1 or above, then there is no problem. Call > PMIx_Load_topology and we will always return a valid pointer to the topology, > subject to the caveat that all members

Re: [OMPI devel] HWLOC duplication relief

2021-02-03 Thread Ralph Castain via devel
see short of asking every library to provide us with the ability to pass hwloc_topology_t down to them. Outside of that obvious answer, I suppose we could put the hwloc_topology_t address into the environment and have them connect that way? > On Feb 3, 2021, at 7:36 AM, Ralph Castain via

Re: [OMPI devel] HWLOC duplication relief

2021-02-03 Thread Ralph Castain via devel
> using the topology pointer in the process must use the same hwloc > version (e.g. not 2.0 vs 2.4). shmem_adopt() verifies that the exported > and importer are compatible. But passing the topology pointer doesn't > provide any way to verify that the caller doesn't use its own > incom

[OMPI devel] HWLOC duplication relief

2021-02-02 Thread Ralph Castain via devel
Hi folks Per today's telecon, here is a link to a description of the HWLOC duplication issue for many-core environments and methods by which you can mitigate the impact. https://openpmix.github.io/support/faq/avoid-hwloc-dup George: for lower-level libs like treematch or HAN, you might want

Re: [OMPI devel] mpirun 4.1.0 segmentation fault

2021-02-01 Thread Ralph Castain via devel
It could be a Slurm issue, but I'm seeing one thing that makes me suspicious that this might be a problem reported elsewhere. Andrej - what version of Slurm are you using here? > On Feb 1, 2021, at 5:34 PM, Gilles Gouaillardet via devel > wrote: > > Andrej, > > that really looks like a

Re: [OMPI devel] mpirun 4.1.0 segmentation fault

2021-02-01 Thread Ralph Castain via devel
The Slurm launch component would only disqualify itself if it didn't see a Slurm allocation - i.e., there is no SLURM_JOBID in the environment. If you want to use mpirun in a Slurm cluster, you need to: 1. get an allocation from Slurm using "salloc" 2. then run "mpirun" Did you remember to

Re: [OMPI devel] mpirun 4.1.0 segmentation fault

2021-02-01 Thread Ralph Castain via devel
Andrej I fail to understand why you continue to think that PMI has anything to do with this problem. I see no indication of a PMIx-related issue in anything you have provided to date. In the output below, it is clear what the problem is - you locked it to the "slurm" launcher (with -mca plm

Re: [OMPI devel] mpirun 4.1.0 segmentation fault

2021-01-31 Thread Ralph Castain via devel
Just trying to understand - why are you saying this is a pmix problem? Obviously, something to do with mpirun is failing, but I don't see any indication here that it has to do with pmix. Can you add --enable-debug to your configure line and inspect the core file from the dump? > On Jan 31,

Re: [OMPI devel] Submodule change

2021-01-16 Thread Ralph Castain via devel
, Ralph Castain via devel mailto:devel@lists.open-mpi.org> > wrote: Just a point of clarification since there was a comment on the PR that made this change. This is _not_ a permanent situation, nor was it done because PMIx had achieved some magic milestone. We changed the submodule to

Re: [OMPI devel] Submodule change

2020-12-17 Thread Ralph Castain via devel
prefer, but we can do either. On Dec 17, 2020, at 9:03 AM, Ralph Castain via devel mailto:devel@lists.open-mpi.org> > wrote: Hi folks I just switched OMPI's PMIx submodule to point at the new v4.0 branch. When you want to update, you may need to do the following after a "git pull"

[OMPI devel] Submodule change

2020-12-17 Thread Ralph Castain via devel
Hi folks I just switched OMPI's PMIx submodule to point at the new v4.0 branch. When you want to update, you may need to do the following after a "git pull": git submodule sync git submodule update --init --recursive --remote to get yourself onto the proper branch. Ralph

Re: [OMPI devel] OpenMPI with Slurm support

2020-08-05 Thread Ralph Castain via devel
e/z04/ompi_slurm >> --with-pmix=/lustre/z04/pmix --with-pmi=/lustre/z04/pmix --with-slurm >> --with-cuda=/lustre/sw/nvidia/hpcsdk/Linux_x86_64/cuda/10.1/include/ >> --with-libevent=/lustre/z04/pmix/libevent/ >> >> and I can see the that header file in the pmix built: &g

Re: [OMPI devel] OpenMPI with Slurm support

2020-08-05 Thread Ralph Castain via devel
w/nvidia/hpcsdk/Linux_x86_64/cuda/10.1/include/ > --with-libevent=/lustre/z04/pmix/libevent/ > > and I can see the that header file in the pmix built: > $ls /lustre/z04/pmix/include/ > pmi2.h pmix_common.h pmix.h pmix_server.h pmix_version.h > pmi.h pmix_extend.h

Re: [OMPI devel] OpenMPI with Slurm support

2020-08-05 Thread Ralph Castain via devel
For OMPI, I would recommend installing PMIx: https://github.com/openpmix/openpmix/releases/tag/v3.1.5 > On Aug 5, 2020, at 12:40 AM, Luis Cebamanos via devel > wrote: > > Hi all, > > We are trying to install OpenMPI with Slurm support on a recently > upgraded system. Unfortunately libpmi,

Re: [OMPI devel] Libevent changes

2020-07-10 Thread Ralph Castain via devel
We forgot to discuss this at the last telecon - GP, would you please ensure it is on next week's agenda? FWIW: I agree that this should not have been committed. We need to stop doing local patches to public packages and instead focus on getting them into the upstream (which has still not been

Re: [OMPI devel] Announcing Open MPI v4.0.4rc2

2020-06-06 Thread Ralph Castain via devel
I would have hoped that the added protections we put into PMIx would have resolved ds12 as well as ds21, but it is possible those changes didn't get into OMPI v4.0.x. Regardless, I think you should be just fine using the gds/hash component for cygwin. I would suggest simply "locking" that param

Re: [OMPI devel] OMPI master fatal error in pml_ob1_sendreq.c

2020-05-04 Thread Ralph Castain via devel
lviewtech.com> All Done! mic:/amd/home/jdelsign/PMIx> Does that mean there is something wrong microway2? If that were the case, then why would it ever work? On 2020-05-04 12:08, Ralph Castain via devel wrote: What happens if you run your "3 procs on two nodes" case using just microway1

Re: [OMPI devel] OMPI master fatal error in pml_ob1_sendreq.c

2020-05-04 Thread Ralph Castain via devel
port 1024 allowed to connect to ?   George. On Mon, May 4, 2020 at 11:36 AM John DelSignore via devel mailto:devel@lists.open-mpi.org> > wrote: Inline below... On 2020-05-04 11:09, Ralph Castain via devel wrote: Staring at this some more, I do have the following questions: * in your fir

Re: [OMPI devel] OMPI master fatal error in pml_ob1_sendreq.c

2020-05-04 Thread Ralph Castain via devel
Good to confirm - thanks! This does indeed look like an issue in the btl/tcp component's reachability code. On May 4, 2020, at 8:34 AM, John DelSignore mailto:jdelsign...@perforce.com> > wrote: Inline below... On 2020-05-04 11:09, Ralph Castain via devel wrote: Staring at this some m

Re: [OMPI devel] OMPI master fatal error in pml_ob1_sendreq.c

2020-05-04 Thread Ralph Castain via devel
Staring at this some more, I do have the following questions: * in your first case, it looks like "prte" was started from microway3 - correct? * in the second case, that worked, it looks like "mpirun" was executed from microway1 - correct? * in the third case, you state that "mpirun" was again

[OMPI devel] ORTE->PRRTE: some consequences to communicate to users

2020-04-28 Thread Ralph Castain via devel
So here is an interesting consequence of moving from ORTE to PRRTE. In ORTE, you could express any mapping policy as an MCA param - e.g., the following: OMPI_MCA_rmaps_base_mapping_policy=core OMPI_MCA_rmaps_base_display_map=1 would be the equivalent of a cmd line that included "--map-by core

Re: [OMPI devel] MPI_Info args to spawn - resolving deprecated values?

2020-04-24 Thread Ralph Castain via devel
020, at 9:51 AM, Ralph Castain via devel mailto:devel@lists.open-mpi.org> > wrote: > > We have deprecated a number of cmd line options (e.g., bynode, npernode, > npersocket) - what do we want to do about their MPI_Info equivalents when > calling comm_spawn? > > Do I sil

[OMPI devel] Mapping/ranking/binding defaults for OMPI v5

2020-04-24 Thread Ralph Castain via devel
I just want to confirm the default behaviors we want for OMPI v5. This is what we have currently set: * if the user specifies nothing: if np <=2: map-by core, rank-by core, bind-to core if np > 2: map-by socket, rank-by core, bind-to socket * if the user only specifies map-by:

[OMPI devel] Cross-job shared memory support

2020-04-12 Thread Ralph Castain via devel
There was a recent discussion regarding whether or not two job could communicate via shared memory. I recalled adding support for this, but thought that Nathan needed to do something in "vader" to enable it. Turns out I remembered correctly about adding the support - but I believe "vader"

[OMPI devel] MPI_Info args to spawn - resolving deprecated values?

2020-04-08 Thread Ralph Castain via devel
We have deprecated a number of cmd line options (e.g., bynode, npernode, npersocket) - what do we want to do about their MPI_Info equivalents when calling comm_spawn? Do I silently convert them? Should we output a deprecation warning? Return an error? Ralph

[OMPI devel] External libevent, hwloc, pmix intertwined

2020-04-02 Thread Ralph Castain via devel
Hey folks I have been fighting the build system for the last two days and discovered something a little bothersome. It appears that there are only two ways to build OMPI: * with all three of libevent, hwloc, and pmix internal * with all three of libevent, hwloc, and pmix external In other

Re: [OMPI devel] Taking advantage of PMIx: Hierarchical collective support

2020-03-22 Thread Ralph Castain via devel
for the specified proc, one per NIC, ordered as above. I'll be posting some example code illustrating the use of all these in the near future and will alert anyone interested when I do. Ralph > On Mar 22, 2020, at 11:36 AM, Ralph Castain via devel > wrote: > > I'll be writing a ser

[OMPI devel] Taking advantage of PMIx: Hierarchical collective support

2020-03-22 Thread Ralph Castain via devel
I'll be writing a series of notes containing thoughts on how to exploit PMIx-provided information, especially covering aspects that might not be obvious (e.g., attributes that might not be widely known). This first note covers the topic of collective optimization. PMIx provides network-related

Re: [OMPI devel] Add multi nic support for ofi MTL using hwloc

2020-03-20 Thread Ralph Castain via devel
using the topology and close that PR as "unneeded" On Mar 20, 2020, at 9:35 AM, Barrett, Brian mailto:bbarr...@amazon.com> > wrote: But does raise the question; should we call get_topology() for belt and suspenders in OFI?  Or will that cause your concerns from the start of

Re: [OMPI devel] Add multi nic support for ofi MTL using hwloc

2020-03-20 Thread Ralph Castain via devel
https://github.com/open-mpi/ompi/pull/7547 fixes it and has an explanation as to why it wasn't catching us elsewhere in the MPI code On Mar 20, 2020, at 9:22 AM, Ralph Castain via devel mailto:devel@lists.open-mpi.org> > wrote: Odd - the topology object gets filled in during init, well

Re: [OMPI devel] Add multi nic support for ofi MTL using hwloc

2020-03-20 Thread Ralph Castain via devel
e, but wanted to figure out if this > was expected and, if so, if we had options for getting the right data from > PMIx early enough in the process. Sorry, this is part of the runtime changes > I haven't been following closely enough. > > Brian > > -Original Message---

Re: [OMPI devel] Add multi nic support for ofi MTL using hwloc

2020-03-18 Thread Ralph Castain via devel
iam > > On 3/17/20, 11:54 PM, "devel on behalf of Ralph Castain via devel" > > wrote: > >CAUTION: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know >

[OMPI devel] Add multi nic support for ofi MTL using hwloc

2020-03-18 Thread Ralph Castain via devel
Hey folks I saw the referenced "new feature" on the v5 feature spreadsheet and wanted to ask a quick question. Is the OFI MTL going to be doing its own hwloc topology discovery for this feature? Or is it going to access the topology info via PMIx and the OPAL hwloc abstraction? I ask because

Re: [OMPI devel] New coll component

2020-03-05 Thread Ralph Castain via devel
You have a missing symbol in your component: undefined symbol: ompi_coll_libpnbc_osc_neighbor_alltoall_init (ignored) On Mar 5, 2020, at 5:57 AM, Luis Cebamanos via devel mailto:devel@lists.open-mpi.org> > wrote: Hi folks, We are developing a (hopefully) new component for the coll

Re: [OMPI devel] Today's OMPI master is failing with "ompi_mpi_init: ompi_rte_init failed"

2020-03-04 Thread Ralph Castain via devel
I checked this with a fresh clone and everything is working fine, so I expect this is a stale submodule issue again. I've asked John to check. > On Mar 4, 2020, at 8:05 AM, John DelSignore via devel > wrote: > > Hi, > > I've been working with Ralph to try to get the PMIx debugging interfaces

Re: [OMPI devel] Adding a new RAS module

2020-02-29 Thread Ralph Castain via devel
You'll have to do it in the PRRTE project: https://github.com/openpmix/prrte OMPI has removed the ORTE code base and replaced it with PRRTE, which is effectively the same code but taken from a repo that multiple projects support. You can use any of the components in there as a template - I

[OMPI devel] Github is sad today

2020-02-27 Thread Ralph Castain via devel
Just an FYI: GitHub is degraded today, especially on the webhooks and actions that we depend upon for things like CI. Hopefully, they will get it fixed soon. Ralph

[OMPI devel] Conflicting definitions

2020-02-20 Thread Ralph Castain via devel
Hey folks Now that we have multiple projects sharing a build system, we need to be careful how we name our #if's. For example, using: #ifdef MCA_FOO_BAR #define MCA_FOO_BAR ... might have been fine in the past, but you wind up clobbering another project that also has a "foo_bar" component.

[OMPI devel] Deprecated configure options in OMPI v5

2020-02-19 Thread Ralph Castain via devel
What do we want to do with the following options? These have either been renamed (changing from "orte..." to a "prrte" equivalent) or are no longer valid: --enable-orterun-prefix-by-default --enable-mpirun-prefix-by-default These are now --enable-prte-prefix-by-default. Should I error out via

[OMPI devel] OMPI Developer Meeting for Feb 2020 - Minutes

2020-02-19 Thread Ralph Castain via devel
Hi folks I integrated the minutes from this week's meeting into the meeting's wiki page: https://github.com/open-mpi/ompi/wiki/Meeting-2020-02 Feel free to update and/or let me know of errors or omissions Ralph

[OMPI devel] Command line and envar processing

2020-02-19 Thread Ralph Castain via devel
Hey folks Based on the discussion at the OMPI developer's meeting this week, I have created the following wiki page explaining how OMPI's command line and envars will be processed for OMPI v5: https://github.com/open-mpi/ompi/wiki/Command-Line-Envar-Parsing Feel free to comment and/or ask

[OMPI devel] Fix your MTT scripts!

2020-02-09 Thread Ralph Castain via devel
We are seeing many failures on MTT because of errors on the cmd line. Note that by request of the OMPI community, PRRTE is strictly enforcing the Posix "dash" syntax: * a single-dash must be used only for single-character options. You can combine the single-character options like "-abc" as

[OMPI devel] ORTE has been removed!

2020-02-08 Thread Ralph Castain via devel
FYI: pursuant to the objectives outline last year, I have committed PR #7202 and removed ORTE from the OMPI repository. It has been replaced with a PRRTE submodule pointed at the PRRTE master branch. At the same tie, we replaced the embedded PMIx code tree with a submodule pointed to the PMIx

Re: [OMPI devel] Git submodules are coming

2020-02-07 Thread Ralph Castain via devel
FWIW: I have major problems when rebasing if that rebase runs across the point where a submodule is added. Every file that was removed and replaced by the submodule generates a conflict. Only solution I could find was to whack the subdirectory containing the files-to-be-replaced and work thru

Re: [OMPI devel] 3.1.6rc2: Cygwin fifo warning

2020-02-03 Thread Ralph Castain via devel
It is the latter one it is complaining about: > /tmp/ompi.LAPTOP-82F08ILC.197609/pid.93/0/debugger_attach_fifo I have no idea why it is complaining. > On Feb 3, 2020, at 2:03 PM, Marco Atzeri via devel > wrote: > > Am 03.02.2020 um 18:15 schrieb Ralph Castain via dev

Re: [OMPI devel] 3.1.6rc2: Cygwin fifo warning

2020-02-03 Thread Ralph Castain via devel
Hi Marco mpirun isn't trying to run a debugger. It is opening a fifo pipe in case a debugger later wishes to attach to the running job - it is used by an MPIR-based debugger to let mpirun know that it is attaching. My guess is that the code is attempting to create the fifo in an unacceptable

Re: [OMPI devel] Git submodules are coming

2020-01-08 Thread Ralph Castain via devel
Actually, I take that back - making a separate PR to change the opal/pmix embedded component to a submodule was way too painful. I simply added it to the existing #7202. > On Jan 7, 2020, at 1:33 PM, Ralph Castain via devel > wrote: > > Just an FYI: there will soon be THREE PRs

Re: [OMPI devel] Git submodules are coming

2020-01-07 Thread Ralph Castain via devel
Just an FYI: there will soon be THREE PRs introducing submodules - I am breaking #7202 into two pieces. The first will replace opal/pmix with direct use of PMIx everywhere and replace the embedded pmix component with a submodule pointing to PMIx master, and the second will replace ORTE with

Re: [OMPI devel] PMIX ERROR: INIT spurious message on 3.1.5

2020-01-07 Thread Ralph Castain via devel
I was able to create the fix - it is in OMPI master. I have provided a patch for OMPI v3.1.5 here: https://github.com/open-mpi/ompi/pull/7276 Ralph > On Jan 3, 2020, at 6:04 PM, Ralph Castain via devel > wrote: > > I'm afraid the fix uncovered an issue in the ds

Re: [OMPI devel] PMIX ERROR: INIT spurious message on 3.1.5

2020-01-03 Thread Ralph Castain via devel
I'm afraid the fix uncovered an issue in the ds21 component that will require Mellanox to address it - unsure of the timetable for that to happen. > On Jan 3, 2020, at 6:28 AM, Ralph Castain via devel > wrote: > > I committed something upstream in PMIx master and v3.1 that proba

Re: [OMPI devel] PMIX ERROR: INIT spurious message on 3.1.5

2020-01-03 Thread Ralph Castain via devel
wrote: > > Is there a configure test we can add to make this kind of behavior be the > default? > > >> On Jan 1, 2020, at 11:50 PM, Marco Atzeri via devel >> wrote: >> >> thanks Ralph >> >> gds = ^ds21 >> works as expected >> >&

Re: [OMPI devel] Reachable framework integration

2020-01-02 Thread Ralph Castain via devel
interfaces? On Jan 2, 2020, at 9:35 AM, George Bosilca mailto:bosi...@icl.utk.edu> > wrote: Ralph, I think the first use is still pending reviews (more precisely my review) at  https://github.com/open-mpi/ompi/pull/7134.   George. On Wed, Jan 1, 2020 at 9:53 PM Ralph Castain via dev

[OMPI devel] Reachable framework integration

2020-01-01 Thread Ralph Castain via devel
Hey folks I can't find where the opal/reachable framework is being used in OMPI. I would like to utilize it in the PRRTE oob/tcp component, but need some guidance on how to do so, or pointers to an example. Ralph

Re: [OMPI devel] PMIX ERROR: INIT spurious message on 3.1.5

2019-12-31 Thread Ralph Castain via devel
PMIx likely defaults to the ds12 component - which will work fine but a tad slower than ds21. It is likely something to do with the way cygwin handles memory locks. You can avoid the error message by simply adding "gds = ^ds21" to your default MCA param file (the pmix one - should be named

[OMPI devel] ORTE replacement

2019-12-25 Thread Ralph Castain via devel
Hi folks The move to replace ORTE with PRRTE is now ready to go (the OSHMEM team needs to fix something in that project). This means that all further development activity and/or PRs involving ORTE should be transferred to the PRRTE project (https://github.com/openpmix/prrte). Existing PRs that

Re: [OMPI devel] Open MPI v4.0.1: Process is hanging inside MPI_Init() when debugged with TotalView

2019-11-13 Thread Ralph Castain via devel
ting of those paths is why we are moving to PMIx-based tool support in OMPI v5. HTH Ralph On Nov 13, 2019, at 10:40 AM, Ralph Castain via devel mailto:devel@lists.open-mpi.org> > wrote: Agreed and understood. My point was only that I'm not convinced the problem was "fixed" as i

Re: [OMPI devel] Open MPI v4.0.1: Process is hanging inside MPI_Init() when debugged with TotalView

2019-11-13 Thread Ralph Castain via devel
il. I think that unless there's already a problem in the code, the debugger should not be able to interfere at all. Cheers, John D. On 11/12/19 6:51 PM, Ralph Castain via devel wrote: Again, John, I'm not convinced your last statement is true. However, I think it is "good enough" fo

Re: [OMPI devel] Open MPI v4.0.1: Process is hanging inside MPI_Init() when debugged with TotalView

2019-11-12 Thread Ralph Castain via devel
thread polling on the flag to see the update - something the volatile keyword doesn't do on its own. I think it's also much cleaner as it eliminates an arbitrary sleep from the code - which I see as a good thing as well. "Ralph Castain via devel" ---11/12/2019 09:24:23 AM---> On Nov

Re: [OMPI devel] Open MPI v4.0.1: Process is hanging inside MPI_Init() when debugged with TotalView

2019-11-12 Thread Ralph Castain via devel
ing for a signal from another thread to wake up } \ pthread_mutex_unlock(); \ }while(0); Is much more standard when dealing with threads updating a shared variable - and might lead to a more expected result in this case. On the other end, this would require the thread updating this variable to: pthread_mute

Re: [OMPI devel] Open MPI v4.0.1: Process is hanging inside MPI_Init() when debugged with TotalView

2019-11-12 Thread Ralph Castain via devel
something the volatile keyword doesn't do on its own. I think it's also much cleaner as it eliminates an arbitrary sleep from the code - which I see as a good thing as well. "Ralph Castain via devel" ---11/12/2019 09:24:23 AM---> On Nov 11, 2019, at 4:53 PM, Gilles Gouaillardet via

Re: [OMPI devel] Open MPI v4.0.1: Process is hanging inside MPI_Init() when debugged with TotalView

2019-11-12 Thread Ralph Castain via devel
ndled by PMIx, at least in the master branch) ? IIRC, that progress thread only runs if explicitly asked to do so by MCA param. We don't need that code any more as PMIx takes care of it. > > Cheers, > > Gilles > > On 11/12/2019 9:27 AM, Ralph Castain via devel wrote: >&

Re: [OMPI devel] Open MPI v4.0.1: Process is hanging inside MPI_Init() when debugged with TotalView

2019-11-11 Thread Ralph Castain via devel
Hi John Sorry to say, but there is no way to really answer your question as the OMPI community doesn't actively test MPIR support. I haven't seen any reports of hangs during MPI_Init from any release series, including 4.x. My guess is that it may have something to do with the debugger

Re: [MTT devel] mtt/server/gds

2019-07-25 Thread Ralph Castain
-devel-boun...@lists.open-mpi.org> > on behalf of Ralph Castain mailto:r...@open-mpi.org> > Reply-To: Development list for the MPI Testing Tool mailto:mtt-devel@lists.open-mpi.org> > Date: Thursday, July 25, 2019 at 12:13 PM To: Development list for the MPI Testing Tool mailto:mtt-

Re: [MTT devel] mtt/server/gds

2019-07-25 Thread Ralph Castain
What "gds" directory are you talking about? Can you provide the full path to it? On Jul 25, 2019, at 11:04 AM, Rezanka, Deb via mtt-devel mailto:mtt-devel@lists.open-mpi.org> > wrote: Does anyone know if the gds directory is being used at all? If not, could we get rid of it.  -- Deb Rezanka  B

[OMPI devel] Network support in OMPI

2019-07-24 Thread Ralph Castain via devel
Hi folks I mentioned this very briefly at the Tues telecon, but didn't explain it well as there just wasn't adequate time available. With the recent updates of the embedded PMIx code, OMPI's mpirun now has the ability to fully support pre-launch network resource assignment for processes. This

Re: [OMPI devel] [OMPI] issue with mpirun

2019-07-12 Thread Ralph Castain via devel
rinted only once by the second case. On Fri, Jul 12, 2019 at 6:00 PM Ralph Castain via devel mailto:devel@lists.open-mpi.org> > wrote: Afraid I don't know anything about that program, but it looks like it is printing the same number of times in both cases. It only appears to be more in th

Re: [OMPI devel] [OMPI] issue with mpirun

2019-07-12 Thread Ralph Castain via devel
mailto:cs15mtech11...@iith.ac.in> > wrote: Thanks, Ralph. Why is the output of the program(mm-llvm.out) being run is printed only once, while the mpirun from intel prints as many times as mentioned in the command line?  On Thu, Jul 11, 2019 at 11:08 PM Ralph Castain via devel mailto:devel@lists.open-m

Re: [OMPI devel] [OMPI] issue with mpirun

2019-07-11 Thread Ralph Castain via devel
Because OMPI binds to core by default when np=2. If you have an OpenMP process, you want to add “--bind-to numa" to your mpirun cmd line. On Jul 11, 2019, at 10:28 AM, Dangeti Tharun kumar via devel mailto:devel@lists.open-mpi.org> > wrote: Hi Devs, I have build openmpi with LLVM-8

[OMPI devel] Jenkins setup

2016-07-22 Thread Ralph Castain
Hi folks I’m setting up a local Jenkins server on my box and could use some directions. I have Jenkins up, and have pulled down the jenkinsci/ghprb-plugin.git repo, but I can’t find on our wiki pages the instructions on how to configure Jenkins jobs for this purpose, or how to setup the “hook”

Re: [OMPI devel] singleton broken on master

2016-07-21 Thread Ralph Castain
Fix included in PR https://github.com/open-mpi/ompi/pull/1897 > On Jul 21, 2016, at 5:34 AM, Gilles Gouaillardet > wrote: > > Ralph, > > I noted singleton are broken on master. > git bisect points to the commit in which PMIx_tool were introduced. > if you

Re: [OMPI devel] about Mellanox Jenkins

2016-07-21 Thread Ralph Castain
b71f0735a83edc45178f2 > if you care). I just filed a v2.x PR to get the fix over there, too. > > However, it looks like Travis doesn't merge to current HEAD when it's doing > building, so existing PRs -- if they are not rebased -- won't see the Travis > fix. > > >>

Re: [OMPI devel] about Mellanox Jenkins

2016-07-21 Thread Ralph Castain
might be. > > 2016-07-21 21:23 GMT+06:00 Ralph Castain <r...@open-mpi.org > <mailto:r...@open-mpi.org>>: > I’m checking this - could be something to do with the recent PMIx update > >> On Jul 21, 2016, at 8:21 AM, Artem Polyakov <artpo...@gmail.com &g

Re: [OMPI devel] about Mellanox Jenkins

2016-07-21 Thread Ralph Castain
I’m checking this - could be something to do with the recent PMIx update > On Jul 21, 2016, at 8:21 AM, Artem Polyakov wrote: > > I see the same error with `sm,self` and `vader,self` in the PR > https://github.com/open-mpi/ompi/pull/1883 >

Re: [MTT devel] Missing the teleconf today

2016-07-20 Thread Ralph Castain
Crumb - we need to chat about the migration issue, so let me know when we can connect > On Jul 20, 2016, at 9:31 AM, Josh Hursey wrote: > > I have to miss the teleconf today. Let me know if there is anything that > needs my immediate attention. I should be on next week.

[OMPI devel] autogen.pl broken on master

2016-07-19 Thread Ralph Castain
When trying to run autogen.pl --no-ompi --no-oshmem: configure.ac:1307: error: m4_require: circular dependency of AC_LANG_COMPILER(Fortran) ../../lib/autoconf/lang.m4:329: AC_LANG_COMPILER_REQUIRE is expanded from... ../../lib/autoconf/general.m4:2678: AC_LINK_IFELSE is expanded from...

Re: [OMPI devel] OSHMEM out-of-date?

2016-07-18 Thread Ralph Castain
Sorry - this is on today’s master > On Jul 17, 2016, at 8:31 PM, Artem Polyakov <artpo...@gmail.com> wrote: > > What is it? What repository? > > понедельник, 18 июля 2016 г. пользователь Ralph Castain написал: > In file included from > ../../../../oshmem/shmem/fort

[OMPI devel] OSHMEM out-of-date?

2016-07-18 Thread Ralph Castain
In file included from ../../../../oshmem/shmem/fortran/prototypes_shmem.h:14:0, from ../../../../oshmem/shmem/fortran/bindings.h:15, from pshmem_put_f.c:13: pshmem_put_f.c: In function ‘shmem_put_f’: ../../../../oshmem/shmem/fortran/shmem_fortran_pointer.h:15:28:

Re: [OMPI devel] MPI_Comm_spawn broken on master on RHEL7

2016-07-16 Thread Ralph Castain
I found the reason for the notification and fixed that as well - should all be done now > On Jul 16, 2016, at 10:37 AM, Ralph Castain <r...@open-mpi.org> wrote: > > Kewl - thanks! I will take care of this, but to me the most pressing issue is > why this event notification

Re: [OMPI devel] MPI_Comm_spawn broken on master on RHEL7

2016-07-16 Thread Ralph Castain
gt; in send_notification() from orted_submit.c, info is > OPAL_PMIX_EVENT_NON_DEFAULT, but in pmix2x.c and pmix_ext20.c, > PMIX_EVENT_NON_DEFAULT is tested. > If I use OPAL_PMIX_EVENT_NON_DEFAULT in pmix*, that fixes the issue > > Cheers, > > Gilles > > On Sund

Re: [OMPI devel] MPI_Comm_spawn broken on master on RHEL7

2016-07-16 Thread Ralph Castain
register a handler for this event (so nondefault is true and > ompi_errhandler_callback is not invoked here) ? > > Cheers, > > Gilles > > On Friday, July 15, 2016, Ralph Castain <r...@open-mpi.org > <mailto:r...@open-mpi.org>> wrote: >

Re: [OMPI devel] MPI_Comm_spawn broken on master on RHEL7

2016-07-15 Thread Ralph Castain
stigate this until Monday, but so far, my guess is > that if the constructor is fixed, then RHEL6 will fail like RHEL7 ... > > fwiw, the intercomm_create used to fail in Cisco mtt because of too many > tasks and no over subscription, now it fails because of this bug. > > Cheer

Re: [OMPI devel] MPI_Comm_spawn broken on master on RHEL7

2016-07-15 Thread Ralph Castain
That would break debugger attach. Sounds to me like it’s just an uninitialized variable for in_event_hdlr? > On Jul 15, 2016, at 1:20 AM, Gilles Gouaillardet wrote: > > Ralph, > > i noticed MPI_Comm_spawn is broken on master and on RHEL7 > > for some reason i cannot yet

Re: [OMPI devel] 2.0.0rc4 Crash in MPI_File_write_all_end

2016-07-13 Thread Ralph Castain
Hmmm…I see where the singleton on master might be broken - will check later today > On Jul 13, 2016, at 11:37 AM, Eric Chamberland > wrote: > > Hi Howard, > > ok, I will wait for 2.0.1rcX... ;) > > I've put in place a script to download/compile

[OMPI devel] RFC: RML changes

2016-07-10 Thread Ralph Castain
We have provided an early “preview” of an upcoming change to the ORTE RML framework: https://github.com/open-mpi/ompi/pull/1857 This change introduces a new “ofi” RML component that enables ORTE to communicate across any libfabric-supported

  1   2   3   4   5   6   7   8   9   10   >