Hi,
We see this on our cluster as well — we traced it to because Python loads
shared library extensions using RTLD_LOCAL.
The Python module (mpi4py?) has a dependency on libmpi.so, which in turn has a
dependency on libhcoll.so. So the Python module is being loaded with
RTLD_LOCAL, anything
memory).
See https://github.com/open-mpi/ompi/issues/8170 and
https://github.com/openpmix/prrte/pull/1141
Brice
Le 11/11/2021 à 05:33, Ben Menadue via devel a écrit :
> Hi,
>
> Quick question: what's the equivalent of "--map-by numa" for the new
> PRRTE-based r
Hi,
Quick question: what's the equivalent of "--map-by numa" for the new
PRRTE-based runtime for v5.0? I can see "package" and "l3cache" in the
help, which are close, but don't quite match "numa" for our system.
In more detail...
We have dual-socket CLX- and SKL-based nodes with sub-NUMA
Hi,
The v4.0.2 tag in GitHub is broken at the moment -- trying to go to it
just takes you to the v4.0.2 _branch_, which looks to be a separate,
much more recent fork from master:
https://github.com/open-mpi/ompi/tree/v4.0.2
Cheers,
Ben
Nov 2018, at 12:09 pm, Ben Menadue <ben.mena...@nci.org.au> wrote:HI Gilles,On 2 Nov 2018, at 11:03 am, Gilles Gouaillardet <gil...@rist.or.jp> wrote:I noted the stack traces refers opal_cuda_memcpy(). Is this issue specific to CUDA environments ?No, this is just on normal CPU-only no
HI Gilles,
> On 2 Nov 2018, at 11:03 am, Gilles Gouaillardet wrote:
> I noted the stack traces refers opal_cuda_memcpy(). Is this issue specific to
> CUDA environments ?
No, this is just on normal CPU-only nodes. But memcpy always goes through
opal_cuda_memcpy when CUDA support is enabled,
Hi,
One of our users is reporting an issue using MPI_Allgatherv with a large
derived datatype — it segfaults inside OpenMPI. Using a debug build of OpenMPI
3.1.2 produces a ton of messages like this before the segfault:
[r3816:50921] ../../../../../opal/datatype/opal_datatype_pack.h:53
Hi Jeff,
What’s the replacement that it should use instead? I’m pretty sure oob/ud is
being picked by default on our IB cluster. Or is oob/tcp good enough?
Cheers,
Ben
> On 20 Jun 2018, at 5:20 am, Jeff Squyres (jsquyres) via devel
> wrote:
>
> We talked about this on the webex today, but
(since the problem is different in
> the various releases) in the next few days that points to the problems.
>
> Comm_spawn is okay, FWIW
>
> Ralph
>
>
>> On May 21, 2018, at 8:00 PM, Ben Menadue <ben.mena...@nci.org.au
>> <mailto:ben.mena...@nci.org.au>&
, and
pmix_progress_threads).
That said, I’m not sure why get_tracker is reporting 32 procs — there’s only 16
running here (i.e. 1 original + 15 spawned).
Or should I post this over in the PMIx list instead?
Cheers,
Ben
> On 17 May 2018, at 9:59 am, Ben Menadue <ben.mena...@nci.org.au> wrote
Hi,
I having trouble using map by socket on remote nodes.
Running on the same node as mpirun works fine (except for that spurious
debugging line):
$ mpirun -H localhost:16 -map-by ppr:2:socket:PE=4 -display-map /bin/true
[raijin7:22248] SETTING BINDING TO CORE
Data for JOB [11140,1] offset 0
Hi,
I’m seeing an extraneous “DONE” message being printed with OpenMPI 3.0.0 when
mapping by core:
[bjm900@raijin7 pt2pt]$ mpirun -np 2 ./osu_bw > /dev/null
[bjm900@raijin7 pt2pt]$ mpirun -map-by core -np 2 ./osu_bw > /dev/null
[raijin7:14376] DONE
This patch gets rid of the offending line —
elcome to pull down the patch and locally apply it if it would help.
Ralph
> On Aug 24, 2016, at 5:29 PM, r...@open-mpi.org wrote:
>
> Hmmm...bet I know why. Let me poke a bit.
>
>> On Aug 24, 2016, at 5:18 PM, Ben Menadue <ben.mena...@nci.org.au> wrote:
>>
could pull the patch in advance if it is holding you up.
>
>
>> On Aug 23, 2016, at 11:46 PM, Ben Menadue <ben.mena...@nci.org.au> wrote:
>>
>> Hi,
>>
>> One of our users has noticed that binding is disabled in 2.0.0 when
>> --oversubscribe is pa
Hi,
One of our users has noticed that binding is disabled in 2.0.0 when
--oversubscribe is passed, which is hurting their performance, likely
through migrations between sockets. It looks to be because of 294793c
(PR#1228).
They need to use --oversubscribe as for some reason the developers
, but that was before
my time.
Cheers,
Ben
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Dave Turner
Sent: Friday, 4 March 2016 3:28 PM
To: Ben Menadue <ben.mena...@nci.org.au>
Cc: Open MPI Developers <de...@open-mpi.org>
Subject: Re: [OMPI devel] mpif.h on Intel bu
Hi Dave,
The issue is the way MPI_Sizeof is handled; it's implemented as a series of
interfaces that map the MPI_Sizeof call to the right function in the library. I
suspect this is needed because that function doesn't take a datatype argument
and instead infers this from the argument types
Hi,
I just finished building 1.8.6 and master on our cluster and noticed that
for both, XRC support wasn't being detected because it didn't detect the
IBV_SRQT_XRC declaration:
checking whether IBV_SRQT_XRC is declared... (cached) no
...
checking if ConnectX XRC support
18 matches
Mail list logo