Hi,
Is there any reason why you do not build the latest 5.0.2 package?
Anyway, the issue could be related to an unknown filesystem.
Do you get a meaningful error if you manually run
/.../test/util/opal_path_nfs?
If not, can you share the output of
mount | cut -f3,5 -d' '
Cheers,
Gilles
On
Greg,
If Open MPI was built with UCX, your jobs will likely use UCX (and the
shared memory provider) even if running on a single node.
You can
mpirun --mca pml ob1 --mca btl self,sm ...
if you want to avoid using UCX.
What is a typical mpirun command line used under the hood by your "make
test"?
Christopher,
I do not think Open MPI explicitly asks SLURM which cores have been
assigned on each node.
So if you are planning to run multiple jobs on the same node, your best bet
is probably to have SLURM
use cpusets.
Cheers,
Gilles
On Sat, Feb 24, 2024 at 7:25 AM Christopher Daley via users
Hi,
please open an issue on GitHub at https://github.com/open-mpi/ompi/issues
and provide the requested information.
If the compilation failed when configured with --enable-debug, please share
the logs.
the name of the WRF subroutine suggests the crash might occur in
MPI_Comm_split(),
if so,
Hi,
Please open a GitHub issue at https://github.com/open-ompi/ompi/issues and
provide the requested information
Cheers,
Gilles
On Sat, Jan 27, 2024 at 12:04 PM Kook Jin Noh via users <
users@lists.open-mpi.org> wrote:
> Hi,
>
>
>
> I’m installing OpenMPI 5.0.1 on Archlinux 6.7.1. Everything
Luis,
you can pass the --bind-to hwthread option in order to bind on the first
thread of each core
Cheers,
Gilles
On Fri, Sep 8, 2023 at 8:30 PM Luis Cebamanos via users <
users@lists.open-mpi.org> wrote:
> Hello,
>
> Up to now, I have been using numerous ways of binding with wrappers
>
Aziz,
When using direct run (e.g. srun), OpenMPI has to interact with SLURM.
This is typically achieved via PMI2 or PMIx
You can
srun --mpi=list
to list the available options on your system
if PMIx is available, you can
srun --mpi=pmix ...
if only PMI2 is available, you need to make sure Open
Luis,
That can happen if a component is linked with libnuma.so:
Open MPI will fail to open it and try to fallback on an other one.
You can run ldd on the mca_*.so components in the /.../lib/openmpi directory
to figure out which is using libnuma.so and assess if it is needed or not.
Cheers,
Kurt,
I think Joachim was also asking for the command line used to launch your
application.
Since you are using Slurm and MPI_Comm_spawn(), it is important to
understand whether you are using mpirun or srun
FWIW, --mpi=pmix is a srun option. you can srun --mpi=list to find the
available
Open MPI 1.6.5 is an antique version and you should not expect any support
with it.
Instead, I suggest you try the latest one, rebuild your app and try again.
FWIW, that kind of error occurs when the MPI library does not match mpirun
That can happen when mpirun and libmpi.so come from different
Christof,
Open MPI switching to the internal PMIx is a bug I addressed in
https://github.com/open-mpi/ompi/pull/11704
Feel free to manually download and apply the patch, you will then need
recent autotools and run
./autogen.pl --force
An other option is to manually edit the configure file
Look
Todd,
Similar issues were also reported when there is Network Translation
(NAT) between hosts, and that occured when using kvm/qemu virtual
machine running on the same host.
First you need to list the available interfaces on both nodes. Then try
to restrict to a single interface that is
Arun,
First Open MPI selects a pml for **all** the MPI tasks (for example,
pml/ucx or pml/ob1)
Then, if pml/ob1 ends up being selected, a btl component (e.g. btl/uct,
btl/vader) is used for each pair of MPI tasks
(tasks on the same node will use btl/vader, tasks on different nodes will
use
Rob,
Do you invoke mpirun from **inside** the container?
IIRC, mpirun is generally invoked from **outside** the container, could
you try this if not already the case?
The error message is from SLURM, so this is really a SLURM vs
singularity issue.
What if you
srun -N 2 -n 2 hostname
You can pick one test, make it standalone, and open an issue on GitHub.
How does (vanilla) Open MPI compare to your vendor Open MPI based library?
Cheers,
Gilles
On Wed, Jan 11, 2023 at 10:20 PM Dave Love via users <
users@lists.open-mpi.org> wrote:
> Gilles Gouaillardet via user
Dave,
If there is a bug you would like to report, please open an issue at
https://github.com/open-mpi/ompi/issues and provide all the required
information
(in this case, it should also include the UCX library you are using and how
it was obtained or built).
Cheers,
Gilles
On Fri, Jan 6, 2023
Hi,
Simply add
btl = tcp,self
If the openib error message persists, try also adding
osc_rdma_btls = ugni,uct,ucp
or simply
osc = ^rdma
Cheers,
Gilles
On 11/29/2022 5:16 PM, Gestió Servidors via users wrote:
Hi,
If I run “mpirun --mca btl tcp,self --mca allow_ib 0 -n 12
Hi Eric,
Currently, Open MPI does not provide specific support for CephFS.
MPI-IO is either implemented by ROMIO (imported from MPICH, it does not
support CephFS today)
or the "native" ompio component (that also does not support CephFS today).
A proof of concept for CephFS in ompio might
Arham,
It should be balanced: the default mapping is to allocate NUMA packages
round robin.
you can
mpirun --report-bindings -n 28 true
to have Open MPI report the bindings
or
mpirun --tag-output -n 28 grep Cpus_allowed_list /proc/self/status
to have each task report which physical cpu it is
;machinefile" (vs.
a copy-and-pasted "em dash")?
--
Jeff Squyres
jsquy...@cisco.com
From: users on behalf of Gilles Gouaillardet via
users
Sent: Sunday, November 13, 2022 9:18 PM
To: Open MPI Users
Cc: Gilles Gouaillardet
Subject: Re: [OMPI users
There is a typo in your command line.
You should use --mca (minus minus) instead of -mca
Also, you can try --machinefile instead of -machinefile
Cheers,
Gilles
There are not enough slots available in the system to satisfy the 2
slots that were requested by the application:
–mca
On Mon, Nov
Chris,
Did you double check libopen-rte.so.40 and libopen-pal.so.40 are installed
in /mnt/software/o/openmpi/4.1.4-ct-test/lib?
If they are not present, it means your install is busted and you should try
to reinstall it.
Cheers,
Gilles
On Sat, Nov 5, 2022 at 3:42 AM Chris Taylor via users <
Volker,
https://ntq1982.github.io/files/20200621.html (mentioned in the ticket)
suggests that patching the generated configure file can do the trick.
We already patch the generated configure file in autogen.pl (if the
patch_autotools_output subroutine), so I guess that could be enhanced
to
Lucas,
the number of MPI tasks started by mpirun is either
- explicitly passed via the command line (e.g. mpirun -np 2306 ...)
- equals to the number of available slots, and this value is either
a) retrieved from the resource manager (such as a SLURM allocation)
b) explicitly set in a
Matias,
Assuming you run one MPI task per unikernel, and two unikernels share
nothing,
it means that inter-node communication cannot be performed via shared
memory or kernel feature
(such as xpmem or knem). That also implies communication are likely using
the loopback interface
which is much
> cout << "intercommunicator interClient=" << interClient << endl;
>
> After connection from a third party client it returns "c403" (in hex).
>
> Both 8406 and c403 are negative integer in dec.
>
> I don't know if it is "norm
Sorry if I did not make my intent clear.
I was basically suggesting to hack the Open MPI and PMIx wrappers to
hostname() and remove the problematic underscores to make the regx
components a happy panda again.
Cheers,
Gilles
- Original Message -
> I think the files suggested by Gilles
Patrick,
you will likely also need to apply the same hack to opal_net_get_hostname()
in opal/util/net.c
Cheers,
Gilles
On Thu, Jun 16, 2022 at 7:30 PM Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:
> Patrick,
>
> I am not sure Open MPI can do that out of the
Patrick,
I am not sure Open MPI can do that out of the box.
Maybe hacking pmix_net_get_hostname() in
opal/mca/pmix/pmix3x/pmix/src/util/net.c
can do the trick.
Cheers,
Gilles
On Thu, Jun 16, 2022 at 4:24 PM Patrick Begou via users <
users@lists.open-mpi.org> wrote:
> Hi all,
>
> we are
Scott,
I am afraid this test is inconclusive since stdout is processed by mpirun.
What if you
mpirun -np 1 touch /tmp/xyz
abort (since it will likely hang) and
ls -l /tmp/xyz
In my experience on mac, this kind of hangs can happen if you are running a
firewall and/or the IP of your host does
Alois,
Thanks for the report.
FWIW, I am not seeing any errors on my Mac with Open MPI from brew (4.1.3)
How many MPI tasks are you running?
Can you please confirm you can evidence the error with
mpirun -np ./mpi_test_suite -d MPI_TYPE_MIX_ARRAY -c
0 -t collective
Also, can you try the same
You can first double check you
MPI_Init_thread(..., MPI_THREAD_MULTIPLE, ...)
And the provided level is MPI_THREAD_MULTIPLE as you requested.
Cheers,
Gilles
On Fri, Apr 22, 2022, 21:45 Angel de Vicente via users <
users@lists.open-mpi.org> wrote:
> Hello,
>
> I'm running out of ideas, and
Cici,
I do not think the Intel C compiler is able to generate native code for the
M1 (aarch64).
The best case scenario is it would generate code for x86_64 and then
Rosetta would be used to translate it to aarch64 code,
and this is a very downgraded solution.
So if you really want to stick to
Ernesto,
Not directly.
But you can use MPI_Comm_split_type(..., MPI_COMM_TYPE_SHARED, ...) and then
MPI_Comm_size(...) on the "returned" communicator.
Cheers,
Gilles
On Sun, Apr 3, 2022 at 5:52 AM Ernesto Prudencio via users <
users@lists.open-mpi.org> wrote:
> Thanks,
>
>
>
> Ernesto.
>
>
Ernesto,
MPI_Bcast() has no barrier semantic.
It means the root rank can return after the message is sent (kind of eager
send) and before it is received by other ranks.
Cheers,
Gilles
On Sat, Apr 2, 2022, 09:33 Ernesto Prudencio via users <
users@lists.open-mpi.org> wrote:
> I have an
Patrick,
In the worst case scenario, requiring MPI_THREAD_MULTIPLE support can
disable some fast interconnect
and make your app fallback on IPoIB or similar. And in that case, Open MPI
might prefer a suboptimal
IP network which can impact the overall performances even more.
Which threading
Ernesto.
>
>
>
> *From:* users *On Behalf Of *Gilles
> Gouaillardet via users
> *Sent:* Monday, March 14, 2022 2:22 AM
> *To:* Open MPI Users
> *Cc:* Gilles Gouaillardet
> *Subject:* Re: [OMPI users] [Ext] Re: Call to MPI_Allreduce() returning
> value 15
>
>
is not an issue on situation 3, when I explicitly point the
>runtime mpi to be 4.0.3 compiled with INTEL (even though I compiled the
>application and openmpi 4.1.2 with GNU, and I link the application with
>openmpi 4.1.2)
>
>
>
> Best,
>
>
>
> Ernesto.
&g
Ernesto,
the coll/tuned module (that should handle collective subroutines by
default) has a known issue when matching but non identical signatures are
used:
for example, one rank uses one vector of n bytes, and an other rank uses n
bytes.
Is there a chance your application might use this pattern?
Angel,
Infiniband detection likely fails before checking expanded verbs.
Please compress and post the full configure output
Cheers,
Gilles
On Fri, Feb 18, 2022 at 12:02 AM Angel de Vicente via users <
users@lists.open-mpi.org> wrote:
> Hi,
>
> I'm trying to compile the latest OpenMPI version
Hari,
What does
ldd solver.exe
(or whatever your clever exe file is called) reports?
Cheers,
Gilles
On Mon, Jan 31, 2022 at 2:09 PM Hari Devaraj via users <
users@lists.open-mpi.org> wrote:
> Hello,
>
> I am trying to run a FEA solver exe file.
> I get this error message:
>
> error while
Thanks Ralph,
Now I get what you had in mind.
Strictly speaking, you are making the assumption that Open MPI performance
matches the system MPI performances.
This is generally true for common interconnects and/or those that feature
providers for libfabric or UCX, but not so for "exotic"
al MPI implementations may
> simplify the mechanics of setting it up, but it is definitely not required.
>
>
> On Jan 26, 2022, at 8:55 PM, Gilles Gouaillardet via users <
> users@lists.open-mpi.org> wrote:
>
> Brian,
>
> FWIW
>
> Keep in mind that when
Brian,
FWIW
Keep in mind that when running a container on a supercomputer, it is
generally recommended to use the supercomputer MPI implementation
(fine tuned and with support for the high speed interconnect) instead of
the one of the container (generally a vanilla MPI with basic
support for TCP
You need a way for your process to exchange information so MPI_Init()
can work.
One option is to have your custom launcher implement a PMIx server
https://pmix.github.io
If you choose this path, you will likely want to use the Open PMIx
reference implementation
https://openpmix.github.io
gives the error
> mentioned below.
>
> Best,
> Matthias
>
> Am 25.01.22 um 07:17 schrieb Gilles Gouaillardet via users:
> > Matthias,
> >
> > do you run the MPI application with mpirun or srun?
> >
> > The error log suggests you are using srun, and SL
Matthias,
do you run the MPI application with mpirun or srun?
The error log suggests you are using srun, and SLURM only provides only PMI
support.
If this is the case, then you have three options:
- use mpirun
- rebuild Open MPI with PMI support as Ralph previously explained
- use SLURM PMIx:
rank, n, m, v_glob);
and also resize rtype so the second element starts at v_glob[3][0] => upper
bound = (3*sizeof(int))
By the way, since this question is not Open MPI specific, sites such as
Stack Overflow are a better fit.
Cheers,
Gilles
On Thu, Dec 16, 2021 at 6:46 PM Gilles
Jonas,
Assuming v_glob is what you expect, you will need to
`MPI_Type_create_resized_type()` the received type so the block received
from process 1 will be placed at the right position (v_glob[3][1] => upper
bound = ((4*3+1) * sizeof(int))
Cheers,
Gilles
On Thu, Dec 16, 2021 at 6:33 PM Jonas
Kurt,
Assuming you built Open MPI with tm support (default if tm is detected at
configure time, but you can configure --with-tm to have it abort if tm
support is not found), you should not need to use a hostfile.
As a workaround, I would suggest you try to
mpirun --map-by node -np 21 ...
Hi Ben,
have you tried
export OMPI_MCA_common_ucx_opal_mem_hooks=1
Cheers,
Gilles
On Mon, Nov 1, 2021 at 9:22 PM bend linux4ms.net via users <
users@lists.open-mpi.org> wrote:
> Ok, I a am newbie supporting the a HPC project and learning about MPI.
>
> I have the following portion of a
Matt,
did you build the same Open MPI 4.1.1 from an official tarball with the
previous NAG Fortran?
did you run autogen.pl (--force) ?
Just to be sure, can you rerun the same test with the previous NAG version?
When using static libraries, you can try manually linking with
-lopen-orted-mpir
Carl,
I opened https://github.com/open-mpi/ompi/issues/9444 to specifically track
the issue related to the op/avx component
TL;DR
nvhpc compilers can compile AVX512 intrinsics (so far so good), but do not
define at least one of these macros
__AVX512BW__
__AVX512F__
__AVX512VL__
and Open MPI is
Ray,
there is a typo, the configure option is
--enable-mca-no-build=op-avx
Cheers,
Gilles
- Original Message -
Added --enable-mca-no-build=op-avx to the configure line. Still dies in
the same place.
CCLD mca_op_avx.la
Ray,
note there is a bug in nvc compilers since 21.3
(it has been reported and is documented at
https://github.com/open-mpi/ompi/issues/9402)
For the time being, I suggest you use gcc, g++ and nvfortran
FWIW, the AVX2 issue is likely caused by nvc **not** defining some macros
(that are both
ly for preliminary
> > testing, and does not merit re-enabling these providers in
> > this notebook.
> >
> > Thank you very much for the clarification.
> >
> > Regards,
> > Jorge.
> >
> > - Mensaje original -
> >> De: "Gilles Gouaillar
Hi Jeff,
Here is a sample file I used some times ago (some definitions might be
missing though ...)
In order to automatically generate this file - this is a bit of an egg
and the chicken problem -
you can run
configure -c
on the RISC-V node. It will generate a config.cache file.
Then
],0]) is on host: unknown!
BTLs attempted: vader tcp self
Your MPI job is now going to abort; sorry.
[fsc08:465159] [[45369,2],27] ORTE_ERROR_LOG: Unreachable in file
dpm/dpm.c at line 493
On Thu, 2021-08-26 at 14:30 +0900, Gilles Gouaillardet via users wrote:
Franco,
I am surprised UCX
Franco,
I am surprised UCX gets selected since there is no Infiniband network.
There used to be a bug that lead UCX to be selected on shm/tcp systems, but
it has been fixed. You might want to give a try to the latest versions of
Open MPI
(4.0.6 or 4.1.1)
Meanwhile, try to
mpirun --mca pml ^ucx
to evaluate them and
report the performance numbers.
On 7/20/2021 11:00 PM, Dave Love via users wrote:
Gilles Gouaillardet via users writes:
One motivation is packaging: a single Open MPI implementation has to be
built, that can run on older x86 processors (supporting only SSE) and the
latest
One motivation is packaging: a single Open MPI implementation has to be
built, that can run on older x86 processors (supporting only SSE) and the
latest ones (supporting AVX512). The op/avx component will select at
runtime the most efficient implementation for vectorized reductions.
On Mon, Jul
Hi Jeff,
Assuming you did **not** explicitly configure Open MPI with
--disable-dlopen, you can try
mpirun --mca pml ob1 --mca btl vader,self ...
Cheers,
Gilles
On Thu, Jun 24, 2021 at 5:08 AM Jeff Hammond via users <
users@lists.open-mpi.org> wrote:
> I am running on a single node and do not
Jorge,
pml/ucx used to be selected when no fast interconnect were detected
(since ucx provides driver for both TCP and shared memory).
These providers are now disabled by default, so unless your machine
has a supported fast interconnect (such as Infiniband),
pml/ucx cannot be used out of the box
Howard,
I have a recollection of a similar issue that only occurs with the
latest flex (that requires its own library to be passed to the linker).
I cannot remember if this was a flex packaging issue, or if we ended up
recommending to downgrade flex to
a known to work version.
The issue
takes place.
> I also request you to plan for an early 4.1.1rc2 release at least by
June 2021.
>
> With Regards,
> S. Biplab Raut
>
> -Original Message-
> From: Gilles Gouaillardet
> Sent: Thursday, April 1, 2021 8:31 AM
> To: Raut, S Biplab
> Subject:
Michael,
orted is able to find its dependencies to the Intel runtime on the
host where you sourced the environment.
However, it is unlikely able to do it on a remote host
For example
ssh ... ldd `which opted`
will likely fail.
An option is to use -rpath (and add the path to the Intel runtime).
Luis,
this file is never compiled when an external hwloc is used.
Please open a github issue and include all the required information
Cheers,
Gilles
On Tue, Mar 23, 2021 at 5:44 PM Luis Cebamanos via users
wrote:
>
> Hello,
>
> Compiling OpenMPI 4.0.5 with Intel 2020 I came across this
Matt,
you can either
mpirun --mca btl self,vader ...
or
export OMPI_MCA_btl=self,vader
mpirun ...
you may also add
btl = self,vader
in your /etc/openmpi-mca-params.conf
and then simply
mpirun ...
Cheers,
Gilles
On Fri, Mar 19, 2021 at 5:44 AM Matt Thompson via users
wrote:
>
> Prentice,
Anthony,
Did you make sure you can compile a simple fortran program with
gfortran? and gcc?
Please compress and attach both openmpi-config.out and config.log, so
we can diagnose the issue.
Cheers,
Gilles
On Mon, Mar 8, 2021 at 6:48 AM Anthony Rollett via users
wrote:
>
> I am trying to
On top of XPMEM, try to also force btl/vader with
mpirun --mca pml ob1 --mca btl vader,self, ...
On Fri, Mar 5, 2021 at 8:37 AM Nathan Hjelm via users
wrote:
>
> I would run the v4.x series and install xpmem if you can
> (http://github.com/hjelmn/xpmem). You will need to build with
>
yes, you need to (re)build Open MPI from source in order to try this trick.
On 2/26/2021 3:55 PM, LINUS FERNANDES via users wrote:
No change.
What do you mean by running configure?
Are you expecting me to build OpenMPI from source?
On Fri, 26 Feb 2021, 11:16 Gilles Gouaillardet via users
gt;>>
>>>> Errno==13 is EACCESS, which generically translates to "permission denied".
>>>> Since you're running as root, this suggests that something outside of
>>>> your local environment (e.g., outside of that immediate layer of
>>>&g
s which I
> obviously can't on Termux since it doesn't support OpenJDK.
>
> On Thu, 25 Feb 2021, 13:37 Gilles Gouaillardet via users,
> wrote:
>>
>> Is SELinux running on ArchLinux under Termux?
>>
>> On 2/25/2021 4:36 PM, LINUS FERNANDES via users wrote:
>> &g
Is SELinux running on ArchLinux under Termux?
On 2/25/2021 4:36 PM, LINUS FERNANDES via users wrote:
Yes, I did not receive this in my inbox since I set to receive digest.
ifconfig output:
dummy0: flags=195 mtu 1500
inet6 fe80::38a0:1bff:fe81:d4f5 prefixlen 64 scopeid 0x20
Can you run
ifconfig
or
ip addr
in both Termux and ArchLinux for Termux?
On 2/25/2021 2:00 PM, LINUS FERNANDES via users wrote:
Why do I see the following error messages when executing |mpirun| on
ArchLinux for Termux?
The same program executes on Termux without any glitches.
Diego,
IIRC, you now have to build your gfortran 10 apps with
-fallow-argument-mismatch
Cheers,
Gilles
- Original Message -
Dear OPENMPI users,
i'd like to notify you a strange issue that arised right after
installing a new up-to-date version of Linux (Kubuntu 20.10, with
This is not an Open MPI question, and hence not a fit for this mailing
list.
But here we go:
first, try
cmake -DGMX_MPI=ON ...
if it fails, try
cmake -DGMX_MPI=ON -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx .
..
Cheers,
Gilles
- Original Message -
Hi, MPI
Max,
at configure time, Open MPI detects the *compiler* capabilities.
In your case, your compiler can emit AVX512 code.
(and fwiw, the tests are only compiled and never executed)
Then at *runtime*, Open MPI detects the *CPU* capabilities.
In your case, it should not invoke the functions
packages?
>
>
>
> Do we know if this was definitely fixed in v4.1.x?
>
>
> > On Feb 4, 2021, at 7:46 AM, Gilles Gouaillardet via users
> > wrote:
> >
> > Martin,
> >
> > this is a connectivity issue reported by the btl/tcp component.
> >
Martin,
this is a connectivity issue reported by the btl/tcp component.
You can try restricting the IP interface to a subnet known to work
(and with no firewall) between both hosts
mpirun --mca btl_tcp_if_include 192.168.0.0/24 ...
If the error persists, you can
mpirun --mca
Diego,
the mpirun command line starts 2 MPI task, but the error log mentions
rank 56, so unless there is a copy/paste error, this is highly
suspicious.
I invite you to check the filesystem usage on this node, and make sure
there is a similar amount of available space in /tmp and /dev/shm (or
Passant,
unless this is a copy paste error, the last error message reads plus
zero three, which is clearly an unknown switch
(plus uppercase o three is a known one)
At the end of the configure, make sure Fortran bindings are generated.
If the link error persists, you can
ldd
Dave,
On 1/19/2021 2:13 AM, Dave Love via users wrote:
Generally it's not surprising if there's a shortage
of effort when outside contributions seem unwelcome. I've tried to
contribute several times. The final attempt wasted two or three days,
after being encouraged to get the port of
over TCP, where we use the socket timeout to prevent deadlocks. As you
> already did quite a few communicator duplications and other collective
> communications before you see the timeout, we need more info about this. As
> Gilles indicated, having the complete output might help. What is
Sajid,
I believe this is a Spack issue and Open MPI cannot do anything about it.
(long story short, `module load openmpi-xyz` does not set the
environment for the (spack) external `xpmem` library.
I updated the spack issue with some potential workarounds you might
want to give a try.
Cheers,
Daniel,
Can you please post the full error message and share a reproducer for
this issue?
Cheers,
Gilles
On Fri, Jan 8, 2021 at 10:25 PM Daniel Torres via users
wrote:
>
> Hi all.
>
> Actually I'm implementing an algorithm that creates a process grid and
> divides it into row and column
Vineet,
probably *not* what you expect, but I guess you can try
$ cat host-file
host1 slots=3
host2 slots=3
host3 slots=3
$ mpirun -hostfile host-file -np 2 ./EXE1 : -np 1 ./EXE2 : -np 2
./EXE1 : -np 1 ./EXE2 : -np 2 ./EXE1 : -np 1 ./EXE2
Cheers,
Gilles
On Mon, Dec 21, 2020 at 10:26 PM
the memory used by rank 0 before (blue) and after (red) the patch.
>
> Thanks
>
> Patrick
>
>
> Le 10/12/2020 à 10:15, Gilles Gouaillardet via users a écrit :
>
> Patrick,
>
>
> First, thank you very much for sharing the reproducer.
>
>
> Yes, please open
binaries for the 4 runs and OpenMPI 3.1 (same behavior with 4.0.5).
The code is in attachment. I'll try to check type deallocation as soon
as possible.
Patrick
Le 04/12/2020 à 01:34, Gilles Gouaillardet via users a écrit :
Patrick,
based on George's idea, a simpler check is to retrieve
r Omnipath based. I will have to investigate too but not sure it
is the same problem.
Patrick
Le 04/12/2020 à 01:34, Gilles Gouaillardet via users a écrit :
Patrick,
based on George's idea, a simpler check is to retrieve the Fortran
index via the (standard) MPI_Type_c2() function
after
Patrick,
based on George's idea, a simpler check is to retrieve the Fortran index
via the (standard) MPI_Type_c2() function
after you create a derived datatype.
If the index keeps growing forever even after you MPI_Type_free(), then
this clearly indicates a leak.
Unfortunately, this
>
> I was tracking this problem for several weeks but not looking in the
> right direction (testing NFS server I/O, network bandwidth.)
>
> I think we will now move definitively to modern OpenMPI implementations.
>
> Patrick
>
> Le 03/12/2020 à 09:06, Gilles
Patrick,
In recent Open MPI releases, the default component for MPI-IO is ompio
(and no more romio)
unless the file is on a Lustre filesystem.
You can force romio with
mpirun --mca io ^ompio ...
Cheers,
Gilles
On 12/3/2020 4:20 PM, Patrick Bégou via users wrote:
Hi,
I'm using an
Dean,
That typically occurs when some nodes have multiple interfaces, and
several nodes have a similar IP on a private/unused interface.
I suggest you explicitly restrict the interface Open MPI should be using.
For example, you can
mpirun --mca btl_tcp_if_include eth0 ...
Cheers,
Gilles
On
Paul,
a "slot" is explicitly defined in the error message you copy/pasted:
"If none of a hostfile, the --host command line parameter, or an RM is
present, Open MPI defaults to the number of processor cores"
The error message also lists 4 ways on how you can move forward, but
you should first
Hi Ognen,
MPI-IO is implemented by two components:
- ROMIO (from MPICH)
- ompio ("native" Open MPI MPI-IO, default component unless running on Lustre)
Assuming you want to add support for a new filesystem in ompio, first
step is to implement a new component in the fs framework
the framework is
Alan,
thanks for the report, I addressed this issue in
https://github.com/open-mpi/ompi/pull/8116
As a temporary workaround, you can apply the attached patch.
FWIW, f18 (shipped with LLVM 11.0.0) is still in development and uses
gfortran under the hood.
Cheers,
Gilles
On Wed, Oct 21, 2020 at
Hi Jorge,
If a firewall is running on your nodes, I suggest you disable it and try again
Cheers,
Gilles
On Wed, Oct 21, 2020 at 5:50 AM Jorge SILVA via users
wrote:
>
> Hello,
>
> I installed kubuntu20.4.1 with openmpi 4.0.3-0ubuntu in two different
> computers in the standard way. Compiling
Patrick,
Thanks for the report and the reproducer.
I was able to confirm the issue with python and Fortran, but
- I can only reproduce it with pml/ucx (read --mca pml ob1 --mca btl
tcp,self works fine)
- I can only reproduce it with bcast algorithm 8 and 9
As a workaround, you can keep using
John,
I am not sure you will get much help here with a kernel crash caused
by a tweaked driver.
About HPL, you are more likely to get better performance with P and Q
closer (e.g. 4x8 is likely better then 2x16 or 1x32).
Also, HPL might have better performance with one MPI task per node and
a
1 - 100 of 1020 matches
Mail list logo