zero copy does not work with non-contiguous datatypes (it would require
both processes to know the memory layout used by the peer). As long as the
memory layout described by the type can be seen as contiguous (even if
described otherwise), it should work just fine.
George.
On Tue, Apr 23, 2024
All the examples work for me on using ULFM ge87f595 compiled with
minimalistic options:
'--prefix=XXX --enable-picky --enable-debug --disable-heterogeneous
--enable-contrib-no-build=vt --enable-mpirun-prefix-by-default
--enable-mpi-ext=ftmpi --with-ft=mpi --with-pmi'.
I run using ipoib, so I
That's not for the MPI communications but for the process management part
(PRRTE/PMIX). If forcing the PTL to `lo` worked it mostly indicates that
the shared memory in OMPI was able to be set up correctly.
George.
On Mon, Feb 5, 2024 at 3:47 PM John Hearns wrote:
> Stupid question... Why is
That would be something @Ralph Castain needs to be
looking at as he declared in a previous discussion that `lo` was the
default for PMIX and we now have 2 reports stating otherwise.
George.
On Mon, Feb 5, 2024 at 3:15 PM John Haiducek wrote:
> Adding '--pmixmca ptl_tcp_if_include lo0' to the
OMPI seems unable to create a communication medium between your processes.
There are few known issues on OSX, please read
https://github.com/open-mpi/ompi/issues/12273 for more info.
Can you provide the header of the ompi_info command. What I'm interested on
is the part about `Configure command
I think the root cause was that he expected the negative integer resulting
from the reduction to be the exit code of the application, and as I
explained in my prior email that's not how exit() works.
The exit() issue aside, MPI_Abort seems to be the right function for this
usage.
George.
On
Alex,
exit(status) does not make status available to the parent process wait,
instead it makes the low 8 bits available to the parent as unsigned. This
explains why small positive values seem to work correctly while negative
values do not (because of the 32 bits negative value representation in
Alex,
How are your values "random" if you provide correct values ? Even for
negative values you could use MIN to pick one value and return it. What is
the problem with `MPI_Abort` ? it does seem to do what you want.
George.
On Tue, Jul 18, 2023 at 4:38 AM Alexander Stadik via users <
I can't replicate this on my setting, but I am not using the tar archive
from the OMPI website (I use the git tag). Can you do `ls -l
opal/datatype/.lib` in your build directory.
George.
On Wed, Jul 12, 2023 at 7:14 AM Elad Cohen via users <
users@lists.open-mpi.org> wrote:
> Hi Jeff, thanks
Some folks from ORNL have done some studies about OMPI memory usage a few
years ago, but I am not sure if these studies are openly available. OMPI
manages all the MCA parameters, user facing requests, unexpected messages,
temporary buffers for collectives and IO. And those are, I might be
slightly
Brian,
OMPI does not have an official mechanism to report how much memory OMPI
allocates. But, there is hope:
1. We have a mechanism to help debug memory issues (OPAL_ENABLE_MEM_DEBUG).
You could enable it and then provide your own flavor of memory tracking in
opal/util/malloc.c
2. You can use a
Edgar is right, UCX_TLS has some role in the selection. You can see the
current selection by running `uxc_info -c`. In my case, UCX_TLS is set to
`all` somehow, and I had either a not-connected IB device or a GPU.
However, I did not set UCX_TLS manually, and I can't see it anywhere in my
system
ucx PML should work just fine even on a single node scenario. As Jeff
indicated you need to move the MCA param `--mca pml ucx` before your
command.
George.
On Mon, Mar 6, 2023 at 9:48 AM Jeff Squyres (jsquyres) via users <
users@lists.open-mpi.org> wrote:
> If this run was on a single node,
Assuming a correct implementation the described communication pattern
should work seamlessly.
Would it be possible to either share a reproducer or provide the execution
stack by attaching a debugger to the deadlocked application to see the
state of the different processes. I wonder if all
This error seems to be initiated from the PMIX regex framework. Not sure
exactly which one is used, but a good starting point is in one of the files
in 3rd-party/openpmix/src/mca/preg/. Look for the generate_node_regex
function in the different components, one of them is raising the error.
There is a lot of FUD regarding the so-called optimizations for
neighborhood collectives. In general, they all converge toward creating a
globally consistent communication order. If the neighborhood topology is
regular, some parts of the globally consistent communication order can be
inferred, but
Michael,
As far as I know none of the implementations of the
neighborhood collectives in OMPI are architecture-aware. The only 2
components that provide support for neighborhood collectives are basic (for
the blocking version) and libnbc (for the non-blocking versions).
George.
On Wed, Jun
That is weird, but maybe it is not a deadlock, but a very slow progress. In
the child can you print the fdmax and i in the frame do_child.
George.
On Thu, May 5, 2022 at 11:50 AM Scott Sayres via users <
users@lists.open-mpi.org> wrote:
> Jeff, thanks.
> from 1:
>
> (lldb) process attach --pid
frame #2: 0x00010784b418
> mca_odls_default.so`odls_default_fork_local_proc
> + 284
>
> frame #3: 0x0001002c7914
> libopen-rte.40.dylib`orte_odls_base_spawn_proc
> + 968
>
> frame #4: 0x0001003d96dc
> libevent_core-2.1.7.dylib`event_process_active_
I compiled a fresh copy of the 4.1.3 branch on my M1 laptop, and I can run
both MPI and non-MPI apps without any issues.
Try running `lldb mpirun -- -np 1 hostname` and once it deadlocks, do a
CTRL+C to get back on the debugger and then `backtrace` to see where it is
waiting.
George.
On Wed,
lib
>
> "_opal_atomic_wmb", referenced from:
>
> import-atom in libopen-pal.dylib
>
> ld: symbol(s) not found for architecture x86_64
>
> make[2]: *** [opal_wrapper] Error 1
>
> make[1]: *** [all-recursive] Error 1
>
> make: *** [all-recursive] Error
1. I am not aware of any outstanding OMPI issues with the M1 chip that
would prevent OMPI from compiling and running efficiently in an M1-based
setup, assuming the compilation chain is working properly.
2. M1 supports x86 code via Rosetta, an app provided by Apple to ensure a
smooth transition
Vladimir,
A while back the best cluster monitoring tool was Ganglia (
http://ganglia.sourceforge.net/), but it has not been maintained for
several years. There are quite a few alternatives out there, I found
nightingale (https://github.com/didi/nightingale) to be simple to install
and use.
Good
Sajid,
`--bind-to-core` should have generated the same warning on OSX. Not sure
why this is happening, but I think the real bug here is the lack of warning
when using the deprecated argument.
Btw, the current master does not even accept 'bind-to-core', instead it
complains about 'unrecognized
OMPI cannot support process binding on OSX because, as the message
indicates, there is no OS API for process binding (at least not exposed to
the user-land applications).
George.
On Thu, Mar 17, 2022 at 3:25 PM Sajid Ali via users <
users@lists.open-mpi.org> wrote:
> Hi OpenMPI-developers,
>
I see similar issues on platforms with multiple IP addresses, if some of
them are not fully connected. In general, specifying which interface OMPI
can use (with --mca btl_tcp_if_include x.y.z.t/s) solves the problem.
George.
On Wed, Mar 16, 2022 at 5:11 PM Mccall, Kurt E. (MSFC-EV41) via
There are two ways the MPI_Allreduce returns MPI_ERR_TRUNCATE:
1. it is propagated from one of the underlying point-to-point
communications, which means that at least one of the participants has an
input buffer with a larger size. I know you said the size is fixed, but it
only matters if all
Sorry, I should have been more precise in my answer. Topology information
is only used during neighborhood communications via the specialized API, in
all other cases the communicator would behave as a normal, fully connected,
communicator.
George.
On Tue, Feb 15, 2022 at 9:28 AM Neil Carlson
On Mon, Feb 14, 2022 at 6:33 PM Neil Carlson via users <
users@lists.open-mpi.org> wrote:
> I've been successful at using MPI_Dist_graph_create_adjacent to create a
> new communicator with graph topology, and using it with
> MPI_Neighbor_alltoallv. But I have a few questions:
>
> 1. Where can I
I am not sure I understand the comment about MPI_T.
Each network card has internal counters that can be gathered by any process
on the node. Similarly, some information is available from the switches,
but I always assumed that information is aggregated across all ongoing
jobs. But, merging the
Collecting data during execution is possible in OMPI either with an
external tool, such as mpiP, or the internal infrastructure, SPC. Take a
look at ./examples/spc_example.c or ./test/spc/spc_test.c to see how to use
this.
George.
On Fri, Feb 11, 2022 at 9:43 AM Bertini, Denis Dr. via users <
Jonas,
The section 5.1.6 in MPI 4.0 should give you a better idea about the
differences between size, extent and true extent. There are also few
examples in Section 5.1.14 on how to manipulate the datatype using extent.
I think you should find Examples 5.13 to 5.16 of particular interest.
Best,
You are confusing the size and extent of the datatype. The size (aka the
physical number of bytes described by the memory layout) would be
m*nloc*sizeof(type), while the extent will be related to where you expect
the second element of the same type to start. If you do resize, you will
incorporate
The error message is self explanatory, the application calls MPI_Recv with
an invalid TAG. The MPI standard defines a valid tag as a positive integer
between 0 and the value of the MPI_UB_TAG attribute on MPI_COMM_WORLD. At
this point it seems plausible this is an application issue.
Check that
You need to enable the monitoring PML in order to get access to the
pml_monitoring_messages_count MPI_T. For this you need to know what PML you
are currently using and add monitoring to the pml MCA variable. As an
example if you use ob1 you should add the following to your mpirun command
"--mca
Hi Pierre,
MPI is allowed to pipeline the collective communications. This explains why
the MPI_Op takes the len of the buffers as an argument. Because your MPI_Op
ignores this length it alters data outside the temporary buffer we use for
the segment. Other versions of the MPI_Allreduce
Carl,
AVX support was introduced in 4.1 which explains why you did not have such
issues before. What is your configure command in these 2 cases ? Please
create an issue on github and attach your config.log.
George.
On Fri, Feb 5, 2021 at 2:44 PM Carl Ponder via users <
gt; 4.- The hostfile.
> >
> > The duration of the delay is just a few seconds, about 3 ~ 4.
> >
> > Essentially, the first error message I get from a waiting process is
> "74: MPI_ERR_PROC_FAILED: Process Failure".
> >
> > Hope this information can
Daniel,
There are no timeouts in OMPI with the exception of the initial connection
over TCP, where we use the socket timeout to prevent deadlocks. As you
already did quite a few communicator duplications and other collective
communications before you see the timeout, we need more info about this.
and hence rule out any memory
> leak that could be triggered by your fast interconnect.
>
>
>
> In any case, a reproducer will greatly help us debugging this issue.
>
>
> Cheers,
>
>
> Gilles
>
>
>
> On 12/4/2020 7:20 AM, George Bosilca via users
Patrick,
I'm afraid there is no simple way to check this. The main reason being that
OMPI use handles for MPI objects, and these handles are not tracked by the
library, they are supposed to be provided by the user for each call. In
your case, as you already called MPI_Type_free on the datatype,
John,
There are many things in play in such an experiment. Plus, expecting linear
speedup even at the node level is certainly overly optimistic.
1. A single core experiment has full memory bandwidth, so you will
asymptotically reach the max flops. Adding more cores will increase the
memory
Diego,
I see nothing wrong with the way you create the datatype. In fact this is
the perfect example on how to almost do it right in FORTRAN. The almost is
because your code is highly dependent on the -r8 compiler option (otherwise
the REAL in your type will not match the MPI_DOUBLE_PRECISION you
An application that rely on MPI eager buffers for correctness or
performance is an incorrect application. Among many other points simply
because MPI implementations without support for eager are legit. Moreover,
these applications also miss the point on performance. Among the overheads
I am not
On Wed, Mar 25, 2020 at 4:49 AM Raut, S Biplab wrote:
> [AMD Official Use Only - Internal Distribution Only]
>
>
>
> Dear George,
>
> Thank you the reply. But my question is more
> particularly on the message size from application side.
>
>
>
> Let’s say the application
Biplab,
The eager is a constant for each BTL, and it represent the data that is
sent eagerly with the matching information out of the entire message. So,
if the question is how much memory is needed to store all the
eager messages then the answer will depend on the communication pattern of
your
On Mon, Mar 16, 2020 at 6:15 PM Konstantinos Konstantinidis via users <
users@lists.open-mpi.org> wrote:
> Hi, I have some questions regarding technical details of MPI collective
> communication methods and broadcast:
>
>- I want to understand when the number of receivers in a MPI_Bcast can
>
Martyn,
I don't know exactly what your code is doing, but based on your inquiry I
assume you are using MPI_BSEND multiple times and you run out of local
buffers.
The MPI standard does not mandate a wait until buffer space becomes
available, because that can lead to deadlocks (communication
Hcoll will be present in many cases, you don’t really want to skip them
all. I foresee 2 problem with the approach you propose:
- collective components are selected per communicator, so even if they will
not be used they are still loaded.
- from outside the MPI library you have little access to
If I'm not mistaken, hcoll is playing with the opal_progress in a way that
conflicts with the blessed usage of progress in OMPI and prevents other
components from advancing and timely completing requests. The impact is
minimal for sequential applications using only blocking calls, but is
According to the error message you are using MPICH not Open MPI.
George.
On Tue, Jan 14, 2020 at 5:53 PM SOPORTE MODEMAT via users <
users@lists.open-mpi.org> wrote:
> Hello everyone.
>
>
>
> I would like somebody help me to figure out how can I make that the
> openmpi use the infiniband
This is going back to the fact that you, as a developer, are the best
placed to know exactly when asynchronous progress is needed for your
algorithm, so from that perspective you can provide that progress in the
most timely manner. One way to force MPI to do progress, is to spawn
another thread
Martin,
The MPI standard does not mandate progress outside MPI calls, thus
implementations are free to provide, or not, asynchronous progress. Calling
MPI_Test provides the MPI implementation with an opportunity to progress
it's internal communication queues. However, an implementation could try
Wonderful maybe but extremely unportable. Thanks but no thanks!
George.
On Wed, Nov 27, 2019 at 11:07 PM Zhang, Junchao wrote:
> Interesting idea. But doing MPI_THREAD_MULTIPLE has other side-effects. If
> MPI nonblocking calls could take an extra stream argument and work like a
> kernel
On Wed, Nov 27, 2019 at 5:02 PM Zhang, Junchao wrote:
> On Wed, Nov 27, 2019 at 3:16 PM George Bosilca
> wrote:
>
>> Short and portable answer: you need to sync before the Isend or you will
>> send garbage data.
>>
> Ideally, I want to formulate my code into a series of asynchronous "kernel
>
Short and portable answer: you need to sync before the Isend or you will
send garbage data.
Assuming you are willing to go for a less portable solution you can get the
OMPI streams and add your kernels inside, so that the sequential order will
guarantee correctness of your isend. We have 2 hidden
Charles,
Having implemented some of the underlying collective algorithms, I am
puzzled by the need to force the sync to 1 to have things flowing. I would
definitely appreciate a reproducer so that I can identify (and hopefully)
fix the underlying problem.
Thanks,
George.
On Tue, Oct 29, 2019
Charles,
There is a known issue with calling collectives on a tight loop, due to
lack of control flow at the network level. It results in a significant
slow-down, that might appear as a deadlock to users. The work around this
is to enable the sync collective module, that will insert a fake
To completely disable UCX you need to disable the UCX MTL and not only the
BTL. I would use "--mca pml ob1 --mca btl ^ucx —mca btl_openib_allow_ib 1".
As you have a gdb session on the processes you can try to break on some of
the memory allocations function (malloc, realloc, calloc).
George.
Leo,
In a UMA system having the displacement and/or recvcounts arrays on managed
GPU memory should work, but it will incur overheads for at least 2 reasons:
1. the MPI API arguments are checked for correctness (here recvcounts)
2. the collective algorithm part that executes on the CPU uses the
There is an ongoing discussion about this on issue #4067 (
https://github.com/open-mpi/ompi/issues/4067). Also the mailing list
contains few examples on how to tweak the collective algorithms to your
needs.
George.
On Thu, Jun 6, 2019 at 7:42 PM hash join via users
wrote:
> Hi all,
>
>
> I
Depending on the alignment of the different types there might be small
holes in the low-level headers we exchange between processes It should not
be a concern for users.
valgrind should not stop on the first detected issue except
if --exit-on-first-error has been provided (the default value
Jon,
The configure AC_HEADER_STDC macro is considered obsolete [1] as most of
the OSes are STDC compliant nowadays. To have it failing on a recent
version of OSX, is therefore something unexpected. Moreover, many of the
OMPI developers work on OSX Mojave with the default compiler but with the
63 matches
Mail list logo