Re: [OMPI users] Not getting zero-copy with custom datatype

2024-04-23 Thread George Bosilca via users
zero copy does not work with non-contiguous datatypes (it would require both processes to know the memory layout used by the peer). As long as the memory layout described by the type can be seen as contiguous (even if described otherwise), it should work just fine. George. On Tue, Apr 23, 2024

Re: [OMPI users] UFLM only works on a single node???

2024-03-24 Thread George Bosilca via users
All the examples work for me on using ULFM ge87f595 compiled with minimalistic options: '--prefix=XXX --enable-picky --enable-debug --disable-heterogeneous --enable-contrib-no-build=vt --enable-mpirun-prefix-by-default --enable-mpi-ext=ftmpi --with-ft=mpi --with-pmi'. I run using ipoib, so I

Re: [OMPI users] Homebrew-installed OpenMPI 5.0.1 can't run a simple test program

2024-02-05 Thread George Bosilca via users
That's not for the MPI communications but for the process management part (PRRTE/PMIX). If forcing the PTL to `lo` worked it mostly indicates that the shared memory in OMPI was able to be set up correctly. George. On Mon, Feb 5, 2024 at 3:47 PM John Hearns wrote: > Stupid question... Why is

Re: [OMPI users] Homebrew-installed OpenMPI 5.0.1 can't run a simple test program

2024-02-05 Thread George Bosilca via users
That would be something @Ralph Castain needs to be looking at as he declared in a previous discussion that `lo` was the default for PMIX and we now have 2 reports stating otherwise. George. On Mon, Feb 5, 2024 at 3:15 PM John Haiducek wrote: > Adding '--pmixmca ptl_tcp_if_include lo0' to the

Re: [OMPI users] Homebrew-installed OpenMPI 5.0.1 can't run a simple test program

2024-02-05 Thread George Bosilca via users
OMPI seems unable to create a communication medium between your processes. There are few known issues on OSX, please read https://github.com/open-mpi/ompi/issues/12273 for more info. Can you provide the header of the ompi_info command. What I'm interested on is the part about `Configure command

Re: [OMPI users] [EXT] Re: Error handling

2023-07-19 Thread George Bosilca via users
I think the root cause was that he expected the negative integer resulting from the reduction to be the exit code of the application, and as I explained in my prior email that's not how exit() works. The exit() issue aside, MPI_Abort seems to be the right function for this usage. George. On

Re: [OMPI users] [EXT] Re: Error handling

2023-07-19 Thread George Bosilca via users
Alex, exit(status) does not make status available to the parent process wait, instead it makes the low 8 bits available to the parent as unsigned. This explains why small positive values seem to work correctly while negative values do not (because of the 32 bits negative value representation in

Re: [OMPI users] Error handling

2023-07-18 Thread George Bosilca via users
Alex, How are your values "random" if you provide correct values ? Even for negative values you could use MIN to pick one value and return it. What is the problem with `MPI_Abort` ? it does seem to do what you want. George. On Tue, Jul 18, 2023 at 4:38 AM Alexander Stadik via users <

Re: [OMPI users] OMPI compilation error in Making all datatypes

2023-07-12 Thread George Bosilca via users
I can't replicate this on my setting, but I am not using the tar archive from the OMPI website (I use the git tag). Can you do `ls -l opal/datatype/.lib` in your build directory. George. On Wed, Jul 12, 2023 at 7:14 AM Elad Cohen via users < users@lists.open-mpi.org> wrote: > Hi Jeff, thanks

Re: [OMPI users] Q: Getting MPI-level memory use from OpenMPI?

2023-04-17 Thread George Bosilca via users
Some folks from ORNL have done some studies about OMPI memory usage a few years ago, but I am not sure if these studies are openly available. OMPI manages all the MCA parameters, user facing requests, unexpected messages, temporary buffers for collectives and IO. And those are, I might be slightly

Re: [OMPI users] Q: Getting MPI-level memory use from OpenMPI?

2023-04-17 Thread George Bosilca via users
Brian, OMPI does not have an official mechanism to report how much memory OMPI allocates. But, there is hope: 1. We have a mechanism to help debug memory issues (OPAL_ENABLE_MEM_DEBUG). You could enable it and then provide your own flavor of memory tracking in opal/util/malloc.c 2. You can use a

Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-06 Thread George Bosilca via users
Edgar is right, UCX_TLS has some role in the selection. You can see the current selection by running `uxc_info -c`. In my case, UCX_TLS is set to `all` somehow, and I had either a not-connected IB device or a GPU. However, I did not set UCX_TLS manually, and I can't see it anywhere in my system

Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-06 Thread George Bosilca via users
ucx PML should work just fine even on a single node scenario. As Jeff indicated you need to move the MCA param `--mca pml ucx` before your command. George. On Mon, Mar 6, 2023 at 9:48 AM Jeff Squyres (jsquyres) via users < users@lists.open-mpi.org> wrote: > If this run was on a single node,

Re: [OMPI users] Subcommunicator communications do not complete intermittently

2022-09-11 Thread George Bosilca via users
Assuming a correct implementation the described communication pattern should work seamlessly. Would it be possible to either share a reproducer or provide the execution stack by attaching a debugger to the deadlocked application to see the state of the different processes. I wonder if all

Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread George Bosilca via users
This error seems to be initiated from the PMIX regex framework. Not sure exactly which one is used, but a good starting point is in one of the files in 3rd-party/openpmix/src/mca/preg/. Look for the generate_node_regex function in the different components, one of them is raising the error.

Re: [OMPI users] Quality and details of implementation for Neighborhood collective operations

2022-06-08 Thread George Bosilca via users
There is a lot of FUD regarding the so-called optimizations for neighborhood collectives. In general, they all converge toward creating a globally consistent communication order. If the neighborhood topology is regular, some parts of the globally consistent communication order can be inferred, but

Re: [OMPI users] Quality and details of implementation for Neighborhood collective operations

2022-06-08 Thread George Bosilca via users
Michael, As far as I know none of the implementations of the neighborhood collectives in OMPI are architecture-aware. The only 2 components that provide support for neighborhood collectives are basic (for the blocking version) and libnbc (for the non-blocking versions). George. On Wed, Jun

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread George Bosilca via users
That is weird, but maybe it is not a deadlock, but a very slow progress. In the child can you print the fdmax and i in the frame do_child. George. On Thu, May 5, 2022 at 11:50 AM Scott Sayres via users < users@lists.open-mpi.org> wrote: > Jeff, thanks. > from 1: > > (lldb) process attach --pid

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread George Bosilca via users
frame #2: 0x00010784b418 > mca_odls_default.so`odls_default_fork_local_proc > + 284 > > frame #3: 0x0001002c7914 > libopen-rte.40.dylib`orte_odls_base_spawn_proc > + 968 > > frame #4: 0x0001003d96dc > libevent_core-2.1.7.dylib`event_process_active_

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread George Bosilca via users
I compiled a fresh copy of the 4.1.3 branch on my M1 laptop, and I can run both MPI and non-MPI apps without any issues. Try running `lldb mpirun -- -np 1 hostname` and once it deadlocks, do a CTRL+C to get back on the debugger and then `backtrace` to see where it is waiting. George. On Wed,

Re: [OMPI users] help with M1 chip macOS openMPI installation

2022-04-22 Thread George Bosilca via users
lib > > "_opal_atomic_wmb", referenced from: > > import-atom in libopen-pal.dylib > > ld: symbol(s) not found for architecture x86_64 > > make[2]: *** [opal_wrapper] Error 1 > > make[1]: *** [all-recursive] Error 1 > > make: *** [all-recursive] Error

Re: [OMPI users] help with M1 chip macOS openMPI installation

2022-04-21 Thread George Bosilca via users
1. I am not aware of any outstanding OMPI issues with the M1 chip that would prevent OMPI from compiling and running efficiently in an M1-based setup, assuming the compilation chain is working properly. 2. M1 supports x86 code via Rosetta, an app provided by Apple to ensure a smooth transition

Re: [OMPI users] Monitoring an openmpi cluster.

2022-04-08 Thread George Bosilca via users
Vladimir, A while back the best cluster monitoring tool was Ganglia ( http://ganglia.sourceforge.net/), but it has not been maintained for several years. There are quite a few alternatives out there, I found nightingale (https://github.com/didi/nightingale) to be simple to install and use. Good

Re: [OMPI users] Regarding process binding on OS X with oversubscription

2022-03-17 Thread George Bosilca via users
Sajid, `--bind-to-core` should have generated the same warning on OSX. Not sure why this is happening, but I think the real bug here is the lack of warning when using the deprecated argument. Btw, the current master does not even accept 'bind-to-core', instead it complains about 'unrecognized

Re: [OMPI users] Regarding process binding on OS X with oversubscription

2022-03-17 Thread George Bosilca via users
OMPI cannot support process binding on OSX because, as the message indicates, there is no OS API for process binding (at least not exposed to the user-land applications). George. On Thu, Mar 17, 2022 at 3:25 PM Sajid Ali via users < users@lists.open-mpi.org> wrote: > Hi OpenMPI-developers, >

Re: [OMPI users] MPI_Intercomm_create error

2022-03-16 Thread George Bosilca via users
I see similar issues on platforms with multiple IP addresses, if some of them are not fully connected. In general, specifying which interface OMPI can use (with --mca btl_tcp_if_include x.y.z.t/s) solves the problem. George. On Wed, Mar 16, 2022 at 5:11 PM Mccall, Kurt E. (MSFC-EV41) via

Re: [OMPI users] Call to MPI_Allreduce() returning value 15

2022-03-09 Thread George Bosilca via users
There are two ways the MPI_Allreduce returns MPI_ERR_TRUNCATE: 1. it is propagated from one of the underlying point-to-point communications, which means that at least one of the participants has an input buffer with a larger size. I know you said the size is fixed, but it only matters if all

Re: [OMPI users] Where can a graph communicator be used?

2022-02-15 Thread George Bosilca via users
Sorry, I should have been more precise in my answer. Topology information is only used during neighborhood communications via the specialized API, in all other cases the communicator would behave as a normal, fully connected, communicator. George. On Tue, Feb 15, 2022 at 9:28 AM Neil Carlson

Re: [OMPI users] Where can a graph communicator be used?

2022-02-14 Thread George Bosilca via users
On Mon, Feb 14, 2022 at 6:33 PM Neil Carlson via users < users@lists.open-mpi.org> wrote: > I've been successful at using MPI_Dist_graph_create_adjacent to create a > new communicator with graph topology, and using it with > MPI_Neighbor_alltoallv. But I have a few questions: > > 1. Where can I

Re: [OMPI users] Using OSU benchmarks for checking Infiniband network

2022-02-11 Thread George Bosilca via users
I am not sure I understand the comment about MPI_T. Each network card has internal counters that can be gathered by any process on the node. Similarly, some information is available from the switches, but I always assumed that information is aggregated across all ongoing jobs. But, merging the

Re: [OMPI users] Using OSU benchmarks for checking Infiniband network

2022-02-11 Thread George Bosilca via users
Collecting data during execution is possible in OMPI either with an external tool, such as mpiP, or the internal infrastructure, SPC. Take a look at ./examples/spc_example.c or ./test/spc/spc_test.c to see how to use this. George. On Fri, Feb 11, 2022 at 9:43 AM Bertini, Denis Dr. via users <

Re: [OMPI users] unexpected behavior when combining MPI_Gather and MPI_Type_vector

2021-12-16 Thread George Bosilca via users
Jonas, The section 5.1.6 in MPI 4.0 should give you a better idea about the differences between size, extent and true extent. There are also few examples in Section 5.1.14 on how to manipulate the datatype using extent. I think you should find Examples 5.13 to 5.16 of particular interest. Best,

Re: [OMPI users] unexpected behavior when combining MPI_Gather and MPI_Type_vector

2021-12-16 Thread George Bosilca via users
You are confusing the size and extent of the datatype. The size (aka the physical number of bytes described by the memory layout) would be m*nloc*sizeof(type), while the extent will be related to where you expect the second element of the same type to start. If you do resize, you will incorporate

Re: [OMPI users] MPI_ERR_TAG: invalid tag

2021-09-19 Thread George Bosilca via users
The error message is self explanatory, the application calls MPI_Recv with an invalid TAG. The MPI standard defines a valid tag as a positive integer between 0 and the value of the MPI_UB_TAG attribute on MPI_COMM_WORLD. At this point it seems plausible this is an application issue. Check that

Re: [OMPI users] Question about MPI_T

2021-08-17 Thread George Bosilca via users
You need to enable the monitoring PML in order to get access to the pml_monitoring_messages_count MPI_T. For this you need to know what PML you are currently using and add monitoring to the pml MCA variable. As an example if you use ob1 you should add the following to your mpirun command "--mca

Re: [OMPI users] Allreduce with Op

2021-03-13 Thread George Bosilca via users
Hi Pierre, MPI is allowed to pipeline the collective communications. This explains why the MPI_Op takes the len of the buffers as an argument. Because your MPI_Op ignores this length it alters data outside the temporary buffer we use for the segment. Other versions of the MPI_Allreduce

Re: [OMPI users] AVX errors building OpenMPI 4.1.0

2021-02-05 Thread George Bosilca via users
Carl, AVX support was introduced in 4.1 which explains why you did not have such issues before. What is your configure command in these 2 cases ? Please create an issue on github and attach your config.log. George. On Fri, Feb 5, 2021 at 2:44 PM Carl Ponder via users <

Re: [OMPI users] Timeout in MPI_Bcast/MPI_Barrier?

2021-01-11 Thread George Bosilca via users
gt; 4.- The hostfile. > > > > The duration of the delay is just a few seconds, about 3 ~ 4. > > > > Essentially, the first error message I get from a waiting process is > "74: MPI_ERR_PROC_FAILED: Process Failure". > > > > Hope this information can

Re: [OMPI users] Timeout in MPI_Bcast/MPI_Barrier?

2021-01-08 Thread George Bosilca via users
Daniel, There are no timeouts in OMPI with the exception of the initial connection over TCP, where we use the socket timeout to prevent deadlocks. As you already did quite a few communicator duplications and other collective communications before you see the timeout, we need more info about this.

Re: [OMPI users] MPI_type_free question

2020-12-04 Thread George Bosilca via users
and hence rule out any memory > leak that could be triggered by your fast interconnect. > > > > In any case, a reproducer will greatly help us debugging this issue. > > > Cheers, > > > Gilles > > > > On 12/4/2020 7:20 AM, George Bosilca via users

Re: [OMPI users] MPI_type_free question

2020-12-03 Thread George Bosilca via users
Patrick, I'm afraid there is no simple way to check this. The main reason being that OMPI use handles for MPI objects, and these handles are not tracked by the library, they are supposed to be provided by the user for each call. In your case, as you already called MPI_Type_free on the datatype,

Re: [OMPI users] Vader - Where to Look for Shared Memory Use

2020-07-22 Thread George Bosilca via users
John, There are many things in play in such an experiment. Plus, expecting linear speedup even at the node level is certainly overly optimistic. 1. A single core experiment has full memory bandwidth, so you will asymptotically reach the max flops. Adding more cores will increase the memory

Re: [OMPI users] Error with MPI_GET_ADDRESS and MPI_TYPE_CREATE_RESIZED?

2020-05-17 Thread George Bosilca via users
Diego, I see nothing wrong with the way you create the datatype. In fact this is the perfect example on how to almost do it right in FORTRAN. The almost is because your code is highly dependent on the -r8 compiler option (otherwise the REAL in your type will not match the MPI_DOUBLE_PRECISION you

Re: [OMPI users] Regarding eager limit relationship to send message size

2020-03-26 Thread George Bosilca via users
An application that rely on MPI eager buffers for correctness or performance is an incorrect application. Among many other points simply because MPI implementations without support for eager are legit. Moreover, these applications also miss the point on performance. Among the overheads I am not

Re: [OMPI users] Regarding eager limit relationship to send message size

2020-03-25 Thread George Bosilca via users
On Wed, Mar 25, 2020 at 4:49 AM Raut, S Biplab wrote: > [AMD Official Use Only - Internal Distribution Only] > > > > Dear George, > > Thank you the reply. But my question is more > particularly on the message size from application side. > > > > Let’s say the application

Re: [OMPI users] Regarding eager limit relationship to send message size

2020-03-24 Thread George Bosilca via users
Biplab, The eager is a constant for each BTL, and it represent the data that is sent eagerly with the matching information out of the entire message. So, if the question is how much memory is needed to store all the eager messages then the answer will depend on the communication pattern of your

Re: [OMPI users] Limits of communicator size and number of parallel broadcast transmissions

2020-03-17 Thread George Bosilca via users
On Mon, Mar 16, 2020 at 6:15 PM Konstantinos Konstantinidis via users < users@lists.open-mpi.org> wrote: > Hi, I have some questions regarding technical details of MPI collective > communication methods and broadcast: > >- I want to understand when the number of receivers in a MPI_Bcast can >

Re: [OMPI users] Fault in not recycling bsend buffer ?

2020-03-17 Thread George Bosilca via users
Martyn, I don't know exactly what your code is doing, but based on your inquiry I assume you are using MPI_BSEND multiple times and you run out of local buffers. The MPI standard does not mandate a wait until buffer space becomes available, because that can lead to deadlocks (communication

Re: [OMPI users] Trouble with Mellanox's hcoll component and MPI_THREAD_MULTIPLE support?

2020-02-04 Thread George Bosilca via users
Hcoll will be present in many cases, you don’t really want to skip them all. I foresee 2 problem with the approach you propose: - collective components are selected per communicator, so even if they will not be used they are still loaded. - from outside the MPI library you have little access to

Re: [OMPI users] Trouble with Mellanox's hcoll component and MPI_THREAD_MULTIPLE support?

2020-02-03 Thread George Bosilca via users
If I'm not mistaken, hcoll is playing with the opal_progress in a way that conflicts with the blessed usage of progress in OMPI and prevents other components from advancing and timely completing requests. The impact is minimal for sequential applications using only blocking calls, but is

Re: [OMPI users] HELP: openmpi is not using the specified infiniband interface !!

2020-01-14 Thread George Bosilca via users
According to the error message you are using MPICH not Open MPI. George. On Tue, Jan 14, 2020 at 5:53 PM SOPORTE MODEMAT via users < users@lists.open-mpi.org> wrote: > Hello everyone. > > > > I would like somebody help me to figure out how can I make that the > openmpi use the infiniband

Re: [OMPI users] Non-blocking send issue

2020-01-02 Thread George Bosilca via users
This is going back to the fact that you, as a developer, are the best placed to know exactly when asynchronous progress is needed for your algorithm, so from that perspective you can provide that progress in the most timely manner. One way to force MPI to do progress, is to spawn another thread

Re: [OMPI users] Non-blocking send issue

2019-12-31 Thread George Bosilca via users
Martin, The MPI standard does not mandate progress outside MPI calls, thus implementations are free to provide, or not, asynchronous progress. Calling MPI_Test provides the MPI implementation with an opportunity to progress it's internal communication queues. However, an implementation could try

Re: [OMPI users] CUDA mpi question

2019-11-28 Thread George Bosilca via users
Wonderful maybe but extremely unportable. Thanks but no thanks! George. On Wed, Nov 27, 2019 at 11:07 PM Zhang, Junchao wrote: > Interesting idea. But doing MPI_THREAD_MULTIPLE has other side-effects. If > MPI nonblocking calls could take an extra stream argument and work like a > kernel

Re: [OMPI users] CUDA mpi question

2019-11-27 Thread George Bosilca via users
On Wed, Nov 27, 2019 at 5:02 PM Zhang, Junchao wrote: > On Wed, Nov 27, 2019 at 3:16 PM George Bosilca > wrote: > >> Short and portable answer: you need to sync before the Isend or you will >> send garbage data. >> > Ideally, I want to formulate my code into a series of asynchronous "kernel >

Re: [OMPI users] CUDA mpi question

2019-11-27 Thread George Bosilca via users
Short and portable answer: you need to sync before the Isend or you will send garbage data. Assuming you are willing to go for a less portable solution you can get the OMPI streams and add your kernels inside, so that the sequential order will guarantee correctness of your isend. We have 2 hidden

Re: [OMPI users] Program hangs when MPI_Bcast is called rapidly

2019-10-29 Thread George Bosilca via users
Charles, Having implemented some of the underlying collective algorithms, I am puzzled by the need to force the sync to 1 to have things flowing. I would definitely appreciate a reproducer so that I can identify (and hopefully) fix the underlying problem. Thanks, George. On Tue, Oct 29, 2019

Re: [OMPI users] Program hangs when MPI_Bcast is called rapidly

2019-10-29 Thread George Bosilca via users
Charles, There is a known issue with calling collectives on a tight loop, due to lack of control flow at the network level. It results in a significant slow-down, that might appear as a deadlock to users. The work around this is to enable the sync collective module, that will insert a fake

Re: [OMPI users] growing memory use from MPI application

2019-06-19 Thread George Bosilca via users
To completely disable UCX you need to disable the UCX MTL and not only the BTL. I would use "--mca pml ob1 --mca btl ^ucx —mca btl_openib_allow_ib 1". As you have a gdb session on the processes you can try to break on some of the memory allocations function (malloc, realloc, calloc). George.

Re: [OMPI users] Can displs in Scatterv/Gatherv/etc be a GPU array for CUDA-aware MPI?

2019-06-11 Thread George Bosilca via users
Leo, In a UMA system having the displacement and/or recvcounts arrays on managed GPU memory should work, but it will incur overheads for at least 2 reasons: 1. the MPI API arguments are checked for correctness (here recvcounts) 2. the collective algorithm part that executes on the CPU uses the

Re: [OMPI users] Open questions on MPI_Allreduce background implementation

2019-06-08 Thread George Bosilca via users
There is an ongoing discussion about this on issue #4067 ( https://github.com/open-mpi/ompi/issues/4067). Also the mailing list contains few examples on how to tweak the collective algorithms to your needs. George. On Thu, Jun 6, 2019 at 7:42 PM hash join via users wrote: > Hi all, > > > I

Re: [OMPI users] OMPI 4.0.1 valgrind error on simple MPI_Send()

2019-04-30 Thread George Bosilca via users
Depending on the alignment of the different types there might be small holes in the low-level headers we exchange between processes It should not be a concern for users. valgrind should not stop on the first detected issue except if --exit-on-first-error has been provided (the default value

Re: [OMPI users] 3.0.4, 4.0.1 build failure on OSX Mojave with LLVM

2019-04-24 Thread George Bosilca via users
Jon, The configure AC_HEADER_STDC macro is considered obsolete [1] as most of the OSes are STDC compliant nowadays. To have it failing on a recent version of OSX, is therefore something unexpected. Moreover, many of the OMPI developers work on OSX Mojave with the default compiler but with the