Re: [OMPI users] Binding to thread 0

2023-09-11 Thread Nathan Hjelm via users
Isn't this a case for --map-by core --bind-to hwthread? Because you want to map each process by core but bind the the first hwthread.From the looks of it your process is both binding and mapping by hwthread now. -NathanOn Sep 11, 2023, at 10:20 AM, Luis Cebamanos via users wrote:@Gilles @Jeff

Re: [OMPI users] MPI_Get is slow with structs containing padding

2023-03-30 Thread Nathan Hjelm via users
That is exactly the issue. Part of the reason I have argued against MPI_SHORT_INT usage in RMA because even though it is padded due to type alignment we are still not allowed to operate on the bits between the short and the int. We can correct that one in the standard by adding the same languag

Re: [OMPI users] MPI_Get is slow with structs containing padding

2023-03-30 Thread Nathan Hjelm via users
Yes. This is absolutely normal. When you give MPI non-contiguous data it has to break out down into one operation per contiguous region. If you have a non-RDMA network Ross can lead to very poor performance. With RDMA networks it will also be much slower than a contiguous get but lower overhead

Re: [OMPI users] Newbie With Issues

2021-03-30 Thread Nathan Hjelm via users
I find it bizarre that icc is looking for a C++ library. That aside if I remember correctly intel's compilers do not provide a C++ stdlib implementation but instead rely on the one from gcc. You need to verify that libstdc++ is installed on the system. On Ubuntu/debian this can be installed wit

Re: [OMPI users] Stable and performant openMPI version for Ubuntu20.04 ?

2021-03-04 Thread Nathan Hjelm via users
I would run the v4.x series and install xpmem if you can (http://github.com/hjelmn/xpmem ). You will need to build with —with-xpmem=/path/to/xpmem to use xpmem otherwise vader will default to using CMA. This will provide the best possible performance. -Nathan >

Re: [OMPI users] Help with One-Sided Communication: Works in Intel MPI, Fails in Open MPI

2020-02-24 Thread Nathan Hjelm via users
The error is from btl/vader. CMA is not functioning as expected. It might work if you set btl_vader_single_copy_mechanism=none Performance will suffer though. It would be worth understanding with process_readv is failing. Can you send a simple reproducer? -Nathan > On Feb 24, 2020, at 2:59 PM

Re: [OMPI users] OpenMPI slowdown in latency bound application

2019-08-28 Thread Nathan Hjelm via users
Is this overall runtime or solve time? The former is essentially meaningless as it includes all the startup time (launch, connections, etc). Especially since we are talking about seconds here. -Nathan > On Aug 28, 2019, at 9:10 AM, Cooper Burns via users > wrote: > > Peter, > > It looks lik

Re: [OMPI users] How is the rank determined (Open MPI and Podman)

2019-07-22 Thread Nathan Hjelm via users
Just add it to the existing modex. -Nathan > On Jul 22, 2019, at 12:20 PM, Adrian Reber via users > wrote: > > I have most of the code ready, but I still have troubles doing > OPAL_MODEX_RECV. I am using the following lines, based on the code from > orte/test/mpi/pmix.c: > > OPAL_MODEX_SEND_V

Re: [OMPI users] How it the rank determined (Open MPI and Podman)

2019-07-21 Thread Nathan Hjelm via users
Patches are always welcome. What would be great is a nice big warning that CMA support is disabled because the processes are on different namespaces. Ideally all MPI processes should be on the same namespace to ensure the best performance. -Nathan > On Jul 21, 2019, at 2:53 PM, Adrian Reber v

Re: [OMPI users] undefined reference error related to ucx

2019-06-26 Thread Nathan Hjelm via users
Unless you are using OSMEM I do not recommend using UCX on a Cray. You will likely get better performance with the built-in uGNI support. -Nathan > On Jun 25, 2019, at 1:51 AM, Passant A. Hafez via users > wrote: > > Thanks Gilles! > > The thing is I'm having this error > ud_iface.c:271 UCX

Re: [OMPI users] growing memory use from MPI application

2019-06-20 Thread Nathan Hjelm via users
THAT is a good idea. When using Omnipath we see an issue with stale files in /dev/shm if the application exits abnormally. I don't know if UCX uses that space as well. -Nathan On June 20, 2019 at 11:05 AM, Joseph Schuchart via users wrote: Noam, Another idea: check for stale files in /de

Re: [OMPI users] Latencies of atomic operations on high-performance networks

2019-05-09 Thread Nathan Hjelm via users
 > On May 9, 2019, at 12:37 AM, Joseph Schuchart via users > wrote: > > Nathan, > > Over the last couple of weeks I made some more interesting observations > regarding the latencies of accumulate operations on both Aries and InfiniBand > systems: > > 1) There seems to be a sig

Re: [OMPI users] Latencies of atomic operations on high-performance networks

2019-05-09 Thread Nathan Hjelm via users
and_op + MPI_NO_OP is 2x that of MPI_Fetch_and_op + MPI_SUM on > 64bit values, roughly matching the latency of 32bit compare-exchange > operations. > > All measurements were done using Open MPI 3.1.2 with > OMPI_MCA_osc_rdma_acc_single_intrinsic=true. Is that behavior expected as >

Re: [OMPI users] Issues compiling HPL with OMPIv4.0.0

2019-04-03 Thread Nathan Hjelm via users
Giles is correct. If mpicc is showing errors like those in your original email then it is not invoking a C compiler. C does not have any concept of try or catch. No modern C compiler will complain about a variable named “try” as it is not a reserved keyword in the C language. Example: foo.c:

Re: [OMPI users] mpi_comm_dup + mpi_comm_group Issue

2019-04-02 Thread Nathan Hjelm via users
That is perfectly valid. The MPI processes that make up the group are all part of comm world. I would file a bug with Intel MPI. -Nathan > On Apr 2, 2019, at 7:11 AM, Stella Paronuzzi > wrote: > > Good afternoon, I am attaching a simple fortran code that: > calls the MPI_INIT > duplicates th

Re: [OMPI users] Using strace with Open MPI on Cray

2019-03-30 Thread Nathan Hjelm via users
Add --mca btl ^tcp to your mpirun command line. It shouldn't be used on a Cray. > On Mar 30, 2019, at 2:00 PM, Christoph Niethammer wrote: > > Short update: > > The polled file descriptor is related to a socket, which I identified to be > the local tcp btl connection ... > On a Lustre file sys

Re: [OMPI users] error "unacceptable operand for unary &" for openmpi-master-201903260242-dfbc144 on Linux with Sun C

2019-03-26 Thread Nathan Hjelm via users
This really looks like a compiler bug. There is no & @ osc_pt2pt.h line 579. There is one at line 577 but there is no “unacceptable operand” on that line. If I have time this week I will try to find a workaround but it might be worth filing a bug with Oracle and see what they say. -Nathan > On

Re: [OMPI users] Best way to send on mpi c, architecture dependent data type

2019-03-14 Thread Nathan Hjelm via users
Why not just use C99 stdint? That gives you fixes-size types. -Nathan > On Mar 14, 2019, at 9:38 AM, George Reeke wrote: > > On Wed, 2019-03-13 at 22:10 +, Sergio None wrote: >> Hello. >> >> >> I'm using OpenMPI 3.1.3 on x64 CPU and two ARMv8( Raspberry pi 3). >> >> >> But i'm having s

Re: [OMPI users] Segfault with OpenMPI 4 and dynamic window

2019-02-16 Thread Nathan Hjelm via users
Probably not. I think this is now fixed. Might be worth trying master to verify. > On Feb 16, 2019, at 7:01 AM, Bart Janssens wrote: > > Hi Gilles, > > Thanks, that works (I had to put quotes around the ^rdma). Should I file a > github issue? > > Cheers, > > Bart >> On 16 Feb 2019, 14:05 +

[OMPI users] Fwd: Minimum time between MPI_Bcast or MPI_Reduce calls?

2019-01-18 Thread Nathan Hjelm via users
Since neither bcast nor reduce acts as a barrier it is possible to run out of resources if either of these calls (or both) are used in a tight loop. The sync coll component exists for this scenario. You can enable it by adding the following to mpirun (or setting these variables through the env

Re: [OMPI users] Increasing OpenMPI RMA win attach region count.

2019-01-09 Thread Nathan Hjelm via users
If you need to support more attachments you can set the value of that variable either by setting: Environment: OMPI_MCA_osc_rdma_max_attach mpirun command line: —mca osc_rdma_max_attach Keep in mind that each attachment may use an underlying hardware resource that may be easy to exhaust (h

Re: [OMPI users] Querying/limiting OpenMPI memory allocations

2018-12-20 Thread Nathan Hjelm via users
How many nodes are you using? How many processes per node? What kind of processor? Open MPI version? 25 GB is several orders of magnitude more memory than should be used except at extreme scale (1M+ processes). Also, how are you calculating memory usage? -Nathan > On Dec 20, 2018, at 4:49 AM,

Re: [OMPI users] Hang in mpi on 32-bit

2018-11-26 Thread Nathan Hjelm via users
Can you try configuring with —disable-builtin-atomics and see if that fixes the issue for you? -Nathan > On Nov 26, 2018, at 9:11 PM, Orion Poplawski wrote: > > Hello - > > We are starting to see some mpi processes "hang" (really cpu spin and never > complete) on 32 bit architectures on Fed

Re: [OMPI users] [Open MPI Announce] Open MPI 4.0.0 Released

2018-11-14 Thread Nathan Hjelm via users
I really need to update that wording. It has been awhile and the code seems to have stabilized. It’s quite safe to use and supports some of the latest kernel versions. -Nathan > On Nov 13, 2018, at 11:06 PM, Bert Wesarg via users > wrote: > > Dear Takahiro, > On Wed, Nov 14, 2018 at 5:38 AM

Re: [OMPI users] Latencies of atomic operations on high-performance networks

2018-11-08 Thread Nathan Hjelm via users
previously tested with 3.1.3 on the IB cluster, which ran fine. If I use the same version I run into the same problem on both systems (with --mca btl_openib_allow_ib true --mca osc_rdma_acc_single_intrinsic true). I have not tried using UCX for this. Joseph On 11/8/18 1:20 PM, Nathan Hjelm via users

Re: [OMPI users] Latencies of atomic operations on high-performance networks

2018-11-08 Thread Nathan Hjelm via users
reply, settingosc_rdma_acc_single_intrinsic=true does the trick for both shared andexclusive locks and brings it down to <2us per operation. I hope thatthe info key will make it into the next version of the standard, Icertainly have use for it :)Cheers,JosephOn 11/6/18 12:13 PM, Nathan Hjelm via users w

Re: [OMPI users] Latencies of atomic operations on high-performance networks

2018-11-06 Thread Nathan Hjelm via users
All of this is completely expected. Due to the requirements of the standard it is difficult to make use of network atomics even for MPI_Compare_and_swap (MPI_Accumulate and MPI_Get_accumulate spoil the party). If you want MPI_Fetch_and_op to be fast set this MCA parameter: osc_rdma_acc_sing

Re: [OMPI users] [version 2.1.5] invalid memory reference

2018-10-11 Thread Nathan Hjelm via users
Those features (MPI_LB/MPI_UB/MPI_Type_struct) were removed in MPI-3.0. It is fairly straightforward to update the code to be MPI-3.0 compliant. MPI_Type_struct -> MPI_Type_create_struct MPI_LB/MPI_UB -> MPI_Type_create_resized Example: types[0] = MPI_LB; disp[0] = my_lb; lens[0] = 1; types[1

Re: [OMPI users] [open-mpi/ompi] vader compile issue (#5814)

2018-10-02 Thread Nathan Hjelm via users
Definitely a compiler bug. I opened a PR to work around it and posted a question on the Oracle forums.-NathanOn Oct 02, 2018, at 12:48 AM, Siegmar Gross wrote:Hi Jeff, hi Nathan,the compilers (Sun C 5.15, Sun C 5.14, Sun C 5.13) don't like the code.loki tmp 110 cc -Vcc: Studio 12.6 Sun C 5.15 Linu

Re: [OMPI users] [open-mpi/ompi] vader compile issue (#5814)

2018-10-02 Thread Nathan Hjelm via users
hmm. Add #include to the test and try it again. -Nathan > On Oct 2, 2018, at 12:41 AM, Siegmar Gross > wrote: > > Hi Jeff, hi Nathan, > > the compilers (Sun C 5.15, Sun C 5.14, Sun C 5.13) don't like the code. > > loki tmp 110 cc -V > cc: Studio 12.6 Sun C 5.15 Linux_i386 2017/05/30 > lok

Re: [OMPI users] pt2pt osc required for single-node runs?

2018-09-06 Thread Nathan Hjelm via users
You can either move to MPI_Win_allocate or try the v4.0.x snapshots. I will look at bringing the btl/vader support for osc/rdma back to v3.1.x. osc/pt2pt will probably never become truly thread safe. -Nathan On Sep 06, 2018, at 08:34 AM, Joseph Schuchart wrote: All, I installed Open MPI 3.1

Re: [OMPI users] MPI_MAXLOC problems

2018-08-28 Thread Nathan Hjelm via users
Yup. That is the case for all composed datatype which is what the tuple types  are. Predefined composed datatypes. -Nathan On Aug 28, 2018, at 02:35 PM, "Jeff Squyres (jsquyres) via users" wrote: I think Gilles is right: remember that datatypes like MPI_2DOUBLE_PRECISION are actually 2 valu

Re: [OMPI users] know which CPU has the maximum value

2018-08-10 Thread Nathan Hjelm via users
) plus the terrible names. If I could kill them in MPI-4 I would. > On Aug 10, 2018, at 9:47 AM, Diego Avesani wrote: > > Dear all, > I have just implemented MAXLOC, why should they go away? > it seems working pretty well. > > thanks > > Diego > > >> O

Re: [OMPI users] know which CPU has the maximum value

2018-08-10 Thread Nathan Hjelm via users
The problem is minloc and maxloc need to go away. better to use a custom op. > On Aug 10, 2018, at 9:36 AM, George Bosilca wrote: > > You will need to create a special variable that holds 2 entries, one for the > max operation (with whatever type you need) and an int for the rank of the > pro

Re: [OMPI users] Asynchronous progress in 3.1

2018-08-06 Thread Nathan Hjelm via users
It depends on the interconnect you are using. Some transports have async progress support but others do not. -Nathan On Aug 06, 2018, at 11:29 AM, "Palmer, Bruce J" wrote: Hi,   Is there anything that can be done to boost asynchronous progress for MPI RMA operations in OpenMPI 3.1? I’m try

Re: [OMPI users] local communicator and crash of the code

2018-08-03 Thread Nathan Hjelm via users
If your are trying to create a communicator containing all node local processes then use MPI_Comm_split_type. > On Aug 3, 2018, at 12:24 PM, Diego Avesani wrote: > > Deal all, > probably I have found the error. > Let's me check. Probably I have not properly set-up colors. > > Thanks a lot, >

Re: [OMPI users] Seg fault in opal_progress

2018-07-13 Thread Nathan Hjelm via users
Please give master a try. This looks like another signature of running out of space for shared memory buffers. -Nathan > On Jul 13, 2018, at 6:41 PM, Noam Bernstein > wrote: > > Just to summarize for the list. With Jeff’s prodding I got it generating > core files with the debug (and mem-deb

Re: [OMPI users] Seg fault in opal_progress

2018-07-11 Thread Nathan Hjelm via users
Might be also worth testing a master snapshot and see if that fixes the issue. There are a couple of fixes being backported from master to v3.0.x and v3.1.x now. -Nathan On Jul 11, 2018, at 03:16 PM, Noam Bernstein wrote: On Jul 11, 2018, at 11:29 AM, Jeff Squyres (jsquyres) via users wro

Re: [OMPI users] MPI_Ialltoallv

2018-07-06 Thread Nathan Hjelm via users
No, thats a bug. Please open an issue on github and we will fix it shortly. Thanks for reporting this issue. -Nathan > On Jul 6, 2018, at 8:08 AM, Stanfield, Clyde > wrote: > > We are using MPI_Ialltoallv for an image processing algorithm. When doing > this we pass in an MPI_Type_contiguous

Re: [OMPI users] Verbose output for MPI

2018-07-04 Thread Nathan Hjelm via users
--mca pmix_base_verbose 100 > On Jul 4, 2018, at 9:15 AM, Maksym Planeta > wrote: > > Hello, > > I have troubles figuring out how can I configure verbose output properly. > There is a call to pmix_output_verbose in > opal/mca/pmix/pmix3x/pmix/src/mca/ptl/tcp/ptl_tcp.c in function try_connect

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 3.1.0 Lock Up on POWER9 w/ CUDA9.2

2018-07-03 Thread Nathan Hjelm via users
Found this issue. PR #5374 fixes it. Will make its way into the v3.0.x and v3.1.x release series. -Nathan On Jul 02, 2018, at 02:36 PM, Nathan Hjelm wrote: The result should be the same with v3.1.1. I will investigate on our Coral test systems. -Nathan On Jul 02, 2018, at 02:23 PM, "Hammo