Re: [OMPI users] Binding to thread 0

2023-09-11 Thread Nathan Hjelm via users
Isn't this a case for --map-by core --bind-to hwthread? Because you want to map each process by core but bind the the first hwthread.From the looks of it your process is both binding and mapping by hwthread now. -NathanOn Sep 11, 2023, at 10:20 AM, Luis Cebamanos via users wrote:@Gilles @Jeff

Re: [OMPI users] MPI_Get is slow with structs containing padding

2023-03-30 Thread Nathan Hjelm via users
That is exactly the issue. Part of the reason I have argued against MPI_SHORT_INT usage in RMA because even though it is padded due to type alignment we are still not allowed to operate on the bits between the short and the int. We can correct that one in the standard by adding the same languag

Re: [OMPI users] MPI_Get is slow with structs containing padding

2023-03-30 Thread Nathan Hjelm via users
Yes. This is absolutely normal. When you give MPI non-contiguous data it has to break out down into one operation per contiguous region. If you have a non-RDMA network Ross can lead to very poor performance. With RDMA networks it will also be much slower than a contiguous get but lower overhead

Re: [OMPI users] Newbie With Issues

2021-03-30 Thread Nathan Hjelm via users
I find it bizarre that icc is looking for a C++ library. That aside if I remember correctly intel's compilers do not provide a C++ stdlib implementation but instead rely on the one from gcc. You need to verify that libstdc++ is installed on the system. On Ubuntu/debian this can be installed wit

Re: [OMPI users] Stable and performant openMPI version for Ubuntu20.04 ?

2021-03-04 Thread Nathan Hjelm via users
I would run the v4.x series and install xpmem if you can (http://github.com/hjelmn/xpmem ). You will need to build with —with-xpmem=/path/to/xpmem to use xpmem otherwise vader will default to using CMA. This will provide the best possible performance. -Nathan >

Re: [OMPI users] Help with One-Sided Communication: Works in Intel MPI, Fails in Open MPI

2020-02-24 Thread Nathan Hjelm via users
The error is from btl/vader. CMA is not functioning as expected. It might work if you set btl_vader_single_copy_mechanism=none Performance will suffer though. It would be worth understanding with process_readv is failing. Can you send a simple reproducer? -Nathan > On Feb 24, 2020, at 2:59 PM

Re: [OMPI users] OpenMPI slowdown in latency bound application

2019-08-28 Thread Nathan Hjelm via users
Is this overall runtime or solve time? The former is essentially meaningless as it includes all the startup time (launch, connections, etc). Especially since we are talking about seconds here. -Nathan > On Aug 28, 2019, at 9:10 AM, Cooper Burns via users > wrote: > > Peter, > > It looks lik

Re: [OMPI users] How is the rank determined (Open MPI and Podman)

2019-07-22 Thread Nathan Hjelm via users
t;>> namespace ID of the other process, but the function would then just >>>> return OPAL_ERROR a bit earlier instead of as a result of >>>> process_vm_{read,write}v(). Nothing would really change. >>>> >>>> A better place for the check would be mca_b

Re: [OMPI users] How it the rank determined (Open MPI and Podman)

2019-07-21 Thread Nathan Hjelm via users
Patches are always welcome. What would be great is a nice big warning that CMA support is disabled because the processes are on different namespaces. Ideally all MPI processes should be on the same namespace to ensure the best performance. -Nathan > On Jul 21, 2019, at 2:53 PM, Adrian Reber v

Re: [OMPI users] undefined reference error related to ucx

2019-06-26 Thread Nathan Hjelm via users
Unless you are using OSMEM I do not recommend using UCX on a Cray. You will likely get better performance with the built-in uGNI support. -Nathan > On Jun 25, 2019, at 1:51 AM, Passant A. Hafez via users > wrote: > > Thanks Gilles! > > The thing is I'm having this error > ud_iface.c:271 UCX

Re: [OMPI users] growing memory use from MPI application

2019-06-20 Thread Nathan Hjelm via users
THAT is a good idea. When using Omnipath we see an issue with stale files in /dev/shm if the application exits abnormally. I don't know if UCX uses that space as well. -Nathan On June 20, 2019 at 11:05 AM, Joseph Schuchart via users wrote: Noam, Another idea: check for stale files in /de

Re: [OMPI users] Latencies of atomic operations on high-performance networks

2019-05-09 Thread Nathan Hjelm via users
 > On May 9, 2019, at 12:37 AM, Joseph Schuchart via users > wrote: > > Nathan, > > Over the last couple of weeks I made some more interesting observations > regarding the latencies of accumulate operations on both Aries and InfiniBand > systems: > > 1) There seems to be a sig

Re: [OMPI users] Latencies of atomic operations on high-performance networks

2019-05-09 Thread Nathan Hjelm via users
and_op + MPI_NO_OP is 2x that of MPI_Fetch_and_op + MPI_SUM on > 64bit values, roughly matching the latency of 32bit compare-exchange > operations. > > All measurements were done using Open MPI 3.1.2 with > OMPI_MCA_osc_rdma_acc_single_intrinsic=true. Is that behavior expected as >

Re: [OMPI users] Issues compiling HPL with OMPIv4.0.0

2019-04-03 Thread Nathan Hjelm via users
Giles is correct. If mpicc is showing errors like those in your original email then it is not invoking a C compiler. C does not have any concept of try or catch. No modern C compiler will complain about a variable named “try” as it is not a reserved keyword in the C language. Example: foo.c:

Re: [OMPI users] mpi_comm_dup + mpi_comm_group Issue

2019-04-02 Thread Nathan Hjelm via users
That is perfectly valid. The MPI processes that make up the group are all part of comm world. I would file a bug with Intel MPI. -Nathan > On Apr 2, 2019, at 7:11 AM, Stella Paronuzzi > wrote: > > Good afternoon, I am attaching a simple fortran code that: > calls the MPI_INIT > duplicates th

Re: [OMPI users] Using strace with Open MPI on Cray

2019-03-30 Thread Nathan Hjelm via users
Add --mca btl ^tcp to your mpirun command line. It shouldn't be used on a Cray. > On Mar 30, 2019, at 2:00 PM, Christoph Niethammer wrote: > > Short update: > > The polled file descriptor is related to a socket, which I identified to be > the local tcp btl connection ... > On a Lustre file sys

Re: [OMPI users] error "unacceptable operand for unary &" for openmpi-master-201903260242-dfbc144 on Linux with Sun C

2019-03-26 Thread Nathan Hjelm via users
This really looks like a compiler bug. There is no & @ osc_pt2pt.h line 579. There is one at line 577 but there is no “unacceptable operand” on that line. If I have time this week I will try to find a workaround but it might be worth filing a bug with Oracle and see what they say. -Nathan > On

Re: [OMPI users] Best way to send on mpi c, architecture dependent data type

2019-03-14 Thread Nathan Hjelm via users
Why not just use C99 stdint? That gives you fixes-size types. -Nathan > On Mar 14, 2019, at 9:38 AM, George Reeke wrote: > > On Wed, 2019-03-13 at 22:10 +, Sergio None wrote: >> Hello. >> >> >> I'm using OpenMPI 3.1.3 on x64 CPU and two ARMv8( Raspberry pi 3). >> >> >> But i'm having s

Re: [OMPI users] Segfault with OpenMPI 4 and dynamic window

2019-02-16 Thread Nathan Hjelm via users
Probably not. I think this is now fixed. Might be worth trying master to verify. > On Feb 16, 2019, at 7:01 AM, Bart Janssens wrote: > > Hi Gilles, > > Thanks, that works (I had to put quotes around the ^rdma). Should I file a > github issue? > > Cheers, > > Bart >> On 16 Feb 2019, 14:05 +

[OMPI users] Fwd: Minimum time between MPI_Bcast or MPI_Reduce calls?

2019-01-18 Thread Nathan Hjelm via users
Since neither bcast nor reduce acts as a barrier it is possible to run out of resources if either of these calls (or both) are used in a tight loop. The sync coll component exists for this scenario. You can enable it by adding the following to mpirun (or setting these variables through the env

Re: [OMPI users] Increasing OpenMPI RMA win attach region count.

2019-01-09 Thread Nathan Hjelm via users
If you need to support more attachments you can set the value of that variable either by setting: Environment: OMPI_MCA_osc_rdma_max_attach mpirun command line: —mca osc_rdma_max_attach Keep in mind that each attachment may use an underlying hardware resource that may be easy to exhaust (h

Re: [OMPI users] Querying/limiting OpenMPI memory allocations

2018-12-20 Thread Nathan Hjelm via users
How many nodes are you using? How many processes per node? What kind of processor? Open MPI version? 25 GB is several orders of magnitude more memory than should be used except at extreme scale (1M+ processes). Also, how are you calculating memory usage? -Nathan > On Dec 20, 2018, at 4:49 AM,

Re: [OMPI users] Hang in mpi on 32-bit

2018-11-26 Thread Nathan Hjelm via users
Can you try configuring with —disable-builtin-atomics and see if that fixes the issue for you? -Nathan > On Nov 26, 2018, at 9:11 PM, Orion Poplawski wrote: > > Hello - > > We are starting to see some mpi processes "hang" (really cpu spin and never > complete) on 32 bit architectures on Fed

Re: [OMPI users] [Open MPI Announce] Open MPI 4.0.0 Released

2018-11-14 Thread Nathan Hjelm via users
I really need to update that wording. It has been awhile and the code seems to have stabilized. It’s quite safe to use and supports some of the latest kernel versions. -Nathan > On Nov 13, 2018, at 11:06 PM, Bert Wesarg via users > wrote: > > Dear Takahiro, > On Wed, Nov 14, 2018 at 5:38 AM

Re: [OMPI users] Latencies of atomic operations on high-performance networks

2018-11-08 Thread Nathan Hjelm via users
previously tested with 3.1.3 on the IB cluster, which ran fine. If I use the same version I run into the same problem on both systems (with --mca btl_openib_allow_ib true --mca osc_rdma_acc_single_intrinsic true). I have not tried using UCX for this. Joseph On 11/8/18 1:20 PM, Nathan Hjelm via users

Re: [OMPI users] Latencies of atomic operations on high-performance networks

2018-11-08 Thread Nathan Hjelm via users
reply, settingosc_rdma_acc_single_intrinsic=true does the trick for both shared andexclusive locks and brings it down to <2us per operation. I hope thatthe info key will make it into the next version of the standard, Icertainly have use for it :)Cheers,JosephOn 11/6/18 12:13 PM, Nathan Hjelm via users w

Re: [OMPI users] Latencies of atomic operations on high-performance networks

2018-11-06 Thread Nathan Hjelm via users
All of this is completely expected. Due to the requirements of the standard it is difficult to make use of network atomics even for MPI_Compare_and_swap (MPI_Accumulate and MPI_Get_accumulate spoil the party). If you want MPI_Fetch_and_op to be fast set this MCA parameter: osc_rdma_acc_sing

Re: [OMPI users] [version 2.1.5] invalid memory reference

2018-10-11 Thread Nathan Hjelm via users
Those features (MPI_LB/MPI_UB/MPI_Type_struct) were removed in MPI-3.0. It is fairly straightforward to update the code to be MPI-3.0 compliant. MPI_Type_struct -> MPI_Type_create_struct MPI_LB/MPI_UB -> MPI_Type_create_resized Example: types[0] = MPI_LB; disp[0] = my_lb; lens[0] = 1; types[1

Re: [OMPI users] [open-mpi/ompi] vader compile issue (#5814)

2018-10-02 Thread Nathan Hjelm via users
Definitely a compiler bug. I opened a PR to work around it and posted a question on the Oracle forums.-NathanOn Oct 02, 2018, at 12:48 AM, Siegmar Gross wrote:Hi Jeff, hi Nathan,the compilers (Sun C 5.15, Sun C 5.14, Sun C 5.13) don't like the code.loki tmp 110 cc -Vcc: Studio 12.6 Sun C 5.15 Linu

Re: [OMPI users] [open-mpi/ompi] vader compile issue (#5814)

2018-10-02 Thread Nathan Hjelm via users
hmm. Add #include to the test and try it again. -Nathan > On Oct 2, 2018, at 12:41 AM, Siegmar Gross > wrote: > > Hi Jeff, hi Nathan, > > the compilers (Sun C 5.15, Sun C 5.14, Sun C 5.13) don't like the code. > > loki tmp 110 cc -V > cc: Studio 12.6 Sun C 5.15 Linux_i386 2017/05/30 > lok

Re: [OMPI users] pt2pt osc required for single-node runs?

2018-09-06 Thread Nathan Hjelm via users
You can either move to MPI_Win_allocate or try the v4.0.x snapshots. I will look at bringing the btl/vader support for osc/rdma back to v3.1.x. osc/pt2pt will probably never become truly thread safe. -Nathan On Sep 06, 2018, at 08:34 AM, Joseph Schuchart wrote: All, I installed Open MPI 3.1

Re: [OMPI users] MPI_MAXLOC problems

2018-08-28 Thread Nathan Hjelm via users
Yup. That is the case for all composed datatype which is what the tuple types  are. Predefined composed datatypes. -Nathan On Aug 28, 2018, at 02:35 PM, "Jeff Squyres (jsquyres) via users" wrote: I think Gilles is right: remember that datatypes like MPI_2DOUBLE_PRECISION are actually 2 valu

Re: [OMPI users] know which CPU has the maximum value

2018-08-10 Thread Nathan Hjelm via users
) plus the terrible names. If I could kill them in MPI-4 I would. > On Aug 10, 2018, at 9:47 AM, Diego Avesani wrote: > > Dear all, > I have just implemented MAXLOC, why should they go away? > it seems working pretty well. > > thanks > > Diego > > >> O

Re: [OMPI users] know which CPU has the maximum value

2018-08-10 Thread Nathan Hjelm via users
The problem is minloc and maxloc need to go away. better to use a custom op. > On Aug 10, 2018, at 9:36 AM, George Bosilca wrote: > > You will need to create a special variable that holds 2 entries, one for the > max operation (with whatever type you need) and an int for the rank of the > pro

Re: [OMPI users] Asynchronous progress in 3.1

2018-08-06 Thread Nathan Hjelm via users
It depends on the interconnect you are using. Some transports have async progress support but others do not. -Nathan On Aug 06, 2018, at 11:29 AM, "Palmer, Bruce J" wrote: Hi,   Is there anything that can be done to boost asynchronous progress for MPI RMA operations in OpenMPI 3.1? I’m try

Re: [OMPI users] local communicator and crash of the code

2018-08-03 Thread Nathan Hjelm via users
If your are trying to create a communicator containing all node local processes then use MPI_Comm_split_type. > On Aug 3, 2018, at 12:24 PM, Diego Avesani wrote: > > Deal all, > probably I have found the error. > Let's me check. Probably I have not properly set-up colors. > > Thanks a lot, >

Re: [OMPI users] Seg fault in opal_progress

2018-07-13 Thread Nathan Hjelm via users
Please give master a try. This looks like another signature of running out of space for shared memory buffers. -Nathan > On Jul 13, 2018, at 6:41 PM, Noam Bernstein > wrote: > > Just to summarize for the list. With Jeff’s prodding I got it generating > core files with the debug (and mem-deb

Re: [OMPI users] Seg fault in opal_progress

2018-07-11 Thread Nathan Hjelm via users
Might be also worth testing a master snapshot and see if that fixes the issue. There are a couple of fixes being backported from master to v3.0.x and v3.1.x now. -Nathan On Jul 11, 2018, at 03:16 PM, Noam Bernstein wrote: On Jul 11, 2018, at 11:29 AM, Jeff Squyres (jsquyres) via users wro

Re: [OMPI users] MPI_Ialltoallv

2018-07-06 Thread Nathan Hjelm via users
No, thats a bug. Please open an issue on github and we will fix it shortly. Thanks for reporting this issue. -Nathan > On Jul 6, 2018, at 8:08 AM, Stanfield, Clyde > wrote: > > We are using MPI_Ialltoallv for an image processing algorithm. When doing > this we pass in an MPI_Type_contiguous

Re: [OMPI users] Verbose output for MPI

2018-07-04 Thread Nathan Hjelm via users
--mca pmix_base_verbose 100 > On Jul 4, 2018, at 9:15 AM, Maksym Planeta > wrote: > > Hello, > > I have troubles figuring out how can I configure verbose output properly. > There is a call to pmix_output_verbose in > opal/mca/pmix/pmix3x/pmix/src/mca/ptl/tcp/ptl_tcp.c in function try_connect

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 3.1.0 Lock Up on POWER9 w/ CUDA9.2

2018-07-03 Thread Nathan Hjelm via users
Found this issue. PR #5374 fixes it. Will make its way into the v3.0.x and v3.1.x release series. -Nathan On Jul 02, 2018, at 02:36 PM, Nathan Hjelm wrote: The result should be the same with v3.1.1. I will investigate on our Coral test systems. -Nathan On Jul 02, 2018, at 02:23 PM

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 3.1.0 Lock Up on POWER9 w/ CUDA9.2

2018-07-02 Thread Nathan Hjelm
WER9 with GCC 7.2.0 and CUDA9.2. S. -- Si Hammond Scalable Computer Architectures Sandia National Laboratories, NM, USA [Sent from remote connection, excuse typos] On 6/16/18, 10:10 PM, "Nathan Hjelm" wrote:     Try the latest nightly tarball for v3.1.x. Should be fixed. On Jun 1

Re: [OMPI users] OpenMPI 3.1.0 Lock Up on POWER9 w/ CUDA9.2

2018-06-16 Thread Nathan Hjelm
Try the latest nightly tarball for v3.1.x. Should be fixed. > On Jun 16, 2018, at 5:48 PM, Hammond, Simon David via users > wrote: > > The output from the test in question is: > > Single thread test. Time: 0 s 10182 us 10 nsec/poppush > Atomics thread finished. Time: 0 s 169028 us 169 nsec/po

Re: [OMPI users] error building openmpi-master-201806060243-64a5baa on Linux with Sun C

2018-06-06 Thread Nathan Hjelm
The bindings in v3.1.0 are incorrect. They are missing the asynchronous attribute. That will be fixed in v3.1.1. > On Jun 6, 2018, at 12:06 PM, Siegmar Gross > wrote: > > Hi Jeff, > >> I asked some Fortran gurus, and they don't think that there >> is any restriction on having ASYNCHRONOUS

Re: [OMPI users] error building openmpi-master-201806060243-64a5baa on Linux with Sun C

2018-06-06 Thread Nathan Hjelm
I put in a PR to "fix" this but if you look at the standard it has both intent(in) and asynchronous. Might be a compiler problem? -Nathan > On Jun 6, 2018, at 5:11 AM, Siegmar Gross > wrote: > > Hi, > > I've tried to install openmpi-master-201806060243-64a5baa on my "SUSE Linux > Enterprise

Re: [OMPI users] Bad file descriptor segmentation fault on an MPI4py program

2018-05-31 Thread Nathan Hjelm
This is a known bug due to the incorrect (or incomplete) documentation for Linux CMA. I believe it is fixed in 2.1.3.-NathanOn May 31, 2018, at 02:43 PM, Konstantinos Konstantinidis wrote:Consider matrices A: s x r and B: s x t. In the attached file, I am doing matrix multiplication in a distribut

Re: [OMPI users] MPI Windows: performance of local memory access

2018-05-24 Thread Nathan Hjelm
PR is up https://github.com/open-mpi/ompi/pull/5193 -Nathan > On May 24, 2018, at 7:09 AM, Nathan Hjelm wrote: > > Ok, thanks for testing that. I will open a PR for master changing the default > backing location to /dev/shm on linux. Will be PR’d to v3.0.x and v3.1.x. > >

Re: [OMPI users] MPI Windows: performance of local memory access

2018-05-24 Thread Nathan Hjelm
come an issue on other systems. > > Cheers > Joseph > > On 05/23/2018 02:26 PM, Nathan Hjelm wrote: >> Odd. I wonder if it is something affected by your session directory. It >> might be worth moving the segment to /dev/shm. I don’t expect it will have >> an imp

Re: [OMPI users] MPI Windows: performance of local memory access

2018-05-23 Thread Nathan Hjelm
te: > > I tested with Open MPI 3.1.0 and Open MPI 3.0.0, both compiled with GCC 7.1.0 > on the Bull Cluster. I only ran on a single node but haven't tested what > happens if more than one node is involved. > > Joseph > > On 05/23/2018 02:04 PM, Nathan Hjelm wr

Re: [OMPI users] MPI Windows: performance of local memory access

2018-05-23 Thread Nathan Hjelm
What Open MPI version are you using? Does this happen when you run on a single node or multiple nodes? -Nathan > On May 23, 2018, at 4:45 AM, Joseph Schuchart wrote: > > All, > > We are observing some strange/interesting performance issues in accessing > memory that has been allocated throug

Re: [OMPI users] MPI-3 RMA on Cray XC40

2018-05-17 Thread Nathan Hjelm
e addresses seem to be "special"). The write-after-free in > MPI_Finalize seems suspicious though. I cannot say whether that causes the > memory corruption I am seeing but I thought I report it. I will dig further > into this to try to figure out what causes the crashes (they

Re: [OMPI users] MPI cartesian grid : cumulate a scalar value through the procs of a given axis of the grid

2018-05-14 Thread Nathan Hjelm
Still looks to me like MPI_Scan is what you want. Just need three additional communicators (one for each direction). With a recurive doubling MPI_Scan inplementation it is O(log n) compared to O(n) in time. > On May 14, 2018, at 8:42 AM, Pierre Gubernatis > wrote: > > Thank you to all of yo

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-13 Thread Nathan Hjelm
I see several problems 1) osu_latency only works with two procs. 2) You explicitly excluded shared memory support by specifying only self and openib (or tcp). If you want to just disable tcp or openib use —mca btl ^tcp or —mca btl ^openib Also, it looks like you have multiple ports active that

Re: [OMPI users] MPI-3 RMA on Cray XC40

2018-05-09 Thread Nathan Hjelm
ph > > On 05/08/2018 05:34 PM, Nathan Hjelm wrote: >> Looks like it doesn't fail with master so at some point I fixed this bug. >> The current plan is to bring all the master changes into v3.1.1. This >> includes a number of bug fixes. >> -Nathan >>

Re: [OMPI users] MPI-3 RMA on Cray XC40

2018-05-08 Thread Nathan Hjelm
test program is attached. Best Joseph On 05/08/2018 02:56 PM, Nathan Hjelm wrote: I will take a look today. Can you send me your test program? -Nathan On May 8, 2018, at 2:49 AM, Joseph Schuchart wrote: All, I have been experimenting with using Open MPI 3.1.0 on our Cray XC40 (Haswell-based

Re: [OMPI users] MPI-3 RMA on Cray XC40

2018-05-08 Thread Nathan Hjelm
I will take a look today. Can you send me your test program? -Nathan > On May 8, 2018, at 2:49 AM, Joseph Schuchart wrote: > > All, > > I have been experimenting with using Open MPI 3.1.0 on our Cray XC40 > (Haswell-based nodes, Aries interconnect) for multi-threaded MPI RMA. > Unfortunately

Re: [OMPI users] User-built OpenMPI 3.0.1 segfaults when storing into an atomic 128-bit variable

2018-05-03 Thread Nathan Hjelm
sus 8B alignment. > Open-MPI “wastes” 4B relative to MPICH for every handle on I32LP64 systems. > The internal state associated with MPI allocations - particularly windows - > is bigger than 8B. I recall ptmalloc uses something like 32B per heap > allocation. > > Jeff >

Re: [OMPI users] User-built OpenMPI 3.0.1 segfaults when storing into an atomic 128-bit variable

2018-05-03 Thread Nathan Hjelm
That is probably it. When there are 4 ranks there are 4 int64’s just before the user data (for PSCW). With 1 rank we don’t even bother, its just malloc (16-byte aligned). With any other odd number of ranks the user data is after an odd number of int64’s and is 8-byte aligned. There is no require

Re: [OMPI users] MPI cartesian grid : cumulate a scalar value through the procs of a given axis of the grid

2018-05-02 Thread Nathan Hjelm
MPI_Scan/MPI_Exscan are easy to forget but really useful. -Nathan > On May 2, 2018, at 7:21 AM, Peter Kjellström wrote: > > On Wed, 02 May 2018 06:32:16 -0600 > Nathan Hjelm wrote: > > > Hit send before I finished. If each proc along the axis needs the > > partial

Re: [OMPI users] MPI cartesian grid : cumulate a scalar value through the procs of a given axis of the grid

2018-05-02 Thread Nathan Hjelm
Hit send before I finished. If each proc along the axis needs the partial sum (ie proc j gets sum for i = 0 -> j-1 SCAL[j]) then MPI_Scan will do that. > On May 2, 2018, at 6:29 AM, Nathan Hjelm wrote: > > MPI_Reduce would do this. I would use MPI_Comm_split to make an axis comm

Re: [OMPI users] MPI cartesian grid : cumulate a scalar value through the procs of a given axis of the grid

2018-05-02 Thread Nathan Hjelm
MPI_Reduce would do this. I would use MPI_Comm_split to make an axis comm then use reduce with the root being the last rank in the axis comm. > On May 2, 2018, at 6:11 AM, John Hearns via users > wrote: > > Also my inner voice is shouting that there must be an easy way to express > this in J

Re: [OMPI users] Error in hello_cxx.cc

2018-04-23 Thread Nathan Hjelm
Two things. 1) 1.4 is extremely old and you will not likely get much help with it, and 2) the c++ bindings were deprecated in MPI-2.2 (2009) and removed in MPI-3.0 (2012) so you probably want to use the C bindings instead. -Nathan > On Apr 23, 2018, at 8:14 PM, Amir via users wrote: > > Yes,

Re: [OMPI users] Invalid rank despite com size large enough

2018-04-13 Thread Nathan Hjelm
Err. MPI_Comm_remote_size. > On Apr 13, 2018, at 7:41 AM, Nathan Hjelm wrote: > > Try using MPI_Comm_remotr_size. As this is an intercommunicator that will > give the number of ranks for send/recv. > >> On Apr 13, 2018, at 7:34 AM, Florian Lindner wrote: >> &

Re: [OMPI users] Invalid rank despite com size large enough

2018-04-13 Thread Nathan Hjelm
Try using MPI_Comm_remotr_size. As this is an intercommunicator that will give the number of ranks for send/recv. > On Apr 13, 2018, at 7:34 AM, Florian Lindner wrote: > > Hello, > > I have this piece of code > > PtrRequest MPICommunication::aSend(double *itemsToSend, int size, int > rankRe

Re: [OMPI users] mpi send/recv pair hangin

2018-04-10 Thread Nathan Hjelm
Using icc will not change anything unless there is a bug in the gcc version. I personally never build Open MPI with icc as it is slow and provides no benefit over gcc these days. I do, however, use ifort for the Fortran bindings. -Nathan > On Apr 10, 2018, at 5:56 AM, Reuti wrote: > > >>> Am

Re: [OMPI users] libmpi_cxx.so doesn't exist in lib path when installing 3.0.1

2018-04-07 Thread Nathan Hjelm
The MPI C++ bindings were depricated more than 10 years ago and were removed from the standard 6 years ago. They have been disabled by default for a couple of years in Open MPI. They will likely be removed in Open MPI 4.0. You should migrate your code to use the C bindings. -Nathan > On Apr 7,

Re: [OMPI users] Eager RDMA causing slow osu_bibw with 3.0.0

2018-04-05 Thread Nathan Hjelm
Honestly, this is a configuration issue with the openib btl. There is no reason to keep either eager RDMA nor is there a reason to pipeline RDMA. I haven't found an app where either of these "features" helps you with infiniband. You have the right idea with the parameter changes but Howard is

Re: [OMPI users] Exhausting QPs?

2018-03-13 Thread Nathan Hjelm
Yalla works because MXM defaults to using unconnected datagrams (I don’t think it uses RC unless you ask). Is this a fully connected algorithm? I ask because (3584 - 28) * 28 * 3 (default number of QPs/remote process in btl/openib) = 298704 > 262144. This is the problem with RC. Mellanox solved

Re: [OMPI users] Concerning the performance of the one-sided communications

2018-02-16 Thread Nathan Hjelm
How was the latency? That is the best metric to use because osc/pt2pt does put aggregation. Makes the result of osu_put_bw relatively garbage. > On Feb 16, 2018, at 5:24 PM, Jeff Hammond wrote: > > > >> On Fri, Feb 16, 2018 at 8:52 AM, Nathan Hjelm wrote: >> It

Re: [OMPI users] Concerning the performance of the one-sided communications

2018-02-16 Thread Nathan Hjelm
It depends on the transport used. If there is a high-performance network (Cray Aries, Infiniband, etc) then the progress is handled by the hardware. For other networks (Infinipath, Omnipath, TCP, etc) there are options. For TCP you can set:  --mca btl_tcp_progress_thread 1 No such option curr

Re: [OMPI users] Possible memory leak in opal_free_list_grow_st

2017-12-04 Thread Nathan Hjelm
Have you opened a bug report on github? Typically you will get much better turnaround on issues when reported there. I, for one, don’t have time to check the mailing list for bugs but I do regularly check the bug tracker. Assign the bug to me when it is open. -Nathan > On Dec 4, 2017, at 9:32

Re: [OMPI users] Progress issue with dynamic windows

2017-11-01 Thread Nathan Hjelm
Hmm, though I thought we also make calls to opal_progress () in your case (calling MPI_Win_lock on self). Open a bug on github and I will double-check. > On Nov 1, 2017, at 9:54 PM, Nathan Hjelm wrote: > > This is a known issue when using osc/pt2pt. The only way to get progress is &g

Re: [OMPI users] Progress issue with dynamic windows

2017-11-01 Thread Nathan Hjelm
This is a known issue when using osc/pt2pt. The only way to get progress is to enable (it is not on by default) it at the network level (btl). How this is done depends on the underlying transport. -Nathan > On Nov 1, 2017, at 9:49 PM, Joseph Schuchart wrote: > > All, > > I came across what I

Re: [OMPI users] Error building openmpi on Raspberry pi 2

2017-09-27 Thread Nathan Hjelm
Open MPI does not officially support ARM in the v2.1 series. Can you download a nightly tarball from https://www.open-mpi.org/nightly/master/ and see if it works for you? -Nathan > On Sep 26, 2017, at 7:32 PM, Faraz Hussain wrote: > > I am receiving the make errors below on my pi 2: > > pi@p

Re: [OMPI users] --enable-builtin-atomics

2017-08-01 Thread Nathan Hjelm
So far only cons. The gcc and sync builtin atomic provide slower performance on x86-64 (and possible other platforms). I plan to investigate this as part of the investigation into requiring C11 atomics from the C compiler. -Nathan > On Aug 1, 2017, at 10:34 AM, Dave Love wrote: > > What are

Re: [OMPI users] Remote progress in MPI_Win_flush_local

2017-06-23 Thread Nathan Hjelm
This is not the intended behavior. Please open a bug on github. -Nathan On Jun 23, 2017, at 08:21 AM, Joseph Schuchart wrote: All, We employ the following pattern to send signals between processes: ``` int com_rank, root = 0; // allocate MPI window MPI_Win win = allocate_win(); // do some co

Re: [OMPI users] MPI_CANCEL for nonblocking collective communication

2017-06-09 Thread Nathan Hjelm
MPI 3.1 5.12 is pretty clear on the matter: "It is erroneous to call MPI_REQUEST_FREE or MPI_CANCEL for a request associated with a nonblocking collective operation." -Nathan > On Jun 9, 2017, at 5:33 AM, Markus wrote: > > Dear MPI Users and Maintainers, > > I am using openMPI in version 1.

Re: [OMPI users] "undefined reference to `MPI_Comm_create_group'" error message when using Open MPI 1.6.2

2017-06-08 Thread Nathan Hjelm
MPI_Comm_create_groups is an MPI-3.0+ function. 1.6.x is MPI-2.1. You can use the macros MPI_VERSION and MPI_SUBVERSION to check the MPI version. You will have to modify your code if you want it to work with older versions of Open MPI. -Nathan On Jun 08, 2017, at 03:59 AM, Arham Amouie via us

Re: [OMPI users] Tuning vader for MPI_Wait Halt?

2017-06-07 Thread Nathan Hjelm
but my desktop does not have it. So, perhaps not XPMEM related? Matt On Mon, Jun 5, 2017 at 1:00 PM, Nathan Hjelm wrote: Can you provide a reproducer for the hang? What kernel version are you using? Is xpmem installed? -Nathan On Jun 05, 2017, at 10:53 AM, Matt Thompson wrote: OMPI Users,

Re: [OMPI users] Tuning vader for MPI_Wait Halt?

2017-06-05 Thread Nathan Hjelm
Can you provide a reproducer for the hang? What kernel version are you using? Is xpmem installed? -Nathan On Jun 05, 2017, at 10:53 AM, Matt Thompson wrote: OMPI Users, I was wondering if there is a best way to "tune" vader to get around an intermittent MPI_Wait halt?  I ask because I rece

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread Nathan Hjelm
Add —mca btl self,vader -Nathan > On May 19, 2017, at 1:23 AM, Gabriele Fatigati wrote: > > Oh no, by using two procs: > > > findActiveDevices Error > We found no active IB device ports > findActiveDevices Error > We found no active IB device ports > --

Re: [OMPI users] How to use MPI_Win_attach() (or how to specify the 'displ' on a remote process)

2017-05-04 Thread Nathan Hjelm
This behavior is clearly specified in the standard. From MPI 3.1 § 11.2.4:In the case of a window created with MPI_WIN_CREATE_DYNAMIC, the target_disp for all RMA functions is the address at the target; i.e., the effective window_base is MPI_BOTTOM and the disp_unit is one. For dynamic windows, the

Re: [OMPI users] How to Free Memory Allocated with MPI_Win_allocate()?

2017-04-24 Thread Nathan Hjelm
You don't. The memory is freed when the window is freed by MPI_Win_free (). See MPI-3.1 § 11.2.5 -Nathan On Apr 24, 2017, at 11:41 AM, Benjamin Brock wrote: How are we meant to free memory allocated with MPI_Win_allocate()?  The following crashes for me with OpenMPI 1.10.6: #include #inclu

Re: [OMPI users] Passive target sync. support

2017-04-03 Thread Nathan Hjelm
certain flags to enable the hardware put/get support? Sebastian On 03 Apr 2017, at 18:02, Nathan Hjelm wrote: On Apr 03, 2017, at 08:36 AM, Sebastian Rinke wrote: Dear all, I’m using passive target sync. in my code and would like to know how well it is supported in Open MPI. In particular

Re: [OMPI users] Passive target sync. support

2017-04-03 Thread Nathan Hjelm
On Apr 03, 2017, at 08:36 AM, Sebastian Rinke wrote: Dear all, I’m using passive target sync. in my code and would like to know how well it is supported in Open MPI. In particular, the code is some sort of particle tree code that uses a distributed tree and every rank gets non-local tree no

Re: [OMPI users] openib/mpi_alloc_mem pathology

2017-03-07 Thread Nathan Hjelm
If this is with 1.10.x or older run with --mca memory_linux_disable 1. There is a bad interaction between ptmalloc2 and psm2 support. This problem is not present in v2.0.x and newer. -Nathan > On Mar 7, 2017, at 10:30 AM, Paul Kapinos wrote: > > Hi Dave, > > >> On 03/06/17 18:09, Dave Love

Re: [OMPI users] MPI_THREAD_MULTIPLE: Fatal error in MPI_Win_flush

2017-02-19 Thread Nathan Hjelm
You can not perform synchronization at the same time as communication on the same target. This means if one thread is in MPI_Put/MPI_Get/MPI_Accumulate (target) you can’t have another thread in MPI_Win_flush (target) or MPI_Win_flush_all(). If your program is doing that it is not a valid MPI pr

Re: [OMPI users] openmpi single node jobs using btl openib

2017-02-07 Thread Nathan Hjelm
That backtrace shows we are registering MPI_Alloc_mem memory with verbs. This is expected behavior but it doesn’t show the openib btl being used for any communication. I am looking into a issue on an OmniPath system where just initializing the openib btl causes performance problems even if it is

Re: [OMPI users] rdmacm and udcm failure in 2.0.1 on RoCE

2016-12-14 Thread Nathan Hjelm
Can you configure with —enable-debug and run with —mca btl_base_verbose 100 and provide the output? It may indicate why neither udcm nor rdmacm are available. -Nathan > On Dec 14, 2016, at 2:47 PM, Dave Turner wrote: > >

Re: [OMPI users] Follow-up to Open MPI SC'16 BOF

2016-11-23 Thread Nathan Hjelm
Integration is already in the 2.x branch. The problem is the way we handle the info key is a bit of a hack. We currently pull out one info key and pass it down to the mpool as a string. Ideally we want to just pass the info object so each mpool can define its own info keys. That requires the inf

Re: [OMPI users] OpenMPI + InfiniBand

2016-11-01 Thread Nathan Hjelm
UDCM does not require IPoIB. It should be working for you. Can you build Open MPI with --enable-debug and run with -mca btl_base_verbose 100 and create a gist with the output. -Nathan On Nov 01, 2016, at 07:50 AM, Sergei Hrushev wrote: I haven't worked with InfiniBand for years, but I do be

Re: [OMPI users] OS X + Xcode 8 : dyld: Symbol not found: _clock_gettime

2016-10-03 Thread Nathan Hjelm
I didn't think we even used clock_gettime() on Linux in 1.10.x. A quick check of the git branch confirms that. ompi-release git:(v1.10) ✗ find . -name '*.[ch]' | xargs grep clock_gettime ompi-release git:(v1.10) ✗ -Nathan On Oct 03, 2016, at 10:50 AM, George Bosilca wrote: This function is n

Re: [OMPI users] Strange errors when running mpirun

2016-09-22 Thread Nathan Hjelm
FWIW it works fine for me on my MacBook Pro running 10.12 with Open MPI 2.0.1 installed through homebrew: ✗ brew -v Homebrew 1.0.0 (git revision c3105; last commit 2016-09-22) Homebrew/homebrew-core (git revision 227e; last commit 2016-09-22) ✗ brew info openmpi open-mpi: stable 2.0.1 (bottled)

Re: [OMPI users] Java-OpenMPI returns with SIGSEGV

2016-09-14 Thread Nathan Hjelm
This error was the result of a typo which caused an incorrect range check when the compare-and-swap was on a memory region less than 8 bytes away from the end of the window. We never caught this because in general no apps create a window as small as that MPICH test (4 bytes). We are adding the

Re: [OMPI users] Java-OpenMPI returns with SIGSEGV

2016-09-14 Thread Nathan Hjelm
We have a new high-speed component for RMA in 2.0.x called osc/rdma. Since the component is doing direct rdma on the target we are much more strict about the ranges. osc/pt2pt doesn't bother checking at the moment. Can you build Open MPI with --enable-debug and add -mca osc_base_verbose 100 to

Re: [OMPI users] Regression: multiple memory regions in dynamic windows

2016-08-25 Thread Nathan Hjelm
Fixed on master. The fix will be in 2.0.2 but you can apply it to 2.0.0 or 2.0.1:https://github.com/open-mpi/ompi/commit/e53de7ecbe9f034ab92c832330089cf7065181dc.patch-NathanOn Aug 25, 2016, at 07:31 AM, Joseph Schuchart wrote:Gilles,Thanks for your fast reply. I did some last minute changes to th

Re: [OMPI users] Regression: multiple memory regions in dynamic windows

2016-08-25 Thread Nathan Hjelm
There is a bug in the code that keeps the dynamic regions sorted. Should have it fixed shortly. -Nathan On Aug 25, 2016, at 07:46 AM, Christoph Niethammer wrote: Hello, The Error is not 100% reproducible for me every time but seems to disappear entirely if one excludes -mca osc ^rdma or -mc

Re: [OMPI users] Problems with mpirun in openmpi-1.8.1 and -2.0.0

2016-08-23 Thread Nathan Hjelm
Might be worth trying with --mca btl_openib_cpc_include udcm   and see if that works. -Nathan On Aug 23, 2016, at 02:41 AM, "Juan A. Cordero Varelaq" wrote: Hi Gilles, If I run it like this: mpirun --mca btl ^openib,usnic --mca pml ob1 --mca btl_sm_use_knem 0 -np 5 myscript.sh it works fine

Re: [OMPI users] Forcing TCP btl

2016-07-19 Thread Nathan Hjelm
You probably will also want to run with -mca pml ob1 to make sure mxm is not in use. The combination should be sufficient to force tcp usage. -Nathan > On Jul 18, 2016, at 10:50 PM, Saliya Ekanayake wrote: > > Hi, > > I read in a previous thread > (https://www.open-mpi.org/community/lists/us

Re: [OMPI users] Error with Open MPI 2.0.0: error obtaining device attributes for mlx5_0 errno says Cannot allocate memory

2016-07-13 Thread Nathan Hjelm
As of 2.0.0 we now support experimental verbs. It looks like one of the calls is failing: #if HAVE_DECL_IBV_EXP_QUERY_DEVICE device->ib_exp_dev_attr.comp_mask = IBV_EXP_DEVICE_ATTR_RESERVED - 1; if(ibv_exp_query_device(device->ib_dev_context, &device->ib_exp_dev_attr)){ BTL_ERROR(

  1   2   3   >