[OMPI users] Error When Using Open MPI SHMEM with UCX

2019-04-23 Thread Benjamin Brock via users
And, to provide more details, I'm using a fresh vanilla build of Open MPI
4.0.1 with UCX 1.5.1 (`./configure --with-ucx=$DIR/ucx-1.5.1`).

Ben
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Error When Using Open MPI SHMEM with UCX

2019-04-23 Thread Benjamin Brock via users
I get the following error when trying to run SHMEM programs using UCX.

[xiii@shini dir]$ oshrun -n 1 ./target/debug/main
[1556046469.890238] [shini:19769:0]sys.c:619  UCX  ERROR
shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: Operation not
permitted, please check shared memory limits by 'ipcs -l'
[1556046469.895859] [shini:19769:0]sys.c:619  UCX  ERROR
shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: Operation not
permitted, please check shared memory limits by 'ipcs -l'
[1556046469.899577] [shini:19769:0]sys.c:619  UCX  ERROR
shmget(size=270532608 flags=0xfb0) for user allocation failed: Operation
not permitted, please check shared memory limits by 'ipcs -l'

As far as I can tell, the programs I'm running are not actually resource
constrained, and `ipcs -l` seems to indicate there's plenty of available
shared memory.  Other than this error, my code seems to run normally.

[xiii@shini dir]$ ipcs -l
-- Messages Limits 
max queues system wide = 32000
max size of message (bytes) = 8192
default max size of queue (bytes) = 16384

-- Shared Memory Limits 
max number of segments = 4096
max seg size (kbytes) = 18014398509465599
max total shared memory (kbytes) = 18014398509481980
min seg size (bytes) = 1

-- Semaphore Limits 
max number of arrays = 32000
max semaphores per array = 32000
max semaphores system wide = 102400
max ops per semop call = 500
semaphore max value = 32767

Do you know what's causing this / if I need to worry about it / how I can
fix this error?

Thanks,

Ben
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] rmaps_base_oversubscribe Option in Open MPI 4.0

2019-01-25 Thread Benjamin Brock
I used to be able to (e.g. in Open MPI 3.1) put the line

rmaps_base_oversubscribe = true

in my `openmpi-mca-params.conf`, and this would enable oversubscription by
default.  In 4.0.0, it appears that this option doesn't work anymore, and I
have to use `--oversubscribe`.

Am I missing something, or has the parameter name changed?

Ben
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] RDMA over Ethernet in Open MPI - RoCE on AWS?

2018-09-26 Thread Benjamin Brock
In case anyone comes across this thread in an attempt to get RDMA over
Ethernet working on AWS, here's the conclusion I came to:

There are two kinds of NICs exposed to VMs on AWS:

  - Intel 82599 VF
- This NIC is old and does not support RoCE or iWARP.
- It's a virtualized view of an actual Intel 82599 Ethernet Controller.
- On C3, C4, D2, I2, M4 (excluding m4.16xlarge), and R3 instances.

 - Elastic Network Adapter (ENA)
- This is a virtual NIC without a RoCE or iWARP driver, and as far as I
can tell, does not have a RoCE or iWARP driver.
- I'm not sure which hardware NICs this is virtualizing, but without an
ENA RoCE driver, AWS VMs will not be able to use RoCE, regardless of
whether the hardware NIC supports it.
- On C5, C5d, F1, G3, H1, I3, m4.16xlarge, M5, M5d, P2, P3, R4, R5,
R5d, T3, X1, X1e, and z1d instances.

Ben
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] RDMA over Ethernet in Open MPI - RoCE on AWS?

2018-09-11 Thread Benjamin Brock
Thanks for your response.

One question: why would RoCE still requiring host processing of every
packet? I thought the point was that some nice server Ethernet NICs can
handle RDMA requests directly?  Or am I misunderstanding RoCE/how Open
MPI's RoCE transport?

Ben
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] RDMA over Ethernet in Open MPI - RoCE on AWS?

2018-09-06 Thread Benjamin Brock
I'm setting up a cluster on AWS, which will have a 10Gb/s or 25Gb/s
Ethernet network.  Should I expect to be able to get RoCE to work in Open
MPI on AWS?

More generally, what optimizations and performance tuning can I do to an
Open MPI installation to get good performance on an Ethernet network?

My codes use a lot of random access AMOs and asynchronous block transfers,
so it seems to me like setting up RDMA over Ethernet would be essential to
getting good performance, but I can't seem to find much information about
it online.

Any pointers you have would be appreciated.

Ben
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Are MPI datatypes guaranteed to be compile-time constants?

2018-09-06 Thread Benjamin Brock
Thanks for the responses--from what you've said, it seems like MPI types
are indeed not guaranteed to be compile-time constants.

However, I worked with the people at IBM, and it seems like the difference
in behavior was caused by the IBM compiler, not the Spectrum IBM
implementation.

Ben
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Are MPI datatypes guaranteed to be compile-time constants?

2018-09-04 Thread Benjamin Brock
Are MPI datatypes like MPI_INT and MPI_CHAR guaranteed to be compile-time
constants?  Is this defined by the MPI standard, or in the Open MPI
implementation?

I've written some template code where MPI datatypes are constexpr members,
which requires that they be known at compile time.  This works in Open MPI
and MPICH, but not in Spectrum MPI--I'm not sure what they've done that
breaks this, but I'm trying to file a bug and any information you have
about whether these are compile-time constants would be useful.

Ben
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Using OpenSHMEM with Shared Memory

2018-02-07 Thread Benjamin Brock
Here's what I get with those environment variables:

https://hastebin.com/ibimipuden.sql

I'm running Arch Linux (but with OpenMPI/UCX installed from source as
described in my earlier message).

Ben
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Using OpenSHMEM with Shared Memory

2018-02-06 Thread Benjamin Brock
How can I run an OpenSHMEM program just using shared memory?  I'd like to
use OpenMPI to run SHMEM programs locally on my laptop.

I understand that the old SHMEM component (Yoda?) was taken out, and that
UCX is now required.  I have a build of OpenMPI with UCX as per the
directions on this random GitHub Page

.

When I try to just `shmemrun`, I get a complaint about not haivng any splm
components available.

[xiii@shini kmer_hash]$ shmemrun -np 2 ./kmer_generic_hash
--
No available spml components were found!

This means that there are no components of this type installed on your
system or all the components reported that they could not be used.

This is a fatal error; your SHMEM process is likely to abort.  Check the
output of the "ompi_info" command and ensure that components of this
type are available on your system.  You may also wish to check the
value of the "component_path" MCA parameter and ensure that it has at
least one directory that contains valid MCA components.
--
[shini:16341] SPML ikrit cannot be selected
[shini:16342] SPML ikrit cannot be selected
[shini:16336] 1 more process has sent help message help-oshmem-memheap.txt
/ find-available:none-found
[shini:16336] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
help / error messages


I tried fiddling with the MCA command-line settings, but didn't have any
luck.  Is it possible to do this?  Can anyone point me to some
documentation?

Thanks,

Ben
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Oversubscribing When Running Locally

2018-01-24 Thread Benjamin Brock
Recently, when I try to run something locally with OpenMPI with more than
two ranks (I have a dual-core machine), I get the friendly message

--
There are not enough slots available in the system to satisfy the 3 slots
that were requested by the application:
  ./kmer_generic_hash

Either request fewer slots for your application, or make more slots
available
for use.
--

Why is oversubscription now disabled by default when running without a
hostfile?  And how can I turn this off?  Is the recommended way to do this
editing /etc/openmpi/openmpi-default-hostfile?

I'm using default OpenMPI 3.0.0 on Arch Linux.

Cheers,

Ben
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Oversubscribing

2018-01-24 Thread Benjamin Brock
Recently, when I try to run something locally with OpenMPI with more than
two ranks (I have a dual-core machine), I get the friendly message

--
There are not enough slots available in the system to satisfy the 3 slots
that were requested by the application:
  ./kmer_generic_hash

Either request fewer slots for your application, or make more slots
available
for use.
--

Why is oversubscription now disabled by default when running without a
hostfile?  And how can I turn this off?

I'm using default OpenMPI 3.0.0 on Arch Linux.

Cheers,

Ben
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] OMPI users] Compiling Open MPI for Cross-Compilation

2017-12-17 Thread Benjamin Brock
Yeah, I just noticed that Open MPI was giving me all x86_64 binaries with
the configuration flags

./configure --host=riscv64-unknown-linux --enable-static --disable-shared
--disable-dlopen --enable-mca-no-build=patcher-overwrite
--prefix=/home/ubuntu/src/ben-build/openmpi

and was very confused.

I thought that the expected behavior for configure scripts was that setting
--host=[myarch]-linux-gnu will tell the compiler to use the
[myarch]-linux-gnu-[gcc,g++,etc.] compiler when building software that will
run on the host arch.  This is what other software I've encountered (e.g.
MPICH) does.

Basically, I want the `orterun` binaries to be compiled with
riscv64-unknown-linux-[gcc,g++] compilers and the `mpi[cc,c++]` binaries to
be compiled with x86_64-linux-gnu-[gcc,g++] and configured to *use*
riscv64-unknown-linux-[gcc,g++] when compiling programs (because, like
`orterun`, programs compiled with `mpi[cc,c++]` will run on the host arch).

Can I get this by just setting [CC,CXX]=riscv64-unknown-linux-[gcc,g++],
etc.?

Ben
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Compiling Open MPI for Cross-Compilation

2017-12-16 Thread Benjamin Brock
I have the same error with

./configure --host=riscv64-unknown-linux --build=x86_64-linux-gnu
--enable-static
--disable-shared --prefix=/home/ubuntu/src/ben-build/openmpi


Ben

On Sat, Dec 16, 2017 at 4:50 PM, Benjamin Brock <br...@cs.berkeley.edu>
wrote:

> > try removing the --target option.
>
> With the configure line
>
> ./configure --host=riscv64-unknown-linux --enable-static --disable-shared
> --prefix=/home/ubuntu/src/ben-build/openmpi
>
> It successfully configures, but I now get the error
>
> /home/xiii/Downloads/openmpi-3.0.0/opal/.libs/libopen-pal.a(patcher_overwrite_module.o):
> In function `mca_patcher_overwrite_patch_address':
> patcher_overwrite_module.c:(.text+0x89): undefined reference to
> `mca_patcher_overwrite_apply_patch'
> collect2: error: ld returned 1 exit status
> Makefile:1844: recipe for target 'orte-clean' failed
>
> When trying to compile.
>
> Ben
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Compiling Open MPI for Cross-Compilation

2017-12-16 Thread Benjamin Brock
> try removing the --target option.

With the configure line

./configure --host=riscv64-unknown-linux --enable-static --disable-shared
--prefix=/home/ubuntu/src/ben-build/openmpi

It successfully configures, but I now get the error

/home/xiii/Downloads/openmpi-3.0.0/opal/.libs/libopen-pal.a(patcher_overwrite_module.o):
In function `mca_patcher_overwrite_patch_address':
patcher_overwrite_module.c:(.text+0x89): undefined reference to
`mca_patcher_overwrite_apply_patch'
collect2: error: ld returned 1 exit status
Makefile:1844: recipe for target 'orte-clean' failed

When trying to compile.

Ben
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Compiling Open MPI for Cross-Compilation

2017-12-15 Thread Benjamin Brock
I'd like to run Open MPI on a cluster of RISC-V machines.  These machines
are pretty weak cores and so I need to cross-compile.  I'd like to do this:

Machine 1, which is x86_64-linux-gnu, compiles programs for machine 2.

Machine 2, which is riscv64-unknown-linux, will run these programs.

It seems to me like the correct configure line for this might be:

./configure --host=riscv64-unknown-linux --target=x86_64-linux-gnu
--enable-static --disable-shared --prefix=/home/ubuntu/src/ben-build/openmpi


However, this yields an error:

configure: WARNING: *** The Open MPI configure script does not support
--program-prefix, --program-suffix or --program-transform-name. Users are
recommended to instead use --prefix with a unique directory and make
symbolic links as desired for renaming.
configure: error: *** Cannot continue


Any tips?  Will it be possible for me to cross-compile this way with Open
MPI?

Ben
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] [EXTERNAL] Re: Using shmem_int_fadd() in OpenMPI\'s SHMEM

2017-11-21 Thread Benjamin Brock
> What version of Open MPI are you trying to use?

Open MPI 2.1.1-2 as distributed by Arch Linux.

> Also, could you describe something about your system.

This is all in shared memory on a MacBook Pro; no networking involved.

The seg fault with the code example above looks like this:

[xiii@shini kmer_hash]$ g++ minimal.cpp -o minimal `shmemcc --showme:link`
[xiii@shini kmer_hash]$ !shm
shmemrun -n 2 ./minimal
[shini:08284] *** Process received signal ***
[shini:08284] Signal: Segmentation fault (11)
[shini:08284] Signal code: Address not mapped (1)
[shini:08284] Failing at address: 0x18
[shini:08284] [ 0] /usr/lib/libpthread.so.0(+0x11da0)[0x7f06fb763da0]
[shini:08284] [ 1] /usr/lib/openmpi/openmpi/mca_spml_yoda.so(mca_spml_yoda_
get+0x7da)[0x7f06e0eef0aa]
[shini:08284] [ 2] /usr/lib/openmpi/openmpi/mca_
atomic_basic.so(atomic_basic_lock+0xb2)[0x7f06e08d90d2]
[shini:08284] [ 3] /usr/lib/openmpi/openmpi/mca_atomic_basic.so(mca_atomic_
basic_fadd+0x4a)[0x7f06e08d949a]
[shini:08284] [ 4] /usr/lib/openmpi/liboshmem.so.20(shmem_int_fadd+0x90)[
0x7f06fc5a7660]
[shini:08284] [ 5] ./minimal(+0x94f)[0x55a5cde7e94f]
[shini:08284] [ 6] /usr/lib/libc.so.6(__libc_start_main+0xea)[
0x7f06fb3baf6a]
[shini:08284] [ 7] ./minimal(+0x80a)[0x55a5cde7e80a]
[shini:08284] *** End of error message ***
--
shmemrun noticed that process rank 1 with PID 0 on node shini exited on
signal 11 (Segmentation fault).
--

Cheers,

Ben
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Using shmem_int_fadd() in OpenMPI's SHMEM

2017-11-20 Thread Benjamin Brock
What's the proper way to use shmem_int_fadd() in OpenMPI's SHMEM?

A minimal example seems to seg fault:

#include 
#include 

#include 

int main(int argc, char **argv) {
  shmem_init();
  const size_t shared_segment_size = 1024;
  void *shared_segment = shmem_malloc(shared_segment_size);

  int *arr = (int *) shared_segment;
  int *local_arr = (int *) malloc(sizeof(int) * 10);

  if (shmem_my_pe() == 1) {
shmem_int_fadd((int *) shared_segment, 1, 0);
  }
  shmem_barrier_all();

  return 0;
}


Where am I going wrong here?  This sort of thing works in Cray SHMEM.

Ben Bock
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] MPI_Accumulate() Blocking?

2017-05-04 Thread Benjamin Brock
Is there any way to issue simultaneous MPI_Accumulate() requests to
different targets, then?  I need to update a distributed array, and this
serializes all of the communication.

Ben

On Thu, May 4, 2017 at 5:53 AM, Marc-André Hermanns <
m.a.herma...@fz-juelich.de> wrote:

> Dear Benjamin,
>
> as far as I understand the MPI standard, RMA operations non-blocking
> in the sense that you need to complete them with a separate call
> (flush/unlock/...).
>
> I cannot find the place in the standard right now, but I think an
> implementation is allowed to either buffer RMA requests or block until
> the RMA operation can be initiated, and the user should not assume
> either. I have seen the one and the other behavior across
> implementations in the past.
>
> For your second question, yes, flush is supposed to block until remote
> completion of the operation.
>
> That said, I think to recall that Open-MPI 1.x did not support
> asynchronous target-side progress for passive-target synchronization
> (which is used in your benchmark example), so the behavior you
> observed is to some extent expected.
>
> Cheers,
> Marc-Andre
>
>
>
> On 04.05.2017 01:25, Benjamin Brock wrote:
> > MPI_Accumulate() is meant to be non-blocking, and MPI will block until
> > completion when an MPI_Win_flush() is called, correct?
> >
> > In this (https://hastebin.com/raw/iwakacadey) microbenchmark,
> > MPI_Accumulate() seems to be blocking for me in OpenMPI 1.10.6.
> >
> > I'm seeing timings like
> >
> > [brock@nid00622 junk]$ mpirun -n 4 ./junk
> > Write: 0.499229 rq, 0.18 fl; Read: 0.463764 rq, 0.35 fl
> > Write: 0.464914 rq, 0.12 fl; Read: 0.419703 rq, 0.24 fl
> > Write: 0.499686 rq, 0.14 fl; Read: 0.422557 rq, 0.23 fl
> > Write: 0.437960 rq, 0.15 fl; Read: 0.396530 rq, 0.23 fl
> >
> > Meaning up to half a second is being spent issuing requests, but
> > almost no time is spent in flushes.  The time spent in requests scales
> > with the size of the messages, but the time spent in flushes stays the
> > same.
> >
> > I'm compiling this with mpicxx acc.cpp -o acc -std=gnu++11 -O3.
> >
> > Any suggestions?  Am I using this incorrectly?
> >
> > Ben
> >
> >
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
>
> --
> Marc-Andre Hermanns
> Jülich Aachen Research Alliance,
> High Performance Computing (JARA-HPC)
> Jülich Supercomputing Centre (JSC)
>
> Wilhelm-Johnen-Str.
> 52425 Jülich
> Germany
>
> Phone: +49 2461 61 2509 | +49 241 80 24381
> Fax: +49 2461 80 6 99753
> www.jara.org/jara-hpc
> email: m.a.herma...@fz-juelich.de
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] MPI_Accumulate() Blocking?

2017-05-03 Thread Benjamin Brock
MPI_Accumulate() is meant to be non-blocking, and MPI will block until
completion when an MPI_Win_flush() is called, correct?

In this (https://hastebin.com/raw/iwakacadey) microbenchmark,
MPI_Accumulate() seems to be blocking for me in OpenMPI 1.10.6.

I'm seeing timings like

[brock@nid00622 junk]$ mpirun -n 4 ./junk
Write: 0.499229 rq, 0.18 fl; Read: 0.463764 rq, 0.35 fl
Write: 0.464914 rq, 0.12 fl; Read: 0.419703 rq, 0.24 fl
Write: 0.499686 rq, 0.14 fl; Read: 0.422557 rq, 0.23 fl
Write: 0.437960 rq, 0.15 fl; Read: 0.396530 rq, 0.23 fl

Meaning up to half a second is being spent issuing requests, but almost no
time is spent in flushes.  The time spent in requests scales with the size
of the messages, but the time spent in flushes stays the same.

I'm compiling this with mpicxx acc.cpp -o acc -std=gnu++11 -O3.

Any suggestions?  Am I using this incorrectly?

Ben
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] How to Free Memory Allocated with MPI_Win_allocate()?

2017-04-24 Thread Benjamin Brock
How are we meant to free memory allocated with MPI_Win_allocate()?  The
following crashes for me with OpenMPI 1.10.6:

#include 
#include 
#include 

int main(int argc, char **argv) {
  MPI_Init(, );

  int n = 1000;
  int *a;

  MPI_Win win;
  MPI_Win_allocate(n*sizeof(int), sizeof(int), MPI_INFO_NULL,
MPI_COMM_WORLD, , );

  /* Why does the following crash? */
  MPI_Free_mem(a);

  MPI_Finalize();

  return 0;
}

Any suggestions?

Ben
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users