from:"Joseph Schuchart via users"

Re: [OMPI users] Seg error when using v5.0.1

2024-01-30 Thread Joseph Schuchart via users


Hello,

This looks like memory corruption. Do you have more details on what your 
app is doing? I don't see any MPI calls inside the call stack. Could you 
rebuild Open MPI with debug information enabled (by adding 
`--enable-debug` to configure)? If this error occurs on singleton runs 
(1 process) then you can easily attach gdb to it to get a better stack 
trace. Also, valgrind may help pin down the problem by telling you which 
memory block is being free'd here.


Thanks
Joseph

On 1/30/24 07:41, afernandez via users wrote:

Hello,
I upgraded one of the systems to v5.0.1 and have compiled everything 
exactly as dozens of previous times with v4. I wasn't expecting any 
issue (and the compilations didn't report anything out of ordinary) 
but running several apps has resulted in error messages such as:

/Backtrace for this error:/
/#0  0x7f7c9571f51f in ???/
/        at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0/
/#1  0x7f7c957823fe in __GI___libc_free/
/        at ./malloc/malloc.c:3368/
/#2  0x7f7c93a635c3 in ???/
/#3  0x7f7c95f84048 in ???/
/#4  0x7f7c95f1cef1 in ???/
/#5  0x7f7c95e34b7b in ???/
/#6  0x6e05be in ???/
/#7  0x6e58d7 in ???/
/#8  0x405d2c in ???/
/#9  0x7f7c95706d8f in __libc_start_call_main/
/        at ../sysdeps/nptl/libc_start_call_main.h:58/
/#10  0x7f7c95706e3f in __libc_start_main_impl/
/        at ../csu/libc-start.c:392/
/#11  0x405d64 in ???/
/#12  0x in ???/
OS is Ubuntu 22.04, OpenMPI was built with GCC13.2, and before 
building OpenMPI, I had previously built the hwloc (2.10.0) library at 
/usr/lib/x86_64-linux-gnu. Maybe I'm missing something pretty basic, 
but the problem seems to be related to memory allocation.

Thanks.

Re: [OMPI users] A make error when build openmpi-5.0.0 using the gcc 14.0.0 (experimental) compiler

2023-12-19 Thread Joseph Schuchart via users

Thanks for the report Jorge! I opened a ticket to track the build issues 
with GCC-14: https://github.com/open-mpi/ompi/issues/12169


Hopefully we will have Open MPI build with GCC-14 before it is released.

Cheers,
Joseph

On 12/17/23 06:03, Jorge D'Elia via users wrote:

Hi there,

I already overcame this problem: simply using the gcc version (GCC) 
13.2.1 that
comes with the Fedora 39 distribution, the openmpi build is now fine 
again,

as it (almost) always is.

Greetings.
Jorge.

--
Jorge D'Elia via users  escribió:


Hi,

On a x86_64-pc-linux-gnu machine with Fedora 39:

$ uname -a
Linux amaral 6.6.6-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Dec 11 
17:29:08 UTC 2023 x86_64 GNU/Linux


and using:

$ gcc --version
gcc (GCC) 14.0.0 20231216 (experimental)

we tried to upgrade to the openmpi distribution:

41968409 Dec 16 09:15 openmpi-5.0.0.tar.gz

using the configuration flags (already used in previous versions of 
openmpi):


$ ../configure --enable-ipv6 --enable-sparse-groups --enable-mpi-ext 
--enable-oshmem --with-libevent=internal --with-hwloc=internal 
--with-ucx --with-pmix=internal --without-libfabric 
--prefix=${PREFIX} 2>&1 | tee configure.eco


$ make -j4 all 2>&1 | tee make-all.eco

but, today, we have the following make error:

...
/home/bigpack/openmpi-paq/openmpi-5.0.0/3rd-party/openpmix/include/pmix_deprecated.h:851:32: 
error: passing argument 2 of ‘PMIx_Data_buffer_unload’ from 
incompatible pointer type [-Wincompatible-pointer-types]

PMIx_Data_buffer_unload(b, &(d), &(s)) void **

We attached the configure.echo and make-all.echo files in a *.tgz 
compressed file.


Please, some clue in order fo fix? Thanks in advance.

Regards.
Jorge D'Elia.--

Re: [OMPI users] MPI_Get is slow with structs containing padding

2023-03-30 Thread Joseph Schuchart via users


Hi Antoine,

That's an interesting result. I believe the problem with datatypes with 
gaps is that MPI is not allowed to touch the gaps. My guess is that for 
the RMA version of the benchmark the implementation either has to revert 
back to an active message packing the data at the target and sending it 
back or (which seems more likely in your case) transfer each object 
separately and skip the gaps. Without more information on your setup 
(using UCX?) and the benchmark itself (how many elements? what does the 
target do?) it's hard to be more precise.


A possible fix would be to drop the MPI datatype for the RMA use and 
transfer the vector as a whole, using MPI_BYTE. I think there is also a 
way to modify the upper bound of the MPI type to remove the gap, using 
MPI_TYPE_CREATE_RESIZED. I expect that that will allow MPI to touch the 
gap and transfer the vector as a whole. I'm not sure about the details 
there, maybe someone can shed some light.


HTH
Joseph

On 3/30/23 18:34, Antoine Motte via users wrote:


Hello everyone,

I recently had to code an MPI application where I send std::vector 
contents in a distributed environment. In order to try different 
approaches I coded both 1-sided and 2-sided point-to-point 
communication schemes, the first one uses MPI_Window and MPI_Get, the 
second one uses MPI_SendRecv.


I had a hard time figuring out why my implementation with MPI_Get was 
between 10 and 100 times slower, and I finally found out that MPI_Get 
is abnormally slow when one tries to send custom datatypes including 
padding.


Here is a short example attached, where I send a struct {double, int} 
(12 bytes of data + 4 bytes of padding) vs a struct {double, int, int} 
(16 bytes of data, 0 bytes of padding) with both MPI_SendRecv and 
MPI_Get. I got these results :


mpirun -np 4 ./compareGetWithSendRecv
{double, int} SendRecv : 0.0303547 s
{double, int} Get : 1.9196 s
{double, int, int} SendRecv : 0.0164659 s
{double, int, int} Get : 0.0147757 s

I run it with both Open MPI 4.1.2 and with intel MPI 2021.6 and got 
the same results.


Is this result normal? Do I have any solution other than adding 
garbage at the end of the struct or at the end of the MPI_Datatype to 
avoid padding?


Regards,

Antoine Motte

Re: [OMPI users] Tracing of openmpi internal functions

2022-11-16 Thread Joseph Schuchart via users


Arun,

You can use a small wrapper script like this one to store the perf data 
in separate files:


```
$ cat perfwrap.sh
#!/bin/bash
exec perf record -o perf.data.$OMPI_COMM_WORLD_RANK $@
```

Then do `mpirun -n  ./perfwrap.sh ./a.out` to run all processes under 
perf. You can also select a subset of processes based on 
$OMPI_COMM_WORLD_RANK.


HTH,
Joseph


On 11/16/22 09:24, Chandran, Arun via users wrote:

Hi Jeff,

Thanks, I will check flamegraphs.

Sample generation with perf could be a problem, I don't think I can do 'mpirun -np <> 
perf record ' and get
the sampling done on all the cores and store each cores data (perf.data) 
separately to analyze it. Is it possible to do?

Came to know that amduprof support individual sample collection for mpi app 
running on multiple cores,  need to investigate further on this.

--Arun

From: users  On Behalf Of Jeff Squyres 
(jsquyres) via users
Sent: Monday, November 14, 2022 11:34 PM
To: users@lists.open-mpi.org
Cc: Jeff Squyres (jsquyres) ; arun c 

Subject: Re: [OMPI users] Tracing of openmpi internal functions


Caution: This message originated from an External Source. Use proper caution 
when opening attachments, clicking links, or responding.

Open MPI uses plug-in modules for its implementations of the MPI collective 
algorithms.  From that perspective, once you understand that infrastructure, 
it's exactly the same regardless of whether the MPI job is using intra-node or 
inter-node collectives.

We don't have much in the way of detailed internal function call tracing inside 
Open MPI itself, due to performance considerations.  You might want to look 
into flamegraphs, or something similar...?

--
Jeff Squyres
mailto:jsquy...@cisco.com

From: users  on behalf of arun c via users 

Sent: Saturday, November 12, 2022 9:46 AM
To: mailto:users@lists.open-mpi.org 
Cc: arun c 
Subject: [OMPI users] Tracing of openmpi internal functions
  
Hi All,


I am new to openmpi and trying to learn the internals (source code
level) of data transfer during collective operations. At first, I will
limit it to intra-node (between cpu cores, and sockets) to minimize
the scope of learning.

What are the best options (Looking for only free and open methods) for
tracing the openmpi code? (say I want to execute alltoall collective
and trace all the function calls and event callbacks that happened
inside the libmpi.so on all the cores)

Linux kernel has something called ftrace, it gives a neat call graph
of all the internal functions inside the kernel with time, is
something similar available?

--Arun

Re: [OMPI users] MPI_THREAD_MULTIPLE question

2022-09-10 Thread Joseph Schuchart via users


Timesir,

It sounds like you're using the 4.0.x or 4.1.x release. The one-sided 
components were cleaned up in the upcoming 5.0.x release and the 
component in question (osc/pt2pt) was removed. You could also try to 
compile Open MPI 4.0.x/4.1.x against UCX and use osc/ucx (by passing 
`--mca osc ucx` to mpirun). In either case (using UCX or switching to 
5.0.x) you should be able to run MPI RMA codes without requiring an 
RDMA-capable network.


HTH
Joseph

On 9/10/22 06:55, mrlong336 via users wrote:

*mpirun reports the following error:
*The OSC pt2pt component does not support MPI_THREAD_MULTIPLE in this 
release.

Workarounds are to run on a single node, or to use a system with an RDMA
capable network such as Infiniband.

*Does this error mean that the network must support RDMA if it wants 
to run distributed? Will Gigabit/10 Gigabit Ethernet work?

*



Best regards,

*Timesir*
*mrlong...@gmail.com

*

[OMPI users] 1st Future of MPI RMA Workshop: Call for Short Talks and Participation

2022-05-29 Thread Joseph Schuchart via users


[Apologies if you got multiple copies of this email.]

*1st Future of MPI RMA Workshop (FoRMA'22)*

https://mpiwg-rma.github.io/forma22/

The MPI RMA Working Group is organizing a workshop aimed at gathering 
inputs from users and implementors of MPI RMA with past experiences and 
ideas for improvements, as well as hardware vendors to discuss future 
developments of high-performance network hardware relevant to MPI RMA. 
The goal is to evaluate the current design of MPI RMA, to learn from the 
current state of practice, and to start the process of rethinking the 
design of one-sided communication in the MPI standard.


The workshop will be held entirely virtual on *June 16 & 17* and the 
results and contributions will be combined into a joint white paper.


*Call for Short Talks*

We are seeking input from users of MPI RMA willing to share their 
experiences, results, and lessons learned in short talks of 10-15 
minutes. No stand-alone publications are required. In particular, the 
following topics for contributions are of interest:


- Successful and unsuccessful attempts of using MPI RMA to improve the 
communication efficiency of applications and middle-ware;
- Challenges in implementing and porting applications and middle-ware on 
top of MPI RMA;

- Features that are missing from MPI RMA.

If interested, please send a short email with the title of the talk 
toschuch...@icl.utk.edu.


*Call for Participation*

We encourage everyone interested in one-sided communication models in 
general and MPI RMA in particular to join this two day workshop and 
contribute to its success through questions and comments. We encourage a 
lively and open exchange of ideas and discussions between the speakers 
and the audience. The connections information will be posted before the 
workshop athttps://mpiwg-rma.github.io/forma22/. Please direct any 
questions toschuch...@icl.utk.edu.



*Registration*

To register for the workshop (and receive the access link) please use 
the registration site at 
https://tennessee.zoom.us/meeting/register/tJ0qduChrDgsGNdEG3MQeLB-DH3lZ6r-DZww. 
All participation is free.



*Organizing committee*

Joseph Schuchart, University of Tennessee, Knoxville
James Dinan, Nvidia Inc.
Bill Gropp, University of Illinois Urbana Champaign

Re: [OMPI users] Check equality of a value in all MPI ranks

2022-02-17 Thread Joseph Schuchart via users


Hi Niranda,

A pattern I have seen in several places is to allreduce the pair p = 
{-x,x} with MPI_MIN or MPI_MAX. If in the resulting pair p[0] == -p[1], 
then everyone has the same value. If not, at least one rank had a 
different value. Example:


```
bool is_same(int x) {
  int p[2];
  p[0] = -x;
  p[1] = x;
  MPI_Allreduce(MPI_IN_PLACE, p, 2, MPI_INT, MPI_MIN, MPI_COMM_WORLD);
  return (p[0] == -p[1]);
}
```

HTH,
Joseph

On 2/17/22 16:40, Niranda Perera via users wrote:

Hi all,

Say I have some int `x`. I want to check if all MPI ranks get the same 
value for `x`. What's a good way to achieve this using MPI collectives?


The simplest I could think of is, broadcast rank0's `x`, do the 
comparison, and allreduce-LAND the comparison result. This requires 
two collective operations.

```python
...
x = ... # each rank may produce different values for x
x_bcast = comm.bcast(x, root=0)
all_equal = comm.allreduce(x==x_bcast, op=MPI.LAND)
if not all_equal:
   raise Exception()
...
```
Is there a better way to do this?


--
Niranda Perera
https://niranda.dev/
@n1r44

Re: [OMPI users] Using OSU benchmarks for checking Infiniband network

2022-02-11 Thread Joseph Schuchart via users

I am not aware of anything similar in Open MPI. Maybe OSU-INAM can work 
with other MPI implementations? Would be worth investigating...


Joseph

On 2/11/22 06:54, Bertini, Denis Dr. wrote:


Hi Joseph

Looking at the MVAPICH i noticed that, in this MPI implementation

a Infiniband Network Analysis  and Profiling Tool  is provided:


OSU-INAM


Is there something equivalent using openMPI ?

Best

Denis



*From:* users  on behalf of Joseph 
Schuchart via users 

*Sent:* Tuesday, February 8, 2022 4:02:53 PM
*To:* users@lists.open-mpi.org
*Cc:* Joseph Schuchart
*Subject:* Re: [OMPI users] Using OSU benchmarks for checking 
Infiniband network

Hi Denis,

Sorry if I missed it in your previous messages but could you also try
running a different MPI implementation (MVAPICH) to see whether Open MPI
is at fault or the system is somehow to blame for it?

Thanks
Joseph

On 2/8/22 03:06, Bertini, Denis Dr. via users wrote:
>
> Hi
>
> Thanks for all these informations !
>
>
> But i have to confess that in this multi-tuning-parameter space,
>
> i got somehow lost.
>
> Furthermore it is somtimes mixing between user-space and kernel-space.
>
> I have only possibility to act on the user space.
>
>
> 1) So i have on the system max locked memory:
>
>                         - ulimit -l unlimited (default )
>
>   and i do not see any warnings/errors related to that when 
launching MPI.

>
>
> 2) I tried differents algorithms for MPI_all_reduce op.  all showing
> drop in
>
> bw for size=16384
>
>
> 4) I disable openIB ( no RDMA, ) and used only TCP, and i noticed
>
> the same behaviour.
>
>
> 3) i realized that increasing the so-called warm up parameter  in the
>
> OSU benchmark (argument -x 200 as default) the discrepancy.
>
> At the contrary putting lower threshold ( -x 10 ) can increase this BW
>
> discrepancy up to factor 300 at message size 16384 compare to
>
> message size 8192 for example.
>
> So does it means that there are some caching effects
>
> in the internode communication?
>
>
> From my experience, to tune parameters is a time-consuming and 
cumbersome

>
> task.
>
>
> Could it also be the problem is not really on the openMPI
> implemenation but on the
>
> system?
>
>
> Best
>
> Denis
>
> 
> *From:* users  on behalf of Gus
> Correa via users 
> *Sent:* Monday, February 7, 2022 9:14:19 PM
> *To:* Open MPI Users
> *Cc:* Gus Correa
> *Subject:* Re: [OMPI users] Using OSU benchmarks for checking
> Infiniband network
> This may have changed since, but these used to be relevant points.
> Overall, the Open MPI FAQ have lots of good suggestions:
> https://www.open-mpi.org/faq/
> some specific for performance tuning:
> https://www.open-mpi.org/faq/?category=tuning
> https://www.open-mpi.org/faq/?category=openfabrics
>
> 1) Make sure you are not using the Ethernet TCP/IP, which is widely
> available in compute nodes:
> mpirun  --mca btl self,sm,openib  ...
>
> https://www.open-mpi.org/faq/?category=tuning#selecting-components
>
> However, this may have changed lately:
> https://www.open-mpi.org/faq/?category=tcp#tcp-auto-disable
> 2) Maximum locked memory used by IB and their system limit. Start
> here:
> 
https://www.open-mpi.org/faq/?category=openfabrics#limiting-registered-memory-usage

> 3) The eager vs. rendezvous message size threshold. I wonder if it may
> sit right where you see the latency spike.
> https://www.open-mpi.org/faq/?category=all#ib-locked-pages-user
> 4) Processor and memory locality/affinity and binding (please check
> the current options and syntax)
> https://www.open-mpi.org/faq/?category=tuning#using-paffinity-v1.4
>
> On Mon, Feb 7, 2022 at 11:01 AM Benson Muite via users
>  wrote:
>
> Following https://www.open-mpi.org/doc/v3.1/man1/mpirun.1.php
>
> mpirun --verbose --display-map
>
> Have you tried newer OpenMPI versions?
>
> Do you get similar behavior for the osu_reduce and osu_gather
> benchmarks?
>
> Typically internal buffer sizes as well as your hardware will affect
> performance. Can you give specifications similar to what is
> available at:
> http://mvapich.cse.ohio-state.edu/performance/collectives/
> where the operating system, switch, node type and memory are
> indicated.
>
> If you need good performance, may want to also specify the algorithm
> used. You can find some of the parameters you can tune using:
>
> ompi_info --all
>
> A particular helpful parameter is:
>
> MCA coll tuned: para

Re: [OMPI users] Using OSU benchmarks for checking Infiniband network

2022-02-08 Thread Joseph Schuchart via users

Hi Denis,

Sorry if I missed it in your previous messages but could you also try 
running a different MPI implementation (MVAPICH) to see whether Open MPI 
is at fault or the system is somehow to blame for it?

Thanks
Joseph

On 2/8/22 03:06, Bertini, Denis Dr. via users wrote:

Hi

Thanks for all these informations !

But i have to confess that in this multi-tuning-parameter space,

i got somehow lost.

Furthermore it is somtimes mixing between user-space and kernel-space.

I have only possibility to act on the user space.

1) So i have on the system max locked memory:

                        - ulimit -l unlimited (default )

  and i do not see any warnings/errors related to that when launching MPI.

2) I tried differents algorithms for MPI_all_reduce op.  all showing 
drop in

bw for size=16384

4) I disable openIB ( no RDMA, ) and used only TCP,  and i noticed

the same behaviour.

3) i realized that increasing the so-called warm up parameter  in the

OSU benchmark (argument -x 200 as default) the discrepancy.

At the contrary putting lower threshold ( -x 10 ) can increase this BW

discrepancy up to factor 300 at message size 16384 compare to

message size 8192 for example.

So does it means that there are some caching effects

in the internode communication?

From my experience, to tune parameters is a time-consuming and cumbersome

task.

Could it also be the problem is not really on the openMPI 
implemenation but on the

system?

Best

Denis

*From:* users  on behalf of Gus 
Correa via users 

*Sent:* Monday, February 7, 2022 9:14:19 PM
*To:* Open MPI Users
*Cc:* Gus Correa
*Subject:* Re: [OMPI users] Using OSU benchmarks for checking 
Infiniband network

This may have changed since, but these used to be relevant points.
Overall, the Open MPI FAQ have lots of good suggestions:
https://www.open-mpi.org/faq/
some specific for performance tuning:
https://www.open-mpi.org/faq/?category=tuning
https://www.open-mpi.org/faq/?category=openfabrics

1) Make sure you are not using the Ethernet TCP/IP, which is widely 
available in compute nodes:

mpirun  --mca btl self,sm,openib  ...

https://www.open-mpi.org/faq/?category=tuning#selecting-components

However, this may have changed lately: 
https://www.open-mpi.org/faq/?category=tcp#tcp-auto-disable
2) Maximum locked memory used by IB and their system limit. Start 
here: 
https://www.open-mpi.org/faq/?category=openfabrics#limiting-registered-memory-usage
3) The eager vs. rendezvous message size threshold. I wonder if it may 
sit right where you see the latency spike.

https://www.open-mpi.org/faq/?category=all#ib-locked-pages-user
4) Processor and memory locality/affinity and binding (please check 
the current options and syntax)

https://www.open-mpi.org/faq/?category=tuning#using-paffinity-v1.4

On Mon, Feb 7, 2022 at 11:01 AM Benson Muite via users 
 wrote:

Following https://www.open-mpi.org/doc/v3.1/man1/mpirun.1.php

mpirun --verbose --display-map

Have you tried newer OpenMPI versions?

Do you get similar behavior for the osu_reduce and osu_gather
benchmarks?

Typically internal buffer sizes as well as your hardware will affect
performance. Can you give specifications similar to what is
available at:
http://mvapich.cse.ohio-state.edu/performance/collectives/
where the operating system, switch, node type and memory are
indicated.

If you need good performance, may want to also specify the algorithm
used. You can find some of the parameters you can tune using:

ompi_info --all

A particular helpful parameter is:

MCA coll tuned: parameter "coll_tuned_allreduce_algorithm" (current
value: "ignore", data source: default, level: 5 tuner/detail,
type: int)
                           Which allreduce algorithm is used. Can be
locked down to any of: 0 ignore, 1 basic linear, 2 nonoverlapping
(tuned
reduce + tuned bcast), 3 recursive doubling, 4 ring, 5 segmented ring
                           Valid values: 0:"ignore",
1:"basic_linear",
2:"nonoverlapping", 3:"recursive_doubling", 4:"ring",
5:"segmented_ring", 6:"rabenseifner"
           MCA coll tuned: parameter
"coll_tuned_allreduce_algorithm_segmentsize" (current value: "0",
data
source: default, level: 5 tuner/detail, type: int)

For OpenMPI 4.0, there is a tuning program [2] that might also be
helpful.

[1]

https://stackoverflow.com/questions/36635061/how-to-check-which-mca-parameters-are-used-in-openmpi
[2] https://github.com/open-mpi/ompi-collectives-tuning

On 2/7/22 4:49 PM, Bertini, Denis Dr. wrote:
> Hi
>
> When i repeat i always got the huge discrepancy at the
>
> message size of 16384.
>
> May be there is a way to run mpi in verbose mode in order
>
> to further investigate this behaviour?
>
> Best
>
> Denis
>
>

Re: [OMPI users] OpenMPI and maker - Multiple messages

2021-02-18 Thread Joseph Schuchart via users

Thomas,

The post you are referencing suggests to run

mpiexec -mca btl ^openib -n 40 maker -help

but you are running

mpiexec -mca btl ^openib -N 5 gcc --version

which will run 5 instances of GCC. The out put you're seeing is totally
to be expected.

I don't think anyone here can help you with getting maker to work
correctly with MPI. I suggest you first check whether make is actually
configured to use MPI. The test using maker (as suggested in the post)
does not necessarily mean that Open MPI isn't working, it might also
happen if maker is not correctly configured.

One way to check whether Open MPI is working correctly on your system is
to use a simple MPI program that prints the world communicator size and
rank. Any MPI hello world program you find online should do.

Cheers
Joseph

On 2/18/21 1:39 PM, Thomas Eylenbosch via users wrote:

Hello

We are trying to run
maker(http://www.yandell-lab.org/software/maker.html) in combination
with OpenMPI

But when we are trying to submit a job with the maker and openmpi,

We see the following error in the log file:

--Next Contig--

Another instance of maker is processing *this*contig!!

SeqID: chrA10

Length: 17398227

According to

http://gmod.827538.n3.nabble.com/Does-maker-support-muti-processing-for-a-single-long-fasta-sequence-using-openMPI-td4061342.html

We have to run the following command mpiexec -mca btl ^openib -n 40
maker -help

“If you get a single help message then everything is fine. If you get
40 help messages, then MPI is not communicating correctly.”

We are using the following command to demonstrate what is going wrong:

mpiexec -mca btl ^openib -N 5 gcc --version

gcc (GCC) 10.2.0