og here:
https://drive.google.com/drive/u/0/folders/0B6O-L5Y7BiGJfmQ4N2FpblBEcFNxaDZnaGpsUFFEUlotVWFjajR0UFFHNk5aYlhoSHVTWkU
On Tue, Apr 3, 2018 at 9:47 AM, Gilles Gouaillardet <gil...@rist.or.jp
<mailto:gil...@rist.or.jp>> wrote:
Can you please compress and attach your config.lo
Can you please compress and attach your config.log ?
You might also want to double check you can compile *and* run a simple C
hello world program
Cheers,
Gilles
On 4/3/2018 1:06 PM, abhisek Mondal wrote:
Hi,
I need some help regarding compiling Openmpi-3.0. I have perfectly
working C
PURPOSE.
I have gcc and gcc-c++ installed.
On Tue, Apr 3, 2018 at 10:08 AM, Gilles Gouaillardet
<gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:
This is the relevant part related to this error
configure:6620: checking whether we are cross compiling
configure:
Let me shed a different light on that.
Once in a while, I run Open MPI between x86_64 and sparcv9, and it works
quite well as far as I am concerned.
Note this is the master branch, and I never try older nor releases branches.
Note you likely need to configure Open MPI with
Supun,
did you configure Open MPI with --disable-dlopen ?
It was previously reported that this option disable the patcher (memory
registration),
which impacts performance negatively.
If yes, then I suggest you reconfigure (and rebuild) without this option
and see if it helps
Cheers,
Charles,
are you saying that even if you
mpirun --mca pml ob1 ...
(e.g. force the ob1 component of the pml framework) the memory leak is
still present ?
As a side note, we strongly recommend to avoid
configure --with-FOO=/usr
instead
configure --with-FOO
should be used (otherwise you will end
Andrei,
you can
mpirun --mca btl ^openib ...
in order to "disable" infiniband
Cheers,
Gilles
On Mon, Nov 12, 2018 at 9:52 AM Andrei Berceanu
wrote:
>
> The node has an IB card, but it is a stand-alone node, disconnected from the
> rest of the cluster.
> I am using OMPI to communicate
I digged a bit the configury logic and found
./configure --prefix=/usr/local/ --with-cuda --with-slurm
--with-pmi=/usr/local/slurm
should do the trick, if not
./configure --prefix=/usr/local/ --with-cuda --with-slurm
--with-pmi=/usr/local/slurm --with-pmi-libdir=/usr/local/slurm/lib64
In this case, some Open MPI plugins are missing some third party libraries,
so you would have to ldd all the plugins (e.g. the .so files) located
in /lib/openmpi
in order to evidence any issue.
Cheers,
Gilles
On Thu, Oct 4, 2018 at 4:34 PM John Hearns via users
wrote:
>
> Michele one tip:
Jeff,
that could be a copy/paste error and/or an email client issue.
The syntax is
mpirun --mca variable value ...
(short hyphen, short hyphen, m, c, a)
The error message is about the missing —-mca executable
(long hyphen, short hyphen, m, c, a)
This is most likely the root cause of this
Matt,
There are two ways of using PMIx
- if you use mpirun, then the MPI app (e.g. the PMIx client) will talk
to mpirun and orted daemons (e.g. the PMIx server)
- if you use SLURM srun, then the MPI app will directly talk to the
PMIx server provided by SLURM. (note you might have to srun
Eduardo,
By config.log, we mean the config.log automatically generated by your
configure command
(e.g. not the output of the configure command)
this is a huge file, so please compress it
Cheers,
Gilles
this file should start with
This file contains any messages produced by compilers
Hi,
Your understanding is incorrect :
"Attributes are local to the process and specific to the communicator
to which they are attached."
(per
https://www.mcs.anl.gov/research/projects/mpi/mpi-standard/mpi-report-1.1/node119.htm)
think an attribute is often a pointer, and really bad things can
Adam,
Are you using btl/tcp (e.g. plain TCP/IP) for internode communications
? Or are you using libfabric on top of the latest EC2 drivers ?
There is no control flow in btl/tcp, which means for example if all
your nodes send messages to rank 0, that can create a lot of
unexpected messages on
ranks at these
>> points. So, does your comment about using the coll/sync module apply in
>> this case? I'm not familiar with this module - is this something I specify
>> at OpenMPI compile time or a runtime option that
I enable?
>>
>> Thanks for the detailed help.
Hi,
Open MPI does not do this out of the box.
MPI profilers can achieve that. Some are commercials (ITAC from Intel,
MAP from Allinea/ARM) and some are free (Score-P)
An other option is to build your own mini-profiler with the PMPI interface.
You can find an example at
My understanding is that MPI tasks will be launched inside a singularity
container.
In a typical environment, mpirun spawns an instance of the orted on each
node, and then each orted daemon (or mpirun on the local node) fork
the MPI tasks (a.out)
With singularity, orted would fork a
e MPI_Comm_dup() once with the help of attribute,
> instead of duplicating the MPI_COMM_WORLD whenever there is a sequential
> execution. Am I right?
>
> Best wishes!
>
> Gilles Gouaillardet 于2018年12月16日周日 上午8:14写道:
>>
>> Hi,
>>
>> Your understanding is i
Thanks for the report.
As far as I am concerned, this is a bug in the IMB benchmark, and I
issued a PR to fix that
https://github.com/intel/mpi-benchmarks/pull/11
Meanwhile, you can manually download and apply the patch at
https://github.com/intel/mpi-benchmarks/pull/11.patch
Cheers,
Thanks Mikhail,
You have a good point.
With the current semantic used in the IMB benchmark, this cannot be
equivalent to
MPI_Reduce() of N bytes followed by MPI_Scatterv() of N bytes.
So this is indeed a semantical question :
what should be a MPI_Reduce_scatter() of N bytes equivalent
Lothar,
it seems you did not configure Open MPI with --with-pmi=
If SLURM was built with PMIx support, then an other option is to use that.
First, srun --mpi=list will show you the list of available MPI
modules, and then you could
srun --mpi=pmix_v2 ... MPI_Hellow
If you believe that should be
Folks,
sorry for the late follow-up. The config.log was indeed sent offline.
Here is the relevant part :
configure:294375: checking for required lustre data structures
configure:294394: pgcc -O -DNDEBUG -Iyes/include -c conftest.c
PGC-S-0040-Illegal use of symbol, u_int64_t
Jeff,
At first glance, a comment in the code suggests the rationale is to minimize
the number of allocations and hence the time spent registering the memory.
Cheers,
Gilles
Jeff Hammond wrote:
>Why is this allocated statically? I dont understand the difficulty of a
>dynamically allocates
Eduardo,
You have two options to use OmniPath
- “directly” via the psm2 mtl
mpirun —mca pml cm —mca mtl psm2 ...
- “indirectly” via libfabric
mpirun —mca pml cm —mca mtl ofi ...
I do invite you to try both. By explicitly requesting the mtl you will
avoid potential conflicts.
libfabric is used
Eduardo,
The first part of the configure command line is for an install in /usr, but
then there is ‘—prefix=/opt/openmpi/4.0.0’ and this is very fishy.
You should also use ‘—with-hwloc=external’.
How many nodes are you running on and which interconnect are you using ?
What if you
mpirun —mca pml
Alan,
Can you please compress and post your config.log ?
My understanding of the mentioned commit is it does not build the
reachable/netlink component if libnl version 1 is used (by third party libs
such as mxm).
I do not believe it should abort configure
Cheers,
Gilles
On Saturday, September
Ben,
This is a bug that will be fixed in 4.0.1 (it is already fixed in the
v4.0.x branch)
meanwhile, you can add
rmaps_base_mapping_policy=numa:OVERSUBSCRIBE
in your openmpi-mca-params.conf.
Note the default policy is to bind to NUMA domain if there are more than
two MPI tasks, and bind
Patrick,
Does “no network is available” means the lo interface (localhost 127.0.0.1)
is not even available ?
Cheers,
Gilles
On Monday, January 28, 2019, Patrick Bégou <
patrick.be...@legi.grenoble-inp.fr> wrote:
> Hi,
>
> I fall in a strange problem with OpenMPI 3.1 installed on a CentOS7
>
Siegmar,
Is this issue specific to the PGI compiler ?
What if you
ulimit -s
before invoking mpirun, is that good enough to workaround the problem ?
Cheers,
Gilles
On Tue, Mar 26, 2019 at 6:32 PM Siegmar Gross
wrote:
>
> Hi,
>
> I've installed openmpi-v4.0.x-201903220241-97aa434 and
>
Carlos,
can you post a trimmed version of your code that evidences the issue ?
Keep in mind that if you want to write MPI code that is correct with respect to
the standard, you should assume MPI_Send() might block until a matching receive
is posted.
Cheers,
Gilles
Sent from my iPod
> On
Do not get fooled by the symlinks to opal_wrapper !
opal_wrapper checks how it is invoked (e.g. check argv[0] in main()) and
the behavior is different
if it is invoked as mpicc, mpiCC, mpifort and other
If the error persists with mpicc, you can manually extract the mpicc
command line, and
Christoph,
I do not know how to fix this, but here are some suggestions/thoughts
- do you need the -f flag ? if not, just remote it
- what if you mpirun strace -o /dev/null ... ?
- if the former works, then you might want to redirect the strace
output to a local file (mpirun wrapper.sh, in which
t would
>>> allow Open MPI to use multiple ports similar to what iperf is doing?
>>>
>>> Thanks.
>>> -Adam
>>>
>>> On Mon, Jul 10, 2017 at 9:31 PM, Adam Sylvester
>>> wrote:
>>>
>>>> Thanks again Gilles. Ahh, better yet
this fix included in the PMIx 2.2.2
> https://github.com/pmix/pmix/releases/tag/v2.2.2 ?
>
>
>
>
> All the best,
>
>
> ____
> From: users on behalf of Gilles
> Gouaillardet
> Sent: Sunday, February 24, 2019 4:09 AM
> To:
ore Laboratory (KSL)
> King Abdullah University of Science and Technology
> Building 1, Al-Khawarizmi, Room 0123
> Mobile : +966 (0) 55-247-9568
> Mobile : +20 (0) 106-146-9644
> Office : +966 (0) 12-808-0367
>
>
> From: users on
Daniel,
On 3/4/2019 3:18 PM, Daniel Letai wrote:
So unless you have a specific reason not to mix both, you might also
give the internal PMIx a try.
Does this hold true for libevent too? Configure complains if libevent
for openmpi is different than the one used for the other tools.
I am
mon". Neither seem to pull from the correct path.
>
>
> Regards,
>
> Dani_L.
>
>
> On 2/24/19 3:09 AM, Gilles Gouaillardet wrote:
>
> Passant,
>
> you have to manually download and apply
> https://github.com/pmix/pmix/commit/2e2f4445b45eac5a3fcbd409c81
:
Sent from my iPhone
On 3 Mar 2019, at 16:31, Gilles Gouaillardet
wrote:
Daniel,
PMIX_MODEX and PMIX_INFO_ARRAY have been removed from PMIx 3.1.2, and
Open MPI 4.0.0 was not ready for this.
You can either use the internal PMIx (3.0.2), or try 4.0.1rc1 (with
the external PMIx 3.1.2
Hi,
PMIx has cross-version compatibility, so as long as the PMIx library
used by SLURM is compatible with the one (internal or external) used
by Open MPI, you should be fine.
If you want to minimize the risk of cross-version incompatibility,
then I encourage you to use the same (and hence
5 1.10.4 "make check" problems w/OpenMPI
3.1.3
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
This is on GPFS. I'll try it on XFS to see if it makes any difference.
On 2/16/19 11:57 PM, Gilles Gouaillardet wrote:
Ryan,
What filesystem are you running on ?
Open MPI defaults to the omp
FWIW I could observe some memory leaks on both mpirun and MPI task 0 with the
latest master branch.
So I guess mileage varies depending on available RAM and number of iterations.
Sent from my iPod
> On Mar 17, 2019, at 20:47, Riebs, Andy wrote:
>
> Thomas, your test case is somewhat similar
4/19 8:28 AM, Gilles Gouaillardet wrote:
Daniel,
On 3/4/2019 3:18 PM, Daniel Letai wrote:
So unless you have a specific reason not to mix both, you might
also give the internal PMIx a try.
Does this hold true for libevent too? Configure complains if
libevent for openmpi is different than the
with
that?
Before that, when we tried to launch MPI apps directly with srun, we
got the error message saying Slurm missed the PMIx support, that's why
we proceeded with the installation.
All the best,
--
Passant
On Mar 12, 2019 6:53 AM, Gilles Gouaillardet wrote:
Passant,
I built a similar
anding the logs)
Cheers,
Gilles
On 3/12/2019 1:41 AM, Michael Di Domenico wrote:
On Mon, Mar 11, 2019 at 12:09 PM Gilles Gouaillardet
wrote:
You can force
mpirun --mca pml ob1 ...
And btl/vader (shared memory) will be used for intra node communications ...
unless MPI tasks are from dif
and Technology
Building 1, Al-Khawarizmi, Room 0123
Mobile : +966 (0) 55-247-9568
Mobile : +20 (0) 106-146-9644
Office : +966 (0) 12-808-0367
From: users on behalf of Gilles Gouaillardet
Sent: Tuesday, March 12, 2019 8:22 AM
To: users@lists.open-mpi.org
Michael,
You can
mpirun --mca pml_base_verbose 10 --mca btl_base_verbose 10 --mca
mtl_base_verbose 10 ...
It might show that pml/cm and mtl/psm2 are used. In that case, then yes, the
OmniPath library is used even for intra node communications. If this library is
optimized for intra node,
Jeff,
The first location is indeed in ompi_coll_libnbc_iallreduce()
Lee Ann,
thanks for the bug report,
for the time being, can you please give the attached patch a try ?
Cheers,
Gilles
FWIW
NBC_Schedule_request() sets handle->tmpbuf = tmpbuf and call
NBC_Start(handle, schedule)
Simone,
If you want to run a single MPI task, you can either
- mpirun -np 1 ./a.out (this is the most standard option)
- ./a.out (this is the singleton mode. Note a.out will fork an
orted daemon under the hood, this is necessary for example if your app
will MPI_Comm_spawn().
-
k which tests are being run by that.
> >
> > Edgar
> >
> >> -----Original Message-
> >> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles
> >> Gouaillardet
> >> Sent: Saturday, February 16, 2019 1:49 AM
> >
Thanks Bart,
I opened https://github.com/open-mpi/ompi/issues/6394 to track this
issue, and we should follow-up there from now.
FWIW, I added a more minimal example, and a possible fix.
Cheers,
Gilles
On 2/18/2019 12:43 AM, Bart Janssens wrote:
I just tried on master (commit
Jingchao,
The error message is not coming from Open MPI but from the PMIx component.
adding the following line in pmix-mca-params.conf should do the trick
mca_base_component_show_load_errors=0
Cheers,
Gilles
On 2/12/2019 7:31 AM, Jingchao Zhang wrote:
Hi,
We have both psm and psm2
Ryan,
Can you
export OMPI_MCA_io=^ompio
and try again after you made sure this environment variable is passed by srun
to the MPI tasks ?
We have identified and fixed several issues specific to the (default) ompio
component, so that could be a valid workaround until the next release.
Cheers,
Bart,
It looks like a bug that involves the osc/rdma component.
Meanwhile, you can
mpirun --mca osc ^rdma ...
Cheers,
Gilles
On Sat, Feb 16, 2019 at 8:43 PM b...@bartjanssens.org
wrote:
>
> Hi,
>
> Running the following test code on two processes:
>
> #include
> #include
> #include
>
>
BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
This is on GPFS. I'll try it on XFS to see if it makes any difference.
On 2/16/19 11:57 PM, Gilles Gouaillardet wrote:
Ryan,
What filesystem are you running on ?
Open MPI defaults to the ompio component, except on Lustre
filesystem where ROMIO is
Patrick,
The root cause is we do not include the localhost interface by default
for OOB communications.
You should be able to run with
mpirun --mca oob_tcp_if_include lo -np 4 hostname
Cheers,
Gilles
On 1/28/2019 11:02 PM, Patrick Bégou wrote:
Hi,
I fall in a strange problem with
will include this fix, meanwhile, you can
either remove the virbr0 interface
or use the workaround I previously described
Cheers,
Gilles
On 1/29/2019 1:56 PM, Gilles Gouaillardet wrote:
Patrick,
The root cause is we do not include the localhost interface by default
for OOB communications
Great point from David !
As a side note, you can
configure --enable-mpirun-prefix-by-default ... && make install
If you choose to do so, you will not have to set LD_LIBRARY_PATH since
it will be "built in" the Open MPI binaries/libraries (via the -rpath
linker option)
Cheers,
Gilles
On Sat,
Daniel,
If your MPI singleton will never MPI_Comm_spawn(), then you can use the
isolated mode like this
OMPI_MCA_ess_singleton_isolated=true ./program
You can also save some ports by blacklisting the btl/tcp component
OMPI_MCA_ess_singleton_isolated=true OMPI_MCA_pml=ob1
Sergio,
I think your best option here is to install an aarch64 (read 64bits ARM)
distro on your raspberry pi 3.
/* FWIW, the default raspberry distro is 32bits so it can run on all
raspberry pi models */
If you cannot/do not wish to go that way, make sure Open MPI is built
with
Dave,
Can you please post the configure command line of the Open MPI you are trying
to relocate ?
Cheers,
Gilles
Dave Love wrote:
>Reuti writes:
>
>> export OPAL_PREFIX=
>>
>> to point it to the new location of installation before you start `mpiexec`.
>
>Thanks; that's now familiar, and I
Passant,
UCX 1.6.0 is not yet officially released, and it seems Open MPI
(4.0.1) does not support it yet, and some porting is needed.
Cheers,
Gilles
On Tue, Jun 25, 2019 at 5:13 PM Passant A. Hafez via users
wrote:
>
> Hello,
>
>
> I'm trying to build ompi 4.0.1 with external ucx 1.6.0 but
Eric,
which version of Open MPI are you using ? how many hosts in your hostsfile ?
The error message suggests this could be a bug within Open MPI, and a
potential workaround for you would be to try
mpirun -np 84 - -hostfile hostsfile --mca routed direct ./openmpi_hello.c
You might also want to
Thanks Junchao,
I issued https://github.com/open-mpi/ompi/pull/6782 in order to fix this
(and the alltoallw variant as well)
Meanwhile, you can manually download and apply the patch at
https://github.com/open-mpi/ompi/pull/6782.patch
Cheers,
Gilles
On 6/28/2019 1:10 PM, Zhang,
Adrian,
the MPI application relies on some environment variables (they typically
start with OMPI_ and PMIX_).
The MPI application internally uses a PMIx client that must be able to
contact a PMIx server
(that is included in mpirun and the orted daemon(s) spawned on the
remote hosts).
t;--> Process # 0 of 2 is alive. ->test1
>>--> Process # 1 of 2 is alive. ->test2
>>
>> I need to tell Podman to mount /tmp from the host into the container, as
>> I am running rootless I also need to tell Podman to use the same user ID
&
Thanks for the report,
this is indeed a bug I fixed at https://github.com/open-mpi/ompi/pull/6790
meanwhile, you can manually download and apply the patch at
https://github.com/open-mpi/ompi/pull/6790.patch
Cheers,
Gilles
On 7/3/2019 1:30 AM, Gyevi-Nagy László via users wrote:
Hi,
I
Sebastian,
the PSM2 shared memory segment name is set by the PSM2 library and
my understanding is that Open MPI has no control over it.
If you believe the root cause of the crash is related to non unique PSM2
shared
memory segment name, I guess you should report this at
The root cause is configure cannot run a simple Fortran program
(see the relevant log below)
I suggest you
export LD_LIBRARY_PATH=/share/apps/gcc-5.4.0/lib64:$LD_LIBRARY_PATH
and then try again.
Cheers,
Gilles
configure:44254: checking Fortran value of selected_int_kind(4)
configure:44281:
John,
what if you move some parameters to CPPFLAGS and CXXCPPFLAGS (see the
new configure command line below)
Cheers,
Gilles
'/Users/cary/projects/ulixesall-llvm/builds/openmpi-4.0.1/nodl/../configure'
\
--prefix=/Volumes/GordianStorage/opt/contrib-llvm7_appleclang/openmpi-4.0.1-nodl
\
tps://github.com/openucx/ucx/issues/3336 that the UCX 1.6 might solve this
issue, so I tried the pre-release version to just check if it will.
All the best,
--
Passant
From: users on behalf of Gilles Gouaillardet via
users
Sent: Tuesday, June 25, 2019 11
ch compiler did you use to build Open MPI that fails to build your
>> test ?
>>
>>
>> Cheers,
>>
>> Gilles
>>
>> On Mon, Aug 19, 2019 at 6:49 PM Gilles Gouaillardet
>> wrote:
>> >
>> > Thanks,
>> >
>> > a
gt; size = this%size_dim(this%gi)*this%size_dim(this%gj)*cs3
> if(this%is_exchange_off) then
>call this%update_stats(size)
>this%bf(:,:,1:cs3) = cmplx(0.,0.)
> else
>call MPI_Irecv(this%bf(:,:,1:cs3),size,MPI_COMPLEX_TYPE,&
> t
One more thing ...
Your initial message mentioned a failure with gcc 8.2.0, but your
follow-up message mentions LLVM compiler.
So which compiler did you use to build Open MPI that fails to build your test ?
Cheers,
Gilles
On Mon, Aug 19, 2019 at 6:49 PM Gilles Gouaillardet
wrote:
>
>
Hi,
Can you please post a full but minimal example that evidences the issue?
Also please post your Open MPI configure command line.
Cheers,
Gilles
Sent from my iPod
> On Aug 19, 2019, at 18:13, Sangam B via users
> wrote:
>
> Hi,
>
> I get following error if the application is compiled
Thanks,
and your reproducer is ?
Cheers,
Gilles
On Mon, Aug 19, 2019 at 6:42 PM Sangam B via users
wrote:
>
> Hi,
>
> OpenMPI is configured as follows:
>
> export CC=`which clang`
> export CXX=`which clang++`
> export FC=`which flang`
> export F90=`which flang`
>
> ../configure
Carlos,
MPI_Isend() does not automatically frees the buffer after it sends the message.
(it simply cannot do it since the buffer might be pointing to a global
variable or to the stack).
Can you please extract a reproducer from your program ?
Out of curiosity, what if you insert an (useless)
Juanchao,
Is the issue related to https://github.com/open-mpi/ompi/pull/4501 ?
Jeff,
you might have to configure with --enable-heterogeneous to evidence the
issue
Cheers,
Gilles
On 8/2/2019 4:06 AM, Jeff Squyres (jsquyres) via users wrote:
I am able to replicate the issue on a
Hi,
You need to
configure --with-pmi ...
Cheers,
Gilles
On August 8, 2019, at 11:28 PM, Jing Gong via users
wrote:
Hi,
Recently our Slurm system has been upgraded to 19.0.5. I tried to recompile
openmpi v3.0 due to the bug reported in
https://bugs.schedmd.com/show_bug.cgi?id=6993
that Podman is running rootless. I will continue to investigate, but now
I know where to look. Thanks!
Adrian
On Fri, Jul 12, 2019 at 06:48:59PM +0900, Gilles Gouaillardet via users wrote:
Adrian,
Can you try
mpirun --mca btl_vader_copy_mechanism none ...
Please double check the MCA
Joseph,
you can achieve this via an agent (and it works with DDT too)
For example, the nostderr script below redirects each MPI task's stderr
to /dev/null (so it is not forwarded to mpirun)
$ cat nostderr
#!/bin/sh
exec 2> /dev/null
exec "$@"
and then you can simply
$ mpirun --mca
Camille,
your program is only valid with a MPI library that features
|MPI_SUBARRAYS_SUPPORTED|
and this is not (yet) the case in Open MPI.
A possible fix is to use an intermediate contiguous buffer
integer, allocatable, dimension(:,:,:,:) :: tmp
allocate( tmp(N,N,N,N) )
and then
Guido,
This error message is from MPICH and not Open MPI.
Make sure your environment is correct and the shared filesystem is mounted on
the compute nodes.
Cheers,
Gilles
Sent from my iPod
> On Dec 12, 2019, at 1:44, Guido granda muñoz via users
> wrote:
>
> Hi,
> after following the
Raymond,
In the case of UCX, you can
mpirun --mca pml_base_verbose 10 ...
If the pml/ucx component is used, then your app will run over UCX.
If the pml/ob1 component is used, then you can
mpirun --mca btl_base_verbose 10 ...
btl/self should be used for communications to itself.
if btl/uct
Orion,
thanks for the report.
I can confirm this is indeed an Open MPI bug.
FWIW, a workaround is to disable the fcoll/vulcan component.
That can be achieved by
mpirun --mca fcoll ^vulcan ...
or
OMPI_MCA_fcoll=^vulcan mpirun ...
I also noted the tst_parallel3 program crashes with the
Charles,
unless you expect yes or no answers, can you please post a simple
program that evidences
the issue you are facing ?
Cheers,
Gilles
On 10/29/2019 6:37 AM, Garrett, Charles via users wrote:
Does anyone have any idea why this is happening? Has anyone seen this
problem before?
Your gfortran command line strongly suggests your program is serial and
does not use MPI at all.
Consequently, mpirun will simply spawn 8 identical instances of the very
same program, and no speed up should be expected
(but you can expect some slow down and/or file corruption).
If you
via users
wrote:
Gilles,
Thanks for your suggestions! I just tried both of them, see below:
On 11/1/19 1:15 AM, Gilles Gouaillardet via users wrote:
> Joseph,
>
>
> you can achieve this via an agent (and it works with DDT too)
>
>
> For example, the nostderr script
Ludovic,
in order to figure out which interconnect is used, you can
mpirun --mca pml_base_verbose 10 --mca mtl_base_verbose 10 --mca
btl_base_verbose 10 ...
the output might be a bit verbose, so here are a few tips on how to
get it step by step
first, mpirun --mca pml_base_verbose 10 ...
in
Hi,
The log filenames suggests you are always running on a single node, is
that correct ?
Do you create the input file on the tmpfs once for all? before each run?
Can you please post your mpirun command lines?
If you did not bind the tasks, can you try again
mpirun --bind-to core ...
s on a resource:
Bind to:CORE
Node: compute-0
#processes: 2
#cpus: 1
You can override this protection by adding the "overload-allowed"
option to your binding directive.
—
I will solve this and get back to you soon.
Best regards,
Al
Thanks Jeff for the information and sharing the pointer.
FWIW, this issue typically occurs when libtool pulls the -pthread flag
from libhcoll.la that was compiled with a GNU compiler.
The simplest workaround is to remove libhcoll.la (so libtool simply
links with libhcoll.so and does not pull any
Soporte,
The error message is from MPICH!
If you intend to use Open MPI, fix your environment first
Cheers,
Gilles
Sent from my iPod
> On Jan 15, 2020, at 7:53, SOPORTE MODEMAT via users
> wrote:
>
> Hello everyone.
>
> I would like somebody help me to figure out how can I make that
Note there could be some NUMA-IO effect, so I suggest you compare
running every MPI tasks on socket 0, to running every MPI tasks on
socket 1 and so on, and then compared to running one MPI task per
socket.
Also, what performance do you measure?
- Is this something in line with the
that the
> "-Wl," before "-force_load” might be a workable solution, but I don’t
> understand what to change in order to accomplish this. Might you have some
> suggestions.
>
>
>
>
>
> On Apr 15, 2020, at 0:06, Gilles Gouaillardet via users
> wrote:
>
Sorry for your trouble.
>
>
>
> > On Apr 16, 2020, at 11:49, Gilles Gouaillardet via users
> > wrote:
> >
> > Paul,
> >
> > My ifort eval license on OSX has expired so I cannot test myself,
> > sorry about that.
> >
> > It has been rep
g.
> link_static_flag=""
>
> After running “make clean”, I ran make again and found the same error. Any
> ideas?
>
> FCLD libmpi_usempif08.la
> ld: library not found for -lhwloc
> make[2]: *** [libmpi_usempif08.la] Error 1
> make[1]: *** [all-recursive] Error
Collin,
Do you have any data to backup your claim?
As long as MPI-IO is used to perform file I/O, the Fortran bindings overhead
should be hardly noticeable.
Cheers,
Gilles
On April 6, 2020, at 23:22, Collin Strassburger via users
wrote:
Hello,
Just a quick comment on this; is your
David,
I suggest you rely on well established benchmarks such as IOR or iozone.
As already pointed by Edgar, you first need to make sure you are not
benchmarking your (memory) cache by comparing the bandwidth you measure vs the
performance you can expect from your hardware.
As a side note,
Jonathon,
GPFS is used by both the ROMIO component (that comes from MPICH) and the
fs/gpfs component that is used by ompio
(native Open MPI MPI-IO so to speak).
you should be able to disable both by running
ac_cv_header_gpfs_h=no configure --without-gpfs ...
Note that Open MPI is modular
Levi,
as a workaround, have you tried using mpirun instead of direct launch (e.g.
srun) ?
Note you are using pmix 1.2.5, so you likely want to srun --mpi=pmix_v1
Also, as reported by the logs
1. [nodeA:12838] OPAL ERROR: Error in file pmix3x_client.c at line 112
there is something
801 - 900 of 1020 matches
Mail list logo