Hi,
FCoE is for storage, Ethernet is for the network.
I assume you can ssh into your nodes, which means you have a TCP/IP, and
it is up and running.
i do not know the details of Cisco hardware, but you might be able to
use usnic (native btl or via libfabric) instead of the plain TCP/IP netw
Did you ran
ulimit -c unlimited
before invoking mpirun ?
if your application can be ran with only one tasks, you can try to run it
under gdb.
you will hopefully be able to see where the illegal instruction occurs.
since you are running on AMD processors, you have to make sure you are not
using an
you need to run the ulimit command before mpirun and on the same node.
if it still does not work, then you can use a wrapper.
instead of
mpirun a.out
you would do
mpirun a.sh
a.sh is a script
ulimit -c unlimited
exec a.out
the core is created in the current directory
Cheers,
Gilles
On Saturda
Hi,
Which version of Open MPI are you running ?
I noted that though you are asking three nodes and one task per node, you have
been allocated 2 nodes only.
I do not know if this is related to this issue.
Note if you use the machinefile, a00551 has two slots (since it appears twice
in the machi
a00551.science.domain
>[a00551.science.domain:04104] [[53688,0],0] plm:base:receive update proc
>state command from [[53688,0],1]
>[a00551.science.domain:04104] [[53688,0],0] plm:base:receive got
>update_proc_state for job [53688,1]
>[1,1]:a00551.science.domain
>[a00551.science.domai
>> [a00551.science.domain:04104] [[53688,0],0] complete_setup on job [53688,1]
>> [a00551.science.domain:04104] [[53688,0],0] plm:base:receive update proc
>> state command from [[53688,0],1]
>> [a00551.science.domain:04104] [[53688,0],0] plm:base
t;tm...*sigh*
>
>Thanks again!
>
>Oswin
>
>On 2016-09-07 16:21, Gilles Gouaillardet wrote:
>> Note the torque library will only show up if you configure'd with
>> --disable-dlopen. Otherwise, you can ldd
>> /.../lib/openmpi/mca_plm_tm.so
>>
>> C
up spawning an orted on its own node.
You might try ensuring that your machinefile is using the exact same name as
provided in your allocation
On Sep 7, 2016, at 7:06 AM, Gilles Gouaillardet
wrote:
Thanjs for the ligs
From what i see now, it looks like a00551 is running both mpirun and
6-09-07 14:41, Gilles Gouaillardet wrote:
Hi,
Which version of Open MPI are you running ?
I noted that though you are asking three nodes and one task per node,
you have been allocated 2 nodes only.
I do not know if this is related to this issue.
Note if you use the machinefile, a00551 has two sl
n
options to do so. We give you lots and lots of knobs for just that reason.
On Sep 7, 2016, at 10:53 PM, Gilles Gouaillardet wrote:
Ralph,
there might be an issue within Open MPI.
on the cluster i used, hostname returns the FQDN, and $PBS_NODEFILE uses the
FQDN too.
my $PBS_NODEFILE ha
a range of apps.
Lesson to be learned: never, ever muddle around with a
system-generated file. If you want to modify where things go, then use
one or more of the mpirun options to do so. We give you lots and lots
of knobs for just that reason.
On Sep 7, 2016, at 10:53 PM, Gilles Gouaillardet
component tm
On 2016-09-08 10:18, Gilles Gouaillardet wrote:
Ralph,
i am not sure i am reading you correctly, so let me clarify.
i did not hack $PBS_NODEFILE for fun nor profit, i was simply trying
to reproduce an issue i could not reproduce otherwise.
/* my job submitted with -l nodes=3
:receive got
>update_proc_state for job [34937,1]
>[a00551.science.domain:18889] [[34937,0],0] plm:base:receive got
>update_proc_state for vpid 2 state NORMALLY TERMINATED exit_code 0
>[a00551.science.domain:18889] [[34937,0],0] plm:base:receive done
>processing commands
>[a00551
mand from [[34937,0],2]
>[a00551.science.domain:18889] [[34937,0],0] plm:base:receive got
>update_proc_state for job [34937,1]
>[a00551.science.domain:18889] [[34937,0],0] plm:base:receive got
>update_proc_state for vpid 2 state NORMALLY TERMINATED exit_code 0
>[a00551.science.domai
_PATH)
Cheers,
Gilles
On 9/12/2016 3:59 PM, Mahmood Naderan wrote:
Hi,
Following the suggestion by Gilles Gouaillardet
(https://mail-archive.com/users@lists.open-mpi.org/msg29688.html), I
ran a configure command for a program like this
# ../Src/configure FC=/export/apps/siesta/openmpi-1.8.
That sounds good to me !
just to make it crystal clear ...
assuming you configure'd your Open MPI 1.8.8 with
--enable-mpirun-prefix-by-default
(and if you did not, i do encourage you to do so), then all you need is
to remove
/opt/openmpi/lib from your LD_LIBRARY_PATH
(e.g. you do *not* ha
Basically, it means libs with be linked with
-Wl,-rpath,/export/apps/siesta/openmpi-1.8.8/lib
so if you run a.out with an empty $LD_LIBRARY_PATH, then it will look
for the MPI libraries in
/export/apps/siesta/openmpi-1.8.8/lib
Cheers,
Gilles
On 9/12/2016 4:50 PM, Mahmood Naderan wrote:
Mahmood,
I was suggesting you (re)configure (i assume you did it) the Open MPI
1.8.8 installed in /export/apps/siesta/openmpi-1.8.8 with
--enable-mpirun-prefix-by-default
Cheers,
Gilles
On 9/12/2016 4:51 PM, Mahmood Naderan wrote:
> --enable-mpirun-prefix-by-default
What is that? Does t
Mahmood,
you need to manually remove /opt/openmpi/lib from your LD_LIBRARY_PATH
(or have your sysadmin do it if this is somehow done automatically)
the point of configuring with --enable-mpirun-prefix-by-default is you
do *not* need
to add /export/apps/siesta/openmpi-1.8.8/lib in your LD_LIBRA
Mahmood,
mpi_siesta is a siesta library, not an Open MPI library.
fwiw, you might want to try again from scratch with
MPI_INTERFACE=libmpi_f90.a
DEFS_MPI=-DMPI
in your arch.make
i do not think libmpi_f90.a is related to an OpenMPI library.
if you need some more support, please refer to the sies
That typically occurs if nwchem is linked with MPICH and you are using
OpenMPI mpirun.
A first, i recommend you double check your environment, and run
ldd nwchem
the very same Open MPI is used by everyone
Cheers,
Gilles
On Wednesday, September 14, 2016, abhisek Mondal
wrote:
> Hi,
> I'm on a s
Mahmood,
try to prepend /export/apps/siesta/openmpi-1.8.8/lib to your
$LD_LIBRARY_PATH
note this is not required when Open MPI is configure'd with
--enable-mpirun-prefix-by-default
Cheers,
Gilles
On Wednesday, September 14, 2016, Mahmood Naderan
wrote:
> Hi,
> Here is the problem with stat
he library paths to LD_LIBRARY_PATH, but I want
> to statically put the required libraries in the binary.
>
>
>
> Regards,
> Mahmood
>
>
>
> On Wed, Sep 14, 2016 at 4:44 PM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com
> > wrote:
>
Mahmood,
i meant
--disable-dlopen
i am not aware of a --disable-dl-dlopen option, which does not mean it does
not exist
Cheers,
Gilles
On Wednesday, September 14, 2016, Mahmood Naderan
wrote:
> Do you mean --disable-dl-dlopen? The last lines of configure are
>
> +++ Configuring MCA framework
Mahmood,
You can
gdb --pid=core.5383
And then
bt
An then
disas
And "scroll" until the current instruction
Iirc, there is a star at the beginning of this line
You can also try
show maps
Or
info maps
(I cannot remember the syntax...)
Btw, did you compile lapack and friends by yourself ?
Mahmood N
--core=... is the right syntax, sorry about that
No need to recompile with -g, binary is good enough here
Then you need to run
disas
in gdb, to disassemble the instruction at 0x08da76e
And then, still in gdb
info maps
or
show maps
To find out the library this instruction is coming from
OpenBLAS i
Ok, you can try this under gdb
info proc mapping
info registers
x /100x $rip
x /100x $eip
I remember you are running on AMD cpus that is why INTEL is only
instructions must be avoided
Cheers,
Gilles
On Thursday, September 15, 2016, Mahmood Naderan
wrote:
> disas command fails.
>
> Prog
if gcc is installed on your compute node, you can run
echo | gcc -v -E - 2>&1 | grep cc1
and look for the -march=xxx parameter
/* you might want to compare that with your fronted */
And/or you can run
grep family /proc/cpuinfo
on your compute node
Then
man gcc
on your front end node
>From my gc
Mahmood,
-march=bdver1
should be ok on your nodes.
from the gcc command line, i was expecting -march=xxx, but it is
missing (your gcc might be a bit older for that)
note you have to recompile all your libs (openblas and friends) with
-march=bdver1
i guess your gdb is also a bit too old to suppor
Mahmood,
note you have to compile the source file that contains the snippet
with '-g -O0', and link with '-g -O0'
also, there was a typo in the gdb command,
please read "frame 1" instead of "frame #1"
Cheers,
Gilles
On Fri, Sep 16, 2016 at 12:53 P
Fred,
Can you try to configure with
--disable-nvml
This is an option that should be passed to embedded hwloc.
Cheers,
Gilles
On Tuesday, September 20, 2016, Fred Mioux wrote:
> Hello ,
>
>
>
> I compile OpenMPI on a machine which support CUDA and NVML, but I don’t
> want to include this in m
Brice,
An other option is to add a --with-hwloc-flags configure option to Open
MPI, and pass the value to embedded hwloc configure.
We already do that for ROMIO (--with-io-romio-flags)
Cheers,
Gilles
On Tuesday, September 20, 2016, Brice Goglin wrote:
> Hello
> Assuming this NVML detection is
Cheers,
Gilles
Fred Mioux wrote:
>Thank you for your answer.
>
>I already tried this but it doesn't work.
>
>
>Regards
>
>Fred Mioux
>
>
>2016-09-20 16:50 GMT+02:00 Gilles Gouaillardet :
>
>Fred,
>
>
>Can you try to configure with
>
>--d
That was a kind of braimstorming
This is not implemented and cannot work, see my previous message
Cheers,
Gilles
Fred Mioux wrote:
>Thank you for your answer, I will try it.
>
>
>Regards,
>
>Fred Mioux
>
>
>2016-09-20 17:02 GMT+02:00 Gilles Gouaillardet :
>
>
Justin,
i do not see this error on my laptop
which version of OS X are you running ?
can you try to
TMPDIR=/tmp mpirun -n 1
Cheers,
Gilles
On Thu, Sep 22, 2016 at 7:21 PM, Nathan Hjelm wrote:
> FWIW it works fine for me on my MacBook Pro running 10.12 with Open MPI 2.0.1
> installed through
ustin
On Thu, Sep 22, 2016 at 7:33 AM, Gilles Gouaillardet
wrote:
Justin,
i do not see this error on my laptop
which version of OS X are you running ?
can you try to
TMPDIR=/tmp mpirun -n 1
Cheers,
Gilles
On Thu, Sep 22, 2016 at 7:21 PM, Nathan Hjelm wrote:
FWIW it works fine for me
Marcin,
You can also try to exclude the public subnet(s) (e.g. 1.2.3.0/24) and the
loopback interface instead of em4 that does not exist on the compute nodes.
Or you can include only the private subnet(s) that are common to frontend
and compute nodes
Cheers,
Gilles
On Saturday, September 24, 20
Mahmood,
The node is defined in the PBS config, however it is not part of the
allocation (e.g. job) so it cannot be used, and hence the error message.
In your PBS script, you do not need -np nor -host parameters to your mpirun
command.
Open MPI mpirun will automatically detect it is launched from
Hi,
I can see this error happening if you configure with --disable-dlopen
--with-pmi
In opal/mca/pmix/s?/pmix_s?.c, you can try to add the static keyword before
OBJ_CLASS_INSTANCE(pmi_opcaddy_t, ...)
Or you can update the files to use unique class name (probably safer...)
Cheers,
Gilles
On Wed
tter: @PenguinHPC/
On Tue, Sep 27, 2016 at 11:51 AM, Gilles Gouaillardet
mailto:gilles.gouaillar...@gmail.com>>
wrote:
Hi,
I can see this error happening if you configure with
--disable-dlopen --with-pmi
In opal/mca/pmix/s?/pmix_s?.c, you can try to
Hi,
I do not expect spawn can work with direct launch (e.g. srun)
Do you have PSM (e.g. Infinipath) hardware ? That could be linked to the
failure
Can you please try
mpirun --mca pml ob1 --mca btl tcp,sm,self -np 1 --hostfile my_hosts
./manager 1
and see if it help ?
Note if you have the poss
gt;
> On Sep 29, 2016, at 6:06 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com
> > wrote:
>
> Hi,
>
> I do not expect spawn can work with direct launch (e.g. srun)
>
> Do you have PSM (e.g. Infinipath) hardware ? That could be linked to the
> failure
Rick,
can you please provide some more information :
- Open MPI version
- interconnect used
- number of tasks / number of nodes
- does the hang occur in the first MPI_Bcast of 8000 bytes ?
note there is a known issue if you MPI_Bcast with different but matching
signatures
(e.g. some tas
David,
i guess you would have expected the default mapping/binding scheme is
core instead of sockets
iirc, we decided *not* to bind to cores by default because it is "safer"
if you simply
OMP_NUM_THREADS=8 mpirun -np 2 a.out
then, a default mapping/binding scheme by core means the OpenMP t
manner of problems with binding when using
cpusets/cgroups.
-- bennet
On Thu, Sep 29, 2016 at 9:52 PM, Gilles Gouaillardet wrote:
David,
i guess you would have expected the default mapping/binding scheme is core
instead of sockets
iirc, we decided *not* to bind to cores by default bec
/open-mpi/ompi/pull/2135.patch, or by using the
latest open-mpi from homebrew.
On Fri, Sep 23, 2016 at 11:15 AM, Gilles Gouaillardet wrote:
> Justin,
>
>
> the root cause could be the length of $TMPDIR that might cause some path
> being truncated.
>
> you can check that by
; dest = 0;
>
> source = 0;
>
> std::cout << "Task " << rank << "
> sending." << std::endl;
>
> MPI_Bcast(inmsg,bufsize,
> MPI_BYTE,rank,MPI_COMM_WORL
Mahmood,
iirc, a related bug was fixed in v2.0.0
Can you please update to 2.0.1 and try again ?
Cheers,
Gilles
On Saturday, October 1, 2016, Mahmood Naderan wrote:
> Hi,
> Here is the bizarre behavior of the system and hope that someone can
> clarify is this related to OMPI or not.
>
> When I
Rick,
I do not think ompi_server is required here.
Can you please post a trimmed version of your client and server, and your two
mpirun command lines.
You also need to make sure all ranks have the same root parameter when invoking
MPI_Comm_accept and MPI_Comm_connect
Cheers,
Gilles
"Marlborou
Christophe,
If i read between the lines, you had Open MPI running just fine, then you
upgraded xcode and that broke OpenMPI. Am i right so far ?
Did you build Open MPI by yourself, or did you get binaries from somewhere
(such as brew) ?
In the first case, you need to rebuild Open MPI.
(You have
> if(mpi_error)
>
> {
>
> ...
>
> }
>
> std::cout << "Established port name is " << port_name <<
> std::endl;
>
> s
les;
>
> The abort occurs somewhere between 30 and 60 seconds. Is
> there some configuration setting that could influence this?
>
>
>
> Rick
>
>
>
> *From:* users [mailto:users-boun...@lists.open-mpi.org
> ] *On
> Behalf Of *Gilles Gouaillardet
> *Sent:* Tuesd
Edwin,
changes are summarized in the NEWS file
we used to have two github repositories, and they were "merged" recently
with github, you can list the closed PR for a given milestone
https://github.com/open-mpi/ompi-release/milestones?state=closed
then you can click on a milestone, and list
Juraj,
if i understand correctly, the "master" task calls MPI_Init(), and then
fork&exec matlab.
In some cases (lack of hardware support), fork cannot even work. but
let's assume it is fine for now.
Then, if i read between the lines, matlab calls mexFunction that MPI_Init().
As far as i a
Andre,
this issue has already been reported and fixed
(the fix will be available in v2.0.2)
Meanwhile, you can manually apply the fix available at
https://github.com/open-mpi/ompi/commit/78f74e58d09a3958043a0c70861b26664def3bb3.patch
Cheers,
Gilles
On 10/7/2016 8:44 AM, Andre Kessler w
George,
i tried to mimick this with the latest v1.10, and failed to reproduce
the error.
at first, i recommend you try the latest v1.10 (1.10.4) or event 2.0.1.
unusable stack trace can sometimes be caused by unloaded modules,
so if the issue persists, you might want to try rebuilding Op
Mark,
My understanding is that shell meta expansion occurs once on the first node, so
from an Open MPI point of view, you really invoke
mpirun echo node0
I suspect
mpirun echo 'Hello from $(hostname)'
Is what you want to do
I do not know about
mpirun echo 'Hello from $HOSTNAME'
$HOSTNAME might be
FWIW.
mpicxx does two things :
1) use the C++ compiler (e.g. g++)
2) if Open MPI was configured with (deprecated) C++ bindings (e.g.
--enable-mpi-cxx), then link with
the Open MPI C++ library that contains bindings.
IIRC, Open MPI v1.10 does build C++ bindings by default, but v2.0 does
n
Limin,
It seems libpsm2 provided by Centos 7 is a bit too old
all symbols are prefixed with psm_, and Open MPI expect they are
prefixed with psm2_
i am afraid your only option is to manually install the latest libpsm2
and then configure again with your psm2 install dir
Cheers,
Gilles
Out of curiosity, why do you specify both --hostfile and -H ?
Do you observe the same behavior without --hostfile ~/.mpihosts ?
Also, do you have at least 4 cores on both A.lan and B.lan ?
Cheers,
Gilles
On Sunday, October 16, 2016, MM wrote:
> Hi,
>
> openmpi 1.10.3
>
> this call:
>
> mpirun
Rick,
In my understanding, sensorgroup is a group with only task 1
Consequently, sensorComm is
- similar to MPI_COMM_SELF on task 1
- MPI_COMM_NULL on other tasks, and hence the barrier fails
I suggest you double check sensorgroup is never MPI_GROUP_EMPTY
and add a test not to call MPI_Barrier on
Rick,
I re-read the MPI standard and was unable to figure out if sensorgroup is
MPI_GROUP_EMPTY or a group with task 1 on tasks except task 1
(A group that does not contain the current task makes little sense to me,
but I do not see any reason why this group have to be MPI_GROUP_EMPTY)
Regardless
dispatcher will use the 2 comm groups to coordinate activity. I tried
> adding the dispatcher to the sensorList comm group, but I get an error
> saying “invalid task”.
>
>
>
> Rick
>
>
>
> *From:* users [mailto:users-boun...@lists.open-mpi.org
> ] *On
> Behalf
Sean,
if i understand correctly, your built a libtransport_mpi.so library that
depends on Open MPI, and your main program dlopen libtransport_mpi.so.
in this case, and at least for the time being, you need to use
RTLD_GLOBAL in your dlopen flags.
Cheers,
Gilles
On 10/18/2016 4:53 AM,
Hi,
can you please give the patch below a try ?
Cheers,
Gilles
diff --git a/ompi/tools/wrappers/ompi_wrapper_script.in
b/ompi/tools/wrappers/ompi_wrapper_script.in
index d87649f..b66fec3 100644
--- a/ompi/tools/wrappers/ompi_wrapper_script.in
+++ b/ompi/tools/wrappers/ompi_wrapper_script.in
vtfilter*
>
>lrwxrwxrwx 1 nmahesh nmahesh 8 Oct 18 15:34 vtfiltergen -> vtfilter*
>
>lrwxrwxrwx 1 nmahesh nmahesh 12 Oct 18 15:34 vtfiltergen-mpi ->
>vtfilter-mpi*
>
>-rwxr-xr-x 1 nmahesh nmahesh 3100359 Oct 18 15:34 vtfilter-mpi*
>
>lrwxrwxrwx 1
You can try
configure --host=arm... CC=gcc_cross_compiler CXX=g++_cross_compiler
On Tuesday, October 18, 2016, Mahesh Nanavalla <
mahesh.nanavalla...@gmail.com> wrote:
> Hi all,
>
> How to cross compile *openmpi *for* arm *on* x86_64 pc.*
>
> *Kindly provide configure options for above...*
>
> Th
or
> must be in the same group. Is my understanding incorrect?
>
>
>
> Thanks
>
> Rick
>
>
>
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles
> Gouaillardet
> Sent: Monday, October 17, 2016 10:38 AM
>
>
> To: Open MPI Users
; > --host=arm-openwrt-linux-muslgnueabi
>>> > > --enable-script-wrapper-compilers
>>> > > --disable-mpi-fortran
>>> > > --enable-shared
>>> > > --disable-dlopen
>>> > >
>>> > > it's configured ,make &
*--disable-vt *while configuring the openmpi-1.10.3 ,
>
> the size of the installation directory reduced 70MB to 9MB.
>
> will it effect anything?
>
>
> On Wed, Oct 19, 2016 at 4:06 PM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com
> > wrote:
>
>> vt
Rick,
if you are using an inter communicator, please refer to the man page (a
group has to use MPI_ROOT as the root argument)
Cheers,
Gilles
On 10/21/2016 3:36 PM, George Bosilca wrote:
Rick,
There are few requirements to use any communicator in a collective
operation (including MPI_R
to ask - are there any new solutions or investigations in
> this problem?
>
> Cheers,
>
> Matus Dobrotka
>
> 2016-07-19 15:23 GMT+02:00 Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com
> >:
>
>> my bad for the confusion,
>>
>> I misr
Thanks Nicolas,
master has been patched, and i will make the PR for the release branches
Cheers,
Gilles
On 10/23/2016 8:28 PM, Nicolas Joly wrote:
Hi,
Just noticed a typo in MPI_Info_get_nkeys/MPI_Info_get_nthkey man
pages. The last cross-reference is 'MPI_Info_get_valueln' where it
shou
Justin,
iirc, NVML is only used by hwloc (e.g. not by CUDA) and there is no real
benefit for having that.
as a workaround, you can
export enable_nvml=no
and then configure && make install
Cheers,
Gilles
On 10/20/2016 12:49 AM, Jeff Squyres (jsquyres) wrote:
Justin --
Fair point. Can y
.
Could be shell variable such as HWLOC_DISABLE_NVML=yes for all our major
configured dependencies.
Brice
Le 24/10/2016 02:12, Gilles Gouaillardet a écrit :
Justin,
iirc, NVML is only used by hwloc (e.g. not by CUDA) and there is no
real benefit for having that.
as a workaround, you can
Jack,
It looks like a bug in vt configury
If you do not need vt, then you can configure with --disable-vt
(Fwiw, vt has been removed from Open MPI 2.0)
If you need vt, you might be lucky with
export with_cuda=no
configure ...
Cheers,
Gilles
Jack Stalnaker wrote:
>How do I prevent the build f
From the MPI 3.1 standard (page 338)
Rationale. The C bindings of MPI_ALLOC_MEM and MPI_FREE_MEM are similar
to the bindings for the malloc and free C library calls: a call to
MPI_Alloc_mem(: : :, &base) should be paired with a call to
MPI_Free_mem(base) (one
less level of indirection). Both
Jeff,
Out of curiosity, did you compile the Fortran test program with -O0 ?
Cheers,
Gilles
Tom Rosmond wrote:
>Jeff,
>
>Thanks for looking at this. I know it isn't specific to Open-MPI, but it is a
>frustrating issue vis-a-vis MPI and Fortran. There are many very large MPI
>applications ar
Siegmar,
The fix is in the pipe.
Meanwhile, you can download it at
https://github.com/open-mpi/ompi/pull/2295.patch
Cheers,
Gilles
Siegmar Gross wrote:
>Hi,
>
>I tried to install openmpi-v2.0.1-130-gb3a367d on my "SUSE Linux
>Enterprise Server 12.1 (x86_64)" with Sun C 5.14 beta. Unfortunatel
Tom,
regardless the (lack of) memory model in Fortran, there is an error in
testmpi3.f90
shar_mem is declared as an integer, and hence is not in the shared memory.
i attached my version of testmpi3.f90, which behaves just like the C
version,
at least when compiled with -g -O0 and with Ope
Sergei,
is there any reason why you configure with --with-verbs-libdir=/usr/lib ?
as far as i understand, --with-verbs should be enough, and /usr/lib
nor /usr/local/lib should ever be used in the configure command line
(and btw, are you running on a 32 bits system ? should the 64 bits
libs be in /
Did you strip the libraries already ?
the script will show the list of frameworks and components used by MPI
helloworld.
from that, you can deduce a list of components that are not required,
exclude them via the configure command line, and rebuild a trimmed Open MPI.
note this is pretty pa
Mahmood,
did you build Open MPI as a static only library ?
i guess the -ldl position is wrong. your link command line should be
mpifort -O3 -o xCbtest blacstest.o btprim.o tools.o Cbt.o
../../libscalapack.a -ldl
you can manually
mpifort -O3 -o xCbtest --showme blacstest.o btprim.o tools.
til
I don't see what you said after "-lopen-pal". Is that OK?
Regards,
Mahmood
On Fri, Nov 4, 2016 at 10:23 AM, Gilles Gouaillardet
mailto:gil...@rist.or.jp>> wrote:
Mahmood,
did you build Open MPI as a static only library ?
i guess the -ldl position is
On Fri, Nov 4, 2016 at 11:02 AM, Gilles Gouaillardet
mailto:gil...@rist.or.jp>> wrote:
Yes, that is a problem :-(
you might want to reconfigure with
--enable-static --disable-shared --disable-dlopen
and see if it helps
or you can simply manuall edit
/opt/o
You might have to remove -ldl from the scalapack makefile
If it still does not work, can you please post
mpirun --showme ...
output ?
Cheers,
Gilles
On Friday, November 4, 2016, Mahmood Naderan wrote:
> Hi Gilles,
> I noticed that /opt/openmpi-2.0.1/share/openmpi/mpifort-wrapper-data.txt
> is
As long as you run 3 MPI tasks, both options will produce the same mapping.
If you want to run up to 12 tasks, then --map-by node is the way to go
Mahesh Nanavalla wrote:
>s...
>
>
>Thanks for responding me.
>
>i have solved that as below by limiting slots in hostfile
>
>
>root@OpenWrt:~#
Hi,
note your printf line is missing.
if you printf l_prev, then the valgrind error occurs in all variants
at first glance, it looks like a false positive, and i will investigate it
Cheers,
Gilles
On Sat, Nov 5, 2016 at 7:59 PM, Yvan Fournier wrote:
> Hello,
>
> I have observed what seems to
les
On Sat, Nov 5, 2016 at 9:48 PM, Gilles Gouaillardet
wrote:
> Hi,
>
> note your printf line is missing.
> if you printf l_prev, then the valgrind error occurs in all variants
>
> at first glance, it looks like a false positive, and i will investigate it
>
>
> Cheers,
&
so it seems we took some shortcuts in pml/ob1
the attached patch (for the v1.10 branch) should fix this issue
Cheers
Gilles
On Sat, Nov 5, 2016 at 10:08 PM, Gilles Gouaillardet
wrote:
> that really looks like a bug
>
> if you rewrite your program with
>
> MPI_Sendrecv
Hi,
master() never MPI_Bcast(dojob=2, c...), hence the hang
Cheers,
Gilles
On Tuesday, November 8, 2016, Baris Kececi via users <
users@lists.open-mpi.org> wrote:
> Hi friends,
> I'm trying to write a simple parallel master/slave program.
> Here in my program the task of master is to distribut
Joseph,
thanks for the report, this is a real memory leak.
i fixed it in master, and the fix is now being reviewed.
meanwhile, you can manually apply the patch available at
https://github.com/open-mpi/ompi/pull/2418.patch
Cheers,
Gilles
On Tue, Nov 15, 2016 at 1:52 AM, Joseph Schuchart wrote:
Julien,
first, make sure you are using the Open MPI wrapper
which mpifort
should be /usr/lib/openmpi/bin if i understand correctly
then make sure you exported your LD_LIBRARY_PATH *after* you prepended
the path to Open MPI lib
in your .bashrc you can either
LD_LIBRARY_PATH=/usr/lib/openmpi/lib:$LD
002b5a23666000)
> libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1
> (0x2b5a238a)
> libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7
> (0x2b5a23aac000)
>
>
> Thanks.
>
>
>
> Julien
>
>
>
> 2016-11-15 23:52 GMT-05:00 Gille
Hi,
With ddt, you can do offline debugging just to get where the program crashes
ddt -n 8 --offline a.out ...
You might also wanna try the reverse connect feature
Cheers,
Gilles
"Beheshti, Mohammadali" wrote:
>Hi Gus,
>
>Thank you very much for your prompt response. The myjob.sh script is as
Hi,
at first, you might want to remove path related to Intel MPI runtime
(e.g. /opt/intel/composer_xe_2013_sp1.2.144/mpirt/lib/intel64) from
your environment
if you are using bash, double check you
export LD_LIBRARY_PATH
(otherwise echo $LD_LIBRARY_PATH and the linker see different things)
then
Sebastian,
The error message is pretty self-explanatory
/usr/mpi/gcc/openmpi-1.8.8/bin/orted is missing on your compute nodes.
it seems you are using /usr/mpi/gcc/openmpi-1.8.8/bin/mpirun on your
frontend node
(e.g. the node on which mpirun is invoked)
but Open MPI was not updated on some nodes l
Yann,
this is a bug that was previously reported, and the fix is pending on
review.
meanwhile, you can manually apply the patch available at
https://github.com/open-mpi/ompi/pull/2418
Cheers,
Gilles
On 11/18/2016 9:34 PM, Yann Jobic wrote:
Hi,
I'm using valgrind 3.12 with openmpi 2.
properly free'd at the end. Any chance this can be
filtered using the suppression file?
Cheers,
Joseph
On 11/15/2016 04:39 PM, Gilles Gouaillardet wrote:
Joseph,
thanks for the report, this is a real memory leak.
i fixed it in master, and the fix is now being reviewed.
meanwhile, yo
Christoph,
out of curiosity, could you try to
mpirun --mca coll ^tuned ...
and see if it helps ?
Cheers,
Gilles
On Tue, Nov 22, 2016 at 7:21 PM, Christof Koehler
wrote:
> Hello again,
>
> I tried to replicate the situation on the workstation at my desk,
> running ubuntu 14.04 (gcc 4.8.4) and
201 - 300 of 1070 matches
Mail list logo