Have you checked your ssh between nodes?
Also how is your Path set up?
There is a difference between interactive and non interactive login sessions
A. Construct a hosts file and mpirun by hand
B. Use modules rather than. Bashrc files
On Thu, 25 Jul 2019, 18:00 David Laidlaw
Not really a relevant reply, however Nomad has task drivers for Docker and
I'm not sure if it woul dbe easier to set up an MPI enviroment with Nomad
On Thu, 11 Jul 2019 at 11:08, Adrian Reber via
Errr.. you chave dropped caches? echo 3 > /proc/sys/vm/drop_caches
On Thu, 20 Jun 2019 at 15:59, Yann Jobic via users
> Le 6/20/2019 à 3:31 PM, Noam Bernstein via users a écrit :
> >> On Jun 20, 2019, at 4:44 AM, Charles A Taylor >> > wrote:
The kernel using memory is why I suggested running slabtop, to see the
kernel slab allocations.
Clearly I Was barking up a wrong tree there...
On Thu, 20 Jun 2019 at 14:41, Jeff Squyres (jsquyres) via users <
> On Jun 20, 2019, at 9:31 AM, Noam Bernstein via
Noam, it may be a stupid question. Could you try runningslabtop ss
the program executes
Also 'watch cat /proc/meminfo'is also a good diagnostic
On Wed, 19 Jun 2019 at 18:32, Noam Bernstein via users <
> Hi - we’re having a weird problem with OpenMPI on
I would do the normal things. Log into those nodes. Run dmesg and look at
Look at the Slurm log on the node and look for the job ending.
Also look at the sysstat files and see if there was a lot of memory being
On Wed, 17 Apr
Sorry if I am being stupid, Serdar might also have to set the location for
the includes by setting MPI_INC
On Wed, 23 Jan 2019 at 14:47, Ralph H Castain wrote:
> Your PATH and LD_LIBRARY_PATH setting is incorrect. You installed OMPI
> into $HOME/openmpi, so you should have done:
On that system please tell us what these return:
On Wed, 10 Oct 2018 at 12:49, John Hearns wrote:
> Noam, what does ompi_info say - specifically which BTLs are available?
> Stupid question though - this is a single system with no connection to a
Noam, what does ompi_info say - specifically which BTLs are available?
Stupid question though - this is a single system with no connection to a switch?
You probably dont have an OpenSM subnet manager running then - could
that be the root cause?
On Wed, 10 Oct 2018 at 09:53, Dave Love wrote:
> still have one error message left from MPI:
> mca_base_component_repository_open: unable to open mca_btl_openib:
> libibverbs.so.1: cannot open shared object file: No such file or directory
> Please let me know if you have any suggestions.
> May you please tell me how to check whether the batch system
Michele one tip: log into a compute node using ssh and as your own username.
If you use the Modules envirnonment then load the modules you use in
the job script
then use the ldd utility to check if you can load all the libraries
in the code.io executable
Actually you are better to submit a
Oleg, I can build the latest master branch of OpenMPI in WSL
I can give it a try with 3.1.2 if that is any help to you?
Linux Johns-Spectre 4.4.0-17134-Microsoft #285-Microsoft Thu Aug 30
17:31:00 PST 2018 x86_64 x86_64 x86_64 GNU/Linux
apt-get install gfortran
Oleg, I have a Windows 10 system and could help by testing this also.
But I have to say - it will be quicker just to install VirtualBox and
a CentOS VM. Or an Ubuntu VM.
You can then set up a small test network of VMs using the VirtualBox
HostOnly network for tests of your MPI code.
On Wed, 19
Ben, ping me off list. I know the guy who heads the HPC Solutions
Architect team for AWS and an AWS Solutions Architect here in the UK.
On Fri, 7 Sep 2018 at 03:11, Benjamin Brock wrote:
> I'm setting up a cluster on AWS, which will have a 10Gb/s or 25Gb/s Ethernet
> network. Should I expect
I am sorry but you have different things here. PBS is a resource allocation
system. It will reserve the use of a compute server, or several compute
servers, for you to run your parallel job on. PBS can launch the MPI job -
there are several mechanisms for launching parallel jobs.
MPI is an
Forgive me for chipping in here. There is definitely a momentum behind the
ARM architecture in HPC.
However it seems to me that there are a lot of architectures under the
Does anyone have a simplified guide to what they all mean?
On 30 May 2018 at 02:26, Gilles Gouaillardet
n open the opensmd
> service because it seems unnecessary in this situation. Can this be the
> reason why IB performs poorer?
> Interconnection details are in the attachment.
> Best Regards,
> Xie Bin
> John Hearns via users <email@example.com
Xie Bin, I do hate to ask this. You say "in a two-node cluster (IB
Does that mean that you have no IB switch, and that there is a single IB
cable joining up these two servers?
If so please run:ibstatusibhosts ibdiagnet
I am trying to check if the IB fabric is
One very, very stupid question here. This arose over on the Slurm list
Those hostnames look like quite generic names, ie they are part of an HPC
Do they happen to have independednt home directories for your userid?
Could that possibly make a difference to the MPI launcher?
Ankita, looks like your program is not launching correctly.
I would try the following:
define two hosts in a machinefile. Use mpirun -np 2 machinefile date
Ie can you use mpirun just to run the command 'date'
Secondly compile up and try to run an MPI 'Hello World' program
On 9 May 2018 at
Peter is correct. We need to find out what K is.
But we may never find out https://en.wikipedia.org/wiki/The_Trial
It would be fun if we could get some real-world dimesnions here and some
What range of numbers are these also?
On 2 May 2018 at 15:21, Peter Kjellström
Pierre, I may not be able to help you directly. But I had better stop
listening to the voices.
Mail me off list please.
This might do the trick using Julia
On 2 May 2018 at 14:11, John Hearns wrote:
> Also my inner voice is
Also my inner voice is shouting that there must be an easy way to express
this in Julia
OK, these are not the same stepwise cumulative operatiosn that you want,
but the idea is close.
ps. Note to self - stop listening
Peter, how large are your models, ie how many cells in each direction?
Something inside of me is shouting that if the models are small enough then
MPI is not the way here.
Assuming use of a Xeon processor there should be some AVX instructions
which can do this.
This is rather out of date, but is
> so total 64 processes.
> On Wed, Apr 25, 2018 at 2:57 PM, John Hearns via users <
> firstname.lastname@example.org> wrote:
>> I do not see much wrong with that.
>> However nodes=4 ppn=2 makes 8 processes in all.
>> You are using mpirun -np 64
3:13 PM, Ankita m <ankitamait...@gmail.com> wrote:
>> i have 16 cores per one node. I usually use 4 node each node has 16 cores
>> so total 64 processes.
>> On Wed, Apr 25, 2018 at 2:57 PM, John Hearns via users <
>> email@example.com> wrote:
I do not see much wrong with that.
However nodes=4 ppn=2 makes 8 processes in all.
You are using mpirun -np 64
Actually it is better practice to use the PBS supplied environment
variables during the job, rather than hard-wiring 64
I dont have access to a PBS cluster from my desk at the
Ankita, this is problem with your batch queuing system. Do you know which
batch system you are using on this cluster?
Can you share with us what command you use to submit a job?
Also please do not share your teamviewer password with us. I doubt this is
of much use to anyone, but...
On 25 April
> 900 University Avenue, Riverside, CA 92521
> On Tue, Mar 20, 2018 at 10:46 AM, John Hearns via users <
> firstname.lastname@example.org> wrote:
>> "It does not handle more recent improvements such as Intel's turbo
>> mode and the proce
"It does not handle more recent improvements such as Intel's turbo
mode and the processor performance inhomogeneity that comes with it."
I guess it is easy enough to disable Turbo mode in the BIOS though.
On 20 March 2018 at 17:48, Kaiming Ouyang wrote:
> I think the problem
Abhisek ... Gilles asked which program you re trying to run, and how it
was linked with OpenMPI
Also please realise that you do not HAVE to use the openmpi packages
provided by your linux distribution.
It is perfectly OK to download, compile and install another version.
On 13 October 2017 at
ps. Before you do the reboot of a compute node, have you run 'ibdiagnet' ?
On 28 September 2017 at 11:17, John Hearns wrote:
> Google turns this up:
> On 28 September 2017 at 01:26, Ludovic Raess
Google turns this up:
On 28 September 2017 at 01:26, Ludovic Raess wrote:
> we have a issue on our 32 nodes Linux cluster regarding the usage of Open
> MPI in a Infiniband dual-rail configuration (2 IB
This might be of interest for ARM users:
On 27 September 2017 at 06:58, Gilles Gouaillardet <
> which OS are you running ?
> iirc, i faced similar issues,
Jeff, from what I read yesterday it is OpenMPI 2 , I am not sure of the
I do acknowledge that Mahmood reports that the Rocks 7 beta is available -
when I last used Rocks this was not avaiable.
But still - look at something more up to date, such as OpenHPC.
There is nothing
Then let me add in my thoughts please.. Rocks is getting out of date.
Mahmood, I would imagine that you are not given the choice of installing
something more modern,
ie the place where you work has an existing Rocks cluster and is unwilling
to re-install it.
So what is wrong with using the
Gary, are you using Modules?
On 22 August 2017 at 02:04, Gilles Gouaillardet wrote:
> one option (as mentioned in the error message) is to configure Open MPI
> with --enable-orterun-prefix-by-default.
System image GUID: 0x248a0703005abb30
> Port 1:
> State: Down
> Physical state: Disabled
> Rate: 100
> Base lid: 0
> LMC: 0
> SM lid: 0
> Capability mask: 0x3c01
> Port GUID: 0x268a07fffe5abb31
> Link layer:
ABoris, as Gilles says - first do som elower level checkouts of your
I suggest running:
and then as Gilles says 'ibstat' on each node
On 14 July 2017 at 03:58, Gilles Gouaillardet wrote:
> Open MPI should automatically
I may have asked this recently (if so sorry).
If anyoen has worked with QoS settings with OpenMPI please ping me off list,
mpirun --mca btl_openib_ib_service_level N
users mailing list
> You can add "OMPI_MCA_plm=rsh OMPI_MCA_sec=^munge” to your environment
> On Jun 22, 2017, at 7:28 AM, John Hearns via users <
> email@example.com> wrote:
> Michael, try
> --mca plm_rsh_agent ssh
> I've been fooling wit
--mca plm_rsh_agent ssh
I've been fooling with this myself recently, in the contect of a PBS cluster
On 22 June 2017 at 16:16, Michael Di Domenico
> is it possible to disable slurm/munge/psm/pmi(x) from the mpirun
> command line or (better) using
>>>> Hi GIlles,
>>>> using your command with one MPI procs I get:
>>>> findActiveDevices Error
>>>> We found no active IB
he error message.
>>> 2017-05-19 9:10 GMT+02:00 Gilles Gouaillardet <gil...@rist.or.jp
>>> so it seems pml/pami
is an infiniband card available (!)
>>> i guess IBM folks will comment on that shortly.
>>> meanwhile, you do not need pami since you are running on a single node
>>> mpirun --mca pml ^pami ...
remember that Infiniband is not Ethernet. You dont NEED to set up IPOIB
Two diagnostics please for you to run:
Let us please have the reuslts ofibnetdiscover
On 19 May 2017 at 09:25, John Hearns wrote:
if the host 'smd' is acting as a cluster head node it is not a must for it
to have an Infiniband card.
So you should be able to run jobs across the other nodes, which have Qlogic
I may have something mixed up here, if so I am sorry.
If you want also to run jobs on the smd
> (if it does not work, can run and post the logs)
> mpirun --mca pml ^pami --mca pml_base_verbose 100 ...
> On 5/19/2017 4:01 PM, Gabriele Fatigati wrote:
>> Hi John,
>> Infiniband is not used, there is a s
Gabriele, pleae run 'ibv_devinfo'
It looks to me like you may have the physical interface cards in these
systems, but you do not have the correct drivers or libraries loaded.
I have had similar messages when using Infiniband on x86 systems - which
did not have libibverbs installed.
On 19 May
> MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v10.1.0)
> about mtl no information retrieve ompi_info
> 2017-05-18 14:13 GMT+02:00 John Hearns via users <firstname.lastname@example.org
>> Gabriele, as this is based on OpenMPI can
Gabriele, as this is based on OpenMPI can you run ompi_info
then look for the btl which are available and the mtl which are available?
On 18 May 2017 at 14:10, Reuti wrote:
> > Am 18.05.2017 um 14:02 schrieb Gabriele Fatigati :
Ray, probably a stupid question but do you have the hwloc-devel package
And also the libxml2-devel package?
On 27 April 2017 at 21:54, Ray Sheppard wrote:
> Hi All,
> I have searched the mail archives because I think this issue was
> addressed earlier, but I can
this is not an answer to your question. However have you looked at
On 24 March 2017 at 08:54, Jordi Guitart wrote:
> Docker allows several containers running in the same host to share the
> same IPC namespace,
Mahmoud, you should look at the OpenHPC project.
On 15 December 2016 at 19:50, Mahmoud MIRZAEI wrote:
> May you please let me know if there is any procedure to install OpenMPI on
> CentOS in HPC?
can you run :
Lord help me for being so naive, but do you have a subnet manager running?
On 1 November 2016 at 06:40, Sergei Hrushev wrote:
> Hi Jeff !
> What does "ompi_info | grep openib" show?
> $ ompi_info | grep openib
Sorry - shoot down my idea. Over to someone else (me hides head in shame)
On 28 October 2016 at 11:28, Sergei Hrushev wrote:
> Sergei, what does the command "ibv_devinfo" return please?
>> I had a recent case like this, but on Qlogic hardware.
>> Sorry if I am mixing
Sergei, what does the command "ibv_devinfo" return please?
I had a recent case like this, but on Qlogic hardware.
Sorry if I am mixing things up.
On 28 October 2016 at 10:48, Sergei Hrushev wrote:
> Hello, All !
> We have a problem with OpenMPI version 1.10.2 on a
Thankyou. That is helpful.
Could you run an 'ldd' on your executable, on one of the compute nodes if
I will nto be able to solve your problem, but at least we now know what the
and can look at the libraries it is using.
On 2 September 2016 at 17:19, Mahmood Naderan
are you compiling and linking this application?
Or are you using an executable which someone else has prepared?
It would be very useful if we could know the application.
On 2 September 2016 at 16:35, Mahmood Naderan wrote:
> >Did you ran
> >ulimit -c unlimited
Mahmood, as Giles says start by looking at how that application is compiled
Run 'ldd' on the executable and look closely at the libraries. Do this on
a compute node if you can.
There was a discussion on another mailign list recently about how to
fingerpritn executables and see which
Hello Lachlan. I think Jeff Squyres will be along in a short while! HE is
of course the expert on Cisco.
In the meantime a quick Google turns up:
Mail list logo