Re: [OMPI users] job aborts "readv failed: Connection reset by peer"

2016-09-02 Thread John Hearns via users
Thankyou. That is helpful. Could you run an 'ldd' on your executable, on one of the compute nodes if possible? I will nto be able to solve your problem, but at least we now know what the application is, and can look at the libraries it is using. On 2 September 2016 at 17:19, Mahmood Naderan

Re: [OMPI users] job aborts "readv failed: Connection reset by peer"

2016-09-02 Thread John Hearns via users
Mahmood, are you compiling and linking this application? Or are you using an executable which someone else has prepared? It would be very useful if we could know the application. On 2 September 2016 at 16:35, Mahmood Naderan wrote: > >Did you ran > >ulimit -c unlimited

Re: [OMPI users] New to (Open)MPI

2016-09-02 Thread John Hearns via users
Hello Lachlan. I think Jeff Squyres will be along in a short while! HE is of course the expert on Cisco. In the meantime a quick Google turns up: http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/usnic/c/deployment/2_0_X/b_Cisco_usNIC_Deployment_Guide_For_Standalone_C-SeriesServers.html

Re: [OMPI users] job aborts "readv failed: Connection reset by peer"

2016-09-02 Thread John Hearns via users
Mahmood, as Giles says start by looking at how that application is compiled and linked. Run 'ldd' on the executable and look closely at the libraries. Do this on a compute node if you can. There was a discussion on another mailign list recently about how to fingerpritn executables and see which

Re: [OMPI users] OpenMPI + InfiniBand

2016-11-01 Thread John Hearns via users
Segei, can you run : ibhosts ibstat ibdiagnet Lord help me for being so naive, but do you have a subnet manager running? On 1 November 2016 at 06:40, Sergei Hrushev wrote: > Hi Jeff ! > > What does "ompi_info | grep openib" show? >> >> > $ ompi_info | grep openib >

Re: [OMPI users] OpenMPI + InfiniBand

2016-10-28 Thread John Hearns via users
Sorry - shoot down my idea. Over to someone else (me hides head in shame) On 28 October 2016 at 11:28, Sergei Hrushev wrote: > Sergei, what does the command "ibv_devinfo" return please? >> >> I had a recent case like this, but on Qlogic hardware. >> Sorry if I am mixing

Re: [OMPI users] OpenMPI + InfiniBand

2016-10-28 Thread John Hearns via users
Sergei, what does the command "ibv_devinfo" return please? I had a recent case like this, but on Qlogic hardware. Sorry if I am mixing things up. On 28 October 2016 at 10:48, Sergei Hrushev wrote: > Hello, All ! > > We have a problem with OpenMPI version 1.10.2 on a

Re: [OMPI users] install OpenMPI on CentOS in HPC

2016-12-18 Thread John Hearns via users
Mahmoud, you should look at the OpenHPC project. http://www.openhpc.community/ On 15 December 2016 at 19:50, Mahmoud MIRZAEI wrote: > Dears, > > May you please let me know if there is any procedure to install OpenMPI on > CentOS in HPC? > > Thanks. > Mahmoud > > > >

Re: [OMPI users] Communicating MPI processes running in Docker containers in the same host by means of shared memory?

2017-03-24 Thread John Hearns via users
Jordi, this is not an answer to your question. However have you looked at Singularity: http://singularity.lbl.gov/ On 24 March 2017 at 08:54, Jordi Guitart wrote: > Hello, > > Docker allows several containers running in the same host to share the > same IPC namespace,

Re: [OMPI users] Q: Basic invoking of InfiniBand with OpenMPI

2017-07-14 Thread John Hearns via users
ABoris, as Gilles says - first do som elower level checkouts of your Infiniband network. I suggest running: ibdiagnet ibhosts and then as Gilles says 'ibstat' on each node On 14 July 2017 at 03:58, Gilles Gouaillardet wrote: > Boris, > > > Open MPI should automatically

Re: [OMPI users] Q: Basic invoking of InfiniBand with OpenMPI

2017-07-17 Thread John Hearns via users
System image GUID: 0x248a0703005abb30 > Port 1: > State: Down > Physical state: Disabled > Rate: 100 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x3c01 > Port GUID: 0x268a07fffe5abb31 > Link layer:

Re: [OMPI users] Basic build trouble on RHEL7

2017-04-27 Thread John Hearns via users
Ray, probably a stupid question but do you have the hwloc-devel package installed? And also the libxml2-devel package? On 27 April 2017 at 21:54, Ray Sheppard wrote: > Hi All, > I have searched the mail archives because I think this issue was > addressed earlier, but I can

Re: [OMPI users] disable slurm/munge from mpirun

2017-06-22 Thread John Hearns via users
Michael, try --mca plm_rsh_agent ssh I've been fooling with this myself recently, in the contect of a PBS cluster On 22 June 2017 at 16:16, Michael Di Domenico wrote: > is it possible to disable slurm/munge/psm/pmi(x) from the mpirun > command line or (better) using

Re: [OMPI users] disable slurm/munge from mpirun

2017-06-22 Thread John Hearns via users
@open-mpi.org> wrote: > You can add "OMPI_MCA_plm=rsh OMPI_MCA_sec=^munge” to your environment > > > On Jun 22, 2017, at 7:28 AM, John Hearns via users < > users@lists.open-mpi.org> wrote: > > Michael, try > --mca plm_rsh_agent ssh > > I've been fooling wit

[OMPI users] Openmpi with btl_openib_ib_service_level

2017-06-22 Thread John Hearns via users
I may have asked this recently (if so sorry). If anyoen has worked with QoS settings with OpenMPI please ping me off list, eg mpirun --mca btl_openib_ib_service_level N ___ users mailing list users@lists.open-mpi.org

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread John Hearns via users
t;>>> <mailto:g.fatig...@cineca.it>>: >>>> >>>> Hi GIlles, >>>> >>>> using your command with one MPI procs I get: >>>> >>>> findActiveDevices Error >>>> We found no active IB

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread John Hearns via users
he error message. >>> >>> >>> 2017-05-19 9:10 GMT+02:00 Gilles Gouaillardet <gil...@rist.or.jp >>> <mailto:gil...@rist.or.jp>>: >>> >>> Gabriele, >>> >>> >>> so it seems pml/pami

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread John Hearns via users
Gabriele, pleae run 'ibv_devinfo' It looks to me like you may have the physical interface cards in these systems, but you do not have the correct drivers or libraries loaded. I have had similar messages when using Infiniband on x86 systems - which did not have libibverbs installed. On 19 May

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-18 Thread John Hearns via users
Gabriele, as this is based on OpenMPI can you run ompi_info then look for the btl which are available and the mtl which are available? On 18 May 2017 at 14:10, Reuti wrote: > Hi, > > > Am 18.05.2017 um 14:02 schrieb Gabriele Fatigati : > > >

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-18 Thread John Hearns via users
0.1.0) > MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v10.1.0) > > > about mtl no information retrieve ompi_info > > > 2017-05-18 14:13 GMT+02:00 John Hearns via users <users@lists.open-mpi.org > >: > >> Gabriele, as this is based on OpenMPI can

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread John Hearns via users
ck > > (if it does not work, can run and post the logs) > > mpirun --mca pml ^pami --mca pml_base_verbose 100 ... > > > Cheers, > > > Gilles > > > On 5/19/2017 4:01 PM, Gabriele Fatigati wrote: > >> Hi John, >> Infiniband is not used, there is a s

Re: [OMPI users] Many different errors with ompi version 2.1.1

2017-05-19 Thread John Hearns via users
Giles, Allan, if the host 'smd' is acting as a cluster head node it is not a must for it to have an Infiniband card. So you should be able to run jobs across the other nodes, which have Qlogic cards. I may have something mixed up here, if so I am sorry. If you want also to run jobs on the smd

Re: [OMPI users] Many different errors with ompi version 2.1.1

2017-05-19 Thread John Hearns via users
Allan, remember that Infiniband is not Ethernet. You dont NEED to set up IPOIB interfaces. Two diagnostics please for you to run: ibnetdiscover ibdiagnet Let us please have the reuslts ofibnetdiscover On 19 May 2017 at 09:25, John Hearns wrote: > Giles,

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread John Hearns via users
is an infiniband card available (!) >>> >>> i guess IBM folks will comment on that shortly. >>> >>> >>> meanwhile, you do not need pami since you are running on a single node >>> >>> mpirun --mca pml ^pami ... >>> >&g

Re: [OMPI users] mpif90 unable to find ibverbs

2017-09-14 Thread John Hearns via users
Then let me add in my thoughts please.. Rocks is getting out of date. Mahmood, I would imagine that you are not given the choice of installing something more modern, ie the place where you work has an existing Rocks cluster and is unwilling to re-install it. So what is wrong with using the

Re: [OMPI users] mpif90 unable to find ibverbs

2017-09-14 Thread John Hearns via users
Jeff, from what I read yesterday it is OpenMPI 2 , I am not sure of the minor version. I do acknowledge that Mahmood reports that the Rocks 7 beta is available - when I last used Rocks this was not avaiable. But still - look at something more up to date, such as OpenHPC. There is nothing

Re: [OMPI users] Fwd: MCA version error

2017-10-13 Thread John Hearns via users
Abhisek ... Gilles asked which program you re trying to run, and how it was linked with OpenMPI Also please realise that you do not HAVE to use the openmpi packages provided by your linux distribution. It is perfectly OK to download, compile and install another version. On 13 October 2017 at

Re: [OMPI users] Open MPI internal error

2017-09-28 Thread John Hearns via users
Google turns this up: https://groups.google.com/forum/#!topic/ulfm/OPdsHTXF5ls On 28 September 2017 at 01:26, Ludovic Raess wrote: > Hi, > > > we have a issue on our 32 nodes Linux cluster regarding the usage of Open > MPI in a Infiniband dual-rail configuration (2 IB

Re: [OMPI users] Open MPI internal error

2017-09-28 Thread John Hearns via users
ps. Before you do the reboot of a compute node, have you run 'ibdiagnet' ? On 28 September 2017 at 11:17, John Hearns wrote: > > Google turns this up: > https://groups.google.com/forum/#!topic/ulfm/OPdsHTXF5ls > > > On 28 September 2017 at 01:26, Ludovic Raess

Re: [OMPI users] Setting LD_LIBRARY_PATH for orted

2017-08-22 Thread John Hearns via users
Gary, are you using Modules? http://www.admin-magazine.com/HPC/Articles/Environment-Modules On 22 August 2017 at 02:04, Gilles Gouaillardet wrote: > Gary, > > > one option (as mentioned in the error message) is to configure Open MPI > with --enable-orterun-prefix-by-default.

Re: [OMPI users] Error building openmpi on Raspberry pi 2

2017-09-27 Thread John Hearns via users
This might be of interest for ARM users: https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc On 27 September 2017 at 06:58, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Faraz, > > which OS are you running ? > > iirc, i faced similar issues,

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-14 Thread John Hearns via users
One very, very stupid question here. This arose over on the Slurm list actually. Those hostnames look like quite generic names, ie they are part of an HPC cluster? Do they happen to have independednt home directories for your userid? Could that possibly make a difference to the MPI launcher? On

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-14 Thread John Hearns via users
Xie Bin, I do hate to ask this. You say "in a two-node cluster (IB direcet-connected). " Does that mean that you have no IB switch, and that there is a single IB cable joining up these two servers? If so please run:ibstatusibhosts ibdiagnet I am trying to check if the IB fabric is

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-15 Thread John Hearns via users
n open the opensmd > service because it seems unnecessary in this situation. Can this be the > reason why IB performs poorer? > > Interconnection details are in the attachment. > > > > Best Regards, > > Xie Bin > > > John Hearns via users <users@lists.open-mpi.o

Re: [OMPI users] problem

2018-05-09 Thread John Hearns via users
Ankita, looks like your program is not launching correctly. I would try the following: define two hosts in a machinefile. Use mpirun -np 2 machinefile date Ie can you use mpirun just to run the command 'date' Secondly compile up and try to run an MPI 'Hello World' program On 9 May 2018 at

Re: [OMPI users] need help finding mpi for Raspberry pi Raspian Streach

2018-05-30 Thread John Hearns via users
Forgive me for chipping in here. There is definitely a momentum behind the ARM architecture in HPC. However it seems to me that there are a lot of architectures under the 'ARM' umbrella. Does anyone have a simplified guide to what they all mean? On 30 May 2018 at 02:26, Gilles Gouaillardet

Re: [OMPI users] MPI cartesian grid : cumulate a scalar value through the procs of a given axis of the grid

2018-05-02 Thread John Hearns via users
Peter, how large are your models, ie how many cells in each direction? Something inside of me is shouting that if the models are small enough then MPI is not the way here. Assuming use of a Xeon processor there should be some AVX instructions which can do this. This is rather out of date, but is

Re: [OMPI users] MPI cartesian grid : cumulate a scalar value through the procs of a given axis of the grid

2018-05-02 Thread John Hearns via users
Also my inner voice is shouting that there must be an easy way to express this in Julia https://discourse.julialang.org/t/apply-reduction-along-specific-axes/3301/16 OK, these are not the same stepwise cumulative operatiosn that you want, but the idea is close. ps. Note to self - stop listening

Re: [OMPI users] MPI cartesian grid : cumulate a scalar value through the procs of a given axis of the grid

2018-05-02 Thread John Hearns via users
Pierre, I may not be able to help you directly. But I had better stop listening to the voices. Mail me off list please. This might do the trick using Julia http://juliadb.org/latest/api/aggregation.html On 2 May 2018 at 14:11, John Hearns wrote: > Also my inner voice is

Re: [OMPI users] MPI cartesian grid : cumulate a scalar value through the procs of a given axis of the grid

2018-05-02 Thread John Hearns via users
Peter is correct. We need to find out what K is. But we may never find out https://en.wikipedia.org/wiki/The_Trial It would be fun if we could get some real-world dimesnions here and some real-world numbers. What range of numbers are these also? On 2 May 2018 at 15:21, Peter Kjellström

Re: [OMPI users] MPI advantages over PBS

2018-08-25 Thread John Hearns via users
Diego, I am sorry but you have different things here. PBS is a resource allocation system. It will reserve the use of a compute server, or several compute servers, for you to run your parallel job on. PBS can launch the MPI job - there are several mechanisms for launching parallel jobs. MPI is an

Re: [OMPI users] RDMA over Ethernet in Open MPI - RoCE on AWS?

2018-09-07 Thread John Hearns via users
Ben, ping me off list. I know the guy who heads the HPC Solutions Architect team for AWS and an AWS Solutions Architect here in the UK. On Fri, 7 Sep 2018 at 03:11, Benjamin Brock wrote: > > I'm setting up a cluster on AWS, which will have a 10Gb/s or 25Gb/s Ethernet > network. Should I expect

Re: [OMPI users] Fwd: problem in cluster

2018-04-25 Thread John Hearns via users
Ankita, this is problem with your batch queuing system. Do you know which batch system you are using on this cluster? Can you share with us what command you use to submit a job? Also please do not share your teamviewer password with us. I doubt this is of much use to anyone, but... On 25 April

Re: [OMPI users] Fwd: Fwd: problem in cluster

2018-04-25 Thread John Hearns via users
3:13 PM, Ankita m <ankitamait...@gmail.com> wrote: > >> i have 16 cores per one node. I usually use 4 node each node has 16 cores >> so total 64 processes. >> >> On Wed, Apr 25, 2018 at 2:57 PM, John Hearns via users < >> users@lists.open-mpi.org> wrote: >

Re: [OMPI users] Fwd: Fwd: problem in cluster

2018-04-25 Thread John Hearns via users
I do not see much wrong with that. However nodes=4 ppn=2 makes 8 processes in all. You are using mpirun -np 64 Actually it is better practice to use the PBS supplied environment variables during the job, rather than hard-wiring 64 I dont have access to a PBS cluster from my desk at the

Re: [OMPI users] Fwd: Fwd: problem in cluster

2018-04-25 Thread John Hearns via users
ores > so total 64 processes. > > On Wed, Apr 25, 2018 at 2:57 PM, John Hearns via users < > users@lists.open-mpi.org> wrote: > >> I do not see much wrong with that. >> However nodes=4 ppn=2 makes 8 processes in all. >> You are using mpirun -np 64 >

Re: [OMPI users] Old version openmpi 1.2 support infiniband?

2018-03-21 Thread John Hearns via users
California, Riverside > 900 University Avenue, Riverside, CA 92521 > > > On Tue, Mar 20, 2018 at 10:46 AM, John Hearns via users < > users@lists.open-mpi.org> wrote: > >> "It does not handle more recent improvements such as Intel's turbo >> mode and the proce

Re: [OMPI users] Old version openmpi 1.2 support infiniband?

2018-03-20 Thread John Hearns via users
"It does not handle more recent improvements such as Intel's turbo mode and the processor performance inhomogeneity that comes with it." I guess it is easy enough to disable Turbo mode in the BIOS though. On 20 March 2018 at 17:48, Kaiming Ouyang wrote: > I think the problem

Re: [OMPI users] Cannot run MPI code on multiple cores with PBS

2018-10-09 Thread John Hearns via users
libssl.so.0.9.8, I > still have one error message left from MPI: > > mca_base_component_repository_open: unable to open mca_btl_openib: > libibverbs.so.1: cannot open shared object file: No such file or directory > (ignored) > > Please let me know if you have any suggestions. >

Re: [OMPI users] no openmpi over IB on new CentOS 7 system

2018-10-10 Thread John Hearns via users
Noam, what does ompi_info say - specifically which BTLs are available? Stupid question though - this is a single system with no connection to a switch? You probably dont have an OpenSM subnet manager running then - could that be the root cause? On Wed, 10 Oct 2018 at 09:53, Dave Love wrote: > >

Re: [OMPI users] no openmpi over IB on new CentOS 7 system

2018-10-10 Thread John Hearns via users
On that system please tell us what these return: ibstat ibstatus sminfo ibdiagnet On Wed, 10 Oct 2018 at 12:49, John Hearns wrote: > > Noam, what does ompi_info say - specifically which BTLs are available? > Stupid question though - this is a single system with no connection to a > switch?

Re: [OMPI users] Cannot run MPI code on multiple cores with PBS

2018-10-04 Thread John Hearns via users
Michele one tip: log into a compute node using ssh and as your own username. If you use the Modules envirnonment then load the modules you use in the job script then use the ldd utility to check if you can load all the libraries in the code.io executable Actually you are better to submit a

Re: [OMPI users] Cannot run MPI code on multiple cores with PBS

2018-10-04 Thread John Hearns via users
PORT_URL="https://bugs.centos.org/; > > CENTOS_MANTISBT_PROJECT="CentOS-7" > CENTOS_MANTISBT_PROJECT_VERSION="7" > REDHAT_SUPPORT_PRODUCT="centos" > REDHAT_SUPPORT_PRODUCT_VERSION=“7" > > May you please tell me how to check whether the batch system

Re: [OMPI users] OpenMPI building fails on Windows Linux Subsystem(WLS).

2018-09-19 Thread John Hearns via users
Oleg, I have a Windows 10 system and could help by testing this also. But I have to say - it will be quicker just to install VirtualBox and a CentOS VM. Or an Ubuntu VM. You can then set up a small test network of VMs using the VirtualBox HostOnly network for tests of your MPI code. On Wed, 19

Re: [OMPI users] OpenMPI building fails on Windows Linux Subsystem(WLS).

2018-09-19 Thread John Hearns via users
Oleg, I can build the latest master branch of OpenMPI in WSL I can give it a try with 3.1.2 if that is any help to you? uname -a Linux Johns-Spectre 4.4.0-17134-Microsoft #285-Microsoft Thu Aug 30 17:31:00 PST 2018 x86_64 x86_64 x86_64 GNU/Linux apt-get upgrade apt-get install gfortran wget

Re: [OMPI users] Open MPI installation problem

2019-01-23 Thread John Hearns via users
Sorry if I am being stupid, Serdar might also have to set the location for the includes by setting MPI_INC On Wed, 23 Jan 2019 at 14:47, Ralph H Castain wrote: > Your PATH and LD_LIBRARY_PATH setting is incorrect. You installed OMPI > into $HOME/openmpi, so you should have done: > >

Re: [OMPI users] job termination

2019-04-17 Thread John Hearns via users
I would do the normal things. Log into those nodes. Run dmesg and look at /var/log/messages Look at the Slurm log on the node and look for the job ending. Also look at the sysstat files and see if there was a lot of memory being used http://sebastien.godard.pagesperso-orange.fr/ On Wed, 17 Apr

Re: [OMPI users] growing memory use from MPI application

2019-06-19 Thread John Hearns via users
Noam, it may be a stupid question. Could you try runningslabtop ss the program executes Also 'watch cat /proc/meminfo'is also a good diagnostic On Wed, 19 Jun 2019 at 18:32, Noam Bernstein via users < users@lists.open-mpi.org> wrote: > Hi - we’re having a weird problem with OpenMPI on

Re: [OMPI users] growing memory use from MPI application

2019-06-20 Thread John Hearns via users
Errr.. you chave dropped caches? echo 3 > /proc/sys/vm/drop_caches On Thu, 20 Jun 2019 at 15:59, Yann Jobic via users wrote: > Hi, > > Le 6/20/2019 à 3:31 PM, Noam Bernstein via users a écrit : > > > > > >> On Jun 20, 2019, at 4:44 AM, Charles A Taylor >> > wrote:

Re: [OMPI users] growing memory use from MPI application

2019-06-20 Thread John Hearns via users
The kernel using memory is why I suggested running slabtop, to see the kernel slab allocations. Clearly I Was barking up a wrong tree there... On Thu, 20 Jun 2019 at 14:41, Jeff Squyres (jsquyres) via users < users@lists.open-mpi.org> wrote: > On Jun 20, 2019, at 9:31 AM, Noam Bernstein via

Re: [OMPI users] How it the rank determined (Open MPI and Podman)

2019-07-11 Thread John Hearns via users
Not really a relevant reply, however Nomad has task drivers for Docker and Singularity https://www.hashicorp.com/blog/singularity-and-hashicorp-nomad-a-perfect-fit I'm not sure if it woul dbe easier to set up an MPI enviroment with Nomad though On Thu, 11 Jul 2019 at 11:08, Adrian Reber via

Re: [OMPI users] can't run MPI job under SGE

2019-07-25 Thread John Hearns via users
Have you checked your ssh between nodes? Also how is your Path set up? There is a difference between interactive and non interactive login sessions I advuse A. Construct a hosts file and mpirun by hand B. Use modules rather than. Bashrc files C. Slurm On Thu, 25 Jul 2019, 18:00 David Laidlaw