Re: [OMPI users] OpenMPI + InfiniBand

2016-11-01 Thread Jeff Squyres (jsquyres)
On Nov 1, 2016, at 2:40 AM, Sergei Hrushev wrote: > > Yes, I tried to get this info already. > And I saw in log that rdmacm wants IP address on port. > So my question in topc start message was: > Is it enough for OpenMPI to have RDMA only or IPoIB should also be > installed?

Re: [OMPI users] Redusing libmpi.so size....

2016-11-01 Thread Jeff Squyres (jsquyres)
by removing unwanted plugin's. > > Here libmpi.so.12.0.3 size is 2.4MB. > > How can i know what are the pluggin's included to build the libmpi.so.12.0.3 > and how can remove. > > Thanks, > Mahesh N > > On Fri, Oct 28, 2016 at 7:09 PM, Jeff Squyres (jsquyres) <jsqu

Re: [OMPI users] OpenMPI + InfiniBand

2016-10-31 Thread Jeff Squyres (jsquyres)
What does "ompi_info | grep openib" show? Additionally, Mellanox provides alternate support through their MXM libraries, if you want to try that. If that shows that you have the openib BTL plugin loaded, try running with "mpirun --mca btl_base_verbose 100 ..." That will provide additional

Re: [OMPI users] Redusing libmpi.so size....

2016-10-28 Thread Jeff Squyres (jsquyres)
On Oct 28, 2016, at 8:12 AM, Mahesh Nanavalla wrote: > > i have configured as below for arm > > ./configure --enable-orterun-prefix-by-default > --prefix="/home/nmahesh/Workspace/ARM_MPI/openmpi" > CC=arm-openwrt-linux-muslgnueabi-gcc

Re: [OMPI users] redeclared identifier for openmpi-v2.0.1-130-gb3a367d witj Sun C on Linux

2016-10-27 Thread Jeff Squyres (jsquyres)
This fix for this was just merged (we had previously fixed it in the v2.x branch, but neglected to also put it on the v2.0.x branch) -- it should be in tonight's tarball: https://github.com/open-mpi/ompi/pull/2295 > On Oct 27, 2016, at 6:45 AM, Siegmar Gross >

Re: [OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-19 Thread Jeff Squyres (jsquyres)
Justin -- Fair point. Can you work with Sylvain Jeaugey (at Nvidia) to submit a pull request for this functionality? Thanks. > On Oct 18, 2016, at 2:26 PM, Justin Luitjens wrote: > > After looking into this a bit more it appears that the issue is I am building > on a

Re: [OMPI users] Openmpi 2.0.1 build --with-psm2 failed on CentOS 7.2

2016-10-11 Thread Jeff Squyres (jsquyres)
Limin -- Can you send the items listed here: https://www.open-mpi.org/community/help/ > On Oct 11, 2016, at 4:00 PM, Cabral, Matias A > wrote: > > Hi Limin, > > psm2_mq_irecv2 should be in libpsm2.so. I’m not quite sure how CentOS packs > it so I would

Re: [OMPI users] Crash during MPI_Finalize

2016-10-11 Thread Jeff Squyres (jsquyres)
On Oct 11, 2016, at 8:58 AM, George Reeke wrote: > > George B. et al, > --Is it normal to top-post on this list? I am following your > example but other lists I am on prefer bottom-posting. Stylistic note: we do both on this list. Specifically: there's no

Re: [OMPI users] centos 7.2 openmpi from repo, stdout issue

2016-10-05 Thread Jeff Squyres (jsquyres)
We did have some kind of stdout/stderr truncation issue a little while ago, but I don't remember what version it specifically affected. I would definitely update to at least Open MPI 1.10.4 (lots of bug fixes since 1.10.0). Better would be to update to Open MPI 2.0.1 -- that's the current

Re: [OMPI users] Question on using Github to see bugs fixed in past versions

2016-10-05 Thread Jeff Squyres (jsquyres)
Additionally: - When Open MPI migrated to github, we only brought over relevant open Trac tickets to Github. As such, many old 1.10 and 1.8 (and earlier) issues were not brought over. - Trac is still available in a read-only manner at https://svn.open-mpi.org/trac/ompi/report. > On Oct 5,

Re: [OMPI users] static linking MPI libraries with applications

2016-09-15 Thread Jeff Squyres (jsquyres)
If you want to build statically with verbs support, it's tricky. Per the FAQ: 37. I get bizarre linker warnings / errors / run-time faults when I try to compile my OpenFabrics MPI application statically. How do I fix this? Fully static linking is not for the weak, and is not recommended. But

Re: [OMPI users] MPI libraries

2016-09-12 Thread Jeff Squyres (jsquyres)
It might be easier to not list the MPI library that Open MPI is using -- we have changed the name of this library over time (as you have noticed). The "mpifort" wrapper compiler will always pick up the right library name for you. > On Sep 12, 2016, at 1:44 PM, Mahmood Naderan

Re: [OMPI users] OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-07 Thread Jeff Squyres (jsquyres)
You can also run: ompi_info | grep 'plm: tm' (note the quotes, because you need to include the space) If you see a line listing the TM PLM plugin, then you have Torque / PBS support built in to Open MPI. If you don't, then you don't. :-) > On Sep 7, 2016, at 11:01 AM, Gilles Gouaillardet >

Re: [OMPI users] Error in file runtime/orte_init.c

2016-09-02 Thread Jeff Squyres (jsquyres)
Did you, perchance, install open MPI v2.0.0 in the same directory tree that a prior version of open MPI was already installed? If so, open MPI may be trying to use plugins from the prior version of open MPI, which will be problematic. Sent from my phone. No type good. > On Sep 2, 2016, at

Re: [OMPI users] job aborts "readv failed: Connection reset by peer"

2016-09-02 Thread Jeff Squyres (jsquyres)
Note that open MPI v2.0.0 is not ABI compatible with prior releases of open MPI. If you are trying to run an MPI executable created by a prior version of open MPI, you will need to recompile your application with open MPI v2.0.0. Sent from my phone. No type good. > On Sep 2, 2016, at 12:48

Re: [OMPI users] New to (Open)MPI

2016-09-02 Thread Jeff Squyres (jsquyres)
Greetings Lachlan. Yes, Gilles and John are correct: on Cisco hardware, our usNIC transport is the lowest latency / best HPC-performance transport. I'm not aware of any MPI implementation (including Open MPI) that has support for FC types of transports (including FCoE). I'll ping you

Re: [OMPI users] job aborts "readv failed: Connection reset by peer"

2016-09-02 Thread Jeff Squyres (jsquyres)
Also, the error message suggested that TCP is not the issue here -- the TCP hangups are likely because some other process exited unexpectedly. Indeed: - mpirun noticed that process rank 0 with PID 4989 on node compute-0-1 exited on signal 4 (Illegal instruction). - This might be the

Re: [OMPI users] Can't compile OpenMPI 2.0.0 on a CentOS 6 system

2016-09-01 Thread Jeff Squyres (jsquyres)
On Sep 1, 2016, at 2:05 PM, Sean Ahern wrote: > > In actuality, I stored off the source in our "third party" repo before I > built it. > > svn add openmpi-2.0.0 > svn commit > > When I grabbed that source back on the machine I wanted to build on, the > relative timestamps

Re: [OMPI users] Can't compile OpenMPI 2.0.0 on a CentOS 6 system

2016-09-01 Thread Jeff Squyres (jsquyres)
Sean Ahern <s...@ensight.com> wrote: > > Yep, that's it. > -Sean > > -- > Sean Ahern > Computational Engineering International > 919-363-0883 > > > On Thu, Sep 1, 2016 at 1:04 PM, Jeff Squyres (jsquyres) > <jsquy...@cisco.com> wrote: >> That's

Re: [OMPI users] Can't compile OpenMPI 2.0.0 on a CentOS 6 system

2016-09-01 Thread Jeff Squyres (jsquyres)
guys can help me track down the dependency problem. > > -Sean > > -- > Sean Ahern > Computational Engineering International > 919-363-0883 > > On Thu, Sep 1, 2016 at 11:56 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > Greetings Sean. >

Re: [OMPI users] Can't compile OpenMPI 2.0.0 on a CentOS 6 system

2016-09-01 Thread Jeff Squyres (jsquyres)
Greetings Sean. Yes, you are correct - when you build from the tarball, you should not need the GNU autotools. When tarball builds fail like this, it *usually* means that you are building in a network filesystem, and the time is not well synchronized between the machine on which you are

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-25 Thread Jeff Squyres (jsquyres)
The IOF fix PR for v2.0.1 was literally just merged a few minutes ago; it wasn't in last night's tarball. > On Aug 25, 2016, at 10:59 AM, r...@open-mpi.org wrote: > > ??? Weird - can you send me an updated output of that last test we ran? > >> On Aug 25, 2016, at 7:51 AM, Jingchao Zhang

Re: [OMPI users] mpi_f08 Question: set comm on declaration error, and other questions

2016-08-19 Thread Jeff Squyres (jsquyres)
On Aug 19, 2016, at 6:32 PM, Matt Thompson wrote: > > 2. The second one is a run-time assignment. You can do that between any > compatible entities, and so that works. > > Okay. This makes sense. I guess I was surprised that MPI_COMM_NULL wasn't a > constant (or parameter,

Re: [OMPI users] mpi_f08 Question: set comm on declaration error, and other questions

2016-08-19 Thread Jeff Squyres (jsquyres)
On Aug 19, 2016, at 2:30 PM, Matt Thompson wrote: > > I'm slowly trying to learn and transition to 'use mpi_f08'. So, I'm writing > various things and I noticed that this triggers an error: > > program hello_world >use mpi_f08 >implicit none >type(MPI_Comm) ::

Re: [OMPI users] OPENSHMEM ERROR with 2+ Distributed Machines

2016-08-17 Thread Jeff Squyres (jsquyres)
om> wrote: > > assuming you have an infiniband network, an other option is to install mxm > (mellanox proprietary but free library) and rebuild Open MPI. > pml/yalla will be used instead of ob1 and you should be just fine > > Cheers, > > Gilles > > On Tuesday, Augus

Re: [OMPI users] SGE integration broken in 2.0.0

2016-08-16 Thread Jeff Squyres (jsquyres)
On Aug 16, 2016, at 3:07 PM, Reuti wrote: > > Thx a bunch - that was it. Despite searching for a solution I found only > hints that didn't solve the issue. FWIW, we talk about this in the HACKING file, but I admit that's not necessarily the easiest place to find:

Re: [OMPI users] SGE integration broken in 2.0.0

2016-08-16 Thread Jeff Squyres (jsquyres)
On Aug 12, 2016, at 2:15 PM, Reuti wrote: > > I updated my tools to: > > autoconf-2.69 > automake-1.15 > libtool-2.4.6 > > but I face with Open MPI's ./autogen.pl: > > configure.ac:152: error: possibly undefined macro: AC_PROG_LIBTOOL > > I recall seeing in

Re: [OMPI users] OPENSHMEM ERROR with 2+ Distributed Machines

2016-08-16 Thread Jeff Squyres (jsquyres)
On Aug 16, 2016, at 6:09 AM, Debendra Das wrote: > > As far as I understood I have to wait for version 2.0.1 to fix the issue.So > can you please give any idea about when 2.0.1 will be released. We had hoped to release it today, actually. :-\ But there's still a

[OMPI users] Open MPI mail archives now back online

2016-08-12 Thread Jeff Squyres (jsquyres)
mail-archive.com now has all of the old Open MPI mail archives online. Example: https://www.mail-archive.com/users@lists.open-mpi.org/ https://www.mail-archive.com/devel@lists.open-mpi.org/ Note that there are two different ways you can permalink to messages on mail-archive: 1. Take

Re: [OMPI users] www.open-mpi.org certificate error?

2016-07-31 Thread Jeff Squyres (jsquyres)
t name (e.g. www.open-mpi.org) and can > > contains wildcards (e.g. *.open-mpi.org) > > so if the first condition is met, then you should be able to reuse the > > certificate that was previously used at UI. > > > > makes sense ? > > > > Cheers, > >

[OMPI users] This list is migrating!

2016-07-19 Thread Jeff Squyres (jsquyres)
Short version = The server for this mailing list will be migrating sometime soon (the exact timing is not fully predictable). Three things you need to know: 1. We'll send a "This list is now closed for migration" last message when the migration starts 2. We'll send a "This list is

Re: [OMPI users] [OMPI devel] Change compiler

2016-07-18 Thread Jeff Squyres (jsquyres)
On Jul 18, 2016, at 4:06 PM, Emani, Murali wrote: > > I would like to know if there is Clang support for OpenMPI codebase. > > I am trying to change the underlying compiler from gcc to clang in > ‘configure' and ‘make all install’, I changed these values in Makefile in > root

Re: [OMPI users] how to build with memchecker using valgrind, preferable linux distro install of valgrind?

2016-07-14 Thread Jeff Squyres (jsquyres)
The key is that you need to specify --with-valgrind=valgrind_install_dir -- not the path to the valgrind executable. Additionally, there's a valgrind.h that you'll need to have in that tree. E.g., if you specify --with-valgrind=/opt/valgrind, then it expects to find

Re: [OMPI users] MPI-RMA rget doesn't complete the communication after mpi_wait

2016-07-13 Thread Jeff Squyres (jsquyres)
I've filed https://github.com/open-mpi/ompi/issues/1869 to track the issue. > On Jul 13, 2016, at 5:23 AM, Alfio Lazzaro <alfio.lazz...@gmail.com> wrote: > > Hi Jeff, > thanks for your reply. We tried it and it still doesn't work... > > Alfio > > 2016-07-

Re: [OMPI users] MPI-RMA rget doesn't complete the communication after mpi_wait

2016-07-12 Thread Jeff Squyres (jsquyres)
Alfio -- We just released Open MPI v2.0.0, with lots of MPI RMA fixes. Would you mind testing there? > On Jul 12, 2016, at 1:33 PM, Alfio Lazzaro wrote: > > Dear OpenMPI developers, > we found a strange behavior when using MPI-RMA passive target and OpenMPI >

Re: [OMPI users] [OMPI devel] Class information in OpenMPI

2016-07-12 Thread Jeff Squyres (jsquyres)
Unfortunately, the Open MPI code base is quite large, and changes over time. There really is no overall diagram describing the entire code base, sorry. The OPAL-level doxygen docs are probably the best you'll get, but they're really only the utility classes in the portability layer. They

Re: [OMPI users] Need libmpi_f90.a

2016-07-11 Thread Jeff Squyres (jsquyres)
On Jul 11, 2016, at 3:25 PM, Mahmood Naderan wrote: > # ls -l libmpi* > -rw-r--r-- 1 root root 1029580 Jul 11 23:51 libmpi_mpifh.a > -rw-r--r-- 1 root root 17292 Jul 11 23:51 libmpi_usempi.a These are the two for v1.10.x. Sorry; one thing I wasn't clear on (I had

Re: [OMPI users] Need libmpi_f90.a

2016-07-10 Thread Jeff Squyres (jsquyres)
> On Jul 10, 2016, at 9:59 AM, Mahmood Naderan wrote: > > Hi, > I need libmpi_f90.a for building an application. I have manually compiled > 1.6.5 and 1.10.3 but that file is absent. Instead I see these > > openmpi-1.6.5/lib/libmpi_f90.la >

Re: [OMPI users] The ompi/mca/cool/sm will not be used on multi-nodes?

2016-06-30 Thread Jeff Squyres (jsquyres)
I actually wouldn't advise ml. It *was* being developed as a joint project between ORNL and Mellanox. I think that code eventually grew into what the "hcoll" Mellanox library currently is. As such, ml reflects kind of a middle point before hcoll became hardened into a real product. It has

Re: [OMPI users] Shared Libraries

2016-06-24 Thread Jeff Squyres (jsquyres)
On Jun 24, 2016, at 4:39 PM, Richard C. Wagner wrote: > > Then I try to compile the library file in 32-bit mode. The first command is: > > mpicc -fPIC -m32 -c libtest.c > > Then the second is: > > mpicc -shared -m32 -o libmpi.so libtest.o > > As you can see below, compiling

Re: [OMPI users] tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32) on openvz containers

2016-06-24 Thread Jeff Squyres (jsquyres)
ut of 2 > wall clock time = 0.02 > Hello world! from processor 1 (name=ct111 ) out of 2 > wall clock time = 0.01 > > Thanks a lot indeed! > > > Jeff Squyres (jsquyres) wrote on 24/06/16 16:08: >> On Jun 24, 2016, at 7:26 AM, kna...@gmail.com wrote

Re: [OMPI users] tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32) on openvz containers

2016-06-24 Thread Jeff Squyres (jsquyres)
On Jun 24, 2016, at 7:26 AM, kna...@gmail.com wrote: > >> mpirun --mca btl_tcp_if_include venet0:0 --mca oob_tcp_if_include >> venet0:0 ... > > See if that works. > Jeff, thanks a lot for such prompt reply, detailed explanation and > suggestion! But unfortunately the error is still the

Re: [OMPI users] Why communication performance change with binding PEs?

2016-06-23 Thread Jeff Squyres (jsquyres)
On Jun 23, 2016, at 8:20 AM, Saliya Ekanayake wrote: > > I've got a quick question. Besides theses time sharing constraints, does > number of cores has any significance to MPI's communication decisions? Open MPI doesn't use the number of cores available to it in any

Re: [OMPI users] Shared Libraries

2016-06-23 Thread Jeff Squyres (jsquyres)
Greetings Richard. Yes, that certainly is unusual. :-) Here's my advice: - Configure Open MPI with the --disable-dlopen flag. This will slurp in all of Open MPI's plugins into the main library, and make things considerably simpler for you. - Build Open MPI in a 32 bit mode -- e.g., supply

Re: [OMPI users] Fw: OpenSHMEM Runtime Error

2016-06-23 Thread Jeff Squyres (jsquyres)
Ryan -- Did you try the suggestions listed in the help message? > On Jun 23, 2016, at 1:24 AM, RYAN RAY wrote: > > > > From: "RYAN RAY" > Sent: Wed, 22 Jun 2016 14:32:33 > To: "users" > Subject: OpenSHMEM Runtime Error >

Re: [OMPI users] Continuous integration question...

2016-06-22 Thread Jeff Squyres (jsquyres)
On Jun 22, 2016, at 1:33 PM, Eric Chamberland wrote: > > I would like to do compile+test our code each night with the "latest" openmpi > v2 release (or nightly if enough stable). Cool! > Just to ease the process, I would like to "wget" the latest archive

Re: [MTT users] Invalid mpi install id while reporting MTT

2016-06-21 Thread Jeff Squyres (jsquyres)
Abhishek -- Could you send the full output from your mtt client run with the --verbose flag enabled? If you'd prefer not to send it to the public list, send it directly to me and Josh Hursey (IBM). Thanks! > On Jun 21, 2016, at 6:48 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com>

Re: [OMPI users] Avoiding the memory registration costs by having memory always registered, is it possible with Linux ?

2016-06-21 Thread Jeff Squyres (jsquyres)
> On Jun 20, 2016, at 4:15 PM, Audet, Martin > wrote: > > But now since we have to live with memory registration issues, what changes > should be done to standard Linux distro so that Open MPI can best use a > recent Mellanox Infiniband network ? > > I guess

Re: [OMPI users] Avoiding the memory registration costs by having memory always registered, is it possible with Linux ?

2016-06-21 Thread Jeff Squyres (jsquyres)
On Jun 20, 2016, at 4:27 PM, Alex A. Granovsky wrote: > > Would be the use of mlockall helpful for this approach? That's an interesting idea; I didn't know about the existence of mlockall(MCL_FUTURE). It has a few drawbacks, of course (e.g., processes can't shrink),

Re: [MTT users] Invalid mpi install id while reporting MTT

2016-06-21 Thread Jeff Squyres (jsquyres)
Greetings Abhishek. You sent me your INI file in another email thread. In general, you need to run all the 5 phases. During the MPI install, for example, even if you have an "already installed" MPI (i.e., using the MPI Get module "AlreadyInstalled"), you still have to run that phase so that

Re: [OMPI users] Avoiding the memory registration costs by having memory always registered, is it possible with Linux ?

2016-06-18 Thread Jeff Squyres (jsquyres)
Greetings Martin. Such approaches have been discussed in the past. Indeed, I'm pretty sure that I've heard of some non-commodity systems / network stacks that do this kind of thing. Such approaches have not evolved in the commodity Linux space, however. This kind of support would need

Re: [OMPI users] Processes unable to communicate when using MPI_Comm_spawn on Windows

2016-06-09 Thread Jeff Squyres (jsquyres)
I think there were a few not-entirely-correct data points in this thread. Let me clarify a few things: 1. Yes, Open MPI suspended native Windows support a while back. Native windows support is simply not a popular use case, and therefore we couldn't justify spending the time on it (not the

Re: [OMPI users] openmpi-dev-4221-gb707d13: referenced symbol

2016-06-08 Thread Jeff Squyres (jsquyres)
Filed https://github.com/open-mpi/ompi/issues/1771 to track the issue. > On Jun 8, 2016, at 1:47 AM, George Bosilca wrote: > > Apparently Solaris 10 lacks support for strnlen. We should add it to our > configure and provide a replacement where needed. > > George. > > >

Re: [OMPI users] users Digest, Vol 3518, Issue 2

2016-06-01 Thread Jeff Squyres (jsquyres)
st at > users-ow...@open-mpi.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of users digest..." > > > Today's Topics: > > 1. Re: Firewall settings for MPI communication > (Jeff Squyres (jsqu

Re: [OMPI users] users Digest, Vol 3514, Issue 1

2016-06-01 Thread Jeff Squyres (jsquyres)
h subject or body 'help' to > users-requ...@open-mpi.org > > You can reach the person managing the list at > users-ow...@open-mpi.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of users digest..." > &g

Re: [OMPI users] Firewall settings for MPI communication

2016-06-01 Thread Jeff Squyres (jsquyres)
In addition, you might want to consider upgrading to Open MPI v1.10.x (v1.6.x is fairly ancient). > On Jun 1, 2016, at 7:46 AM, Gilles Gouaillardet > wrote: > > which network are your VMs using for communications ? > if this is tcp, then you also have to specify

Re: [OMPI users] users Digest, Vol 3510, Issue 2

2016-05-26 Thread Jeff Squyres (jsquyres)
You're still intermingling your Open MPI and MPICH installations. You need to ensure to use the wrapper compilers and mpirun/mpiexec from the same MPI implementation. For example, if you use mpicc/mpifort from Open MPI to build your program, then you must use Open MPI's mpirun/mpiexec. If you

Re: [OMPI users] fortran problem when mixing "use mpi" and "use mpi_f08" with gfortran 5

2016-05-25 Thread Jeff Squyres (jsquyres)
On May 21, 2016, at 12:17 PM, Andrea Negri wrote: > > Hi, in the last few days I ported my entire fortran mpi code to "use > mpif_08". You really did a great job with this interface. However, > since HDF5 still uses integers to handle communicators, I have a > module where

Re: [OMPI users] segmentation fault for slot-list and openmpi-1.10.3rc2

2016-05-24 Thread Jeff Squyres (jsquyres)
On May 24, 2016, at 7:19 AM, Siegmar Gross wrote: > > I don't see a difference for my spawned processes, because both functions will > "wait" until all pending operations have finished, before the object will be > destroyed. Nevertheless, perhaps my small

Re: [OMPI users] users Digest, Vol 3510, Issue 2

2016-05-24 Thread Jeff Squyres (jsquyres)
Doesn't Abaqus do its own environment setup? I.e., I'm *guessing* that you should be able to set your environment startup files (e.g., $HOME/.bashrc) to point your PATH / LD_LIBRARY_PATH to point to whichever MPI implementation you want, and Abaqus will do whatever it needs to a) be

Re: [OMPI users] problem about mpirun on two nodes

2016-05-23 Thread Jeff Squyres (jsquyres)
d and the error messages disappeared. I saw six > processes were running on each node, but now the all processes keep running > forever with 100% CPU usage. > > > -Original Message- > From: Jeff Squyres (jsquyres) <jsquy...@cisco.com> > To: Open MPI User's List &

Re: [OMPI users] problem about mpirun on two nodes

2016-05-23 Thread Jeff Squyres (jsquyres)
On May 21, 2016, at 11:31 PM, dour...@aol.com wrote: > > I encountered a problem about mpirun and SSH when using OMPI 1.10.0 compiled > with gcc, running on centos7.2. > When I execute mpirun on my 2 node cluster, I get the following errors pasted > below. > > [douraku@master home]$ mpirun -np

Re: [OMPI users] OpenMPI 1.6.5 on CentOS 7.1, silence ib-locked-pages?

2016-05-18 Thread Jeff Squyres (jsquyres)
On May 18, 2016, at 6:16 PM, Ryan Novosielski wrote: > > I’m pretty sure this is no longer relevant (having read Roland’s messages > about it from a couple of years ago now). Can you please confirm that for me, > and then let me know if there is any way that I can silence

[MTT users] MTT ompi-tests password

2016-05-18 Thread Jeff Squyres (jsquyres)
Folks -- In helping someone get up to speed with MTT yesterday, I updated an MTT sample .ini file and accidentally committed the ompiteam-mtt Github password to the public repository. :-( I have therefore just changed the password for the ompiteam-mtt account. Please contact me off-list for

Re: [OMPI users] Mpirun invocation only works in debug mode, hangs in "normal" mode.

2016-05-16 Thread Jeff Squyres (jsquyres)
I'm afraid I don't know what the difference is in systemctld for ssh.socket vs. ssh.service, or why that would change Open MPI's behavior. One other thing to try is to mpirun non-MPI programs, like "hostname" and see if that works. This will help distinguish between problems with Open MPI's

Re: [OMPI users] Building vs packaging

2016-05-16 Thread Jeff Squyres (jsquyres)
+1 to everything so far. Also, look in your shell startup files (e.g., $HOME/.bashrc) to see if certain parts of it are not executed for non-interactive logins. A common mistake we see is a shell startup file like this: # ... do setup for all logins ... if (this is a non-interactive

Re: [MTT users] Request for SVN username & Password

2016-05-16 Thread Jeff Squyres (jsquyres)
Greetings Manu. It looks like you are using ancient / outdated information; we switched away from SVN to Git ~1.5 years ago. Do we still have some stale references to SVN somewhere, or are you working from old data? Are you looking at the MTT wiki? > On May 16, 2016, at 8:20 AM, Manu S.

Re: [OMPI users] One more (possible) bug report

2016-05-14 Thread Jeff Squyres (jsquyres)
You might want to try a pure TCP benchmark across this problematic NIC (e.g., NetpipeTCP or iperf). That will take MPI out of the equation and see if you are able to pass TCP traffic correctly. Make sure to test sizes both smaller and larger than your MTU. > On May 14, 2016, at 1:25 AM,

Re: [OMPI users] mpirun command won't run unless the firewalld daemon is disabled

2016-05-12 Thread Jeff Squyres (jsquyres)
ther devices (InfiniBand or RoCE switches) between the servers. >>> >>> I will have to ask a stupid question here but when you suggest that we open >>> the firewall to trust random TCP connections, how is that different from >>> disabling it? Is there some configuratio

Re: [OMPI users] mpirun command won't run unless the firewalld daemon is disabled

2016-05-12 Thread Jeff Squyres (jsquyres)
interop tests with at least two servers, sometimes more. We also >>> have other devices (InfiniBand or RoCE switches) between the servers. >>> >>> I will have to ask a stupid question here but when you suggest that we open >>> the firewall to trust random TCP conne

Re: [OMPI users] mpirun command won't run unless the firewalld daemon is disabled

2016-05-12 Thread Jeff Squyres (jsquyres)
We >>> run the interop tests with at least two servers, sometimes more. We also >>> have other devices (InfiniBand or RoCE switches) between the servers. >>> >>> I will have to ask a stupid question here but when you suggest that we open >>> the firewall to trust

Re: [OMPI users] mpirun command won't run unless the firewalld daemon is disabled

2016-05-10 Thread Jeff Squyres (jsquyres)
On May 10, 2016, at 5:05 PM, Gilles Gouaillardet wrote: > > I was basically suggesting you open a few ports to anyone (e.g. any IP > address), and Jeff suggests you open all ports to a few trusted IP addresses. +1 -- Jeff Squyres jsquy...@cisco.com For

Re: [OMPI users] mpirun command won't run unless the firewalld daemon is disabled

2016-05-10 Thread Jeff Squyres (jsquyres)
Open MPI generally needs to be able to communicate on random TCP ports between machines in the MPI job (and the machine where mpirun is invoked, if that is a different machine). You could also open your firewall to trust random TCP connections just between the servers in your cluster. > On

Re: [MTT users] Python choice

2016-05-09 Thread Jeff Squyres (jsquyres)
Is it possible to give a friendly error message at run time if you accidentally run with Python 3.x? > On May 9, 2016, at 12:37 PM, Ralph Castain wrote: > > Hi folks > > As we look at the Python client, there is an issue with the supported Python > version. There was a

Re: [OMPI users] Isend, Recv and Test

2016-05-09 Thread Jeff Squyres (jsquyres)
On May 9, 2016, at 8:23 AM, Zhen Wang wrote: > > I have another question. I thought MPI_Test is a local call, meaning it > doesn't send/receive message. Am I misunderstanding something? Thanks again. >From the user's perspective, MPI_TEST is a local call, in that it checks to

Re: [OMPI users] No core dump in some cases

2016-05-07 Thread Jeff Squyres (jsquyres)
I'm afraid I don't know what a .btr file is -- that is not something that is controlled by Open MPI. You might want to look into your OS settings to see if it has some kind of alternate corefile mechanism...? > On May 6, 2016, at 8:58 PM, dpchoudh . wrote: > > Hello all

Re: [OMPI users] Isend, Recv and Test

2016-05-06 Thread Jeff Squyres (jsquyres)
On May 5, 2016, at 10:09 PM, Zhen Wang wrote: > > It's taking so long because you are sleeping for .1 second between calling > MPI_Test(). > > The TCP transport is only sending a few fragments of your message during each > iteration through MPI_Test (because, by definition,

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-06 Thread Jeff Squyres (jsquyres)
Ok, good. I asked that question because typically when we see errors like this, it is usually either a busted compiler installation or inadvertently mixing the run-times of multiple different compilers in some kind of incompatible way. Specifically, the mpifort (aka mpif90) application is a

Re: [OMPI users] Isend, Recv and Test

2016-05-05 Thread Jeff Squyres (jsquyres)
It's taking so long because you are sleeping for .1 second between calling MPI_Test(). The TCP transport is only sending a few fragments of your message during each iteration through MPI_Test (because, by definition, it has to return "immediately"). Other transports do better handing off

Re: [OMPI users] Segmentation Fault (Core Dumped) on mpif90 -v

2016-05-05 Thread Jeff Squyres (jsquyres)
Giacomo -- Are you able to run anything that is compiled by that Intel compiler installation? > On May 5, 2016, at 12:02 PM, Gus Correa wrote: > > Hi Giacomo > > Some programs fail with segmentation fault > because the stack size is too small. > [But others because

Re: [OMPI users] mpirun gives error when option '--hostfiles' or '--hosts' is used

2016-05-03 Thread Jeff Squyres (jsquyres)
--- > - > Again, i can call mpirun on triops from kraken und all squid_XX without a > problem... > > What could cause this problem? > > Thank You > Jody > > > On Tue, May 3, 2016 at 2:54 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com

Re: [OMPI users] Problem with 'orted: command not found'

2016-05-02 Thread Jeff Squyres (jsquyres)
profile just to be sure a moment before but it didn't work. > Still the same error. > > > > 2016-05-02 23:40 GMT+02:00 Jeff Squyres (jsquyres) <jsquy...@cisco.com>: > The key is this error: > > bash: orted: command not found > > Meaning: you need to set

Re: [OMPI users] Problem with 'orted: command not found'

2016-05-02 Thread Jeff Squyres (jsquyres)
The key is this error: bash: orted: command not found Meaning: you need to set your PATH and LD_LIBRARY_PATH properly for non-interactive logins. See https://www.open-mpi.org/faq/?category=running#adding-ompi-to-path. > On May 2, 2016, at 5:36 PM, Maciek Lewiński

Re: [OMPI users] OpenSHMEM + STM Linking Problem

2016-05-02 Thread Jeff Squyres (jsquyres)
stm.h is not a header file in either Open MPI or OpenSHMEM. Is that a TinySTM header file? If you're having a problem with compiling TinySTM applications, you should probably contact their support channels -- we don't know/can't help with that. Sorry. > On May 2, 2016, at 5:57 AM, RYAN

Re: [OMPI users] Cannot run a simple MPI program

2016-04-25 Thread Jeff Squyres (jsquyres)
On Apr 24, 2016, at 8:12 PM, Gilles Gouaillardet wrote: > > fwiw, once in a while, i > rm -rf /.../ompi_install_dir/lib/openmpi > to get rid of the removed modules If it helps, I usually install Open MPI into a tree all by itself, and then I can "rm -rf

Re: [OMPI users] runtime errors for openmpi-v1.10.2-142-g5cd9490

2016-04-20 Thread Jeff Squyres (jsquyres)
Thanks Siegmar; I posted this in https://github.com/open-mpi/ompi/issues/1569. > On Apr 20, 2016, at 1:14 PM, Siegmar Gross > wrote: > > Hi, > > I have built openmpi-v1.10.2-142-g5cd9490 on my machines > (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE

Re: [OMPI users] Error for openmpi-dev-3868-g178c97b with Solaris

2016-04-20 Thread Jeff Squyres (jsquyres)
Siegmar -- Thanks for the report. I filed https://github.com/open-mpi/ompi/issues/1565 to track the issue. > On Apr 20, 2016, at 4:48 AM, Siegmar Gross > wrote: > > Hi, > > I tried to build openmpi-dev-3868-g178c97b on my machines > (Solaris 10

Re: [OMPI users] MPI_Bcast implementations in OpenMPI

2016-04-19 Thread Jeff Squyres (jsquyres)
On Apr 15, 2016, at 9:18 AM, Dorier, Matthieu wrote: > > I'd like to know how OpenMPI implements MPI_Bcast. And if different > implementations are provided, how one is selected. This is a fairly complicated topic. This old paper is the foundation for how Open MPI works (it's

Re: [OMPI users] Build on FreeBSD

2016-04-19 Thread Jeff Squyres (jsquyres)
Thanks; I added this to https://github.com/open-mpi/ompi-release/pull/1079. I didn't know your Github ID, so I didn't @mention you in the comment. > On Apr 17, 2016, at 1:07 AM, dpchoudh . wrote: > > Hello all > > I understand that FreeBSD is not a supported platform, so

Re: [OMPI users] openib failover

2016-04-19 Thread Jeff Squyres (jsquyres)
On Apr 17, 2016, at 3:24 PM, dpchoudh . wrote: > > Hello all > > As I understand, the openib BTL supports NIC failover, but I am confused > about the scope of this support. Let me elaborate: > > 1. Is the failover support part of MPI specification? No. MPI doesn't make

Re: [OMPI users] Fw: LSF's LSB_PJL_TASK_GEOMETRY + OpenMPI 1.10.2

2016-04-19 Thread Jeff Squyres (jsquyres)
On Apr 18, 2016, at 7:08 PM, Farid Parpia wrote: > > I will try to put you in touch with someone in LSF development immediately. FWIW: It would be great if IBM could contribute the fixes to this. None of us have access to LSF resources, and IBM is a core contributor to Open

Re: [OMPI users] system call failed that shouldn't?

2016-04-14 Thread Jeff Squyres (jsquyres)
On Apr 14, 2016, at 12:27 PM, Tom Rosmond wrote: > > Gilles, > > Yes, that solved the problem. Thanks for the help. I assume this fix will > be in the next official release, i.e. 1.10.3? Yup! -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to:

Re: [OMPI users] Debugging help

2016-04-12 Thread Jeff Squyres (jsquyres)
On Apr 12, 2016, at 2:38 PM, dpchoudh . wrote: > > Hello all > > I am trying to set a breakpoint during the modex exchange process so I can > see the data being passed for different transport type. I assume that this is > being done in the context of orted since this is

Re: [OMPI users] libfabric verb provider for iWARP RNIC

2016-04-11 Thread Jeff Squyres (jsquyres)
On Apr 11, 2016, at 2:38 PM, dpchoudh . wrote: > > If the vendor of a new type of fabric wants to include support for OpenMPI, > then, as long as they can implement a libfabric provider, they can use the > OFI MTL without adding any code to the OpenMPI source tree itself.

Re: [OMPI users] Openmpi 1.10.x, mpirun and Slurm 15.08 problem

2016-04-01 Thread Jeff Squyres (jsquyres)
Ralph -- What's the state of PMI integration with SLURM in the v1.10.x series? (I haven't kept up with SLURM's recent releases to know if something broke between existing Open MPI releases and their new releases...?) > On Mar 31, 2016, at 4:24 AM, Tommi T wrote: > >

Re: [OMPI users] Existing and emerging interconnects for commodity PCs

2016-03-21 Thread Jeff Squyres (jsquyres)
+1 on what Gilles says. 10 years is too lengthy of a horizon to guarantee knowledge in the fast-moving tech sector. All you can do is make good estimates based on your requirements and budget today (and what you can estimate over the next few years). > On Mar 21, 2016, at 6:06 AM, Gilles

Re: [OMPI users] Why do I need a C++ linker while linking in MPI C code with CUDA?

2016-03-21 Thread Jeff Squyres (jsquyres)
On Mar 20, 2016, at 9:23 PM, Gilles Gouaillardet wrote: > > Durga, > > since the MPI c++ bindings are not required, you might want to > mpicc ... -lstd++ > instead of > mpicxx ... I'm not sure I'd recommend that. Using the C++ compiler may do other

Re: [OMPI users] Why does 'self' needs to be explicitly mentioned?

2016-03-21 Thread Jeff Squyres (jsquyres)
On Mar 19, 2016, at 11:53 AM, dpchoudh . wrote: > > 1. Why 'self' needs to be explicitly mentioned when using the BTL > communication? Since it must always be there for MPI communication to work, > should it not be implicit? I am sure there is some architectural rationale

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Jeff Squyres (jsquyres)
Michael -- Can you send all the information listed here? https://www.open-mpi.org/community/help/ (including the full output from the run with the PML/BTL/MTL/etc. verbosity) This will allow Matias to look through all the relevant info, potentially with fewer back-n-forth emails. Thanks!

Re: [OMPI users] How to link the statically compiled OpenMPI library ?

2016-03-17 Thread Jeff Squyres (jsquyres)
On Mar 17, 2016, at 10:54 AM, evelina dumitrescu wrote: > > hello, > > I unsuccessfully tried to link the statically compiled OpenMPI library. > I used for compilation: > > ./configure --enable-static -disable-shared > make -j 4 > make install > > When I try

<    1   2   3   4   5   6   7   8   9   10   >