Re: [OMPI users] Issue about cm PML

2016-03-17 Thread Jeff Squyres (jsquyres)
Additionally, if you run ompi_info | grep psm Do you see the PSM MTL listed? To force the CM MTL, you can run: mpirun --mca pml cm ... That won't let any BTLs be selected (because only ob1 uses the BTLs). > On Mar 17, 2016, at 8:07 AM, Gilles Gouaillardet >

Re: [OMPI users] Communication problem (on one node) when network interface is down

2016-03-11 Thread Jeff Squyres (jsquyres)
It's set by default in btl_tcp_if_exclude (because in most cases, you *do* want to exclude the loopback interface -- it's much slower than other shared memory types of scenarios). But this value can certainly be overridden: mpirun --mca btl_tcp_if_exclude '' > On Mar 11, 2016, at 11:15

Re: [OMPI users] What causes the pingpong bandwidth "hump" on SM?

2016-03-10 Thread Jeff Squyres (jsquyres)
I think the information was scattered across a few posts, but the union of which is correct: - it depends on the benchmark - yes, L1/L2/L3 cache sizes can have a huge effect. I.e., once the buffer size gets bigger than the cache size, it takes more time to get the message from main RAM -->

Re: [OMPI users] General Questions

2016-03-05 Thread Jeff Squyres (jsquyres)
I just realized that I replied directly to Matthew and not to the list. Let me add most of my reply to the thread here on the list, in case it's helpful to others. See my reply to Matthew, below. > On Mar 5, 2016, at 10:53 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote:

Re: [OMPI users] openmpi bug on mac os 10.11.3 ?

2016-03-05 Thread Jeff Squyres (jsquyres)
What version of Open MPI are you using? Can you send all the information listed here: https://www.open-mpi.org/community/help/ > On Mar 5, 2016, at 5:35 AM, Hans-Jürgen Greif > wrote: > > > > > Hello, > > on mac os 10.11.3 I have found an error: > >

Re: [OMPI users] Sending string causes memory errors

2016-03-03 Thread Jeff Squyres (jsquyres)
All of those valgrind reports below are from within your code -- not from within Open MPI. All Open MPI can do is pass the contents of your message properly; you can verify that it is being sent and received properly by checking the byte contents of your received array (e.g., assert that the

Re: [OMPI users] Sending string causes memory errors

2016-03-02 Thread Jeff Squyres (jsquyres)
There's a bunch of places in OMPI where we don't initialize memory because we know it doesn't matter (e.g., in padding between unaligned struct members), but then that memory is accessed when writing the entire struct down a file descriptor or memcpy'ed elsewhere in memory...etc. It gets even

Re: [OMPI users] General Questions

2016-03-02 Thread Jeff Squyres (jsquyres)
On Mar 1, 2016, at 10:25 PM, dpchoudh . wrote: > > > I don't think the Open MPI TCP BTL will pass the SDP socket type when > creating sockets -- SDP is much lower performance than native verbs/RDMA. > You should use a "native" interface to your RDMA network instead (which

Re: [OMPI users] General Questions

2016-03-01 Thread Jeff Squyres (jsquyres)
you use depends on which kind of network you have). > Not sure if the only answer for this is a custom stack, API/kernel module. > > Do you have any input on the above mentioned things? > > On Tuesday, March 1, 2016 6:42 AM, Jeff Squyres (jsquyres) > <jsquy...@cisco.com> wro

Re: [OMPI users] General Questions

2016-03-01 Thread Jeff Squyres (jsquyres)
On Feb 29, 2016, at 6:48 PM, Matthew Larkin wrote: > > 1. I know OpenMPI supports ethernet, but where does it clearly state that? > - I see on the FAQ on the web page there is a whole list of network > interconnect, but how do I relate that to Ethernet network etc.? Open MPI

Re: [OMPI users] wrong library version for dependent open-rte lib when libtool relinks

2016-02-27 Thread Jeff Squyres (jsquyres)
On Feb 27, 2016, at 9:33 AM, Emmanuel Thomé wrote: > >>> dependency_libs=' -losmcomp -libverbs >>> /tmp/openmpi-1.10.2/orte/libopen-rte.la >>> /tmp/openmpi-1.10.2/opal/libopen-pal.la -lnuma -ldl -lrt -lm -lutil' >> >> Why does this not look good? > > Because my

Re: [OMPI users] wrong library version for dependent open-rte lib when libtool relinks

2016-02-27 Thread Jeff Squyres (jsquyres)
I got your log files and looked at them, but am replying earlier in the thread in order to give more specific answers. More below. > On Feb 27, 2016, at 5:42 AM, Emmanuel Thomé wrote: > > Hi, > > Thanks for your answer. > > I have no LD_LIBRARY_PATH. I am not sure

Re: [OMPI users] wrong library version for dependent open-rte lib when libtool relinks

2016-02-27 Thread Jeff Squyres (jsquyres)
om a system where I kept that file). > > /usr/lib/libosmcomp.la has no embedded rpath information. FWIW, this > .la file comes from the file > MLNX_OFED_LINUX-3.1-1.0.3-debian8.1-x86_64/DEBS/libopensm_4.6.0.MLNX20150830.c69ebab_amd64.deb > . > > On Sat, Feb 27, 2016 at 2:22 PM,

Re: [OMPI users] wrong library version for dependent open-rte lib when libtool relinks

2016-02-27 Thread Jeff Squyres (jsquyres)
On Feb 27, 2016, at 5:42 AM, Emmanuel Thomé wrote: > > Specifically, I have /usr/lib/libosmcomp.la. If I delete that file, > then no -L/usr/lib shows up in the relink command for libmpi; libtool > just emits -losmcomp alone, which is fine. Then the subsequent >

Re: [OMPI users] Problem when installing OpenMPI during make all install

2016-02-25 Thread Jeff Squyres (jsquyres)
Can you send all the information listed here: https://www.open-mpi.org/community/help/ > On Feb 25, 2016, at 9:09 PM, Tang Cheng Yee wrote: > > Hi all, > > I am new to the bioinformatics world. When I was trying to install OpenMPI, I > encountered the following

Re: [OMPI users] Adding a new BTL

2016-02-25 Thread Jeff Squyres (jsquyres)
Can you send the full output from autogen and configure? Also, this is probably better suited for the Devel list, since we're talking about OMPI internals. Sent from my phone. No type good. On Feb 25, 2016, at 2:06 PM, dpchoudh . > wrote: Hello

Re: [OMPI users] OMPI users] Fortran vs C reductions

2016-02-18 Thread Jeff Squyres (jsquyres)
George is correct that the Fortran bindings are optional, but the datatype declarations are not. I've raised the question in the MPI Forum Fortran Working Group:     http://lists.mpi-forum.org/mpiwg-fortran/2016/02/1722.php -Original Message- From: Jed Brown

Re: [MTT users] MTT: Unescaped left brace in regex is deprecated

2016-02-18 Thread Jeff Squyres (jsquyres)
Filed a PR: https://github.com/open-mpi/mtt/pull/424 Can you have a look? -Original Message- From: Adrian Reber Reply: General user list for the MPI Testing Tool Date: February 17, 2016 at 12:14:07 PM To: mtt-us...@open-mpi.org

Re: [OMPI users] difference between OpenMPI - intel MPI -- how to understand where\why

2016-02-17 Thread Jeff Squyres (jsquyres)
-Original Message- From: Diego Avesani Reply: Open MPI Users Date: February 17, 2016 at 8:13:35 AM To: Open MPI Users Subject:  Re: [OMPI users] difference between OpenMPI - intel MPI -- how to understand where\why > I

Re: [OMPI users] readv failed How to debug?

2016-02-16 Thread Jeff Squyres (jsquyres)
-Original Message- From: JR Cary Reply: Open MPI Users Date: February 16, 2016 at 9:39:23 AM To: us...@open-mpi.org Subject:  Re: [OMPI users] readv failed How to debug? > Thanks, Gilles, > > Yes, this binary was built a few

Re: [OMPI users] readv failed How to debug?

2016-02-16 Thread Jeff Squyres (jsquyres)
John -- +1 on what Gilles said. The initial error says that a broadcast message was truncated.  This likely indicates that someone is calling MPI_Bcast with a different size than its peers (it *could* indicate what Giles mentioned about different-but-supposed-to-be-compatible-datatypes, but

Re: [OMPI users] difference between OpenMPI - intel MPI -- how to understand where\why

2016-02-16 Thread Jeff Squyres (jsquyres)
On February 16, 2016 at 5:19:16 AM, Diego Avesani (diego.aves...@gmail.com) wrote: > Dear all, > > I have written an fortran-MPI code. > Usually, I compile it in MPI or in openMPI according to the cluster where > it runs. > Unfortunately, I get complitly a different result and I do not know

Re: [OMPI users] Open MPI backwards incompatibility issue in statically linked program

2016-02-15 Thread Jeff Squyres (jsquyres)
Yes, that is correct: the Open MPI libraries used by all components (to include mpirun and all the individual MPI processes) must be compatible.  We usually advise users to use the same version across the board, and that will make things work. More specifically: it's not enough to statically

Re: [OMPI users] shared memory zero size segment

2016-02-10 Thread Jeff Squyres (jsquyres)
Peter -- Somewhere along the way, your attachment got lost. Could you re-send? Thanks. > On Feb 10, 2016, at 5:56 AM, Peter Wind wrote: > > Hi, > > Under fortran, MPI_Win_allocate_shared is called with a window size of zero > for some processes. > The output pointer is

Re: [OMPI users] OMPI users] Fortran vs C reductions

2016-02-08 Thread Jeff Squyres (jsquyres)
CC nor Clang support this, and it ends up throwing two error > messages when compiled (only one - the right one - when only preprocessed), > which might confuse the same folks that it is trying to help. > > Best, > > Jeff > > On Mon, Feb 8, 2016 at 5:14 AM, Jeff Squyres (jsqu

Re: [OMPI users] OMPI users] Fortran vs C reductions

2016-02-08 Thread Jeff Squyres (jsquyres)
The issue at hand is trying to help the user figure out that they have an open MPI built without fortran support. Perhaps we should improve the error reporting at run time to display something about the fact that you used a fortran data type but have an OMPI that was compiled without fortran

Re: [OMPI users] Fortran vs C reductions

2016-02-07 Thread Jeff Squyres (jsquyres)
On Feb 4, 2016, at 9:46 PM, Brian Taylor wrote: > > Thanks for the explanation, Jeff. I'm not surprised to hear that using a > Fortran type from C in this manner is potentially buggy and not portable. > However, assuming that the C and Fortran types are

Re: [OMPI users] Fortran vs C reductions

2016-02-04 Thread Jeff Squyres (jsquyres)
On Feb 4, 2016, at 12:02 PM, Brian Taylor wrote: > > I have a question about the standards compliance of OpenMPI. Is the > following program valid according to the MPI standard? > > #include > #include > > int main(int argc, char **argv) > { > int rank; >

Re: [OMPI users] New libmpi.so dependency on libibverbs.so?

2016-02-02 Thread Jeff Squyres (jsquyres)
On Feb 2, 2016, at 12:15 PM, Number Cruncher wrote: > > Thanks for the info. I'll probably go with insisting on libibverbs. > > Does seem a bit contrary to the very high modularity in OpenMPI that > essentially a Cisco-specific module introduces a libmpi

Re: [OMPI users] New libmpi.so dependency on libibverbs.so?

2016-02-02 Thread Jeff Squyres (jsquyres)
This functionality is there to overcome a bug in libibverbs (that prints a dire warning about Cisco usNIC devices not being supported). However, I can see how this additional linkage is undesirable. We can probably flip the default on this component to not build by default -- but leave it

Re: [OMPI users] MX replacement?

2016-02-02 Thread Jeff Squyres (jsquyres)
On Feb 2, 2016, at 9:00 AM, Dave Love wrote: > > Now that MX support has been dropped, is there an alternative for fast > Ethernet? There are several options for low latency ethernet, but they're all vendor-based solutions (e.g., my company's usNIC solution). Note

Re: [OMPI users] difference between OpenMPI - intel MPI mpi_waitall

2016-01-29 Thread Jeff Squyres (jsquyres)
On Jan 29, 2016, at 7:55 AM, Diego Avesani wrote: > > Dear all, Dear Jeff, Dear Gilles, > > I am sorry, porblably I am a stubborn. > > In all my code I have > > CALL MPI_WAITALL(2,REQUEST,send_status_list,MPIdata%iErr) > > how can it became "3"? I don't know.

Re: [OMPI users] difference between OpenMPI - intel MPI mpi_waitall

2016-01-29 Thread Jeff Squyres (jsquyres)
You must have an error elsewhere in your code; as Gilles pointed, the error message states that you are calling MPI_WAITALL with a first argument of 3: -- MPI_Waitall(271): MPI_Waitall(count=3, req_array=0x7445f0, status_array=0x744600) failed -- We can't really help you with problems

Re: [OMPI users] Strange behaviour OpenMPI in Fortran

2016-01-25 Thread Jeff Squyres (jsquyres)
On Jan 25, 2016, at 7:21 AM, Dave Love wrote: > > You might expect the f90 module to reveal the error anyway. > Unfortunately which routines it covers depends on the compiler and OMPI > versions in a way I don't understand -- can someone explain? For > instance, it won't

Re: [OMPI users] Strange behaviour OpenMPI in Fortran

2016-01-22 Thread Jeff Squyres (jsquyres)
+1 If you're starting new code, try using the F08 MPI bindings. Type safety === good. > On Jan 22, 2016, at 10:44 AM, Jeff Hammond wrote: > > You will find the MPI Fortran 2008 bindings to be significantly better w.r.t. > MPI types. See e.g. MPI 3.1 section 17.2.5

Re: [OMPI users] MPI hangs on poll_device() with rdma

2016-01-21 Thread Jeff Squyres (jsquyres)
On Jan 21, 2016, at 7:40 AM, Eva wrote: > > Thanks Jeff. > > >>1. Can you create a small example to reproduce the problem? > > >>2. The TCP and verbs-based transports use different thresholds and > >>protocols, and can sometimes bring to light errors in the application >

Re: [OMPI users] MPI hangs on poll_device() with rdma

2016-01-21 Thread Jeff Squyres (jsquyres)
Can you create a small example to reproduce the problem? The TCP and verbs-based transports use different thresholds and protocols, and can sometimes bring to light errors in the application (e.g., the application is making assumptions that just happen to be true for TCP, but not necessarily

Re: [OMPI users] Global settings

2016-01-11 Thread Jeff Squyres (jsquyres)
On Jan 11, 2016, at 8:32 AM, Bennet Fauber wrote: > > We have an issue with binding to cores with some applications and the > default causes issues. We would, therefore, like to set the > equivalent of > > mpirun --bind-to none > > globally. I tried search for combinations

Re: [OMPI users] Questions about non-blocking collective calls...

2015-12-17 Thread Jeff Squyres (jsquyres)
On Dec 17, 2015, at 1:39 PM, Eric Chamberland wrote: > > Just to be clear: we *always* call MPI_Wait. Now the question was about > *when* to do it. Ok. Remember that the various flavors of MPI_Test are acceptable, too. And it's ok to call

Re: [OMPI users] Questions about non-blocking collective calls...

2015-12-17 Thread Jeff Squyres (jsquyres)
On Dec 17, 2015, at 8:57 AM, Eric Chamberland wrote: > > But I would like to know if the MPI I am using is able to do message > progression or not: So how do an end-user like me can knows that? Does-it > rely on hardware? Is there a #define by OpenMPI that

Re: [MTT users] Python client

2015-12-13 Thread Jeff Squyres (jsquyres)
+1 on both Ralph and Josh's comments. > On Dec 13, 2015, at 11:56 AM, Josh Hursey wrote: > > I think this is fine. If we do start to organize ourselves for a formal > release then we might want to move to pull requests to keep the branch stable > for a bit, but for now

Re: [MTT users] Actual releases?

2015-12-10 Thread Jeff Squyres (jsquyres)
+1. We did do a release branch at one point (which is likely completely stale), but we've never invested any release engineering efforts into MTT to make a packaged/perfect tarball -- mainly because, at least in the beginning, MTT was just for Open MPI core developers, and we were comfortable

Re: [OMPI users] OpenMPI library conflicts

2015-12-04 Thread Jeff Squyres (jsquyres)
On Dec 4, 2015, at 4:31 AM, Yilmaz, D. wrote: > > If you are switching/rebulding the openmpi releases on your computer time to > time. Your latest openmpi build ( not the latest release) which you are > trying to install can not change the symbolic links of the openmpi

Re: [OMPI users] Bug in Fortran-module MPI of OpenMPI 1.10.0 with Intel-Ftn-compiler

2015-12-03 Thread Jeff Squyres (jsquyres)
On Nov 23, 2015, at 11:07 AM, michael.rach...@dlr.de wrote: > > In the meantime the administrators have installed (Thanks!) OpenMPI-1.10.1 > with Intel-16.0.0 on the cluster. > I have tested it with our code: It works. > The time spent for MPI-data transmission was the same as with >

Re: [OMPI users] [OMPI devel] Openmpi 1.10.1: BUG in orterun.c

2015-12-01 Thread Jeff Squyres (jsquyres)
On Dec 1, 2015, at 9:42 AM, Stefano Garzarella wrote: > > I noticed that the development repo (https://github.com/open-mpi/ompi) > doesn't have this bug, but the > release repo (https://github.com/open-mpi/ompi-release) has this bug and I > send a pull request to

Re: [OMPI users] help understand unhelpful ORTE error message

2015-11-30 Thread Jeff Squyres (jsquyres)
On Nov 24, 2015, at 9:31 AM, Dave Love wrote: > >> btw, we already use the force, thanks to the ob1 pml and the yoda spml > > I think that's assuming familiarity with something which leaves out some > people... FWIW, I agree: we use unhelpful names for components in

Re: [OMPI users] Openmpi 1.10.1 fails with SIGXFSZ on file limit <= 131072

2015-11-20 Thread Jeff Squyres (jsquyres)
I'm in an airport right now and can't easily check, but instead of using mmap memory (which treats shared memory as a file), you could tell open MPI to use SYSV shared memory. IIRC that isn't treated like a file. Look for a selection mechanism via an MCA param in the sm or Vader btls- run

Re: [OMPI users] Openmpi 1.10.1 fails with SIGXFSZ on file limit <= 131072

2015-11-20 Thread Jeff Squyres (jsquyres)
vader/sm btl and disable them with a warning if value is too low ? Cheers, Gilles On Friday, November 20, 2015, Jeff Squyres (jsquyres) <jsquy...@cisco.com<mailto:jsquy...@cisco.com>> wrote: For what it's worth, that's open MPI creating a chunk of shared memory for use with on-server

Re: [OMPI users] Openmpi 1.10.1 fails with SIGXFSZ on file limit <= 131072

2015-11-20 Thread Jeff Squyres (jsquyres)
For what it's worth, that's open MPI creating a chunk of shared memory for use with on-server communication. It shows up as a "file", but it's really shared memory. You can disable sm and/or Vader, but your on-server message passing performance will be significantly lower. Is there a reason

Re: [OMPI users] [OMPI devel] Slides from the Open MPI SC'15 State of the Union BOF

2015-11-19 Thread Jeff Squyres (jsquyres)
It appears that the PDF that was originally posted was corrupted. Doh! The file has been fixed -- you should be able to download and open it correctly now: http://www.open-mpi.org/papers/sc-2015/ Sorry about that, folks! > On Nov 19, 2015, at 9:03 AM, Jeff Squyres (jsquyres) <

[OMPI users] Slides from the Open MPI SC'15 State of the Union BOF

2015-11-19 Thread Jeff Squyres (jsquyres)
Thanks to the over 100 people who came to the Open MPI State of the Union BOF yesterday. George Bosilca from U. Tennessee, Nathan Hjelm from Los Alamos National Lab, and I presented where we are with Open MPI development, and where we're going. If you weren't able to join us, feel free to

[OMPI users] Open MPI State of the Union BOF: this Wednesday!

2015-11-16 Thread Jeff Squyres (jsquyres)
George Bosilca, Nathan Hjelm, and I will be presenting the Open MPI State of the Union BOF at SC'15 in Austin, TX, USA, this Wednesday at 12:15pm US Central time. This is your last chance to pre-submit questions for us to discuss in the BOF:

Re: [OMPI users] Problems running 1.8.8 and compiling 1.10.1 on Redhat EL7

2015-11-06 Thread Jeff Squyres (jsquyres)
Both of these seem to be issues with libnl, which is a dependent library that Open MPI uses. Can you send all the information listed here: http://www.open-mpi.org/community/help/ > On Nov 6, 2015, at 5:44 PM, Saurabh T wrote: > > Hi, > > On Redhat Enterprise Linux

Re: [OMPI users] Unable to compile for libnumactl and libnumactl-devel

2015-10-30 Thread Jeff Squyres (jsquyres)
Oh, that's an interesting idea: perhaps the "bind to numa" is failing -- but perhaps "bind to socket" would work. Can you try: /opt/openmpi-1.10.0-gcc/bin/mpiexec -bind-to numa -n 4 hostname and /opt/openmpi-1.10.0-gcc/bin/mpiexec -bind-to socket -n 4 hostname > On Oct 30, 2015, at 12:02

Re: [OMPI users] Unable to compile for libnumactl and libnumactl-devel

2015-10-29 Thread Jeff Squyres (jsquyres)
Your Open MPI build looks good -- it seems to have found all the right libnuma stuff during configured, etc. The error message you're seeing indicates that the embedded hwloc is telling Open MPI that it doesn't have binding support, which would be unusual since you have both libnuma and

Re: [OMPI users] Unable to compile for libnumactl and libnumactl-devel

2015-10-29 Thread Jeff Squyres (jsquyres)
On Oct 29, 2015, at 9:26 AM, Fabian Wein wrote: > > Is there any hint in the configure output? Yes. Please send all the info listed here: http://www.open-mpi.org/community/help/ -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to:

Re: [OMPI users] Unable to compile for libnumactl and libnumactl-devel

2015-10-29 Thread Jeff Squyres (jsquyres)
+1 If you're compiling Open MPI from source, you need the -devel package so that the libnuma header files are installed (and therefore Open MPI [i.e., the hwloc embedded in Open MPI] can include those header files and then compile support for libnuma). > On Oct 29, 2015, at 7:01 AM, Ralph

Re: [OMPI users] Seg fault in MPI_FINALIZE

2015-10-22 Thread Jeff Squyres (jsquyres)
1.10.1 isn't released yet -- it's "very close" (just working on a few final issues), but not quite out the door yet. Stay tuned... > On Oct 22, 2015, at 2:26 PM, McGrattan, Kevin B. Dr. > wrote: > > OK, I guess I have to upgrade to 1.10.x. I think we have 1.10.0

Re: [OMPI users] MPI_Win_lock with MPI_MODE_NOCHECK

2015-10-21 Thread Jeff Squyres (jsquyres)
I filed this as https://github.com/open-mpi/ompi/issues/1049. Thanks for the bug report! > On Oct 21, 2015, at 8:23 AM, Sebastian Rettenberger > wrote: > > The title was actually not correct. I first thought that happens when using > multiple tasks/threads, but I could

[OMPI users] Open MPI State of the Union BOF @SC15

2015-10-20 Thread Jeff Squyres (jsquyres)
We're about T-1 month away from the Open MPI State of the Union BOF at SC'15. It's at 12:15pm on Wednesday, 18 Nov, 2015, in room 18C/D: http://sc15.supercomputing.org/schedule/event_detail?evid=bof107 As usual, since we only have an hour in the BOF, we like to capture some of your questions

Re: [OMPI users] Seg fault in MPI_FINALIZE

2015-10-16 Thread Jeff Squyres (jsquyres)
Kevin wait for 1.10.1 with the intel 16 compiler? > A bugfix for intel 16 has been committed with > fb49a2d71ed9115be892e8a22643d9a1c069a8f9. > (At least I am anxiously awaiting the 1.10.1 because I cannot get my builds > to complete successfully) > > > > 2015-10-16 1

Re: [OMPI users] Seg fault in MPI_FINALIZE

2015-10-16 Thread Jeff Squyres (jsquyres)
> On Oct 16, 2015, at 3:25 PM, McGrattan, Kevin B. Dr. > wrote: > > I cannot nail this down any better because this happens like every other > night, with about 1 out of a hundred jobs. Can anyone think of a reason why > the job would seg fault in MPI_FINALIZE, but

Re: [OMPI users] mpirun/mpiexec requires su

2015-10-16 Thread Jeff Squyres (jsquyres)
On Oct 15, 2015, at 2:58 PM, Brant Abbott wrote: > > If I use mpirun.openmpi everything works as normal. I suppose mpirun is > executing the MPICH version. I'm not entirely sure why when logged in a root > it behaves differently, but good enough for me to just use the

Re: [OMPI users] mpirun/mpiexec requires su

2015-10-15 Thread Jeff Squyres (jsquyres)
I think you're accidentally using MPICH, not Open MPI. Specifically, those are error messages from MPICH. Check your paths to ensure that mpif90 / mpirun / etc. are all the ones that you think you're executing. And then double check your LD_LIBRARY_PATH to ensure that libmpi is the library

[OMPI users] 1.10.1rc2 available

2015-10-08 Thread Jeff Squyres (jsquyres)
I don't want to pester everyone with all of our release candidates for 1.10.1, so this will likely be the last general announcement on the users and announcement lists before we release 1.10.1 (final). That being said, 1.10.1 rc2 is now available:

Re: [OMPI users] OpenMPI 1.8.8: Segfault when using non-blocking reduce operations with a user-defined operator

2015-10-07 Thread Jeff Squyres (jsquyres)
More specifically: the 1.10.x series was the follow-on to 1.8.8. v1.10.0 is available now; v1.10.1 will be available soon (we already have an rc for it; another rc is coming soon). > On Oct 7, 2015, at 7:30 AM, Gilles Gouaillardet > wrote: > > Georg, > >

Re: [OMPI users] Process binding with SLURM and 'heterogeneous' nodes

2015-10-07 Thread Jeff Squyres (jsquyres)
Is this something that needs to go into v1.10.1? If so, a PR needs to be filed ASAP. We were supposed to make the next 1.10.1 RC yesterday, but slipped to today due to some last second patches. > On Oct 7, 2015, at 4:32 AM, Gilles Gouaillardet wrote: > > Marcin, > > here

Re: [OMPI users] [Open MPI Announce] Open MPI v1.10.1rc1 release

2015-10-05 Thread Jeff Squyres (jsquyres)
On Oct 3, 2015, at 9:14 AM, Dimitar Pashov wrote: > > Hi, I have a pet bug causing silent data corruption here: >https://github.com/open-mpi/ompi/issues/965 > which seems to have a fix committed some time later. I've tested v1.10.1rc1 > now and it still has the

Re: [OMPI users] Process binding with SLURM and 'heterogeneous' nodes

2015-10-05 Thread Jeff Squyres (jsquyres)
I filed an issue to track this problem here: https://github.com/open-mpi/ompi/issues/978 > On Oct 5, 2015, at 1:01 PM, Ralph Castain wrote: > > Thanks Marcin. I think we have three things we need to address: > > 1. the warning needs to be emitted regardless of whether

[OMPI users] Open MPI v1.10.1rc1 release

2015-10-03 Thread Jeff Squyres (jsquyres)
Open MPI users -- We have just posted first release candidate for the upcoming v1.10.1 bug fix release. We'd appreciate any testing and/or feedback that you may on this release candidate: http://www.open-mpi.org/software/ompi/v1.10/ Thank you! Changes since v1.10.0: - Fix segv when

Re: [OMPI users] Problem using mpifort(Intel)

2015-10-01 Thread Jeff Squyres (jsquyres)
Excellent; thanks for the feedback. > On Oct 1, 2015, at 2:02 AM, Julien Bodart <julien.bod...@gmail.com> wrote: > > I have tried the last nightly build and it seems that nobody is complaining > now. > Thanks a lot, > > Julien > >> Date: Fri, 25 Sep 2015 1

Re: [OMPI users] libfabric/usnic does not compile in 2.x

2015-09-30 Thread Jeff Squyres (jsquyres)
On Sep 30, 2015, at 3:13 PM, marcin.krotkiewski wrote: > > Thank you for this clear explanation. I do not have True Scale on 'my' > machine, so unless Mellanox gets involved - no juice for me. > > Makes me wonder. libfabric is marketed as a next-generation

Re: [OMPI users] libfabric/usnic does not compile in 2.x

2015-09-30 Thread Jeff Squyres (jsquyres)
On Sep 30, 2015, at 11:19 AM, marcin.krotkiewski wrote: > > Thank you, and Jeff, for clarification. > > Before I bother you all more without the need, I should probably say I was > hoping to use libfabric/OpenMPI on an InfiniBand cluster. Somehow now I feel > I

Re: [OMPI users] send_request error with allocate

2015-09-30 Thread Jeff Squyres (jsquyres)
status_list,MPIdata%iErr) Per my prior email: what is the value of nMsg? If it's 2, you should probably be ok. > On 30 September 2015 at 12:42, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > Put differently: > > - You have an array of N requests > - If you're only fil

Re: [OMPI users] understanding mpi_gather-mpi_gatherv

2015-09-30 Thread Jeff Squyres (jsquyres)
Gather requires that all processes contribute the same size message. Gatherv allows the root to specify a different size that will be supplied by each peer process. Note, too, that X1(iStart:iEnd) may well invoke a copy to copy just that portion of the array; that might hurt your performance

Re: [OMPI users] send_request error with allocate

2015-09-30 Thread Jeff Squyres (jsquyres)
; > This is what my code does. Problably, the use of send_request(:) as a vectror > and the use of WAITALL is not correct, I am right? > > what do you suggest? > > Thanks a lot, > Diego > > > Diego > > > On 30 September 2015 at 12:42, Jeff Squyres (jsquyres) &l

Re: [OMPI users] libfabric/usnic does not compile in 2.x

2015-09-30 Thread Jeff Squyres (jsquyres)
On Sep 30, 2015, at 7:35 AM, Marcin Krotkiewski wrote: > > I am trying to compile the 2.x branch with libfabric support, but get this > error during configure: > > configure:100708: checking rdma/fi_ext_usnic.h presence > configure:100708: gcc -E

Re: [OMPI users] send_request error with allocate

2015-09-30 Thread Jeff Squyres (jsquyres)
_request(MPIdata%rank), > MPIdata%iErr) > > write(*,*) MPIdata%rank-1 > ENDIF > ! > ! > CALL MPI_WAITALL(nMsg,send_request,send_status_list,MPIdata%iErr) > CALL MPI_WAITALL(nMsg,recv_request,recv_status_list,MPIdata%iErr) > > Diego

Re: [OMPI users] send_request error with allocate

2015-09-29 Thread Jeff Squyres (jsquyres)
Idata%rank-1 > ENDIF > ! > ! > CALL MPI_WAITALL(nMsg,send_request,send_status_list,MPIdata%iErr) > CALL MPI_WAITALL(nMsg,recv_request,recv_status_list,MPIdata%iErr) > > Diego > > > On 29 September 2015 at 00:15, Jeff Squyres (jsquyres) &l

Re: [OMPI users] send_request error with allocate

2015-09-28 Thread Jeff Squyres (jsquyres)
Can you send a small reproducer program? > On Sep 28, 2015, at 4:45 PM, Diego Avesani wrote: > > Dear all, > > I have to use a send_request in a MPI_WAITALL. > Here the strange things: > > If I use at the begging of the SUBROUTINE: > > INTEGER :: send_request(3),

Re: [OMPI users] Problem using mpifort(Intel)

2015-09-25 Thread Jeff Squyres (jsquyres)
This problem was literally reported just the other day; it was partially fixed earlier today, the rest of the fix will be committed shortly. The Intel 2016 compiler suite changed something in how they handle the !GCC pragma (i.e., they didn't handle it at all before, and now they only partially

Re: [OMPI users] Problem using Open MPI 1.10.0 built with Intel compilers 16.0.0

2015-09-25 Thread Jeff Squyres (jsquyres)
tarballs, which you can build exactly like you build the real official release tarballs). Thanks for reporting the issue. > On Sep 24, 2015, at 4:55 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > > Intel apparently changed something in their 2016 compiler (compa

Re: [OMPI users] Problem using Open MPI 1.10.0 built with Intel compilers 16.0.0

2015-09-24 Thread Jeff Squyres (jsquyres)
i_integer,0,mpi_comm_world,ierr) > compiles but I don't get the good result. > > Thanks for your help, > > Fabrice > > > Le 24/09/2015 16:32, Jeff Squyres (jsquyres) a écrit : >> Yes -- typo -- it's not a problem with mpi_f08, it's a problem with the mpi

Re: [OMPI users] Problem using Open MPI 1.10.0 built with Intel compilers 16.0.0

2015-09-24 Thread Jeff Squyres (jsquyres)
gt; Jeff, > > I am not sure whether you made a typo or not ... > > the issue only occuex with f90 bindings (aka use mpi) > f08 bindings (aka use mpi_f08) works fine > > Cheers, > > Gilles > > On Thursday, September 24, 2015, Jeff Squyres (jsquyres) <jsquy...

Re: [OMPI users] Problem using Open MPI 1.10.0 built with Intel compilers 16.0.0

2015-09-24 Thread Jeff Squyres (jsquyres)
BTW, I created this Github issue to track the problem: https://github.com/open-mpi/ompi/issues/937 > On Sep 24, 2015, at 4:27 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > > I looked into the MPI_BCAST problem -- I think we (Open MPI) have a problem

Re: [OMPI users] Problem using Open MPI 1.10.0 built with Intel compilers 16.0.0

2015-09-24 Thread Jeff Squyres (jsquyres)
I looked into the MPI_BCAST problem -- I think we (Open MPI) have a problem with the mpi_f08 bindings and the Intel 2016 compilers. It looks like configure is choosing to generate a different pragma for Intel 2016 vs. Intel 2015 compilers, and that's causing a problem. Let me look into this a

Re: [OMPI users] Have Trouble Setting Up a Ring Network Using Open MPI

2015-09-18 Thread Jeff Squyres (jsquyres)
On Sep 18, 2015, at 7:26 PM, Gilles Gouaillardet wrote: > > I built a similar environment with master and private ip and that does not > work. > my understanding is each tasks has two tcp btl (one per interface), > and there is currently no mechanism to tell that

Re: [OMPI users] Have Trouble Setting Up a Ring Network Using Open MPI

2015-09-18 Thread Jeff Squyres (jsquyres)
Whoa; wait -- are you really using Open MPI v1.0? That's over 10 years old... Can you update to Open MPI v1.10? > On Sep 18, 2015, at 1:37 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > > Open MPI uses different heuristics depending on whether IP addre

Re: [OMPI users] Problem with using MPI in a Python extension

2015-09-17 Thread Jeff Squyres (jsquyres)
On Sep 17, 2015, at 11:44 AM, Joel Hermanns wrote: > > Thanks for the quick answer! Be sure to see Nick's answer, too -- mpi4py is a nice package. > I have a few questions now: > > 1. Are there any downsides of using —disable-dlopen? You won't be able to add or

Re: [OMPI users] Problem with using MPI in a Python extension

2015-09-17 Thread Jeff Squyres (jsquyres)
Short version: The easiest way to do this is to configure your Open MPI installation with --disable-dlopen. More detail: Open MPI uses a bunch of plugins for its functionality. When you dlopen libmpi in a private namespace (like Python does), and then libmpi tries to dlopen its plugins, the

Re: [OMPI users] Contact?

2015-09-16 Thread Jeff Squyres (jsquyres)
Sorry for the trouble. FWIW, there actually is a real, live sysadmin at Indiana University who actually receives the webmaster emails; he usually forwards such emails to me. Ping me off-list and we can dig into why your NASA email address didn't work (i.e., I can ask the IU sysadmins to look

Re: [OMPI users] runtime MCA parameters

2015-09-16 Thread Jeff Squyres (jsquyres)
On Sep 16, 2015, at 8:22 AM, marcin.krotkiewski wrote: > > Thanks a lot, that looks right! Looks like some reading to do.. > > Do you know if in the OpenMPI implementation the MPI_T-interfaced MCA > settings are thread-local, or rank-local? By "rank local", I

[OMPI users] MPI-3.1 books now available

2015-09-14 Thread Jeff Squyres (jsquyres)
The MPI-3.1 standard is now available (at cost) in hardcover: http://blogs.cisco.com/performance/mpi-3-1-books-now-available-in-hardcover Enjoy. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] As one MPI process executes MPI_Barrier(), other processes hang

2015-09-11 Thread Jeff Squyres (jsquyres)
Looks like your question was answered on StackOverflow. > On Sep 5, 2015, at 11:14 AM, Dhanashree N P wrote: > > Hello All, > > I have an MPI program for having multiple processes read from a file that > contains list of file names and based on the file names read - it

Re: [OMPI users] difference between OPENMPI e Intel MPI (DATATYPE)

2015-09-03 Thread Jeff Squyres (jsquyres)
On Sep 3, 2015, at 10:43 AM, Diego Avesani wrote: > > Dear Jeff, Dear all, > I normaly use "USE MPI" > > This is the answar fro intel HPC forum: > > If you are switching between intel and openmpi you must remember not to mix > environment. You might use modules to

Re: [OMPI users] difference between OPENMPI e Intel MPI (DATATYPE)

2015-09-02 Thread Jeff Squyres (jsquyres)
Can you reproduce the error in a small example? Also, try using "use mpi" instead of "include 'mpif.h'", and see if that turns up any errors. > On Sep 2, 2015, at 12:13 PM, Diego Avesani wrote: > > Dear Gilles, Dear all, > I have found the error. Some CPU has no

Re: [OMPI users] Multiple windows for the same communicator at thesame time

2015-08-31 Thread Jeff Squyres (jsquyres)
On Aug 27, 2015, at 11:55 AM, abhisek...@gmail.com wrote: > > I want to know if it is allowed in MPI one-sided communication to open > multiple windows simultaneously using the same communicator. Yes. > The standard does not seem to forbid it as far as I can see, but when I look > at the

Re: [OMPI users] Why does make run autoconfig?

2015-08-25 Thread Jeff Squyres (jsquyres)
This typically happens in one of three cases: 1. You ran autogen.sh (and it didn't complete properly) --> If this is the problem, just rm the source tree, un-tar the tarball again, and just run configure; make install -- don't run autogen.sh. 2. You are building Open MPI on a network

Re: [OMPI users] Bug in ompi/errhandler/errcode.h (1.8.6)?

2015-08-20 Thread Jeff Squyres (jsquyres)
It's fixed in v1.10.0, which we hope to release Real Soon Now. > On Aug 14, 2015, at 9:30 AM, Åke Sandgren <ake.sandg...@hpc2n.umu.se> wrote: > > This problem still exists in 1.8.8 > > On 06/29/2015 05:37 PM, Jeff Squyres (jsquyres) wrote: >> Good

Re: [OMPI users] segfault on java binding from MPI.init()

2015-08-14 Thread Jeff Squyres (jsquyres)
at is associated with the persistent request. > > Howard > > > 2015-08-14 11:21 GMT-06:00 Jeff Squyres (jsquyres) <jsquy...@cisco.com>: > Hmm. Oscar's not around to ask any more, but I'd be greatly surprised if he > had InfiniPath on his systems where he ran into

<    2   3   4   5   6   7   8   9   10   11   >