Re: [OMPI users] Intermittent, somewhat architecture-dependent hang with Open MPI 1.8.1

2014-08-16 Thread Matt Thompson
es on a single machine, or spread across > multiple machines? > > Note that Open MPI 1.8.x binds each MPI process to a core by default, so > if you're oversubscribing the machine, it could be fairly disastrous...? > > > On Aug 14, 2014, at 1:29 PM, Matt Thompson <fort...@gmail.com

[OMPI users] Issues with OpenMPI 1.8.2, GCC 4.9.1, and SLURM Interactive Jobs

2014-08-28 Thread Matt Thompson
Open MPI List, I recently encountered an odd bug with Open MPI 1.8.1 and GCC 4.9.1 on our cluster (reported on this list), and decided to try it with 1.8.2. However, we seem to be having an issue with Open MPI 1.8.2 and SLURM. Even weirder, Open MPI 1.8.2rc4 doesn't show the bug. And the bug is:

Re: [OMPI users] Issues with OpenMPI 1.8.2, GCC 4.9.1, and SLURM Interactive Jobs

2014-08-29 Thread Matt Thompson
command line and > let's see if any errors get reported. > > > On Aug 28, 2014, at 12:20 PM, Matt Thompson <fort...@gmail.com> wrote: > > Open MPI List, > > I recently encountered an odd bug with Open MPI 1.8.1 and GCC 4.9.1 on our > cluster (reported on this l

Re: [OMPI users] Issues with OpenMPI 1.8.2, GCC 4.9.1, and SLURM Interactive Jobs

2014-08-29 Thread Matt Thompson
uot; to the cmd line? > > > On Aug 29, 2014, at 4:22 AM, Matt Thompson <fort...@gmail.com> wrote: > > Ralph, > > For 1.8.2rc4 I get: > > (1003) $ > /discover/nobackup/mathomp4/MPI/gcc_4.9.1-openmpi_1.8.2rc4/bin/mpirun > --leave-session-attached --debug-daemon

Re: [OMPI users] Issues with OpenMPI 1.8.2, GCC 4.9.1, and SLURM Interactive Jobs

2014-08-31 Thread Matt Thompson
component tcp On Fri, Aug 29, 2014 at 3:18 PM, Ralph Castain <r...@open-mpi.org> wrote: > Rats - I also need "-mca plm_base_verbose 5" on there so I can see the cmd > line being executed. Can you add it? > > > On Aug 29, 2014, at 11:16 AM, Matt Thompson <fort...@gma

Re: [OMPI users] Issues with OpenMPI 1.8.2, GCC 4.9.1, and SLURM Interactive Jobs

2014-09-01 Thread Matt Thompson
r 1.8.2 code, rebuild, and try again? > > Much appreciate the help. Everyone's system is slightly different, and I > think you've uncovered one of those differences. > Ralph > > > > On Aug 31, 2014, at 6:25 AM, Matt Thompson <fort...@gmail.com> wrote: > > Ralph,

Re: [OMPI users] Issues with OpenMPI 1.8.2, GCC 4.9.1, and SLURM Interactive Jobs

2014-09-03 Thread Matt Thompson
you won't see the "hello world" output. > > The purpose of this test is that I want to see if OMPI is just totally > erring out and not even running your job (which is quite unlikely; OMPI > should be much more noisy when this happens), or whether we're simply not > see

Re: [OMPI users] Issues with OpenMPI 1.8.2, GCC 4.9.1, and SLURM Interactive Jobs

2014-09-03 Thread Matt Thompson
On Tue, Sep 2, 2014 at 8:38 PM, Jeff Squyres (jsquyres) wrote: > Matt: Random thought -- is your "srun" a shell script, perchance? (it > shouldn't be, but perhaps there's some kind of local override...?) > > Ralph's point on the call today is that it doesn't matter *how*

Re: [OMPI users] Issues with OpenMPI 1.8.2, GCC 4.9.1, and SLURM Interactive Jobs

2014-09-04 Thread Matt Thompson
you can see how it is affecting Open MPI's argument passage. Matt On Thu, Sep 4, 2014 at 8:04 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: > On Sep 3, 2014, at 9:27 AM, Matt Thompson <fort...@gmail.com> wrote: > > > Just saw this, sorry. Our srun is indeed

Re: [OMPI users] Issues with OpenMPI 1.8.2, GCC 4.9.1, and SLURM Interactive Jobs

2014-09-04 Thread Matt Thompson
> > Ah, if it's perl, it might be easy. It might just be the difference > between system("...string...") and system(@argv). > > Sent from my phone. No type good. > > On Sep 4, 2014, at 8:35 AM, "Matt Thompson" <fort...@gmail.com> wrote: > > Jeff, >

Re: [OMPI users] Issues with OpenMPI 1.8.2, GCC 4.9.1, and SLURM Interactive Jobs

2014-09-04 Thread Matt Thompson
rolog=/) { > print("The --task-prolog option is unsupported at . Please " . > "contact the for assistance.\n"); > exit(1); > } else { > push(@command, $_); > } > } > system(@command); > > > > On Sep 4,

Re: [OMPI users] OpenMPI-1.10.0 bind-to core error

2015-09-15 Thread Matt Thompson
g > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27575.php > -- Matt Thompson Man Among Men Fulcrum of History

[OMPI users] Open MPI 1.10.0: Works on one Sandybridge Node, not on another: tcp_peer_send_blocking

2015-09-24 Thread Matt Thompson
where they say it's an error that can be seen: http://www.hpc.mcgill.ca/downloads/checkpointing_workshop/20150326%20-%20McGill%20-%20Checkpointing%20Techniques.pdf Any ideas for what this error can mean? -- Matt Thompson Man Among Men Fulcrum of History

Re: [OMPI users] Open MPI 1.10.0: Works on one Sandybridge Node, not on another: tcp_peer_send_blocking

2015-09-24 Thread Matt Thompson
o you could use it > :-) > > In which case, you need to remove the obstacle. You might check for > firewall, or check to see if multiple NICs are on the non-maia nodes (this > can sometimes confuse things, especially if someone put the NICs on the > same IP subnet) > > HT

Re: [OMPI users] Open MPI 1.10.0: Works on one Sandybridge Node, not on another: tcp_peer_send_blocking

2015-09-24 Thread Matt Thompson
Process 1 of 2 is on r509i2n17 Process 0 of 2 is on r509i2n17 So that is nice. Now the spin up if I have 8 or so nodes is rather...slow. But at this point I'll take working over efficient. Quick startup can come later. Matt > > > On Sep 24, 2015, at 8:56 AM, Matt Thompson <fort...@

[OMPI users] Help building/installing a working Open MPI 1.7.4 on OS X 10.9.2 with Free PGI Fortran

2014-03-18 Thread Matt Thompson
nore_tkr.0.dylib: No such file or directory > make[3]: *** [install-libLTLIBRARIES] Error 71 > make[2]: *** [install-am] Error 2 > make[1]: *** [install-recursive] Error 1 > make: *** [install-recursive] Error 1 Any ideas on how to overcome this? Thanks, Matt Thompson -- "And, isn't s

Re: [OMPI users] Help building/installing a working Open MPI 1.7.4 on OS X 10.9.2 with Free PGI Fortran

2014-03-20 Thread Matt Thompson
> That's a strange error. Can you confirm whether > ompi_buil_dir/ompi/mpi/fortran/use-mpi-ignore-tkr/.libs/libmpi_usempi_ignore_tkr.0.dylib > exists or not? > > Can you send all the info listed here: > > http://www.open-mpi.org/community/help/ > > > On Mar

Re: [OMPI users] Help building/installing a working Open MPI 1.7.4 on OS X 10.9.2 with Free PGI Fortran

2014-03-20 Thread Matt Thompson
t; libmpi_usempi_ignore_tkr.la'/Users/fortran/MPI/openmpi_1.7.4-pgi_14.3-gcc/lib' > libtool: install: /usr/bin/install -c > .libs/libmpi_usempi_ignore_tkr.0.dylib > /Users/fortran/MPI/openmpi_1.7.4-pgi_14.3-gcc/lib/libmpi_usempi_ignore_tkr.0.dylib > install: .libs/libmpi_usempi_ignore_tkr.0.d

Re: [OMPI users] Help building/installing a working Open MPI 1.7.4 on OS X 10.9.2 with Free PGI Fortran

2014-03-24 Thread Matt Thompson
make it > more like OMPI's build system behavior. > > If you can replicate the error, then also try the second attached tarball: > it's the same project, but bootstrapped with the latest versions of GNU > Automake (the others are already the most recent): > > Automake 1.14.1 >

Re: [OMPI users] Help building/installing a working Open MPI 1.7.4 on OS X 10.9.2 with Free PGI Fortran

2014-03-24 Thread Matt Thompson
you know? On Mon, Mar 24, 2014 at 6:48 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com > wrote: > On Mar 24, 2014, at 6:34 PM, Matt Thompson <fort...@gmail.com> wrote: > > > Sorry for the late reply. The answer is: No, 1.14.1 has not fixed the > problem (and indeed, that's w

[OMPI users] Intermittent, somewhat architecture-dependent hang with Open MPI 1.8.1

2014-08-14 Thread Matt Thompson
x NX NY where NX*NY has to equal NPROCS and it's best to keep them even numbers. (There might be a few more restrictions and the code will die if you violate them.) Thanks, Matt Thompson -- Matt Thompson SSAI, Sr Software Test Engr NASA GSFC, Global Modeling and Assimilation Office Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771 Phone: 301-614-6712 Fax: 301-614-6246

[OMPI users] Help with Binding in 1.8.8: Use only second socket

2015-12-21 Thread Matt Thompson
it shouldn't need to use Socket 0. -- Matt Thompson Man Among Men Fulcrum of History

Re: [OMPI users] Help with Binding in 1.8.8: Use only second socket

2015-12-21 Thread Matt Thompson
on, Dec 21, 2015 at 10:51 AM, Ralph Castain <r...@open-mpi.org> wrote: > Try adding —cpu-set a,b,c,… where the a,b,c… are the core id’s of your > second socket. I’m working on a cleaner option as this has come up before. > > > On Dec 21, 2015, at 5:29 AM, Matt Thompson <fort.

Re: [OMPI users] statically linked OpenMPI 1.10.1 with PGI compilers

2015-12-31 Thread Matt Thompson
h --enable-shared, it would not compile ESMF correctly. I'm going to try and revisit it next week because I want Intel OpenMPI as shared so I can easily use Allinea MAP. I'll try and make a good report for you/ESMF. -- Matt Thompson Man Among Men Fulcrum of History

[OMPI users] Open MPI MPI-OpenMP Hybrid Binding Question

2016-01-06 Thread Matt Thompson
to help me learn? The man mpirun page is a bit formidable in the pinning part, so maybe I've missed an obvious answer. Matt -- Matt Thompson Man Among Men Fulcrum of History

Re: [OMPI users] Open MPI MPI-OpenMP Hybrid Binding Question

2016-01-06 Thread Matt Thompson
probably override anything that OpenMPI > sets. Can you try without? > > -erik > > On Wed, Jan 6, 2016 at 2:46 PM, Matt Thompson <fort...@gmail.com> wrote: > > Hello Open MPI Gurus, > > > > As I explore MPI-OpenMP hybrid codes, I'm trying to figure out how to d

Re: [OMPI users] Open MPI MPI-OpenMP Hybrid Binding Question

2016-01-06 Thread Matt Thompson
flags are duplicated (and strictly not needed), but I > provide them for easy testing changes. > Surely this is application dependent, but for my case it was performing > really well. > > > 2016-01-06 20:48 GMT+01:00 Erik Schnetter <schnet...@gmail.com>: > >> S

Re: [OMPI users] Open MPI MPI-OpenMP Hybrid Binding Question

2016-01-06 Thread Matt Thompson
wiki to map all the combinations of compiler+mpistack. Or pray the MPI Forum and OpenMP combine and I can just look in a Standard. :D Thanks, Matt -- Matt Thompson Man Among Men Fulcrum of History

[OMPI users] MPI, Fortran, and GET_ENVIRONMENT_VARIABLE

2016-01-15 Thread Matt Thompson
Is there an option to Open MPI that might do it? Or is this just something MPI doesn't do? Or is my Google-fu just too weak to figure out the right search-phrase to find the answer to this probable FAQ? Matt [1] Note, this might be unnecessary, but I got to the point where I wanted to see if I *could* do it, rather than *should*. -- Matt Thompson Man Among Men Fulcrum of History

Re: [OMPI users] MPI, Fortran, and GET_ENVIRONMENT_VARIABLE

2016-01-15 Thread Matt Thompson
7:02 AM, Jim Edwards <jedwa...@ucar.edu> wrote: > > > > On Fri, Jan 15, 2016 at 7:53 AM, Matt Thompson <fort...@gmail.com> wrote: > >> All, >> >> I'm not too sure if this is an MPI issue, a Fortran issue, or something >> else but I thought I'd ask the

Re: [OMPI users] MPI, Fortran, and GET_ENVIRONMENT_VARIABLE

2016-01-15 Thread Matt Thompson
switch as me?”. Hoping to have it later this year, perhaps in the summer. > > > On Jan 15, 2016, at 7:56 AM, Matt Thompson <fort...@gmail.com> wrote: > > Ralph, > > That doesn't help: > > (1004) $ mpirun -map-by node -np 8 ./hostenv.x | sort -g -k2 > Process0

[OMPI users] Issues Building Open MPI static with Intel Fortran 16

2016-01-22 Thread Matt Thompson
is a new OS (RHEL 7 instead of 6) so I can see issues possible. Anyone seen this before? As I said, the "usual" build way is just fine. Perhaps I need an extra RPM that isn't installed? I do have libnl-devel installed. -- Matt Thompson Man Among Men Fulcrum of History

Re: [OMPI users] Issues Building Open MPI static with Intel Fortran 16

2016-01-22 Thread Matt Thompson
; HI Matt, > > If you don't need oshmem, you could try again with --disable-oshmem added > to the config line > > Howard > > > 2016-01-22 12:15 GMT-07:00 Matt Thompson <fort...@gmail.com>: > >> All, >> >> I'm trying to duplicate an issue I had with

[OMPI users] Error with Open MPI 2.0.0: error obtaining device attributes for mlx5_0 errno says Cannot allocate memory

2016-07-13 Thread Matt Thompson
option "-v" Type 'ompi_info --help' for usage. I am asking our machine gurus about the Infiniband network per: https://www.open-mpi.org/faq/?category=openfabrics#ofa-troubleshoot -- Matt Thompson Man Among Men Fulcrum of History

Re: [OMPI users] Error with Open MPI 2.0.0: error obtaining device attributes for mlx5_0 errno says Cannot allocate memory

2016-07-13 Thread Matt Thompson
anager are you running? (e.g., OpenSM, a vendor-specific subnet manager, etc.) Mellanox UFM (OpenSM under the covers) -- Matt Thompson Man Among Men Fulcrum of History

[OMPI users] mpi_f08 Question: set comm on declaration error, and other questions

2016-08-19 Thread Matt Thompson
, Matt -- Matt Thompson ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] mpi_f08 Question: set comm on declaration error, and other questions

2016-08-19 Thread Matt Thompson
On Fri, Aug 19, 2016 at 2:55 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com > wrote: > On Aug 19, 2016, at 2:30 PM, Matt Thompson <fort...@gmail.com> wrote: > > > > I'm slowly trying to learn and transition to 'use mpi_f08'. So, I'm > writing various things an

Re: [OMPI users] mpi_f08 Question: set comm on declaration error, and other questions

2016-08-20 Thread Matt Thompson
On Fri, Aug 19, 2016 at 8:54 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com > wrote: > On Aug 19, 2016, at 6:32 PM, Matt Thompson <fort...@gmail.com> wrote: > > > > that the comm == MPI_COMM_WORLD evaluates to .TRUE.? I discovered that > once when I was printing

[OMPI users] Issues building Open MPI 2.0.1 with PGI 16.10 on macOS

2016-11-28 Thread Matt Thompson
13=0) since I'm not sure. But, no matter what, does anyone have thoughts on how to solve this? Thanks, Matt -- Matt Thompson ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Issues building Open MPI 2.0.1 with PGI 16.10 on macOS

2016-11-28 Thread Matt Thompson
ev_uint64_t uint64_t > > > > #define ev_int64_t int64_t > > > > #elif defined(WIN32) > > > > #define ev_uint64_t unsigned __int64 > > > > #define ev_int64_t signed __int64 > > > > #elif _EVENT_SIZEOF_LONG_LONG == 8 > > > > #def

Re: [OMPI users] Issues building Open MPI 2.0.1 with PGI 16.10 on macOS

2016-11-30 Thread Matt Thompson
int16_t, ev_uint8_t >>> > >>> > * unsigned integer types of exactly 64, 32, 16, and 8 bits >>> > >>> > * respectively. >>> > >>> > *ev_int64_t, ev_int32_t, ev_int16_t, ev_int8_t >>> > >>> > *

Re: [OMPI users] Help with Open MPI 2.1.0 and PGI 16.10: Configure and C++

2017-03-24 Thread Matt Thompson
gt; to C++, > and you should instead focus on the Fortran issue. > > Cheers, > > Gilles > > On Thursday, March 23, 2017, Matt Thompson <fort...@gmail.com> wrote: > >> All, I'm hoping one of you knows what I might be doing wrong here. I'm >> try

[OMPI users] Issues with PGI 16.10, OpenMPI 2.1.0 on macOS: Fortran issues with hello world (running and dylib)

2017-03-24 Thread Matt Thompson
vironment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > opal_shmem_base_select failed > --> Returned value -1 instead of OPAL_SUCCESS > ------

[OMPI users] Help with Open MPI 2.1.0 and PGI 16.10: Configure and C++

2017-03-22 Thread Matt Thompson
as well. I also tried passing in --enable-mpi-cxx, but that did nothing. Is this just a red herring? My real concern is with pgfortran/mpifort, but I thought I'd start with this. If this is okay, I'll move on and detail the fortran issues I'm having. Matt -- Matt Thompson Man Among Men Fulcrum

Re: [OMPI users] Compiler error with PGI: pgcc-Error-Unknown switch: -pthread

2017-04-03 Thread Matt Thompson
>>>> --prefix=/usr/pppl/pgi/17.3-pkgs/openmpi-1.10.3 \ > >>>>>>>>>> --disable-silent-rules \ > >>>>>>>>>> --enable-shared \ > >>>>>>>>>> --enable-static \ > >>>>>>>>&

[OMPI users] mpi_f08 interfaces in man3 pages?

2017-08-10 Thread Matt Thompson
-sided MPI combined with OpenMP/threads into our code. So far, Open MPI is the only stack we've tried where the code doesn't weirdly die in odd places, so I might be coming back here with more questions when we try to improve the performance/encounter problems. -- Matt Thompson Man Among Men Fulcrum

[OMPI users] Tuning vader for MPI_Wait Halt?

2017-06-05 Thread Matt Thompson
ed the memory maximum message bandwidth for large messages on some BTL network transports, such as openib, sm, and vader. - The vader BTL is now more efficient in terms of memory usage when using XPMEM. Thanks for any help, Matt -- Matt Thompson Man Among Men Fulcrum of History __

Re: [OMPI users] Tuning vader for MPI_Wait Halt?

2017-06-07 Thread Matt Thompson
, Jun 5, 2017 at 1:00 PM, Nathan Hjelm <hje...@me.com> wrote: > Can you provide a reproducer for the hang? What kernel version are you > using? Is xpmem installed? > > -Nathan > > On Jun 05, 2017, at 10:53 AM, Matt Thompson <fort...@gmail.com> wrote: >

[OMPI users] 3.1.1 Bindings Change

2018-07-03 Thread Matt Thompson
OPENMPI somewhere. Matt -- Matt Thompson “The fact is, this is about us identifying what we do best and finding more ways of doing less of it better” -- Director of Better Anna Rampton ___ users mailing list users@lists.open-mpi.org https

Re: [OMPI users] NAS benchmark

2018-02-03 Thread Matt Thompson
t; make[1]: Leaving directory `/home/mahmood/Downloads/NPB3. > 3.1/NPB3.3-MPI/BT' > make: *** [bt] Error 2 > > > There is a good guide about that (https://www.technovelty.org/ > c/relocation-truncated-to-fit-wtf.html) but I don't know which compiler > flag should I fix to fix that. &g

Re: [OMPI users] Help Getting Started with Open MPI and PMIx and UCX

2019-01-18 Thread Matt Thompson
On Fri, Jan 18, 2019 at 1:13 PM Jeff Squyres (jsquyres) via users < users@lists.open-mpi.org> wrote: > On Jan 18, 2019, at 12:43 PM, Matt Thompson wrote: > > > > With some help, I managed to build an Open MPI 4.0.0 with: > > We can discuss each of these

Re: [OMPI users] Help Getting Started with Open MPI and PMIx and UCX

2019-01-18 Thread Matt Thompson
? Perhaps I added too much to my configure line? Not enough? Thanks, Matt On Thu, Jan 17, 2019 at 11:10 AM Matt Thompson wrote: > Dear Open MPI Gurus, > > A cluster I use recently updated their SLURM to have support for UCX and > PMIx. These are names I've seen and heard often at SC BoFs

Re: [OMPI users] Help Getting Started with Open MPI and PMIx and UCX

2019-01-22 Thread Matt Thompson
r such as mpirun hostname (both with sbatch >> and salloc) >> - explicitly specify the network to be used for the wire-up. you can >> for example mpirun --mca oob_tcp_if_include 192.168.0.0/24 if this is >> the network subnet by which all the nodes (e.g. compute nodes and >>

Re: [OMPI users] Help Getting Started with Open MPI and PMIx and UCX

2019-01-23 Thread Matt Thompson
MAC > > > > *From:* users [mailto:users-boun...@lists.open-mpi.org] *On Behalf Of *Matt > Thompson > *Sent:* Tuesday, January 22, 2019 6:04 AM > *To:* Open MPI Users > *Subject:* Re: [OMPI users] Help Getting Started with Open MPI and PMIx > and UCX > > > > Wel

[OMPI users] Help Getting Started with Open MPI and PMIx and UCX

2019-01-17 Thread Matt Thompson
uster than the administrator end, so I tend to get lost in the detailed presentations, etc. I see online. Thanks, Matt -- Matt Thompson “The fact is, this is about us identifying what we do best and finding more ways of doing less of it better” -- Director of Better An