Re: [OMPI users] New to (Open)MPI

2016-09-02 Thread Dave Goodell (dgoodell)
Lachlan mentioned that he has "M Series" hardware, which, to the best of my 
knowledge, does not officially support usNIC.  It may not be possible to even 
configure the relevant usNIC adapter policy in UCSM for M Series 
modules/chassis.

Using the TCP BTL may be the only realistic option here.

-Dave

> On Sep 2, 2016, at 5:35 AM, Jeff Squyres (jsquyres)  
> wrote:
> 
> Greetings Lachlan.
> 
> Yes, Gilles and John are correct: on Cisco hardware, our usNIC transport is 
> the lowest latency / best HPC-performance transport.  I'm not aware of any 
> MPI implementation (including Open MPI) that has support for FC types of 
> transports (including FCoE).
> 
> I'll ping you off-list with some usNIC details.
> 
> 
>> On Sep 1, 2016, at 10:06 PM, Lachlan Musicman  wrote:
>> 
>> Hola,
>> 
>> I'm new to MPI and OpenMPI. Relatively new to HPC as well.
>> 
>> I've just installed a SLURM cluster and added OpenMPI for the users to take 
>> advantage of.
>> 
>> I'm just discovering that I have missed a vital part - the networking.
>> 
>> I'm looking over the networking options and from what I can tell we only 
>> have (at the moment) Fibre Channel over Ethernet (FCoE).
>> 
>> Is this a network technology that's supported by OpenMPI?
>> 
>> (system is running Centos 7, on Cisco M Series hardware)
>> 
>> Please excuse me if I have terms wrong or am missing knowledge. Am new to 
>> this.
>> 
>> cheers
>> Lachlan
>> 
>> 
>> --
>> The most dangerous phrase in the language is, "We've always done it this 
>> way."
>> 
>> - Grace Hopper
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Using POSIX shared memory as send buffer

2015-09-28 Thread Dave Goodell (dgoodell)
On Sep 27, 2015, at 1:38 PM, marcin.krotkiewski  
wrote:
> 
> Hello, everyone
> 
> I am struggling a bit with IB performance when sending data from a POSIX 
> shared memory region (/dev/shm). The memory is shared among many MPI 
> processes within the same compute node. Essentially, I see a bit hectic 
> performance, but it seems that my code it is roughly twice slower than when 
> using a usual, malloced send buffer.

It may have to do with NUMA effects and the way you're allocating/touching your 
shared memory vs. your private (malloced) memory.  If you have a 
multi-NUMA-domain system (i.e., any 2+ socket server, and even some 
single-socket servers) then you are likely to run into this sort of issue.  The 
PCI bus on which your IB HCA communicates is almost certainly closer to one 
NUMA domain than the others, and performance will usually be worse if you are 
sending/receiving from/to a "remote" NUMA domain.

"lstopo" and other tools can sometimes help you get a handle on the situation, 
though I don't know if it knows how to show memory affinity.  I think you can 
find memory affinity for a process via "/proc//numa_maps".  There's lots 
of info about NUMA affinity here: https://queue.acm.org/detail.cfm?id=2513149

-Dave



Re: [OMPI users] Bug: Disabled mpi_leave_pinned for GPUDirect and InfiniBand during run-time caused by GCC optimizations

2015-06-08 Thread Dave Goodell (dgoodell)
On Jun 5, 2015, at 8:47 PM, Gilles Gouaillardet  
wrote:

> i did not use the term "pure" properly.
> 
> please read instead "posix_memalign is a function that does not modify any 
> user variable"
> that assumption is correct when there is no wrapper, and incorrect in our 
> case.

My suggestion is to try to create a small reproducer program that we can send 
to the GCC folks with the claim that we believe it to be a buggy optimization.  
Then we can see whether they agree and if not, how they defend that behavior.

We probably still need a workaround for now though, and the "volatile" approach 
seems fine to me.

-Dave




Re: [OMPI users] send and receive vectors + variable length

2015-01-09 Thread Dave Goodell (dgoodell)
On Jan 9, 2015, at 7:46 AM, Jeff Squyres (jsquyres)  wrote:

> Yes, I know examples 3.8/3.9 are blocking examples.
> 
> But it's morally the same as:
> 
> MPI_WAITALL(send_requests...)
> MPI_WAITALL(recv_requests...)
> 
> Strictly speaking, that can deadlock, too.  
> 
> It reality, it has far less chance of deadlocking than examples 3.8 and 3.9 
> (because you're likely within the general progression engine, and the 
> implementation will progress both the send and receive requests while in the 
> first WAITALL).  
> 
> But still, it would be valid for an implementation to *only* progress the 
> send requests -- and NOT the receive requests -- while in the first WAITALL.  
> Which makes it functionally equivalent to examples 3.8/3.9.

That's not true.  The implementation is required to make progress on all 
outstanding requests (assuming they can be progressed).  The following should 
not deadlock:

✂
for (...)  MPI_Isend(...)
for (...)  MPI_Irecv(...)
MPI_Waitall(send_requests...)
MPI_Waitall(recv_requests...)
✂

-Dave



Re: [OMPI users] mpi_wtime implementation

2014-11-24 Thread Dave Goodell (dgoodell)
On Nov 24, 2014, at 12:06 AM, George Bosilca  wrote:

> https://github.com/open-mpi/ompi/pull/285 is a potential answer. I would like 
> to hear Dave Goodell comment on this before pushing it upstream.
> 
>   George.

I'll take a look at it today.  My notification settings were messed up when you 
originally CCed me on the PR, so I didn't see this until now.

-Dave



Re: [OMPI users] OpenMPI 1.8.2 segfaults while 1.6.5 works?

2014-09-29 Thread Dave Goodell (dgoodell)
Looks like boost::mpi and/or your python "mpi" module might be creating a bogus 
argv array and passing it to OMPI's MPI_Init routine.  Note that argv is 
required by C99 to be terminated with a NULL pointer (that is, 
(argv[argc]==NULL) must hold).  See http://stackoverflow.com/a/3772826/158513.

-Dave

On Sep 29, 2014, at 1:34 PM, Ralph Castain  wrote:

> Afraid I cannot replicate a problem with singleton behavior in the 1.8 series:
> 
> 11:31:52  /home/common/openmpi/v1.8/orte/test/mpi$ ./hello foo bar
> Hello, World, I am 0 of 1 [0 local peers]: get_cpubind: 0 bitmap 0-23
> OMPI_MCA_orte_default_hostfile=/home/common/hosts
> OMPI_COMMAND=./hello
> OMPI_ARGV=foo bar
> OMPI_NUM_APP_CTX=1
> OMPI_FIRST_RANKS=0
> OMPI_APP_CTX_NUM_PROCS=1
> OMPI_MCA_orte_ess_num_procs=1
> 
> You can see that the OMPI_ARGV envar (which is the spot you flagged) is 
> correctly being set and there is no segfault. Not sure what your program may 
> be doing, though, so I'm not sure I've really tested your scenario.
> 
> 
> On Sep 29, 2014, at 10:55 AM, Ralph Castain  wrote:
> 
>> Okay, so regression-test.py is calling MPI_Init as a singleton, correct? 
>> Just trying to fully understand the scenario
>> 
>> Singletons are certainly allowed, if that's the scenario
>> 
>> On Sep 29, 2014, at 10:51 AM, Amos Anderson  
>> wrote:
>> 
>>> I'm not calling mpirun in this case because this particular calculation 
>>> doesn't use more than one processor. What I'm doing on my command line is 
>>> this:
>>> 
>>> /home/user/myapp/tools/python/bin/python test/regression/regression-test.py 
>>> test/regression/regression-jobs
>>> 
>>> and internally I check for rank/size. This command is executed in the 
>>> context of a souped up LD_LIBRARY_PATH. You can see the variable argv in 
>>> opal_argv_join is ending up with the last argument on my command line.
>>> 
>>> I suppose your question implies that mpirun is mandatory for executing 
>>> anything compiled with OpenMPI > 1.6 ?
>>> 
>>> 
>>> 
>>> On Sep 29, 2014, at 10:28 AM, Ralph Castain  wrote:
>>> 
 Can you pass us the actual mpirun command line being executed? Especially 
 need to see the argv being passed to your application.
 
 
 On Sep 27, 2014, at 7:09 PM, Amos Anderson  
 wrote:
 
> FWIW, I've confirmed that the segfault also happens with OpenMPI 1.7.5. 
> Also, I have some gdb output (from 1.7.5) for your perusal, including a 
> printout of some of the variables' values.
> 
> 
> 
> Starting program: /home/user/myapp/tools/python/bin/python 
> test/regression/regression-test.py test/regression/regression-jobs
> [Thread debugging using libthread_db enabled]
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x2bc8df1e in opal_argv_join (argv=0xa39398, delimiter=32) at 
> argv.c:299
> 299   str_len += strlen(*p) + 1;
> (gdb) where
> #0  0x2bc8df1e in opal_argv_join (argv=0xa39398, delimiter=32) at 
> argv.c:299
> #1  0x2ab2ce4e in ompi_mpi_init (argc=2, argv=0xa39390, 
> requested=0, provided=0x7fffba98) at runtime/ompi_mpi_init.c:450
> #2  0x2ab63e39 in PMPI_Init (argc=0x7fffbb8c, 
> argv=0x7fffbb80) at pinit.c:84
> #3  0x2aaab7b965d6 in boost::mpi::environment::environment 
> (this=0xa3a1d0, argc=@0x7fffbb8c, argv=@0x7fffbb80, 
> abort_on_exception=true)
>at ../tools/boost/libs/mpi/src/environment.cpp:98
> #4  0x2aaabc7b311d in boost::mpi::python::mpi_init (python_argv=..., 
> abort_on_exception=true) at 
> ../tools/boost/libs/mpi/src/python/py_environment.cpp:60
> #5  0x2aaabc7b33fb in boost::mpi::python::export_environment () at 
> ../tools/boost/libs/mpi/src/python/py_environment.cpp:94
> #6  0x2aaabc7d5ab5 in boost::mpi::python::init_module_mpi () at 
> ../tools/boost/libs/mpi/src/python/module.cpp:44
> #7  0x2aaab792a2f2 in 
> boost::detail::function::void_function_ref_invoker0 void>::invoke (function_obj_ptr=...)
>at ../tools/boost/boost/function/function_template.hpp:188
> #8  0x2aaab7929e6b in boost::function0::operator() 
> (this=0x7fffc110) at 
> ../tools/boost/boost/function/function_template.hpp:767
> #9  0x2aaab7928f11 in boost::python::handle_exception_impl (f=...) at 
> ../tools/boost/libs/python/src/errors.cpp:25
> #10 0x2aaab792a54f in boost::python::handle_exception 
> (f=0x2aaabc7d5746 ) at 
> ../tools/boost/boost/python/errors.hpp:29
> #11 0x2aaab792a1d9 in boost::python::detail::(anonymous 
> namespace)::init_module_in_scope (m=0x2aaabc617f68, 
>init_function=0x2aaabc7d5746 ) 
> at ../tools/boost/libs/python/src/module.cpp:24
> #12 

Re: [OMPI users] importing to MPI data already in memory from another process

2014-06-27 Thread Dave Goodell (dgoodell)
On Jun 27, 2014, at 8:53 AM, Brock Palen  wrote:

> Is there a way to import/map memory from a process (data acquisition) such 
> that an MPI program could 'take' or see that memory?
> 
> We have a need to do data acquisition at the rate of .7TB/s and need todo 
> some shuffles/computation on these data,  some of the nodes are directly 
> connected to the device, and some will do processing. 
> 
> Here is the proposed flow:
> 
> * Data collector nodes runs process collecting data from device
> * Those nodes somehow pass the data to an MPI job running on these nodes and 
> a number of other nodes (cpu need for filterting is greater than what the 16 
> data nodes can provide).

For a non-MPI solution for intranode data transfer in this case, take a look at 
vmsplice(2):

http://man7.org/linux/man-pages/man2/vmsplice.2.html

Pay particular attention to the SPLICE_F_GIFT flag, which will allow you to 
simply give memory pages away to the MPI process, avoiding unnecessary data 
copies.  You would just need a pipe shared between the data collector process 
and the MPI process (and to be a bit careful with your memory 
allocation/management, since any page you gift away should probably come from 
mmap(2) directly).


Otherwise, as George mentioned, I would investigate converting your current 
data collector processes to also be MPI processes so that they can simply 
communicate the data to the rest of the cluster.

-Dave




Re: [OMPI users] OMPI 1.8.1 Deadlock in mpi_finalize with mpi_init_thread

2014-04-29 Thread Dave Goodell (dgoodell)
I don't know of any workaround.  I've created a ticket to track this, but it 
probably won't be very high priority in the short term:

https://svn.open-mpi.org/trac/ompi/ticket/4575

-Dave

On Apr 25, 2014, at 3:27 PM, Jamil Appa  wrote:

> 
>   Hi 
> 
> The following program deadlocks in mpi_finalize with OMPI 1.8.1 but works 
> correctly with OMPI 1.6.5
> 
> Is there a work around?
> 
>   Thanks
> 
>  Jamil
> 
> program mpiio
> use mpi
> implicit none
> integer(kind=4) :: iprov, fh, ierr
> call mpi_init_thread(MPI_THREAD_SERIALIZED, iprov, ierr)
> if (iprov < MPI_THREAD_SERIALIZED) stop 'mpi_init_thread'
> call mpi_file_open(MPI_COMM_WORLD, 'test.dat', &
> MPI_MODE_WRONLY + MPI_MODE_CREATE, MPI_INFO_NULL, fh, ierr)
> call mpi_file_close(fh, ierr)
> call mpi_finalize(ierr)
> end program mpiio
> 
> (gdb) bt
> #0  0x003155a0e054 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x003155a09388 in _L_lock_854 () from /lib64/libpthread.so.0
> #2  0x003155a09257 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x77819f3c in ompi_attr_free_keyval () from 
> /gpfs/thirdparty/zenotech/home/jappa/apps6.4/lib/libmpi.so.1
> #4  0x77857be1 in PMPI_Keyval_free () from 
> /gpfs/thirdparty/zenotech/home/jappa/apps6.4/lib/libmpi.so.1
> #5  0x715b21f2 in ADIOI_End_call () from 
> /gpfs/thirdparty/zenotech/home/jappa/apps6.4/lib/openmpi/mca_io_romio.so
> #6  0x7781a325 in ompi_attr_delete_impl () from 
> /gpfs/thirdparty/zenotech/home/jappa/apps6.4/lib/libmpi.so.1
> #7  0x7781a4ec in ompi_attr_delete_all () from 
> /gpfs/thirdparty/zenotech/home/jappa/apps6.4/lib/libmpi.so.1
> #8  0x77832ad5 in ompi_mpi_finalize () from 
> /gpfs/thirdparty/zenotech/home/jappa/apps6.4/lib/libmpi.so.1
> #9  0x77b12e59 in pmpi_finalize__ () from 
> /gpfs/thirdparty/zenotech/home/jappa/apps6.4/lib/libmpi_mpifh.so.2
> #10 0x00400b64 in mpiio () at t.f90:10
> #11 0x00400b9a in main ()
> #12 0x00315561ecdd in __libc_start_main () from /lib64/libc.so.6
> #13 0x00400a19 in _start ()
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] mpirun runs in serial even I set np to several processors

2014-04-14 Thread Dave Goodell (dgoodell)
On Apr 14, 2014, at 12:15 PM, Djordje Romanic  wrote:

> When I start wrf with mpirun -np 4 ./wrf.exe, I get this:
> -
>  starting wrf task0  of1
>  starting wrf task0  of1
>  starting wrf task0  of1
>  starting wrf task0  of1
> -
> This indicates that it is not using 4 processors, but 1. 
> 
> Any idea what might be the problem? 

It could be that you compiled WRF with a different MPI implementation than you 
are using to run it (e.g., MPICH vs. Open MPI).

-Dave



Re: [OMPI users] usNIC point-to-point messaging module

2014-04-02 Thread Dave Goodell (dgoodell)
On Apr 2, 2014, at 12:57 PM, Filippo Spiga  wrote:

> I still do not understand why this keeps appearing...
> 
> srun: cluster configuration lacks support for cpu binding
> 
> Any clue?

I don't know what causes that message.  Ralph, any thoughts here?

-Dave



Re: [OMPI users] usNIC point-to-point messaging module

2014-04-01 Thread Dave Goodell (dgoodell)
On Apr 1, 2014, at 12:13 PM, Filippo Spiga  wrote:

> Dear Ralph, Dear Jeff,
> 
> I've just recompiled the latest Open MPI 1.8. I added 
> "--enable-mca-no-build=btl-usnic" to configure but the message still appear. 
> Here the output of "--mca btl_base_verbose 100" (trunked immediately after 
> the application starts)

Jeff's on vacation, so I'll see if I can help here.

Try deleting all the files in "$PREFIX/lib/openmpi/", where "$PREFIX" is the 
value you passed to configure with "--prefix=".  If you did not pass a value, 
then it is "/usr/local".  Then reinstall (with "make install" in the OMPI build 
tree).

What I think is happening is that you still have an "mca_btl_usnic.so" file 
leftover from the last time you installed OMPI (before passing 
"--enable-mca-no-build=btl-usnic").  So OMPI is using this shared library and 
you get exactly the same problem.

-Dave



Re: [OMPI users] Problem building OpenMPI 1.8 on RHEL6

2014-04-01 Thread Dave Goodell (dgoodell)
On Apr 1, 2014, at 10:26 AM, "Blosch, Edwin L"  wrote:

> I am getting some errors building 1.8 on RHEL6.  I tried autoreconf as 
> suggested, but it failed for the same reason.  Is there a minimum version of 
> m4 required that is newer than that provided by RHEL6?

Don't run "autoreconf" by hand, make sure to run the "./autogen.sh" script that 
is packaged with OMPI.  It will also check your versions and warn you if they 
are out of date.

Do you need to build OMPI from the SVN source?  Or would a (pre-autogen'ed) 
release tarball work for you?

-Dave




Re: [OMPI users] trying to use personal copy of 1.7.4

2014-03-12 Thread Dave Goodell (dgoodell)
Perhaps there's an RPATH issue here?  I don't fully understand the structure of 
Rmpi, but is there both an app and a library (or two separate libraries) that 
are linking against MPI?

I.e., what we want is:

app -> ~ross/OMPI
\  /
 --> library --

But what we're getting is:

app ---> /usr/OMPI   
\
 --> library ---> ~ross/OMPI


If one of them was first linked against the /usr/OMPI and managed to get an 
RPATH then it could override your LD_LIBRARY_PATH.

-Dave

On Mar 12, 2014, at 5:39 AM, Jeff Squyres (jsquyres)  wrote:

> Generally, all you need to ensure that your personal copy of OMPI is used is 
> to set the PATH and LD_LIBRARY_PATH to point to your new Open MPI 
> installation.  I do this all the time on my development cluster (where I have 
> something like 6 billion different installations of OMPI available... mmm... 
> should probably clean that up...)
> 
> export LD_LIBRARY_PATH=path_to_my_ompi/lib:$LD_LIBRARY_PATH
> export PATH=path-to-my-ompi/bin:$PATH
> 
> It should be noted that:
> 
> 1. you need to *prefix* your PATH and LD_LIBRARY_PATH with these values
> 2. you need to set these values in a way that will be picked up on all 
> servers that you use in your job.  The safest way to do this is in your shell 
> startup files (e.g., $HOME/.bashrc or whatever is relevant for your shell).
> 
> See http://www.open-mpi.org/faq/?category=running#run-prereqs, 
> http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path, and 
> http://www.open-mpi.org/faq/?category=running#mpirun-prefix.
> 
> Note the --prefix option that is described in the 3rd FAQ item I cited -- 
> that can be a bit easier, too.
> 
> 
> 
> On Mar 12, 2014, at 2:51 AM, Ross Boylan  wrote:
> 
>> I took the advice here and built a personal copy of the current openmpi,
>> to see if the problems I was having with Rmpi were a result of the old
>> version on the system.
>> 
>> When I do ldd on the relevant libraries (Rmpi.so is loaded dynamically
>> by R) everything looks fine; path references that should be local are.
>> But when I run the program and do lsof it shows that both the system and
>> personal versions of key libraries are opened.
>> 
>> First, does anyone know which library will actually be used, or how to
>> tell which library is actually used, in this situation.  I'm running on
>> linux (Debian squeeze)?
>> 
>> Second, it there some way to prevent the wrong/old/sytem libraries from
>> being loaded?
>> 
>> FWIW I'm still seeing the old misbehavior when I run this way, but, as I
>> said, I'm really not sure which libraries are being used.  Since Rmpi
>> was built against the new/local ones, I think the fact that it doesn't
>> crash means I really am using the new ones.
>> 
>> Here are highlights of lsof on the process running R:
>> COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF  NODE NAME
>> R   17634 ross  cwdDIR  254,212288 150773764 
>> /home/ross/KHC/sunbelt
>> R   17634 ross  rtdDIR8,1 4096 2 /
>> R   17634 ross  txtREG8,1 5648   3058294 
>> /usr/lib/R/bin/exec/R
>> R   17634 ross  DELREG8,12416718 
>> /tmp/openmpi-sessions-ross@n100_0/60429/1/shared_mem_pool.n100
>> R   17634 ross  memREG8,1   335240   3105336 
>> /usr/lib/openmpi/lib/libopen-pal.so.0.0.0
>> R   17634 ross  memREG8,1   304576   3105337 
>> /usr/lib/openmpi/lib/libopen-rte.so.0.0.0
>> R   17634 ross  memREG8,1   679992   3105332 
>> /usr/lib/openmpi/lib/libmpi.so.0.0.2
>> R   17634 ross  memREG8,193936   2967826 
>> /usr/lib/libz.so.1.2.3.4
>> R   17634 ross  memREG8,110648   3187256 
>> /lib/libutil-2.11.3.so
>> R   17634 ross  memREG8,132320   2359631 
>> /usr/lib/libpciaccess.so.0.10.8
>> R   17634 ross  memREG8,133368   2359338 
>> /usr/lib/libnuma.so.1
>> R   17634 ross  memREG  254,2   979113 152045740 
>> /home/ross/install/lib/libopen-pal.so.6.1.0
>> R   17634 ross  memREG8,1   183456   2359592 
>> /usr/lib/libtorque.so.2.0.0
>> R   17634 ross  memREG  254,2  1058125 152045781 
>> /home/ross/install/lib/libopen-rte.so.7.0.0
>> R   17634 ross  memREG8,149936   2359341 
>> /usr/lib/libibverbs.so.1.0.0
>> R   17634 ross  memREG  254,2  2802579 152045867 
>> /home/ross/install/lib/libmpi.so.1.3.0
>> R   17634 ross  memREG  254,2   106626 152046481 
>> /home/ross/Rlib-3.0.1/Rmpi/libs/Rmpi.so
>> 
>> So libmpi, libopen-pal, and libopen-rte all are opened in two versions and 
>> two locations.
>> 
>> Thanks.
>> Ross Boylan
>> 
>> ___
>> users mailing list
>>