Re: [OMPI users] New libmpi.so dependency on libibverbs.so?

2016-02-02 Thread Number Cruncher
you will get warnings (such as cannot dlopen mca_btl_openib.so) on these nodes unless you specify --mca btl ^openib on these nodes i think it would be much easier if libibverbs.so.1 were available on all your nodes, including those with no infiniband hardware Cheers, Gilles On 2/2/2016 2:

[OMPI users] New libmpi.so dependency on libibverbs.so?

2016-02-01 Thread Number Cruncher
Having compiled various recent Open MPI sources with the same "configure" options, I've noticed that the "libmpi.so" shared library from 1.10.1 now depends itself directly on libibverbs.so.1. Previously, 1.10.0 for example, only plugins such as mca_btl_openib.so depended on it. E.g. readelf -

Re: [OMPI users] Strange "All-to-All" behavior

2013-04-30 Thread Number Cruncher
Sorry, I seem to have misread your post. You're not actually invoking MPI_Alltoall or MPI_Alltoallv. Please disregard my last post. Simon. On 26/04/2013 23:14, Stephan Wolf wrote: Hi, I have encountered really bad performance when all the nodes send data to all the other nodes. I use Isen

Re: [OMPI users] Strange "All-to-All" behavior

2013-04-30 Thread Number Cruncher
This sounds a bit like the All_to_allv algorithm change I complained about when 1.6.1 was released. Original post: http://www.open-mpi.org/community/lists/users/2012/11/20722.php Everything waits for "rank 0" observation: http://www.open-mpi.org/community/lists/users/2013/01/21219.php Does s

[OMPI users] All_to_allv algorithm patch

2013-02-04 Thread Number Cruncher
I'll try running this by the mailing list again, before resigning myself to maintaining this privately I've looked in detail at the current two MPI_Alltoallv algorithms and wanted to raise a couple of ideas. Firstly, the new default "pairwise" algorithm. * There is no optimisation for spa

Re: [OMPI users] MPI_Alltoallv performance regression 1.6.0 to 1.6.1

2013-01-24 Thread Number Cruncher
I've looked in more detail at the current two MPI_Alltoallv algorithms and wanted to raise a couple of ideas. Firstly, the new default "pairwise" algorithm. * There is no optimisation for sparse/empty messages, compare to the old basic "linear" algorithm. * The attached "pairwise-nop" patch add

Re: [OMPI users] MPI_Alltoallv performance regression 1.6.0 to 1.6.1

2012-12-22 Thread Number Cruncher
e your applications. George. On Dec 21, 2012, at 13:25 , Number Cruncher mailto:number.crunc...@ntlworld.com>> wrote: I completely understand there's no "one size fits all", and I appreciate that there are workarounds to the change in behaviour. I'm only trying t

Re: [OMPI users] MPI_Alltoallv performance regression 1.6.0 to 1.6.1

2012-12-21 Thread Number Cruncher
hen- und Kommunikationszentrum der RWTH Aachen Seffenter Weg 23, D 52074 Aachen (Germany) -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Number Cruncher Sent: Wednesday, December 19, 2012 5:31 PM To: Open MPI Users Subject: Re: [OMPI users]

Re: [OMPI users] MPI_Alltoallv performance regression 1.6.0 to 1.6.1

2012-12-19 Thread Number Cruncher
or quick online tests). http://www.open-mpi.org/faq/?category=tuning#setting-mca-params We 'tune' our Open MPI by setting environment variables Best Paul Kapinos On 12/19/12 11:44, Number Cruncher wrote: Having run some more benchmarks, the new default is *really* bad for our

Re: [OMPI users] MPI_Alltoallv performance regression 1.6.0 to 1.6.1

2012-12-19 Thread Number Cruncher
ance Computing RWTH Aachen University, Center for Computing and Communication Rechen- und Kommunikationszentrum der RWTH Aachen Seffenter Weg 23, D 52074 Aachen (Germany) -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Number Cruncher

[OMPI users] MPI_Alltoallv performance regression 1.6.0 to 1.6.1

2012-11-15 Thread Number Cruncher
I've noticed a very significant (100%) slow down for MPI_Alltoallv calls as of version 1.6.1. * This is most noticeable for high-frequency exchanges over 1Gb ethernet where process-to-process message sizes are fairly small (e.g. 100kbyte) and much of the exchange matrix is sparse. * 1.6.1 releas

Re: [OMPI users] memcpy overlap in ompi_ddt_copy_content_same_ddt and glibc 2.12

2010-11-11 Thread Number Cruncher
On 11/11/10 10:56, Jed Brown wrote: On Thu, Nov 11, 2010 at 11:45, Number Cruncher mailto:number.crunc...@ntlworld.com>> wrote: Having just replaced the memcpy with Linus safe forward-copy version from https://bugzilla.redhat.com/show_bug.cgi?id=638477#c38 I can report n

Re: [OMPI users] memcpy overlap in ompi_ddt_copy_content_same_ddt and glibc 2.12

2010-11-11 Thread Number Cruncher
On 10/11/10 21:17, Jed Brown wrote: I think any software that ignores the ISO warning "If copying takes place between objects that overlap, the behavior is undefined" needs fixing. Absolutely, it is incorrect and should be fixed. Having just replaced the memcpy with Linus safe

[OMPI users] memcpy overlap in ompi_ddt_copy_content_same_ddt and glibc 2.12

2010-11-10 Thread Number Cruncher
Just some observations from a concerned user with a temperamental Open MPI program (1.4.3): Fedora 14 (just released) includes glibc-2.12 which has optimized versions of memcpy, including a copy backward. http://sourceware.org/git/?p=glibc.git;a=commitdiff;h=6fb8cbcb58a29fff73eb2101b34caa19a7f

Re: [OMPI users] Mimicking timeout for MPI_Wait

2009-12-08 Thread Number Cruncher
Whilst MPI has traditionally been run on dedicated hardware, the rise of cheap multicore CPUs makes it very attractive for ISVs such as ourselves (http://www.cambridgeflowsolutions.com/) to build a *single* executable that can be run in batch mode on a dedicated cluster *or* interactively on a

Re: [OMPI users] MPI and C++ - now Send and Receive of Classes and STL containers

2009-07-06 Thread Number Cruncher
I strongly suggest you take a look at boost::mpi, http://www.boost.org/doc/libs/1_39_0/doc/html/mpi.html It handles serialization transparently and has some great natural extensions to the MPI C interface for C++, e.g. bool global = all_reduce(comm, local, logical_and()); This sets "global"

Re: [OMPI users] top question

2009-06-03 Thread Number Cruncher
Jeff Squyres wrote: We get this question so much that I really need to add it to the FAQ. :-\ Open MPI currently always spins for completion for exactly the reason that Scott cites: lower latency. Arguably, when using TCP, we could probably get a bit better performance by blocking and allow

Re: [OMPI users] Bogus memcpy or bogus valgrind record

2009-04-06 Thread Number Cruncher
I'd like to add my concern to the thread at http://www.open-mpi.org/community/lists/users/2009/03/8661.php that the latest 1.3 series produces far too much memory-checker noise. We use Valgrind extensively during debugging, and although I'm using the latest snapshot (1.3.2a1r20901) and latest

[OMPI users] orterun returns zero when it fails to find an executeable

2008-12-08 Thread Number Cruncher
I notice that bug ticket #954 https://svn.open-mpi.org/trac/ompi/ticket/954 has the very issue I'm encountering: I want to know when mpirun fails because of a missing executable during some automated tests. At the moment, nearly 2 years after the bug was reported, orterun/mpirun still returns

Re: [OMPI users] overlapping memcpy in ompi_coll_tuned_allgather_intra_bruck

2008-02-04 Thread Number Cruncher
ing the overall performance. Totally agree. The vast majority of OpenMPI stuff uses memcpy fine. It would just be a local bug fix. Can I volunteer? Regards, Simon Thanks, George. On Jan 30, 2008, at 9:41 AM, Number Cruncher wrote: I'm getting many "Source and destination

[OMPI users] overlapping memcpy in ompi_coll_tuned_allgather_intra_bruck

2008-01-30 Thread Number Cruncher
I'm getting many "Source and destination overlap in memcpy" errors when running my application on an odd number of procs. I believe this is because the Allgather collective is using Bruck's algorithm and doing a shift on the buffer as a finalisation step (coll_tuned_allgather.c): tmprecv = (char