Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-12 Thread Gus Correa
On 11/12/2014 05:45 PM, Reuti wrote: Am 12.11.2014 um 17:27 schrieb Reuti: Am 11.11.2014 um 02:25 schrieb Ralph Castain: Another thing you can do is (a) ensure you built with —enable-debug, and then (b) run it with -mca oob_base_verbose 100 (without the tcp_if_include option) so we can

Re: [OMPI users] OMPI users] How OMPI picks ethernet interfaces

2014-11-12 Thread Gilles Gouaillardet
Could you please send the output of netstat -nr on both head and compute node ? no problem obfuscating the ip of the head node, i am only interested in netmasks and routes. Ralph Castain wrote: > >> On Nov 12, 2014, at 2:45 PM, Reuti wrote: >> >>

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-12 Thread Ralph Castain
> On Nov 12, 2014, at 2:45 PM, Reuti wrote: > > Am 12.11.2014 um 17:27 schrieb Reuti: > >> Am 11.11.2014 um 02:25 schrieb Ralph Castain: >> >>> Another thing you can do is (a) ensure you built with —enable-debug, and >>> then (b) run it with -mca oob_base_verbose

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-12 Thread Reuti
Am 12.11.2014 um 17:27 schrieb Reuti: > Am 11.11.2014 um 02:25 schrieb Ralph Castain: > >> Another thing you can do is (a) ensure you built with —enable-debug, and >> then (b) run it with -mca oob_base_verbose 100 (without the tcp_if_include >> option) so we can watch the connection handshake

Re: [OMPI users] mmaped memory and openib btl.

2014-11-12 Thread Emmanuel Thomé
yes I confirm. Thanks for saying that this is the supposed behaviour. In the binary, the code goes to munmap@plt, which goes to the libc, not to libopen-pal.so libc is 2.13-38+deb7u1 I'm a total noob at got/plt relocations. What is the mechanism which should make the opal relocation win over

Re: [OMPI users] mmaped memory and openib btl.

2014-11-12 Thread Jeff Squyres (jsquyres)
FWIW, munmap is *supposed* to be intercepted. Can you confirm that when your application calls munmap, it doesn't make a call to libopen-pal.so? It should be calling this (1-line) function: - /* intercept munmap, as the user can give back memory that way as well. */ OPAL_DECLSPEC int

Re: [OMPI users] mmaped memory and openib btl.

2014-11-12 Thread Nathan Hjelm
You could just disable leave pinned: -mca mpi_leave_pinned 0 -mca mpi_leave_pinned_pipeline 0 This will fix the issue but may reduce performance. Not sure why the munmap wrapper is failing to execute but this will get you running. -Nathan Hjelm HPC-5, LANL On Wed, Nov 12, 2014 at 05:08:06PM

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-12 Thread Reuti
Am 11.11.2014 um 02:25 schrieb Ralph Castain: > Another thing you can do is (a) ensure you built with —enable-debug, and then > (b) run it with -mca oob_base_verbose 100 (without the tcp_if_include > option) so we can watch the connection handshake and see what it is doing. > The

[OMPI users] 1.8.4 release delayed

2014-11-12 Thread Ralph Castain
Hi folks Those of you following the mailing lists probably know that we had hoped to release 1.8.4 last Friday, but were unable to do so. We currently have a couple of issues pending resolution, and our developers are badly “crunched” by final prep for Supercomputing. We then will hit the US

Re: [OMPI users] mmaped memory and openib btl.

2014-11-12 Thread Emmanuel Thomé
As far as I have been able to understand while looking at the code, it very much seems that Joshua pointed out the exact cause for the issue. munmap'ing a virtual address space region does not evict it from mpool_grdma->pool->lru_list . If a later mmap happens to return the same address (a priori

Re: [OMPI users] 1.8.4

2014-11-12 Thread Ralph Castain
I was going to send something out to the list today anyway - will do so now. > On Nov 12, 2014, at 6:58 AM, Jeff Squyres (jsquyres) > wrote: > > On Nov 12, 2014, at 9:53 AM, Ray Sheppard wrote: > >> Thanks, and sorry to blast my little note out to the

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-12 Thread Ralph Castain
> On Nov 12, 2014, at 7:15 AM, Dave Love wrote: > > Ralph Castain writes: > >> You might also add the —display-allocation flag to mpirun so we can >> see what it thinks the allocation looks like. If there are only 16 >> slots on the node, it seems

[OMPI users] Question on tunning OFED kernel parameter for Mellanox 56G IB

2014-11-12 Thread Liu Jianyu
Hi, I'm trying to find the correct settings for OFED kernel parameter for the cluster. Each node has 32G RAM, installed Red Hat Enterprise Linux Server release 6.4 (Santiago) , OFED 2.1.192, OpenMPI 1.6.5 and Mellanox Technologies MT27500 Family [ConnectX-3] with 56G actived. lsmod

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-12 Thread Dave Love
"SLIM H.A." writes: > Dear Reuti and Ralph > > Below is the output of the run for openmpi 1.8.3 with this line > > mpirun -np $NSLOTS --display-map --display-allocation --cpus-per-proc 1 $exe -np is redundant with tight integration unless you're using fewer than NSLOTS

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-12 Thread Dave Love
Reuti writes: >> If so, I’m wondering if that NULL he shows in there is the source of the >> trouble. The parser doesn’t look like it would handle that very well, though >> I’d need to test it. Is that NULL expected? Or is the NULL not really in the >> file? > > I

Re: [OMPI users] oversubscription of slots with GridEngine

2014-11-12 Thread Dave Love
Ralph Castain writes: > You might also add the —display-allocation flag to mpirun so we can > see what it thinks the allocation looks like. If there are only 16 > slots on the node, it seems odd that OMPI would assign 32 procs to it > unless it thinks there is only 1 node in

Re: [OMPI users] 1.8.4

2014-11-12 Thread Jeff Squyres (jsquyres)
On Nov 12, 2014, at 9:53 AM, Ray Sheppard wrote: > Thanks, and sorry to blast my little note out to the list. I guess your mail > address is now aliased to the mailing list in my mail client. :-) No worries; I'm sure this is a question on other people's minds, too. -- Jeff

Re: [OMPI users] 1.8.4

2014-11-12 Thread Ray Sheppard
Thanks, and sorry to blast my little note out to the list. I guess your mail address is now aliased to the mailing list in my mail client. Ray On 11/12/2014 9:41 AM, Jeff Squyres (jsquyres) wrote: We have 2 critical issues left that need fixing (a THREAD_MULTIPLE/locking issue and a shmem

Re: [OMPI users] 1.8.4

2014-11-12 Thread Jeff Squyres (jsquyres)
We have 2 critical issues left that need fixing (a THREAD_MULTIPLE/locking issue and a shmem issue). There's active work progressing on both. I think we'd love to say it would be ready by SC, but I know that a lot of us -- myself included -- are fighting to meet our own SC deadlines. Ralph

[OMPI users] 1.8.4

2014-11-12 Thread Ray Sheppard
Hi Jeff, Sorry to bother you directly, but do you know when y'all will release the stable version of 1.8.4? I have users asking for it and really would like to build it for them before I leave for SC. But, either way, it would be great to be able to help manage their expectations. Thanks.

Re: [OMPI users] mpirun fails across nodes

2014-11-12 Thread Jeff Squyres (jsquyres)
Do you have firewalling enabled on either server? See this FAQ item: http://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems On Nov 12, 2014, at 4:57 AM, Syed Ahsan Ali wrote: > Dear All > > I need your advice. While trying to run mpirun job

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-12 Thread Reuti
Am 11.11.2014 um 02:12 schrieb Gilles Gouaillardet: > Hi, > > IIRC there were some bug fixes between 1.8.1 and 1.8.2 in order to really use > all the published interfaces. > > by any change, are you running a firewall on your head node ? Yes, but only for the interface to the outside world.

[OMPI users] mpirun fails across nodes

2014-11-12 Thread Syed Ahsan Ali
Dear All I need your advice. While trying to run mpirun job across nodes I get following error. It seems that the two nodes i.e, compute-01-01 and compute-01-06 are not able to communicate with each other. While nodes see each other on ping. [pmdtest@pmd ERA_CLM45]$ mpirun -np 16 -hostfile

[OMPI users] [ICCS/Alchemy] Call for Papers: Architecture, Languages, Compilation and Hardware support for Emerging ManYcore systems

2014-11-12 Thread CUDENNEC Loic
Please accept our apologies if you receive multiple copies of this CfP. *** * ALCHEMY Workshop 2015 * Architecture, Languages, Compilation and Hardware support for Emerging ManYcore systems * * Held in