Re: [OMPI users] MPI_Comm_accept()

2017-03-13 Thread Adam Sylvester
Bummer - thanks for the update.  I will revert back to 1.10.x for now
then.  Should I file a bug report for this on GitHub or elsewhere?  Or if
there's an issue for this already open, can you point me to it so I can
keep track of when it's fixed?  Any best guess calendar-wise as to when you
expect this to be fixed?

Thanks.

On Mon, Mar 13, 2017 at 10:45 AM, r...@open-mpi.org  wrote:

> You should consider it a bug for now - it won’t work in the 2.0 series,
> and I don’t think it will work in the upcoming 2.1.0 release. Probably will
> be fixed after that.
>
>
> On Mar 13, 2017, at 5:17 AM, Adam Sylvester  wrote:
>
> As a follow-up, I tried this with Open MPI 1.10.4 and this worked as
> expected (the port formatting looks really different):
>
> $ mpirun -np 1 ./server
> Port name is 1286733824.0;tcp://10.102.16.135:43074+1286733825.0;tcp://
> 10.102.16.135::300
> Accepted!
>
> $ mpirun -np 1 ./client "1286733824.0;tcp://10.102.16.
> 135:43074+1286733825.0;tcp://10.102.16.135::300"
> Trying with '1286733824.0;tcp://10.102.16.135:43074+1286733825.0;tcp://
> 10.102.16.135::300'
> Connected!
>
> I've found some other posts of users asking about similar things regarding
> the 2.x release - is this a bug?
>
> On Sun, Mar 12, 2017 at 9:38 PM, Adam Sylvester  wrote:
>
>> I'm using Open MPI 2.0.2 on RHEL 7.  I'm trying to use MPI_Open_port() /
>> MPI_Comm_accept() / MPI_Conn_connect().  My use case is that I'll have two
>> processes running on two machines that don't initially know about each
>> other (i.e. I can't do the typical mpirun with a list of IPs); eventually I
>> think I may need to use ompi-server to accomplish what I want but for now
>> I'm trying to test this out running two processes on the same machine with
>> some toy programs.
>>
>> server.cpp creates the port, prints it, and waits for a client to accept
>> using it:
>>
>> #include 
>> #include 
>>
>> int main(int argc, char** argv)
>> {
>> MPI_Init(NULL, NULL);
>>
>> char myport[MPI_MAX_PORT_NAME];
>> MPI_Comm intercomm;
>>
>> MPI_Open_port(MPI_INFO_NULL, myport);
>> std::cout << "Port name is " << myport << std::endl;
>>
>> MPI_Comm_accept(myport, MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm);
>>
>> std::cout << "Accepted!" << std::endl;
>>
>> MPI_Finalize();
>> return 0;
>> }
>>
>> client.cpp takes in this port on the command line and tries to connect to
>> it:
>>
>> #include 
>> #include 
>>
>> int main(int argc, char** argv)
>> {
>> MPI_Init(NULL, NULL);
>>
>> MPI_Comm intercomm;
>>
>> const std::string name(argv[1]);
>> std::cout << "Trying with '" << name << "'" << std::endl;
>> MPI_Comm_connect(name.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF,
>> &intercomm);
>>
>> std::cout << "Connected!" << std::endl;
>>
>> MPI_Finalize();
>> return 0;
>> }
>>
>> I run the server first:
>> $ mpirun ./server
>> Port name is 2720137217.0:595361386
>>
>> Then a second later I run the client:
>> $ mpirun ./client 2720137217.0:595361386
>> Trying with '2720137217.0:595361386'
>>
>> Both programs hang for awhile and then eventually time out.  I have a
>> feeling I'm misunderstanding something and doing something dumb but from
>> all the examples I've seen online it seems like this should work.
>>
>> Thanks for the help.
>> -Adam
>>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Lustre support uses deprecated include.

2017-03-13 Thread Edgar Gabriel
thank you for the report, it is on my to do list. I will try to get the 
configure logic to recognize which file to use later this, should 
hopefully be done for 2.0.3 and 2.1.1 series.


Thanks

Edgar


On 3/13/2017 8:55 AM, Åke Sandgren wrote:

Hi!

The lustre support in ompi/mca/fs/lustre/fs_lustre.h is using a
deprecated include.

#include 

is deprecated in newer lustre versions (at least from 2.8) and

#include 

should be used instead.



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] MPI_Comm_accept()

2017-03-13 Thread r...@open-mpi.org
You should consider it a bug for now - it won’t work in the 2.0 series, and I 
don’t think it will work in the upcoming 2.1.0 release. Probably will be fixed 
after that.


> On Mar 13, 2017, at 5:17 AM, Adam Sylvester  wrote:
> 
> As a follow-up, I tried this with Open MPI 1.10.4 and this worked as expected 
> (the port formatting looks really different):
> 
> $ mpirun -np 1 ./server
> Port name is 
> 1286733824.0;tcp://10.102.16.135:43074+1286733825.0;tcp://10.102.16.135::300
> Accepted!
> 
> $ mpirun -np 1 ./client 
> "1286733824.0;tcp://10.102.16.135:43074+1286733825.0;tcp://10.102.16.135::300"
> Trying with 
> '1286733824.0;tcp://10.102.16.135:43074+1286733825.0;tcp://10.102.16.135::300'
> Connected!
> 
> I've found some other posts of users asking about similar things regarding 
> the 2.x release - is this a bug?
> 
> On Sun, Mar 12, 2017 at 9:38 PM, Adam Sylvester  > wrote:
> I'm using Open MPI 2.0.2 on RHEL 7.  I'm trying to use MPI_Open_port() / 
> MPI_Comm_accept() / MPI_Conn_connect().  My use case is that I'll have two 
> processes running on two machines that don't initially know about each other 
> (i.e. I can't do the typical mpirun with a list of IPs); eventually I think I 
> may need to use ompi-server to accomplish what I want but for now I'm trying 
> to test this out running two processes on the same machine with some toy 
> programs.
> 
> server.cpp creates the port, prints it, and waits for a client to accept 
> using it:
> 
> #include 
> #include 
> 
> int main(int argc, char** argv)
> {
> MPI_Init(NULL, NULL);
> 
> char myport[MPI_MAX_PORT_NAME];
> MPI_Comm intercomm;
> 
> MPI_Open_port(MPI_INFO_NULL, myport);
> std::cout << "Port name is " << myport << std::endl;
> 
> MPI_Comm_accept(myport, MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm);
> 
> std::cout << "Accepted!" << std::endl;
> 
> MPI_Finalize();
> return 0;
> }
> 
> client.cpp takes in this port on the command line and tries to connect to it:
> 
> #include 
> #include 
> 
> int main(int argc, char** argv)
> {
> MPI_Init(NULL, NULL);
> 
> MPI_Comm intercomm;
> 
> const std::string name(argv[1]);
> std::cout << "Trying with '" << name << "'" << std::endl;
> MPI_Comm_connect(name.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF, 
> &intercomm);
> 
> std::cout << "Connected!" << std::endl;
> 
> MPI_Finalize();
> return 0;
> }
> 
> I run the server first:
> $ mpirun ./server
> Port name is 2720137217.0:595361386
> 
> Then a second later I run the client:
> $ mpirun ./client 2720137217.0:595361386
> Trying with '2720137217.0:595361386'
> 
> Both programs hang for awhile and then eventually time out.  I have a feeling 
> I'm misunderstanding something and doing something dumb but from all the 
> examples I've seen online it seems like this should work.
> 
> Thanks for the help.
> -Adam
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Lustre support uses deprecated include.

2017-03-13 Thread Åke Sandgren
Hi!

The lustre support in ompi/mca/fs/lustre/fs_lustre.h is using a
deprecated include.

#include 

is deprecated in newer lustre versions (at least from 2.8) and

#include 

should be used instead.

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] openib/mpi_alloc_mem pathology

2017-03-13 Thread Paul Kapinos

Nathan,
unfortunately '--mca memory_linux_disable 1' does not help on this issue - it 
does not change the behaviour at all.
 Note that the pathological behaviour is present in Open MPI 2.0.2 as well as 
in /1.10.x, and Intel OmniPath (OPA) network-capable nodes are affected only.


The known workaround is to disable InfiniBand failback by '--mca btl 
^tcp,openib' on nodes with OPA network. (On IB nodes, the same tweak lead to 5% 
performance improvement on single-node jobs; but obviously disabling IB on nodes 
connected via IB is not a solution for multi-node jobs, huh).



On 03/07/17 20:22, Nathan Hjelm wrote:

If this is with 1.10.x or older run with --mca memory_linux_disable 1. There is 
a bad interaction between ptmalloc2 and psm2 support. This problem is not 
present in v2.0.x and newer.

-Nathan


On Mar 7, 2017, at 10:30 AM, Paul Kapinos  wrote:

Hi Dave,



On 03/06/17 18:09, Dave Love wrote:
I've been looking at a new version of an application (cp2k, for for what
it's worth) which is calling mpi_alloc_mem/mpi_free_mem, and I don't


Welcome to the club! :o)
In our measures we see some 70% of time in 'mpi_free_mem'... and 15x 
performance loss if using Open MPI vs. Intel MPI. So it goes.

https://www.mail-archive.com/users@lists.open-mpi.org//msg30593.html



think it did so the previous version I looked at.  I found on an
IB-based system it's spending about half its time in those allocation
routines (according to its own profiling) -- a tad surprising.

It turns out that's due to some pathological interaction with openib,
and just having openib loaded.  It shows up on a single-node run iff I
don't suppress the openib btl, and doesn't for multi-node PSM runs iff I
suppress openib (on a mixed Mellanox/Infinipath system).


we're lucky - our issue is on Intel OmniPath (OPA) network (and we will junk IB 
hardware in near future, I think) - so we disabled the IB transport failback,
--mca btl ^tcp,openib

For single-node jobs this will also help on plain IB nodes, likely. (you can 
disable IB if you do not use it)



Can anyone say why, and whether there's a workaround?  (I can't easily
diagnose what it's up to as ptrace is turned off on the system
concerned, and I can't find anything relevant in archives.)

I had the idea to try libfabric instead for multi-node jobs, and that
doesn't show the pathological behaviour iff openib is suppressed.
However, it requires ompi 1.10, not 1.8, which I was trying to use.
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] MPI_Comm_accept()

2017-03-13 Thread Adam Sylvester
As a follow-up, I tried this with Open MPI 1.10.4 and this worked as
expected (the port formatting looks really different):

$ mpirun -np 1 ./server
Port name is 1286733824.0;tcp://10.102.16.135:43074
+1286733825.0;tcp://10.102.16.135::300
Accepted!

$ mpirun -np 1 ./client "1286733824.0;tcp://10.102.16.135:43074
+1286733825.0;tcp://10.102.16.135::300"
Trying with '1286733824.0;tcp://10.102.16.135:43074
+1286733825.0;tcp://10.102.16.135::300'
Connected!

I've found some other posts of users asking about similar things regarding
the 2.x release - is this a bug?

On Sun, Mar 12, 2017 at 9:38 PM, Adam Sylvester  wrote:

> I'm using Open MPI 2.0.2 on RHEL 7.  I'm trying to use MPI_Open_port() /
> MPI_Comm_accept() / MPI_Conn_connect().  My use case is that I'll have two
> processes running on two machines that don't initially know about each
> other (i.e. I can't do the typical mpirun with a list of IPs); eventually I
> think I may need to use ompi-server to accomplish what I want but for now
> I'm trying to test this out running two processes on the same machine with
> some toy programs.
>
> server.cpp creates the port, prints it, and waits for a client to accept
> using it:
>
> #include 
> #include 
>
> int main(int argc, char** argv)
> {
> MPI_Init(NULL, NULL);
>
> char myport[MPI_MAX_PORT_NAME];
> MPI_Comm intercomm;
>
> MPI_Open_port(MPI_INFO_NULL, myport);
> std::cout << "Port name is " << myport << std::endl;
>
> MPI_Comm_accept(myport, MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm);
>
> std::cout << "Accepted!" << std::endl;
>
> MPI_Finalize();
> return 0;
> }
>
> client.cpp takes in this port on the command line and tries to connect to
> it:
>
> #include 
> #include 
>
> int main(int argc, char** argv)
> {
> MPI_Init(NULL, NULL);
>
> MPI_Comm intercomm;
>
> const std::string name(argv[1]);
> std::cout << "Trying with '" << name << "'" << std::endl;
> MPI_Comm_connect(name.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF,
> &intercomm);
>
> std::cout << "Connected!" << std::endl;
>
> MPI_Finalize();
> return 0;
> }
>
> I run the server first:
> $ mpirun ./server
> Port name is 2720137217.0:595361386
>
> Then a second later I run the client:
> $ mpirun ./client 2720137217.0:595361386
> Trying with '2720137217.0:595361386'
>
> Both programs hang for awhile and then eventually time out.  I have a
> feeling I'm misunderstanding something and doing something dumb but from
> all the examples I've seen online it seems like this should work.
>
> Thanks for the help.
> -Adam
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] "No objects of the specified type were found on at least one node"

2017-03-13 Thread Angel de Vicente
Brice Goglin  writes:

> Ok, that's a very old kernel on a very old POWER processor, it's
> expected that hwloc doesn't get much topology information, and it's
> then expected that OpenMPI cannot apply most binding policies.

Just in case it can add anything, I tried with an older OpenMPI version
(1.10.6), and I cannot get it to work either, but the message is
different:

,
| --
| No objects of the specified type were found on at least one node:
|   
|   Type: Socket
|   Node: s01c1b08
| 
| The map cannot be done as specified.
| --
`


-- 
Ángel de Vicente
http://www.iac.es/galeria/angelv/  
-
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning 
the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users