Re: [OMPI users] NVCC mpi.h: error: attribute "__deprecated__" does not take arguments

2012-06-18 Thread Rolf vandeVaart
Dmitry:

It turns out that by default in Open MPI 1.7, configure enables warnings for 
deprecated MPI functionality.  In Open MPI 1.6, these warnings were disabled by 
default.
That explains why you would not see this issue in the earlier versions of Open 
MPI.

I assume that gcc must have added support for __attribute__((__deprecated__)) 
and then later on __attribute__((__deprecated__(msg))) and your version of gcc 
supports both of these.  (My version of gcc, 4.5.1 does not support the msg in 
the attribute)

The version of nvcc you have does not support the "msg" argument so everything 
blows up.

I suggest you configure with -disable-mpi-interface-warning which will prevent 
any of the deprecated attributes from being used and then things should work 
fine.

Let me know if this fixes your problem.

Rolf

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Rolf vandeVaart
Sent: Monday, June 18, 2012 11:00 AM
To: Open MPI Users
Cc: Олег Рябков
Subject: Re: [OMPI users] NVCC mpi.h: error: attribute "__deprecated__" does 
not take arguments

Hi Dmitry:
Let me look into this.

Rolf

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Dmitry N. Mikushin
Sent: Monday, June 18, 2012 10:56 AM
To: Open MPI Users
Cc: Олег Рябков
Subject: Re: [OMPI users] NVCC mpi.h: error: attribute "__deprecated__" does 
not take arguments

Yeah, definitely. Thank you, Jeff.

- D.
2012/6/18 Jeff Squyres >
On Jun 18, 2012, at 10:41 AM, Dmitry N. Mikushin wrote:

> No, I'm configuring with gcc, and for openmpi-1.6 it works with nvcc without 
> a problem.
Then I think Rolf (from Nvidia) should figure this out; I don't have access to 
nvcc.  :-)

> Actually, nvcc always meant to be more or less compatible with gcc, as far as 
> I know. I'm guessing in case of trunk nvcc is the source of the issue.
>
> And with ./configure CC=nvcc etc. it won't build:
> /home/dmikushin/forge/openmpi-trunk/opal/mca/event/libevent2019/libevent/include/event2/util.h:126:2:
>  error: #error "No way to define ev_uint64_t"
You should complain to Nvidia about that.

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.



[OMPI users] Using OpenMPI on a network

2012-06-18 Thread VimalMathew
So I configured and compiled a simple MPI program.

Now the issue is when I try to do the same thing on my computer on a
corporate network, I get this error:

 

C:\OpenMPI\openmpi-1.6\installed\bin>mpiexec MPI_Tutorial_1.exe


--

Open RTE was unable to open the hostfile:

C:\OpenMPI\openmpi-1.6\installed\bin/../etc/openmpi-default-hostfile

Check to make sure the path and filename are correct.


--

[SOUMIWHP5003567:01884] [[37936,0],0] ORTE_ERROR_LOG: Not found in file
C:\OpenM

PI\openmpi-1.6\orte\mca\ras\base\ras_base_allocate.c at line 200

[SOUMIWHP5003567:01884] [[37936,0],0] ORTE_ERROR_LOG: Not found in file
C:\OpenM

PI\openmpi-1.6\orte\mca\plm\base\plm_base_launch_support.c at line 99

[SOUMIWHP5003567:01884] [[37936,0],0] ORTE_ERROR_LOG: Not found in file
C:\OpenM

PI\openmpi-1.6\orte\mca\plm\process\plm_process_module.c at line 996

 

What network settings should I be using? I'm sure this is because of the
network because when I unplug the network cable, I get the error message
I got below.

 

Thanks,

Vimal

 

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Damien
Sent: Friday, June 15, 2012 3:15 PM
To: Open MPI Users
Subject: Re: [OMPI users] Building MPI on Windows

 

OK, that's what orte_rml_base_select failed means, no TCP connection.
But you should be able to make OpenMPI & mpiexec work without a network
if you're just running in local memory.  There's probably a runtime
parameter to set but I don't know what it is.  Maybe Jeff or Shiqing can
weigh in with what that is.

Damien

On 15/06/2012 1:10 PM, vimalmat...@eaton.com wrote: 

Just figured it out.

The only thing different from when it ran yesterday to today was I was
connected to a network. So I connected my laptop to a network and it
worked again.

 

Thanks for all your help, Damien!

I'm sure I'm gonna get stuck more along the way so hoping you can help.

 

--

Vimal

 

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Damien
Sent: Friday, June 15, 2012 2:57 PM
To: Open MPI Users
Subject: Re: [OMPI users] Building MPI on Windows

 

Hmmm.  Two things.  Can you run helloworldMPI.exe on it's own?  It
should output "Number of threads = 1, My rank = 0"

Also, can you post the output of ompi_info ?  I think you might still
have some path mixups.  A successful OpenMPI build with this simple
program should just work.

If you still have the other OpenMPIs installed from the binaries, you
might want to try uninstalling all of them and rebooting.  Also if you
rebuilt OpenMPI and helloworldMPI with VS 2010, make sure that
helloworldMPI is actually linked to those VS2010 OpenMPI libs by setting
the right lib path in the Linker options.  Linking to VS2008 libs and
trying to run with VS2010 dlls/exes could cause problems too.

Damien   

On 15/06/2012 11:44 AM, vimalmat...@eaton.com wrote: 

Hi Damien,

 

I installed MS Visual Studio 2010 and tried the whole procedure again
and it worked!

That's the great news.

Now the bad news is that I'm trying to run the program again using
mpiexec and it won't!

 

I get these error messages: 

orte_rml_base_select failed

orte_ess_set_name failed, with a bunch of text saying it could be due to
configuration or environment problems and will make sense only to an
OpenMPI developer.

 

Help!

 

--

Vimal

 

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Damien
Sent: Thursday, June 14, 2012 4:55 PM
To: Open MPI Users
Subject: Re: [OMPI users] Building MPI on Windows

 

You did build the project, right?  The helloworldMPI.exe is in the Debug
directory?

On 14/06/2012 1:49 PM, vimalmat...@eaton.com wrote: 

No luck.

Output:

 

Microsoft Windows [Version 6.1.7601]

Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

 

C:\Users\...>cd "C:\Users\C9995799\Downloads\helloworldMPI\Debug"

 

C:\Users\...\Downloads\helloworldMPI\Debug>mpiexec -n 2
helloworldMPI.exe


--

mpiexec was unable to launch the specified application as it could not
find an e

xecutable:

 

Executable: helloworldMPI.exe

Node: SOUMIWHP5003567

 

while attempting to start process rank 0.


--

2 total processes failed to start

 

C:\Users\...\Downloads\helloworldMPI\Debug>

 

--

Vimal

 

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Damien
Sent: Thursday, June 14, 2012 3:38 PM
To: Open MPI Users
Subject: Re: [OMPI users] Building MPI on Windows

 

Here's a MPI Hello World project based on your code.  It runs fine on my
machine.  You'll need to change the include and lib paths as we
discussed before to match your paths, and copy those bin files over to
the Debug directory.

Run it with 

Re: [OMPI users] Executions in two different machines

2012-06-18 Thread Jeff Squyres
On Jun 18, 2012, at 11:45 AM, Harald Servat wrote:

>> 2. The two machines need to be able to open TCP connections to each other on 
>> random ports.
> 
> That will be harder. Do need both machines to open TCP connections to
> random ports, or just one? 


Both.

To be specific: there's two layers that open TCP sockets to each other.  The 
run-time system (i.e., mpirun and its friends) opens control channels between 
nodes.  There *is* a predictable pattern upon which nodes open TCP sockets to 
which other nodes, but you shouldn't count on it (because we change it over 
time).

Then the MPI layer opens TCP sockets for MPI messaging.  The pattern of who 
opens TCP sockets to whom depends on the app, because OMPI opens sockets upon 
the first send (and that may be racy, depending on your application).

So it's best not to assume and just allow random TCP sockets from any machines 
that will be involved in the computation.

BTW, there have been a few discussions here in the past about how to configure 
iptables properly to allow this.  No one has quite gotten it right; our advice 
has always just been to disable iptables.  However, if you come up with a 
configuration solution that allows it to work properly -- and I'm *sure* that 
such a configuration exists; I'm just betting that no one with the proper 
willpower / experience has set their mind to figuring it out -- please let us 
know what it is so that we can add it to the FAQ.

Thanks!

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Executions in two different machines

2012-06-18 Thread Harald Servat
El dl 18 de 06 de 2012 a les 11:39 -0400, en/na Jeff Squyres va
escriure:
> On Jun 18, 2012, at 11:12 AM, Harald Servat wrote:
> 
> > Thank you Jeff. Now with the following commands starts, but it gets
> > blocked before starting. May be this problem of firewalls? Do I need
> > both that M1 and M2 can log into the other machine through ssh?
> 
> I'm not sure what you mean by "blocked" -- do you mean that it hangs and does 
> nothing after seeming to start?

Yes, that's it.

> 
> If so, then yes, you need at least the two following things to be true:
> 
> 1. You need to be able to ssh to between your machines without manually 
> entering a password or passphrase.

Uhmmm... I'm trying to solve that by opening port 22.

> 2. The two machines need to be able to open TCP connections to each other on 
> random ports.
> 

That will be harder. Do need both machines to open TCP connections to
random ports, or just one? 

Thank you.


WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer


Re: [OMPI users] Executions in two different machines

2012-06-18 Thread Jeff Squyres
On Jun 18, 2012, at 11:12 AM, Harald Servat wrote:

> Thank you Jeff. Now with the following commands starts, but it gets
> blocked before starting. May be this problem of firewalls? Do I need
> both that M1 and M2 can log into the other machine through ssh?

I'm not sure what you mean by "blocked" -- do you mean that it hangs and does 
nothing after seeming to start?

If so, then yes, you need at least the two following things to be true:

1. You need to be able to ssh to between your machines without manually 
entering a password or passphrase.

2. The two machines need to be able to open TCP connections to each other on 
random ports.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] NVCC mpi.h: error: attribute "__deprecated__" does not take arguments

2012-06-18 Thread Rolf vandeVaart
Hi Dmitry:
Let me look into this.

Rolf

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Dmitry N. Mikushin
Sent: Monday, June 18, 2012 10:56 AM
To: Open MPI Users
Cc: Олег Рябков
Subject: Re: [OMPI users] NVCC mpi.h: error: attribute "__deprecated__" does 
not take arguments

Yeah, definitely. Thank you, Jeff.

- D.
2012/6/18 Jeff Squyres >
On Jun 18, 2012, at 10:41 AM, Dmitry N. Mikushin wrote:

> No, I'm configuring with gcc, and for openmpi-1.6 it works with nvcc without 
> a problem.
Then I think Rolf (from Nvidia) should figure this out; I don't have access to 
nvcc.  :-)

> Actually, nvcc always meant to be more or less compatible with gcc, as far as 
> I know. I'm guessing in case of trunk nvcc is the source of the issue.
>
> And with ./configure CC=nvcc etc. it won't build:
> /home/dmikushin/forge/openmpi-trunk/opal/mca/event/libevent2019/libevent/include/event2/util.h:126:2:
>  error: #error "No way to define ev_uint64_t"
You should complain to Nvidia about that.

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI users] Executions in two different machines

2012-06-18 Thread Jeff Squyres
On Jun 18, 2012, at 10:45 AM, Harald Servat wrote:

> # $HOME/aplic/openmpi/1.6/bin/mpirun -np 1 -host
> localhost ./init_barrier_fini : -x
> LD_LIBRARY_PATH=/home/Computational/harald/aplic/openmpi/1.6/lib
> -prefix /home/Computational/harald/aplic/openmpi/1.6/ -x
> PATH=/home/Computational/harald/aplic/openmpi/1.6/bin -np 1 -host
> M2 /home/Computational/harald/tests/mpi/multi-machine/init_barrier_fini

Try without using the absolute pathname to mpirun -- it reacts differently if 
you specify the absolute pathname vs. just "mpirun".

Also, if you setup your .bashrc's right, then you don't need the -x 
LD_LIBRARY_PATH... clause.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] NVCC mpi.h: error: attribute "__deprecated__" does not take arguments

2012-06-18 Thread Dmitry N. Mikushin
Yeah, definitely. Thank you, Jeff.

- D.

2012/6/18 Jeff Squyres 

> On Jun 18, 2012, at 10:41 AM, Dmitry N. Mikushin wrote:
>
> > No, I'm configuring with gcc, and for openmpi-1.6 it works with nvcc
> without a problem.
>
> Then I think Rolf (from Nvidia) should figure this out; I don't have
> access to nvcc.  :-)
>
> > Actually, nvcc always meant to be more or less compatible with gcc, as
> far as I know. I'm guessing in case of trunk nvcc is the source of the
> issue.
> >
> > And with ./configure CC=nvcc etc. it won't build:
> >
> /home/dmikushin/forge/openmpi-trunk/opal/mca/event/libevent2019/libevent/include/event2/util.h:126:2:
> error: #error "No way to define ev_uint64_t"
>
> You should complain to Nvidia about that.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] NVCC mpi.h: error: attribute "__deprecated__" does not take arguments

2012-06-18 Thread Jeff Squyres
On Jun 18, 2012, at 10:41 AM, Dmitry N. Mikushin wrote:

> No, I'm configuring with gcc, and for openmpi-1.6 it works with nvcc without 
> a problem.

Then I think Rolf (from Nvidia) should figure this out; I don't have access to 
nvcc.  :-)

> Actually, nvcc always meant to be more or less compatible with gcc, as far as 
> I know. I'm guessing in case of trunk nvcc is the source of the issue.
> 
> And with ./configure CC=nvcc etc. it won't build:
> /home/dmikushin/forge/openmpi-trunk/opal/mca/event/libevent2019/libevent/include/event2/util.h:126:2:
>  error: #error "No way to define ev_uint64_t"

You should complain to Nvidia about that.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] NVCC mpi.h: error: attribute "__deprecated__" does not take arguments

2012-06-18 Thread Dmitry N. Mikushin
No, I'm configuring with gcc, and for openmpi-1.6 it works with nvcc
without a problem.
Actually, nvcc always meant to be more or less compatible with gcc, as far
as I know. I'm guessing in case of trunk nvcc is the source of the issue.

And with ./configure CC=nvcc etc. it won't build:
/home/dmikushin/forge/openmpi-trunk/opal/mca/event/libevent2019/libevent/include/event2/util.h:126:2:
error: #error "No way to define ev_uint64_t"

Thanks,
- D.

2012/6/18 Jeff Squyres 

> Did you configure and build Open MPI with nvcc?
>
> I ask because Open MPI should auto-detect whether the underlying compiler
> can handle a message argument with the deprecated directive or not.
>
> You should be able to build Open MPI with:
>
>./configure CC=nvcc etc.
>make clean all install
>
> If you're building Open MPI with one compiler and then trying to compile
> with another (like the command line in your mail implies), all bets are off
> because Open MPI has tuned itself to the compiler that it was configured
> with.
>
>
>
>
> On Jun 18, 2012, at 10:20 AM, Dmitry N. Mikushin wrote:
>
> > Hello,
> >
> > With openmpi svn trunk as of
> >
> > Repository Root: http://svn.open-mpi.org/svn/ompi
> > Repository UUID: 63e3feb5-37d5-0310-a306-e8a459e722fe
> > Revision: 26616
> >
> > we are observing the following strange issue (see below). How do you
> think, is it a problem of NVCC or OpenMPI?
> >
> > Thanks,
> > - Dima.
> >
> > [dmikushin@tesla-apc mpitest]$ cat mpitest.cu
> > #include 
> >
> > __global__ void kernel() { }
> >
> > [dmikushin@tesla-apc mpitest]$ nvcc -I/opt/openmpi-trunk/include -c
> mpitest.cu
> > /opt/openmpi-trunk/include/mpi.h(365): error: attribute "__deprecated__"
> does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(374): error: attribute "__deprecated__"
> does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(382): error: attribute "__deprecated__"
> does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(724): error: attribute "__deprecated__"
> does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(730): error: attribute "__deprecated__"
> does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(736): error: attribute "__deprecated__"
> does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(790): error: attribute "__deprecated__"
> does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(791): error: attribute "__deprecated__"
> does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1049): error: attribute
> "__deprecated__" does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1070): error: attribute
> "__deprecated__" does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1072): error: attribute
> "__deprecated__" does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1074): error: attribute
> "__deprecated__" does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1145): error: attribute
> "__deprecated__" does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1149): error: attribute
> "__deprecated__" does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1151): error: attribute
> "__deprecated__" does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1345): error: attribute
> "__deprecated__" does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1347): error: attribute
> "__deprecated__" does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1484): error: attribute
> "__deprecated__" does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1507): error: attribute
> "__deprecated__" does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1510): error: attribute
> "__deprecated__" does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1515): error: attribute
> "__deprecated__" does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1525): error: attribute
> "__deprecated__" does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1527): error: attribute
> "__deprecated__" does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1589): error: attribute
> "__deprecated__" does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1610): error: attribute
> "__deprecated__" does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1612): error: attribute
> "__deprecated__" does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1614): error: attribute
> "__deprecated__" does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1685): error: attribute
> "__deprecated__" does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1689): error: attribute
> "__deprecated__" does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1691): error: attribute
> "__deprecated__" does not take arguments
> >
> > /opt/openmpi-trunk/include/mpi.h(1886): error: attribute
> "__deprecated__" does not take arguments
> >
> > 

Re: [OMPI users] NVCC mpi.h: error: attribute "__deprecated__" does not take arguments

2012-06-18 Thread Jeff Squyres
Did you configure and build Open MPI with nvcc?

I ask because Open MPI should auto-detect whether the underlying compiler can 
handle a message argument with the deprecated directive or not.

You should be able to build Open MPI with:

./configure CC=nvcc etc.
make clean all install

If you're building Open MPI with one compiler and then trying to compile with 
another (like the command line in your mail implies), all bets are off because 
Open MPI has tuned itself to the compiler that it was configured with.




On Jun 18, 2012, at 10:20 AM, Dmitry N. Mikushin wrote:

> Hello,
> 
> With openmpi svn trunk as of
> 
> Repository Root: http://svn.open-mpi.org/svn/ompi
> Repository UUID: 63e3feb5-37d5-0310-a306-e8a459e722fe
> Revision: 26616
> 
> we are observing the following strange issue (see below). How do you think, 
> is it a problem of NVCC or OpenMPI?
> 
> Thanks,
> - Dima.
> 
> [dmikushin@tesla-apc mpitest]$ cat mpitest.cu
> #include 
> 
> __global__ void kernel() { }
> 
> [dmikushin@tesla-apc mpitest]$ nvcc -I/opt/openmpi-trunk/include -c mpitest.cu
> /opt/openmpi-trunk/include/mpi.h(365): error: attribute "__deprecated__" does 
> not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(374): error: attribute "__deprecated__" does 
> not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(382): error: attribute "__deprecated__" does 
> not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(724): error: attribute "__deprecated__" does 
> not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(730): error: attribute "__deprecated__" does 
> not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(736): error: attribute "__deprecated__" does 
> not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(790): error: attribute "__deprecated__" does 
> not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(791): error: attribute "__deprecated__" does 
> not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1049): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1070): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1072): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1074): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1145): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1149): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1151): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1345): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1347): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1484): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1507): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1510): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1515): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1525): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1527): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1589): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1610): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1612): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1614): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1685): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1689): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1691): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1886): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(1888): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(2024): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(2047): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(2050): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(2055): error: attribute "__deprecated__" 
> does not take arguments
> 
> /opt/openmpi-trunk/include/mpi.h(2065): error: attribute "__deprecated__" 
> does not take arguments
> 

[OMPI users] NVCC mpi.h: error: attribute "__deprecated__" does not take arguments

2012-06-18 Thread Dmitry N. Mikushin
Hello,

With openmpi svn trunk as of

Repository Root: http://svn.open-mpi.org/svn/ompi
Repository UUID: 63e3feb5-37d5-0310-a306-e8a459e722fe
Revision: 26616

we are observing the following strange issue (see below). How do you think,
is it a problem of NVCC or OpenMPI?

Thanks,
- Dima.

[dmikushin@tesla-apc mpitest]$ cat mpitest.cu
#include 

__global__ void kernel() { }

[dmikushin@tesla-apc mpitest]$ nvcc -I/opt/openmpi-trunk/include -c
mpitest.cu
/opt/openmpi-trunk/include/mpi.h(365): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(374): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(382): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(724): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(730): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(736): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(790): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(791): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1049): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1070): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1072): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1074): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1145): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1149): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1151): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1345): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1347): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1484): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1507): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1510): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1515): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1525): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1527): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1589): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1610): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1612): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1614): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1685): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1689): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1691): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1886): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(1888): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(2024): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(2047): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(2050): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(2055): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(2065): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/mpi.h(2067): error: attribute "__deprecated__"
does not take arguments

/opt/openmpi-trunk/include/openmpi/ompi/mpi/cxx/comm.h(102): error:
attribute "__deprecated__" does not take arguments

/opt/openmpi-trunk/include/openmpi/ompi/mpi/cxx/win.h(90): error: attribute
"__deprecated__" does not take arguments

/opt/openmpi-trunk/include/openmpi/ompi/mpi/cxx/file.h(298): error:
attribute "__deprecated__" does not take arguments

41 errors detected in the compilation of
"/tmp/tmpxft_4a17_-4_mpitest.cpp1.ii".


Re: [OMPI users] Naming MPI_Spawn children

2012-06-18 Thread Ralph Castain
I believe you could resolve this by specifying the interfaces to use in the 
order you want them checked. In other words, you might try this:

-mca btl_tcp_if_include eth1,eth0

where eth1 is the NIC connecting the internal subnet in the cloud, and eth0 is 
the NIC connecting them to the Internet. I believe OMPI will check comm in that 
order, meaning that eth1 will get picked first.

Of course, that presumes something about the interfaces on your parent machine. 
It doesn't matter if eth1 doesn't exist - what matters is that one of those 
names is the right one to reach your cloud. If so, then this should help 
resolve your problem.


On Jun 17, 2012, at 10:01 PM, Jaison Paul Mulerikkal wrote:

> HI,
> 
> I'm running openmpi on Rackspace cloud over Internet using MPI_Spawn. IT 
> means,
> I run the parent on my PC and the children on Rackspace cloud machines.
> Rackspace provides direct IP addresses of the machines (no NAT), that is why 
> it
> is possible. 
> 
> Now, there is a communicator involving only the children and some 
> communications
> involve only communication between children (on Rackspace cloud, in this
> scenario). When we conducted experiments, we experienced more than expected
> delays in this operation - communication between children alone. 
> 
> My assumption is that openMPI is looking at the direct IP addresses at the
> hostfile and try to communicate between Rackspace children over Internet. 
> What I
> would want/expect is the Rackspace children communicate between themselves
> internally, using the internal Rackspace hostnames. Rackspace provide internal
> IP addresses. But if I use that in the hostfile at my home PC, the parent wont
> be able to access the children (there is a communicator involving parent and
> children).
> 
> Can I anyway tell openMPI to look into the internal IP addresses of Rackspace
> machines (another hostfile, may be) for the sub-group (communicator) involving
> Rackspace children? In that case we will get performance improvement, I guess.
> 
> Thanks in advance for your valuable suggestions.
> 
> Jaison
> Australian National University. 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] MPI_Comm_spawn and exit of parent process.

2012-06-18 Thread Ralph Castain
One further point that I missed in my earlier note: if you are starting the 
parent as a singleton, then you are fooling yourself about the "without mpirun" 
comment. A singleton immediately starts a local daemon to act as mpirun so that 
comm_spawn will work. Otherwise, there is no way to launch the child processes.

So you might as well just launch the "child" job directly with mpirun - the 
result is exactly the same. If you truly want the job to use all the cores, one 
proc per core, and don't want to tell it the number of cores, then use the OMPI 
devel trunk where we have added support for such patterns. All you would have 
to do is:

mpirun -ppr 1:core --bind-to core ./my_app

and you are done.


On Jun 18, 2012, at 4:27 AM, TERRY DONTJE wrote:

> On 6/16/2012 8:03 AM, Roland Schulz wrote:
>> 
>> Hi,
>> 
>> I would like to start a single process without mpirun and then use 
>> MPI_Comm_spawn to start up as many processes as required. I don't want the 
>> parent process to take up any resources, so I tried to disconnect the inter 
>> communicator and then finalize mpi and exit the parent. But as soon as I do 
>> that the children exit too. Why is that? Can I somehow change that behavior? 
>> Or can I wait on the children to exit without the waiting taking up CPU time?
>> 
>> The reason I don't need the parent as soon as the children are spawned, is 
>> that I need one intra-communicator over all processes. And as far as I know 
>> I cannot join the parent and children to one intra-communicator. 
> You could use MPI_Intercomm_merge to create an intra-communicator out of the 
> groups in an inter-communicator and pass the inter-communicator you get back 
> from the MPI_Comm_spawn call.
> 
> --td
>> 
>> The purpose of the whole exercise is that I want that my program to use all 
>> cores of a node by default when executed without mpirun.
>> 
>> I have tested this with OpenMPI 1.4.5. A sample program is here: 
>> http://pastebin.com/g2XSZwvY . "Child finalized" is only printed with the 
>> sleep(2) in the parent not commented out.
>> 
>> Roland
>> 
>> -- 
>> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
>> 865-241-1537, ORNL PO BOX 2008 MS6309
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> -- 
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.don...@oracle.com
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Executions in two different machines

2012-06-18 Thread Jeff Squyres
You might also want to set up your shell startup files on each machine to 
reflect the proper PATH and LD_LIBRARY_PATH.  E.g., if you have a different 
.bashrc on each machine, just have it set PATH and LD_LIBARY_PATH properly *for 
that machine*.

To be clear: it's usually easiest to install OMPI to the same prefix on every 
machine, but there's no technical requirement from OMPI to do so.


On Jun 18, 2012, at 10:00 AM, Ralph Castain wrote:

> Try adding "-x LD_LIBRARY_PATH=" to your mpirun cmd line
> 
> 
> On Jun 18, 2012, at 7:11 AM, Harald Servat wrote:
> 
>> Hello list,
>> 
>> I'd like to use OpenMPI to execute an MPI application in two different
>> machines.
>> 
>> Up to now, I've configured and installed OpenMPI 1.6 in my two systems
>> (each on a different directory). When I execute binaries within a system
>> (in any) the application works well. However when I try to execute in
>> the two systems, it does not work, in fact it complains it cannot find
>> "orted". This is the command I try to run and its output
>> 
>> #  $HOME/aplic/openmpi/1.6/bin/mpirun -display-map --machinefile hosts
>> -np 2 /bin/date
>> 
>>    JOB MAP   
>> 
>> Data for node: M1Num procs: 1
>>  Process OMPI jobid: [6021,1] Process rank: 0
>> 
>> Data for node: M2Num procs: 1
>>  Process OMPI jobid: [6021,1] Process rank: 1
>> 
>> =
>> bash: /home/harald/aplic/openmpi/1.6/bin/orted: El fitxer o directori no
>> existeix
>> --
>> A daemon (pid 19598) died unexpectedly with status 127 while attempting
>> to launch so we are aborting.
>> 
>> There may be more information reported by the environment (see above).
>> 
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have
>> the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>> --
>> --
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --
>> 
>> My guess is that the spawn process cannot find orted in M2 because the
>> installation prefix of M1 and M2 differ. Is my guess correct? As I
>> cannot change the prefix of the two installation, how can I tell mpirun
>> to look for orted in a different place? After looking at the
>> documentation, I've tried with --prefix and --launch-agent without
>> success.
>> 
>> Thank you very much in advance.
>> 
>> 
>> 
>> 
>> 
>> WARNING / LEGAL TEXT: This message is intended only for the use of the
>> individual or entity to which it is addressed and may contain
>> information which is privileged, confidential, proprietary, or exempt
>> from disclosure under applicable law. If you are not the intended
>> recipient or the person responsible for delivering the message to the
>> intended recipient, you are strictly prohibited from disclosing,
>> distributing, copying, or in any way using this message. If you have
>> received this communication in error, please notify the sender and
>> destroy and delete any copies you may have received.
>> 
>> http://www.bsc.es/disclaimer
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Executions in two different machines

2012-06-18 Thread Ralph Castain
Try adding "-x LD_LIBRARY_PATH=" to your mpirun cmd line


On Jun 18, 2012, at 7:11 AM, Harald Servat wrote:

> Hello list,
> 
>  I'd like to use OpenMPI to execute an MPI application in two different
> machines.
> 
>  Up to now, I've configured and installed OpenMPI 1.6 in my two systems
> (each on a different directory). When I execute binaries within a system
> (in any) the application works well. However when I try to execute in
> the two systems, it does not work, in fact it complains it cannot find
> "orted". This is the command I try to run and its output
> 
> #  $HOME/aplic/openmpi/1.6/bin/mpirun -display-map --machinefile hosts
> -np 2 /bin/date
> 
>    JOB MAP   
> 
> Data for node: M1 Num procs: 1
>   Process OMPI jobid: [6021,1] Process rank: 0
> 
> Data for node: M2 Num procs: 1
>   Process OMPI jobid: [6021,1] Process rank: 1
> 
> =
> bash: /home/harald/aplic/openmpi/1.6/bin/orted: El fitxer o directori no
> existeix
> --
> A daemon (pid 19598) died unexpectedly with status 127 while attempting
> to launch so we are aborting.
> 
> There may be more information reported by the environment (see above).
> 
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have
> the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --
> --
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --
> 
>  My guess is that the spawn process cannot find orted in M2 because the
> installation prefix of M1 and M2 differ. Is my guess correct? As I
> cannot change the prefix of the two installation, how can I tell mpirun
> to look for orted in a different place? After looking at the
> documentation, I've tried with --prefix and --launch-agent without
> success.
> 
> Thank you very much in advance.
> 
> 
> 
> 
> 
> WARNING / LEGAL TEXT: This message is intended only for the use of the
> individual or entity to which it is addressed and may contain
> information which is privileged, confidential, proprietary, or exempt
> from disclosure under applicable law. If you are not the intended
> recipient or the person responsible for delivering the message to the
> intended recipient, you are strictly prohibited from disclosing,
> distributing, copying, or in any way using this message. If you have
> received this communication in error, please notify the sender and
> destroy and delete any copies you may have received.
> 
> http://www.bsc.es/disclaimer
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] Executions in two different machines

2012-06-18 Thread Harald Servat
Hello list,

  I'd like to use OpenMPI to execute an MPI application in two different
machines.

  Up to now, I've configured and installed OpenMPI 1.6 in my two systems
(each on a different directory). When I execute binaries within a system
(in any) the application works well. However when I try to execute in
the two systems, it does not work, in fact it complains it cannot find
"orted". This is the command I try to run and its output

#  $HOME/aplic/openmpi/1.6/bin/mpirun -display-map --machinefile hosts
-np 2 /bin/date

    JOB MAP   

 Data for node: M1  Num procs: 1
Process OMPI jobid: [6021,1] Process rank: 0

 Data for node: M2  Num procs: 1
Process OMPI jobid: [6021,1] Process rank: 1

 =
bash: /home/harald/aplic/openmpi/1.6/bin/orted: El fitxer o directori no
existeix
--
A daemon (pid 19598) died unexpectedly with status 127 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have
the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--
--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--

  My guess is that the spawn process cannot find orted in M2 because the
installation prefix of M1 and M2 differ. Is my guess correct? As I
cannot change the prefix of the two installation, how can I tell mpirun
to look for orted in a different place? After looking at the
documentation, I've tried with --prefix and --launch-agent without
success.

Thank you very much in advance.





WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer


Re: [OMPI users] MPI_Comm_spawn and exit of parent process.

2012-06-18 Thread TERRY DONTJE

On 6/16/2012 8:03 AM, Roland Schulz wrote:

Hi,

I would like to start a single process without mpirun and then use 
MPI_Comm_spawn to start up as many processes as required. I don't want 
the parent process to take up any resources, so I tried to disconnect 
the inter communicator and then finalize mpi and exit the parent. But 
as soon as I do that the children exit too. Why is that? Can I somehow 
change that behavior? Or can I wait on the children to exit without 
the waiting taking up CPU time?


The reason I don't need the parent as soon as the children are 
spawned, is that I need one intra-communicator over all processes. And 
as far as I know I cannot join the parent and children to one 
intra-communicator.
You could use MPI_Intercomm_merge to create an intra-communicator out of 
the groups in an inter-communicator and pass the inter-communicator you 
get back from the MPI_Comm_spawn call.


--td


The purpose of the whole exercise is that I want that my program to 
use all cores of a node by default when executed without mpirun.


I have tested this with OpenMPI 1.4.5. A sample program is here: 
http://pastebin.com/g2XSZwvY . "Child finalized" is only printed with 
the sleep(2) in the parent not commented out.


Roland

--
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov 
865-241-1537, ORNL PO BOX 2008 MS6309


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] An idea about a semi-automatic optimized parallel I/O with Open MPI

2012-06-18 Thread Xuan Wang
Hi Jeff,

Thank you very much for your suggestions. They help us a lot. We will 
reconsider the whole model more in details.

I am thinking of separating the whole process into two different kinds of 
processes. One is the system process and the other is MPI process, which is 
invoked (how to invoke is also a key issue here) by the system process.

I will reconstruct/redesign it and then post it for discussion.

Thank you very much indeed.

Best Regards!

- Original Message -
From: "Jeff Squyres" 
To: "Open MPI Users" 
Cc: zih-siox-de...@groups.tu-dresden.de
Sent: Friday, June 15, 2012 1:13:12 PM
Subject: Re: [OMPI users] An idea about a semi-automaticoptimized   
parallelI/O with Open MPI

There's nothing that says that your daemons have to be MPI processes.  They 
could be proper system-level daemons that live "forever", for example.  You 
might not be able to speak to them via MPI easily (e.g., you may need to use 
TCP sockets or some other network transport), but this is fairly common 
client/server stuff.

Or you could use MPI_COMM_CONNECT / MPI_COMM_ACCEPT to connect to them, if 
they're long-lived MPI jobs.  That gets a bit more complicated; I'm not sure 
how well we've tested having a "server"-like MPI process around with repeated 
connects/disconnects for long periods of time.  The failure model may also not 
be attractive (i.e., if your MPI job MPI connects to the server, but then 
segv's -- it might likely take down the server, too).

That being said, any additional monitoring or querying of what to do will add 
overhead.  Hence, your techniques might be useful for large IO operations, for 
example.  Or every Nth small operation.  It's just a question of balancing the 
overhead of the query/reply with the expected duration of the IO operation(s) 
that you intend to perform.  I suspect that you can probably model this 
overhead and then refine it with some empirical data.  I suspect that you can 
supplement such a query scheme with some level of caching on the rationale that 
if an app does IO pattern X once, it might need to do it multiple times.  So if 
you query for it once, you can just keep that result around for a little while 
in case the same pattern comes up again -- you won't necessarily need to query 
for it again.

The tricky part is that everyone's HPC setup is different -- the 
exact/empirical overhead of the query/reply will likely be highly tied to a 
user's particular hardware, software stack, network setup, back-end filesystem, 
etc.

One more thing -- to expand on what Ralph said, there's two kinds of typical 
MPI IO optimizations that are possible:

1. one-time optimizations during startup: this is usually effected via OMPI's 
MCA parameters, and is only performed when OMPI's MPI IO subsystem is 
initialized.  This is typically the first time an app calls MPI_FILE_OPEN.

2. MPI_Info hints that are passed in to each MPI IO operation.  These obviously 
are much more flexible and timely (since they can be passed in to each 
operation, and/or to MPI_FILE_OPEN).  

More below.


On Jun 13, 2012, at 10:32 AM, Xuan Wang wrote:

> Hi Ralph,
> 
> you are right, the monitor hurts the performance, if the monitor has to sent 
> the monitoring results to the data warehouse during the execution of the I/O 
> operation. But actually, we don't need this parallel execution for the 
> monitor. The monitor only need to gather the information such as the duration 
> of read/write, the bandwidth, the number of used processes, the algorithm ID 
> and some other small information. This information can be stored as txt file 
> and sent to the data warehouse when the file system is free, which means the 
> monitor sends information to the data warehouse "OFFLINE".

You might find that IO -- particularly parallel IO -- is a *highly* complex 
issue with many different variables involved.  Specifically: it may be better 
to save as much meta data as possible, not just number of bytes and number of 
processes (for example).  Position in the file may also be relevant (e.g., you 
may be able to infer the file systems caching algorithms from information like 
this, which will have a significant effect on overall performance).  And so on.

> I will try to find out the impact of comm_spawn to the whole performance. 
> Besides starting another MPI process to monitoring the performance, is there 
> any possibility to integrating a monitoring function within the MPI process 
> or even within the MPI I/O operation? That means we start one MPI process, 
> there are multiple threads for I/O operations, in which one thread is in 
> charge of monitoring. Will that hurt a lot of performance? If necessary, the 
> Open MPI source code has to be overwritten for this purpose.

Sure, you can do this.  If you spawn off multiple threads, if they block most 
of the time (and I really mean *most* of the time), that's no big deal.  Or you 
could interpose your 

[OMPI users] Naming MPI_Spawn children

2012-06-18 Thread Jaison Paul Mulerikkal
HI,

I'm running openmpi on Rackspace cloud over Internet using MPI_Spawn. IT means,
I run the parent on my PC and the children on Rackspace cloud machines.
Rackspace provides direct IP addresses of the machines (no NAT), that is why it
is possible. 

Now, there is a communicator involving only the children and some communications
involve only communication between children (on Rackspace cloud, in this
scenario). When we conducted experiments, we experienced more than expected
delays in this operation - communication between children alone. 

My assumption is that openMPI is looking at the direct IP addresses at the
hostfile and try to communicate between Rackspace children over Internet. What I
would want/expect is the Rackspace children communicate between themselves
internally, using the internal Rackspace hostnames. Rackspace provide internal
IP addresses. But if I use that in the hostfile at my home PC, the parent wont
be able to access the children (there is a communicator involving parent and
children).

Can I anyway tell openMPI to look into the internal IP addresses of Rackspace
machines (another hostfile, may be) for the sub-group (communicator) involving
Rackspace children? In that case we will get performance improvement, I guess.

Thanks in advance for your valuable suggestions.

Jaison
Australian National University.