[OMPI users] Prioritization of --mca btl openib,tcp,self

2010-11-22 Thread Parallel Scientific
This is a follow-up to an earlier question, I'm trying to understand how --mca 
btl prioritizes it's choice for connectivity.  Going back to my original 
network, there are actually two networks running around.  A point to point 
Infiniband network that looks like this (with two fabrics):

A(port 1)(opensm)-->B
A(port 2)(opensm)-->C

The original question queried whether there was a way to avoid the problem of B 
and C not being able to talk to each other if I were to run

mpirun  -host A,B,C --mca btl openib,self -d /mnt/shared/apps/myapp

"At least one pair of MPI processes are unable to reach each other for
MPI communications." ...

There is an additional network though, I have an ethernet management network 
that connects to all nodes.  If our program retrieves the ranks from the nodes 
using TCP and then can shift to openib, that would be interesting and, in fact, 
if I run 

mpirun  -host A,B,C --mca btl openib,tcp,self -d /mnt/shared/apps/myapp

The program does, in fact, run cleanly.

But, the question I have now is does MPI "choose" to use tcp when it can find 
all nodes and then always use tcp, or will it fall back to openib if it can?

So ... more succinctly:
Given a list of btls, such as openib,tcp,self, and a program can only broadcast 
on tcp but individual operations can occur over openib between nodes, will 
mpirun use the first interconnect that works for each operation or once it 
finds one that the broadcast phase works on will it use that one permanently?

And, as a follow-up, can I turn off the attempt to broadcast to touch all nodes?

Paul Monday


[OMPI users] How to avoid abort when calling MPI_Finalize without calling MPI_File_close?

2010-11-22 Thread James Overfelt
Hello,

I have a small test case where a file created with MPI_File_open
is still open at the time MPI_Finalize is called.  In the actual
program there are lots of open files and it would be nice to avoid the
resulting "Your MPI job will now abort." by either having MPI_Finalize
close the files or honor the error handler and return an error code
without an abort.

  I've tried with with OpenMPI 1.4.3 and 1.5 with the same results.
Attached are the configure, compile and source files and the whole
program follows.

Any help would be appreciated,
Dr. James Overfelt
Sandia National Laboratories




#include 
#include "mpi.h"
using namespace std;
int main (int argc, char *argv[])
{
  MPI_Init(, );
  MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
  MPI_Info info ;
  MPI_Info_create();
  MPI_Filefh;
  if (const int stat = MPI_File_open
(MPI_COMM_SELF, "File_Test.txt", MPI_MODE_WRONLY |
MPI_MODE_CREATE, info, ))
cerr<< "Error in MPI_File_open"<< stat<

Re: [OMPI users] possible mismatch between MPI_Iprobe and MPI_Recv?

2010-11-22 Thread Jeff Squyres
On Nov 21, 2010, at 1:46 PM, Riccardo Murri wrote:

> I'm using code like this:
> 
>  MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, , );
>  if(flag) {
>int size;
>MPI_Get_count(, MPI_BYTE, );
>void* row = xmalloc(size);
>/* ... */
>MPI_Recv(row, size, MPI_BYTE,
> status.MPI_SOURCE, status.MPI_TAG, MPI_COMM_WORLD,
> );
>  /* ... */
>  }
> 
> Question: is it possible that, in the time my program progresses from
> MPI_Iprobe() to MPI_Recv(), another message has arrived, that matches
> the MPI_Recv(), but is not the one originally matched from
> MPI_Iprobe()?  (e.g. a shorter one)

No.  MPI guarantees matching order.  Given that you're receiving on a very 
specific signature (i.e., the source and tag from the successful probe), 
messages will be received in order.

As long as there's no other possible MPI_RECV on that signature between your 
MPI_IPROBE-that-returns-flag==true and the MPI_RECV shown above, then your 
MPI_RECV should be receiving the message that MPI_IPROBE returned flag==true 
for.

> In particular, could it be that the size of the message actually
> received by MPI_Recv() does not match `size` (the variable)?

No.

> In case a shorter message (different from the one initially matched)
> was received, can I get the actual message size via a new call to
> MPI_Get_count(_recv_status ...)?

You certainly can, but you should not need to -- they should be the same size.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] launching the mpi runtime

2010-11-22 Thread Jeff Squyres
On Nov 22, 2010, at 11:52 AM, Hicham Mouline wrote:

> i have now a application with a gui. The gui launches the calculations
> in-process serially.
> No MPI involved. Fine. The objective is to parallelize.

Gotcha.

> I want to keep the GUI(windows) as the control to start calcs and display
> results.
> 
> The GUI could be the master process of the mpi processes.
> That's bad because the executable image has deps on the GUI library and
> there's no need for all the mpi processes (the same executable) to have
> anything to do with the display.

Sounds reasonable.  You likely want to have (at least) 2 executables, then: the 
GUI and the compute worker.

> besides, i have a win box and a couple of linux boxes, and openmpi cannot
> mix both in the same group of mpi processes.

Sadly true.  There has been (very) little demand for this, so we haven't spent 
much (any) time on making it work.

> therefore, I guess I need to separate the GUI binary from the mpi-processes
> binary and have the GUI process talk to the "master" mpi process (on linux)
> for calc requests.
> 
> I was hoping I wouldn't have to write a custom code to do that.

MPI doesn't necessarily mean SPMD -- you can certainly have the GUI call 
MPI_INIT and then call MPI_COMM_SPAWN to launch a different executable to do 
the compute working stuff.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Multiple Subnet MPI Fail

2010-11-22 Thread Paul Monday (Parallel Scientific)
Thanks for the quick response ... I've been thinking about this today and tried 
a few things on my CentOS mini connected cluster ...

To use tcp btl I will have to set up a bridge on A with ib0 and ib1 
participating in the bridge, then tcp btl could be used as you suggest.  
Unfortunately, the obvious solution to use bridge-utils on CentOS does not 
support Infiniband adapters.

This is now straying out of MPI range to a networking issue ... any ideas would 
be greatly appreciated on bridging at the IP over IB tier in a cluster.  This 
must be a solved problem but I'm not having a lot of luck with google and the 
archives.

Paul Monday



On Nov 22, 2010, at 7:46 AM, Terry Dontje wrote:

> You're gonna have to use a protocol that can route through a machine, OFED 
> User Verbs (ie openib) does not do this.  The only way I know of to do this 
> via OMPI is with the tcp btl.
> 
> --td
> 
> On 11/22/2010 09:28 AM, Paul Monday (Parallel Scientific) wrote:
>> 
>> We've been using OpenMPI in a switched environment with success, but we've 
>> moved to a point to point environment to do some work.  Some of the nodes 
>> cannot talk directly to one another, sort of like this with computers A,B, C 
>> with A having two ports: 
>> 
>> A(1)(opensm)-->B 
>> A(2)(opensm)-->C 
>> 
>> B is not connected to C in any way. 
>> 
>> When we try to run our OpenMPI program we are receiving: 
>> At least one pair of MPI processes are unable to reach each other for 
>> MPI communications.  This means that no Open MPI device has indicated 
>> that it can be used to communicate between these processes.  This is 
>> an error; Open MPI requires that all MPI processes be able to reach 
>> each other.  This error can sometimes be the result of forgetting to 
>> specify the "self" BTL. 
>> 
>>   Process 1 ([[1581,1],5]) is on host: pg-B 
>>   Process 2 ([[1581,1],0]) is on host: pg-C 
>>   BTLs attempted: openib self sm 
>> 
>> Your MPI job is now going to abort; sorry. 
>> 
>> 
>> I hope I'm not being overly naive but, is their a way to join the subnets at 
>> the MPI layer?  It seems like IP over IB would be too high up the stack. 
>> 
>> Paul Monday 
>> ___ 
>> users mailing list 
>> us...@open-mpi.org 
>> http://www.open-mpi.org/mailman/listinfo.cgi/users 
> 
> 
> -- 
> 
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.don...@oracle.com
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] out of memory in io_romio_ad_nfs_read.c

2010-11-22 Thread Rob Latham
On Wed, Nov 17, 2010 at 11:23:29AM +0100, Zak wrote:
> Dear
> I m getting the following error, during the I/O
> 
> "out of memory in io_romio_ad_nfs_read.c, line 156"
> 
> do any one knew how I solve this issue during the read of file

that's odd.  How many processors?  can you tell me if it happens with
other file systems?If this is a read-only workload, you can prefix
the file name with 'ufs:' and take a different code path inside the
library. 

p.s. i know this won't be very helpful to you but parallel I/O to an
NFS volume, while supported, is not going to perform very well, and
will likely, despite the library's best efforts, give you incorrect
results.

==rob


-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA


Re: [OMPI users] launching the mpi runtime

2010-11-22 Thread Hicham Mouline
i have now a application with a gui. The gui launches the calculations
in-process serially.
No MPI involved. Fine. The objective is to parallelize.

I want to keep the GUI(windows) as the control to start calcs and display
results.

The GUI could be the master process of the mpi processes.
That's bad because the executable image has deps on the GUI library and
there's no need for all the mpi processes (the same executable) to have
anything to do with the display.

besides, i have a win box and a couple of linux boxes, and openmpi cannot
mix both in the same group of mpi processes.

therefore, I guess I need to separate the GUI binary from the mpi-processes
binary and have the GUI process talk to the "master" mpi process (on linux)
for calc requests.

I was hoping I wouldn't have to write a custom code to do that.


> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
> Behalf Of Jeff Squyres
> Sent: 22 November 2010 15:53
> To: Open MPI Users
> Subject: Re: [OMPI users] launching the mpi runtime
> 
> Other than MPI_COMM_SPAWN[_MULTIPLE], we don't expose the underlying
> run-time to MPI applications.
> 
> There is a whole programatic interface for the layer under MPI (the
> Open MPI Runtime Environment -- ORTE), though.  We don't advise mixing
> ORTE calls in MPI applications, but it is certainly feasible to use
> ORTE for non-MPI things (some of the OMPI community member
> organizations do so).
> 
> What are you trying to do?
> 
> 
> On Nov 18, 2010, at 11:37 AM, David Zhang wrote:
> 
> > you could spawn more processes from currently running processes.
> >
> > On Thu, Nov 18, 2010 at 3:05 AM, Hicham Mouline 
> wrote:
> > Hi,
> >
> > One typically uses mpirun to launch a set of mpi processes.
> >
> > Is there some programmatical interface to launching the runtime and
> having the process that launched the runtime becoming part of the list
> of mpi processes,
> >
> >
> > Regards,
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> > --
> > David Zhang
> > University of California, San Diego
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] SYSTEM CPU with OpenMPI 1.4.3

2010-11-22 Thread Jeff Squyres
Thanks for tracking that down!

Here's where it was removed:

https://svn.open-mpi.org/trac/ompi/changeset/23434

and then it was later applied to the v1.4 branch in r23448.

I'll double check back with Ralph (the ORTE guy), but I don't think that this 
change matters.


On Nov 18, 2010, at 5:19 AM, tmish...@jcity.maeda.co.jp wrote:

> Hi,
> 
> I found that ./openmpi-1.4.3/ompi/runtime/ompi_mpi_init.c was changed.
> Calling opal_progress_event_users_decrement was deleted as below.
> 
> $diff openmpi-1.4.2/ompi/runtime/ompi_mpi_init.c openmpi-1.4.3
> /ompi/runtime/ompi_mpi_init.c
> 813,819d812
> < /* Undo ORTE calling opal_progress_event_users_increment() during
>     < opal_progress_event_users_decrement();
> <
> 
> I confirmed that this change affects SYSTEM CPU & performace decrement.
> Is it safe to replace the routine with older one?
> 
> Anyway, if it's just a mistake, please fix it in the next version.
> 
> Regards,
> Tetsuya Mishima
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] mpi-io, fortran, going crazy... (ADENDA)

2010-11-22 Thread Jeff Squyres
On Nov 17, 2010, at 6:58 AM, Ricardo Reis wrote:

> Thanks for the explanation. Then this should be updated in the spec no...?

You have no idea.  :-)

The MPI Forum has been debating about exactly this issue for over a year.  It 
turns out to be a surprisingly complex, subtle issue (i.e., it's not easy to 
just "upgrade" the type used to pass counts around in MPI functions).  

The Forum has not resolved this issue yet; a small subset of the issues are 
described in the SC MPI Forum BOF slides that were presented last week.  Rich 
Graham is going to post those slides on the web somewhere, but I don't think he 
has posted them yet.

As Gus points out, the workaround is to use MPI datatypes so that your count 
can still be "low" (i.e., still positive for an INTEGER or int), even though 
the total amount of data being written/sent/received/whatever is larger.  MPI 
should do the Right Thing under the covers in this case.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Infiniband problem, kernel mismatch

2010-11-22 Thread Jeff Squyres
On Nov 18, 2010, at 7:03 PM, HeeJin Kim wrote:

> I'm using Mellanox infiniband network card and trying to run it with openmpi.
> The problem is that I can connect and communicate between nodes, but I'm not 
> sure whether it is in a correct state or not.
> 
> I have two version of openmpi, one is compiled with mca-btl-openib and the 
> other is without btl-openib.(I checked it in ompi_info)

You should ensure that the version of Open MPI available on all nodes is the 
same exact version.

> And my jobs are running well using the openmpi without btl-openib, 
> but when I run the exactly same job using openmpi with btl-openmpi, I meet 
> the  following error.
> 
>mlx4: There is a mismatch between the kernel and the userspace libraries: 
> Kernel does not support XRC. Exiting.

At first blush, this sounds like a problem with your OFED installation.  You 
should contact your IB vendor for assistance.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] boost + openmpi link errors: missing openmpi libraries

2010-11-22 Thread Jeff Squyres
This looks like something the boost people will have to support you with -- we 
don't know anything about their installer.

Sorry!  :(


On Nov 19, 2010, at 2:40 PM, Hicham Mouline wrote:

> hello,sorry for cross posting. I've built openmpi 1.4.3 on win32 and 
> generated only 4 release libs:
> 3,677,712 libmpi.lib
>   336,466 libmpi_cxx.lib
>   758,686 libopen-pal.lib
> 1,307,592 libopen-rte.lib
> 
> 
> 
> I've installed the boostPro 1.44 mpi library with the installer. But I have 
> link errors:
> 
> 
> 1>libboost_mpi-vc90-mt-1_44.lib(broadcast.obj) : error LNK2001: unresolved 
> external symbol _MPI_Bcast@20
> 
> 
> is boostpro's mpi lib against openmpi or another MPI implementation?
> 
> 
> regards,
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] launching the mpi runtime

2010-11-22 Thread Jeff Squyres
Other than MPI_COMM_SPAWN[_MULTIPLE], we don't expose the underlying run-time 
to MPI applications.

There is a whole programatic interface for the layer under MPI (the Open MPI 
Runtime Environment -- ORTE), though.  We don't advise mixing ORTE calls in MPI 
applications, but it is certainly feasible to use ORTE for non-MPI things (some 
of the OMPI community member organizations do so).

What are you trying to do?


On Nov 18, 2010, at 11:37 AM, David Zhang wrote:

> you could spawn more processes from currently running processes.
> 
> On Thu, Nov 18, 2010 at 3:05 AM, Hicham Mouline  wrote:
> Hi,
> 
> One typically uses mpirun to launch a set of mpi processes.
> 
> Is there some programmatical interface to launching the runtime and having 
> the process that launched the runtime becoming part of the list of mpi 
> processes,
> 
>  
> Regards,
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> -- 
> David Zhang
> University of California, San Diego
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Multiple Subnet MPI Fail

2010-11-22 Thread Terry Dontje
You're gonna have to use a protocol that can route through a machine, 
OFED User Verbs (ie openib) does not do this.  The only way I know of to 
do this via OMPI is with the tcp btl.


--td

On 11/22/2010 09:28 AM, Paul Monday (Parallel Scientific) wrote:
We've been using OpenMPI in a switched environment with success, but 
we've moved to a point to point environment to do some work.  Some of 
the nodes cannot talk directly to one another, sort of like this with 
computers A,B, C with A having two ports:


A(1)(opensm)-->B
A(2)(opensm)-->C

B is not connected to C in any way.

When we try to run our OpenMPI program we are receiving:
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[1581,1],5]) is on host: pg-B
  Process 2 ([[1581,1],0]) is on host: pg-C
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.


I hope I'm not being overly naive but, is their a way to join the 
subnets at the MPI layer?  It seems like IP over IB would be too high 
up the stack.


Paul Monday
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





[OMPI users] Multiple Subnet MPI Fail

2010-11-22 Thread Paul Monday (Parallel Scientific)
We've been using OpenMPI in a switched environment with success, but 
we've moved to a point to point environment to do some work.  Some of 
the nodes cannot talk directly to one another, sort of like this with 
computers A,B, C with A having two ports:


A(1)(opensm)-->B
A(2)(opensm)-->C

B is not connected to C in any way.

When we try to run our OpenMPI program we are receiving:
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[1581,1],5]) is on host: pg-B
  Process 2 ([[1581,1],0]) is on host: pg-C
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.


I hope I'm not being overly naive but, is their a way to join the 
subnets at the MPI layer?  It seems like IP over IB would be too high up 
the stack.


Paul Monday


Re: [OMPI users] win32 and linux64

2010-11-22 Thread Hicham Mouline
No way!!! That is so limiting,

Are you aware of any MPI implementation that is able to do both windows and
linux?

regards,

  _  

Hi Hicham, 

Unfortunately, it's not possible to run over both windows and linux. 

Regards, 
Shiqing 

--
Shiqing Fan  http://www.hlrs.de/people/fan
High Performance Computing   Tel.: +49 711 685 87234
   Center Stuttgart (HLRS)Fax.: +49 711 685 65832
Address:Allmandring 30   email: fan_at_[hidden]
70569 Stuttgart


Re: [OMPI users] win32 and linux64

2010-11-22 Thread Shiqing Fan

Hi Hicham,

Unfortunately, it's not possible to run over both windows and linux.


Regards,
Shiqing

On 2010-11-22 10:04 AM, Hicham Mouline wrote:

Hello

Is it possible to run openmpi application over 2 hosts win32 and linux64?

I ran this from the win box

mpirun -np 2 --hetero --host localhost,host2 .\Test1.exe

and the error was:
[:04288] This feature hasn't been implemented yet.
[:04288] Could not connect to namespace cimv2 on node host2.
Error code =-2147023174

Obviously I need to provide the linux binary as well, on host2.

Is this at all possible?

regards,

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
--
Shiqing Fan  http://www.hlrs.de/people/fan
High Performance Computing   Tel.: +49 711 685 87234
  Center Stuttgart (HLRS)Fax.: +49 711 685 65832
Address:Allmandring 30   email: f...@hlrs.de
70569 Stuttgart



[OMPI users] win32 and linux64

2010-11-22 Thread Hicham Mouline
Hello

Is it possible to run openmpi application over 2 hosts win32 and linux64?

I ran this from the win box
> mpirun -np 2 --hetero --host localhost,host2 .\Test1.exe

and the error was:
[:04288] This feature hasn't been implemented yet.
[:04288] Could not connect to namespace cimv2 on node host2.
Error code =-2147023174

Obviously I need to provide the linux binary as well, on host2.

Is this at all possible?

regards,