Re: [OMPI devel] [devel-core] Trunk borked?

2007-08-06 Thread Ralph Castain

On 8/6/07 5:55 PM, "Jeff Squyres"  wrote:

> On Aug 6, 2007, at 2:33 PM, Ralph H Castain wrote:
> 
>>> This is probably my fault somehow;
>> 
>> Isn't everything??  :-)
> 
> I believe that there is an official OMPI rule about this, yes.  ;-)
> 
>>> I can look into this but not
>>> immediately.  I'm guessing this is related to the IOF fix that I put
>>> in last week sometime.  If you can deal without io from the
>>> COMM_SPAWN children for a little while, I can look at it in a few
>>> days...
>> 
>> No problem, really - just wanted to ensure someone was aware of it.
> 
> Can you do me a favor and file a ticket about this and assign it to me?

Will do so on Tues morning...





Re: [OMPI devel] MPI_Win_get_group

2007-08-06 Thread Jeff Squyres

On Aug 6, 2007, at 2:42 PM, Lisandro Dalcin wrote:


having to call XXX.Free() for  every
object i get from a call like XXX.Get_something() is really an
unnecesary pain.


Gotcha.

But I don't see why this means that you need to know if an MPI handle
points to an intrinsic object or not...?


Because many predefined, intrinsic objects cannot (or should not be
able to) be freed, acording to the standard.


I understand that.  :-)  But why would you call XXX.Free() on an  
intrinsic object?  If you're instantiating an MPI handle, you know  
that it's a user-created object and therefore you should MPI free it,  
right?  If you didn't instantiate it, then it's not a user-defined  
object, and therefore you shouldn't MPI free it.


If it's a question of trying to have a generic destructor (pardon me  
-- I know next to nothing about python) for your MPI handle classes,  
you can have a private member flag in your handle class indicating  
whether the underlying MPI handle is intrinsic or not.  Have a  
special communicator for instantiating the global / intrinsic objects  
(e.g., for MPI_INT) that sets this flag for "true"; have all other  
constructors set it to "false".  In the destructor, you check this  
flag and know whether you should call the corresponding MPI free  
function (assuming you solve issues surrounding deadlock, etc.).



Yes and no.  As the author of the C++ bindings chapter in MPI-2, I
have a pretty good idea why we didn't do this.  :-)


Please, do not missunderstand me. C++ bindings are almost perfect for
me. The only thing I object a bit is the open-door for dangling
references. Any way, this is a minor problem. And the C++ bindings are
my source of inspiration for my python wrappers, as they are really
good for me.


Good!  That's exactly what they were intended to be.  :-)

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] [devel-core] Trunk borked?

2007-08-06 Thread Jeff Squyres

On Aug 6, 2007, at 2:33 PM, Ralph H Castain wrote:


This is probably my fault somehow;


Isn't everything??  :-)


I believe that there is an official OMPI rule about this, yes.  ;-)


I can look into this but not
immediately.  I'm guessing this is related to the IOF fix that I put
in last week sometime.  If you can deal without io from the
COMM_SPAWN children for a little while, I can look at it in a few
days...


No problem, really - just wanted to ensure someone was aware of it.


Can you do me a favor and file a ticket about this and assign it to me?

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] problem with system() call and openib - blocks send/recv

2007-08-06 Thread Jeff Squyres

On Aug 6, 2007, at 8:08 AM, Bill Wichser wrote:


Now I have another issue, which we fixed, with ROMIO/PVFS2/and openmpi
1.2.3.  It seems that ROMIO support is way behind in openmpi and  
what we

did was basically copy the stuff from mpich2, apply the pvfs2 romio
patch and our problems went away.

Should I post this to the developer's list, or is this too something
which you folks are aware of and will address before the next release?


This is the developer's list.  ;-)

We have an upgrade of ROMIO scheduled, but not until the 1.3  
series.  :-(


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] [RFC] Upgrade to newer libtool 2.1 snapshot

2007-08-06 Thread Jeff Squyres
I forgot to mention: Brian and I chatted today and we have both been  
building the 1.2 branch with AM 1.10 / AC 2.61 for a long time and  
it's been fine.


So we're throwing it on the to-do list to upgrade the nightly tarball  
generation process to AM 1.10 / AC 2.61.


As for upgrading the Libtool 2.1 that we use -- maybe we'll wait for  
the upcoming xlc fixes (if there's no urgent need to upgrade and the  
xlc stuff might be in within the next few weeks).  :-)



On Aug 6, 2007, at 2:48 PM, Ralf Wildenhues wrote:


Hello Jeff,

* Jeff Squyres wrote on Mon, Aug 06, 2007 at 04:27:59PM CEST:

On Aug 5, 2007, at 3:41 PM, Ralf Wildenhues wrote:

WHY: https://svn.open-mpi.org/trac/ompi/ticket/982 is fixed by  
newer

Libtool snapshots (e.g., 1.2444 2007/04/10 is what I have installed
at Cisco).

[...]
Asking because I don't think the bug was consciously fixed in  
Libtool;
only a test was added to expose the issue.  I'll put it on my  
list of

things to look at.

[...]

FWIW, note that we are applying this patch to the generated
aclocal.m4 (in all versions -- it appears to apply cleanly with a
little fuzz on the exact line numbering):


Ahh, yes, that was the patch that fixed the problem (rather than the
Autoconf upgrade), I remember now.  Thanks for searching!


--- aclocal.m4.old  2007-04-20 15:18:48.0 -0700
+++ aclocal.m4  2007-04-20 15:18:59.0 -0700
@@ -5311,7 +5311,7 @@
# Commands to make compiler produce verbose output that lists
# what "hidden" libraries, object files and flags are used  
when

# linking a shared library.
-  output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.
$objext 2>&1 | $GREP "\-L"'
+  output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.
$objext 2>&1 | $GREP "\-L" | tail -n 1'

  else
GXX=no

This fixes the problem for us (we stole it from a libtool mailing
list post from a long time ago).  If this could be applied to the
Libtool development trunk, that would be great...  :-)


The patch has two issues.  First a simple one, it should be
  sed -n '$p'
instead of `tail -n 1', for portability.  Second, and more  
importantly,
I remember to have tested the patch on some but not all compilers  
that I
know do pretend to be g++ at times (icpc, pathCC?, pgCC?).  I hope  
none
of them (nor g++ either) get the idea of splitting long output  
lines of

`$CXX -v' with backslash-newline.

Anyway, I'll put on my list throwing out another test round for the
patch.

Cheers,
Ralf
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] [RFC] Upgrade to newer libtool 2.1 snapshot

2007-08-06 Thread Jeff Squyres

On Aug 6, 2007, at 2:48 PM, Ralf Wildenhues wrote:


This fixes the problem for us (we stole it from a libtool mailing
list post from a long time ago).  If this could be applied to the
Libtool development trunk, that would be great...  :-)


The patch has two issues.  First a simple one, it should be
  sed -n '$p'
instead of `tail -n 1', for portability.  Second, and more  
importantly,
I remember to have tested the patch on some but not all compilers  
that I
know do pretend to be g++ at times (icpc, pathCC?, pgCC?).  I hope  
none
of them (nor g++ either) get the idea of splitting long output  
lines of

`$CXX -v' with backslash-newline.


FWIW, I've been regression testing OMPI with this patch against GNU,  
Intel 9.0, 9.1, pathscale, and the PGI compilers and it's been  
fine...  (of course, that doesn't imply anything about future  
behavior ;-) ).



Anyway, I'll put on my list throwing out another test round for the
patch.


Thanks!

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Startup failure on mixed IPv4/IPv6 environment (oob tcp bug?)

2007-08-06 Thread Brian Barrett

On Aug 6, 2007, at 3:05 PM, dispan...@sobel.ls.la wrote:


* Brian Barrett  [06.08.2007 18:09]:

On Aug 5, 2007, at 3:05 PM, dispan...@sobel.ls.la wrote:

Can you try the attached patch?  It's pretty close to what you've
suggested, but should eliminate one corner case that you could, in
theory, run into with your solution.  You are using a nightly tarball
from the trunk, correct?


The patch works, thank you! I'm using the trunk.


Thanks!  I've committed the patch to our trunk -- it'll be in the  
nightly tarballs starting tonight.



Brian


Re: [OMPI devel] Startup failure on mixed IPv4/IPv6 environment (oob tcp bug?)

2007-08-06 Thread dispanser
* Brian Barrett  [06.08.2007 18:09]:
> On Aug 5, 2007, at 3:05 PM, dispan...@sobel.ls.la wrote:
> 
> Can you try the attached patch?  It's pretty close to what you've  
> suggested, but should eliminate one corner case that you could, in  
> theory, run into with your solution.  You are using a nightly tarball  
> from the trunk, correct?

The patch works, thank you! I'm using the trunk.

thanks,

Thomas

--
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany


Re: [OMPI devel] [RFC] Upgrade to newer libtool 2.1 snapshot

2007-08-06 Thread Ralf Wildenhues
Hello Jeff,

* Jeff Squyres wrote on Mon, Aug 06, 2007 at 04:27:59PM CEST:
> On Aug 5, 2007, at 3:41 PM, Ralf Wildenhues wrote:
> 
> >> WHY: https://svn.open-mpi.org/trac/ompi/ticket/982 is fixed by newer
> >> Libtool snapshots (e.g., 1.2444 2007/04/10 is what I have installed
> >> at Cisco).
[...]
> > Asking because I don't think the bug was consciously fixed in Libtool;
> > only a test was added to expose the issue.  I'll put it on my list of
> > things to look at.
[...]
> FWIW, note that we are applying this patch to the generated  
> aclocal.m4 (in all versions -- it appears to apply cleanly with a  
> little fuzz on the exact line numbering):

Ahh, yes, that was the patch that fixed the problem (rather than the
Autoconf upgrade), I remember now.  Thanks for searching!

> --- aclocal.m4.old  2007-04-20 15:18:48.0 -0700
> +++ aclocal.m4  2007-04-20 15:18:59.0 -0700
> @@ -5311,7 +5311,7 @@
> # Commands to make compiler produce verbose output that lists
> # what "hidden" libraries, object files and flags are used when
> # linking a shared library.
> -  output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest. 
> $objext 2>&1 | $GREP "\-L"'
> +  output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest. 
> $objext 2>&1 | $GREP "\-L" | tail -n 1'
> 
>   else
> GXX=no
> 
> This fixes the problem for us (we stole it from a libtool mailing  
> list post from a long time ago).  If this could be applied to the  
> Libtool development trunk, that would be great...  :-)

The patch has two issues.  First a simple one, it should be 
  sed -n '$p'
instead of `tail -n 1', for portability.  Second, and more importantly,
I remember to have tested the patch on some but not all compilers that I
know do pretend to be g++ at times (icpc, pathCC?, pgCC?).  I hope none
of them (nor g++ either) get the idea of splitting long output lines of
`$CXX -v' with backslash-newline.

Anyway, I'll put on my list throwing out another test round for the
patch.

Cheers,
Ralf


Re: [OMPI devel] MPI_Win_get_group

2007-08-06 Thread Lisandro Dalcin
On 8/1/07, Jeff Squyres  wrote:
> On Jul 31, 2007, at 6:43 PM, Lisandro Dalcin wrote:
>> having to call XXX.Free() for  every
> > object i get from a call like XXX.Get_something() is really an
> > unnecesary pain.
>
> Gotcha.
>
> But I don't see why this means that you need to know if an MPI handle
> points to an intrinsic object or not...?

Because many predefined, intrinsic objects cannot (or should not be
able to) be freed, acording to the standard.

> > Many things in MPI are LOCAL (datatypes, groups, predefined
> > operations) and in general destroying them for user-space is
> > guaranteed by MPI to not conflict with system(MPI)-space and
> > communication (i.e. if you create a derived datatype four using it in
> > a construction of another derived datatype, you can safely free the
> > first).
> >
> > Well, for all those LOCAL objects, I could implement automatic
> > deallocation of handles for Python (for Comm, Win, and File, that is
> > not so easy, at freeing them is a collective operation AFAIK, and
> > automaticaly freeing them can lead to deadlocks).
>
> This is a difficult issue -- deadlocks for removing objects that are
> collective actions.  It's one of the reasons the Forum decided not to
> have the C++ bindings automatically free handles when they go out of
> scope.

An that was a really good and natural decision.

> > Sorry for the long mail. In short, many things in MPI are not clearly
> > designed for languages other than C and Fortran. Even in C++
> > specification, there are things that are unnaceptable, like the
> > open-door to the problem of having dangling references, which could be
> > avoided with negligible cost.
>
> Yes and no.  As the author of the C++ bindings chapter in MPI-2, I
> have a pretty good idea why we didn't do this.  :-)

Please, do not missunderstand me. C++ bindings are almost perfect for
me. The only thing I object a bit is the open-door for dangling
references. Any way, this is a minor problem. And the C++ bindings are
my source of inspiration for my python wrappers, as they are really
good for me.

> The standard is meant to be as simple, straightforward,
> and cross-language as possible (and look where it is!  Imagine if we
> had tried to make a real class library -- it would have led to even
> more corner cases and imprecision in the official standard).

Well, I have to completely agree with you. And as I said before, the
corner cases are really a few, compared to all the number of (rather
orthogonal) features provided in MPI. And all guess all this is going
to be solved with minor clarifications/corrections in MPI-2.1.



-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] [devel-core] Trunk borked?

2007-08-06 Thread Ralph H Castain



On 8/6/07 1:51 PM, "Jeff Squyres"  wrote:

> On Aug 6, 2007, at 11:49 AM, Ralph H Castain wrote:
> 
>> 1. if everything is being done on localhost, I do not see any of
>> the IO from
>> the child process. Mpirun executes and completes cleanly, however.
>> Because,
>> the spawn'd child terminates so quickly, I haven't been able to
>> positively
>> confirm it is actually running - though I have some indication that
>> it is.
> 
> This is probably my fault somehow;

Isn't everything??  :-)

> I can look into this but not
> immediately.  I'm guessing this is related to the IOF fix that I put
> in last week sometime.  If you can deal without io from the
> COMM_SPAWN children for a little while, I can look at it in a few
> days...

No problem, really - just wanted to ensure someone was aware of it.

> 
>> 2. if running on multiple hosts, I see the output from the child
>> processes,
>> but mpirun "hangs" in MPI_Comm_disconnect. A ctrl-C is able to kill
>> the
>> entire job.
> 
> I can't comment on this one...

Could be related - let's fix the first and see if the second goes away.

Thanks
Ralph

> 
>> Any ideas on what might have happened? This was all working not
>> that long
>> ago...can't swear to an r-level at the moment, but am hoping
>> someone has an
>> idea before I start having to blindly work backwards to find out
>> what broke
>> it.
>> 
>> Thanks
>> Ralph
>> 
>> 
>> ___
>> devel-core mailing list
>> devel-c...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel-core
> 




Re: [OMPI devel] [devel-core] Trunk borked?

2007-08-06 Thread Jeff Squyres

On Aug 6, 2007, at 11:49 AM, Ralph H Castain wrote:

1. if everything is being done on localhost, I do not see any of  
the IO from
the child process. Mpirun executes and completes cleanly, however.  
Because,
the spawn'd child terminates so quickly, I haven't been able to  
positively
confirm it is actually running - though I have some indication that  
it is.


This is probably my fault somehow; I can look into this but not  
immediately.  I'm guessing this is related to the IOF fix that I put  
in last week sometime.  If you can deal without io from the  
COMM_SPAWN children for a little while, I can look at it in a few  
days...


2. if running on multiple hosts, I see the output from the child  
processes,
but mpirun "hangs" in MPI_Comm_disconnect. A ctrl-C is able to kill  
the

entire job.


I can't comment on this one...

Any ideas on what might have happened? This was all working not  
that long
ago...can't swear to an r-level at the moment, but am hoping  
someone has an
idea before I start having to blindly work backwards to find out  
what broke

it.

Thanks
Ralph


___
devel-core mailing list
devel-c...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel-core



--
Jeff Squyres
Cisco Systems



[OMPI devel] Trunk borked?

2007-08-06 Thread Ralph H Castain
Yo all

I've been playing with the trunk today and found it appears to be broken for
comm_spawn. I'm getting two types of errors, perhaps related:

1. if everything is being done on localhost, I do not see any of the IO from
the child process. Mpirun executes and completes cleanly, however. Because,
the spawn'd child terminates so quickly, I haven't been able to positively
confirm it is actually running - though I have some indication that it is.

2. if running on multiple hosts, I see the output from the child processes,
but mpirun "hangs" in MPI_Comm_disconnect. A ctrl-C is able to kill the
entire job.

Any ideas on what might have happened? This was all working not that long
ago...can't swear to an r-level at the moment, but am hoping someone has an
idea before I start having to blindly work backwards to find out what broke
it.

Thanks
Ralph




Re: [OMPI devel] Startup failure on mixed IPv4/IPv6 environment (oob tcp bug?)

2007-08-06 Thread Brian Barrett

On Aug 5, 2007, at 3:05 PM, dispan...@sobel.ls.la wrote:

I fixed the problem by setting the peer_state to  
MCA_OOB_TCP_CONNECTING
after creating the socket, which works for me.  I'm not sure if  
this is

always correct, though.


Can you try the attached patch?  It's pretty close to what you've  
suggested, but should eliminate one corner case that you could, in  
theory, run into with your solution.  You are using a nightly tarball  
from the trunk, correct?


Thanks,

Brian



oob_ipv6.diff
Description: Binary data


Re: [OMPI devel] [RFC] Upgrade to newer libtool 2.1 snapshot

2007-08-06 Thread Jeff Squyres

On Aug 5, 2007, at 3:41 PM, Ralf Wildenhues wrote:


WHAT: Upgrade to a newer Libtool 2.1 nightly snapshot (we are
currently using 1.2362 2007/01/23) for making OMPI tarballs.

WHY: https://svn.open-mpi.org/trac/ompi/ticket/982 is fixed by newer
Libtool snapshots (e.g., 1.2444 2007/04/10 is what I have installed
at Cisco).


Is it?  If so, then I would like to know why (config.log outputs for
both would be nice).  Could have been an Autoconf update instead.
Asking because I don't think the bug was consciously fixed in Libtool;
only a test was added to expose the issue.  I'll put it on my list of
things to look at.


While gathering data for this reply, I realized that you are exactly  
right: it's not the difference in the versions of Libtool that is the  
problem, it's the difference in versions of Autoconf (the OMPI v1.2  
nightly tarball uses AC 2.59, the OMPI trunk nightly tarball uses AC  
2.61, I use AC 2.61 in my development copies).


So I'll change my RFC and send it around again to upgrade the version  
of AC that we're using in the 1.2 tarball.  There may be some second- 
order effects of doing this; I'll chat with Brian about it (he  
watches this stuff much more closely than me).


FWIW, note that we are applying this patch to the generated  
aclocal.m4 (in all versions -- it appears to apply cleanly with a  
little fuzz on the exact line numbering):


--- aclocal.m4.old  2007-04-20 15:18:48.0 -0700
+++ aclocal.m4  2007-04-20 15:18:59.0 -0700
@@ -5311,7 +5311,7 @@
   # Commands to make compiler produce verbose output that lists
   # what "hidden" libraries, object files and flags are used when
   # linking a shared library.
-  output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest. 
$objext 2>&1 | $GREP "\-L"'
+  output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest. 
$objext 2>&1 | $GREP "\-L" | tail -n 1'


 else
   GXX=no

This fixes the problem for us (we stole it from a libtool mailing  
list post from a long time ago).  If this could be applied to the  
Libtool development trunk, that would be great...  :-)





Plus, it's a newer version, so it's better, right?  ;-)


FWIW, a patch applied today fixes a regression introduced on  
2007-05-08

and reported by Brian.

Cheers,
Ralf
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] problem with system() call and openib - blocks send/recv

2007-08-06 Thread Bill Wichser
Thanks Jeff!  I looked and didn't find anything but sure enough, there 
it is!


Now I have another issue, which we fixed, with ROMIO/PVFS2/and openmpi 
1.2.3.  It seems that ROMIO support is way behind in openmpi and what we 
did was basically copy the stuff from mpich2, apply the pvfs2 romio 
patch and our problems went away.


Should I post this to the developer's list, or is this too something 
which you folks are aware of and will address before the next release?


Bill

Jeff Squyres wrote:

Bill --

Check out http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork.

To my knowledge, RHEL4 has not yet received a hotfix that will allow  
fork() with OpenFabrics verbs applications when memory is still  
registered in the parent.





Re: [OMPI devel] problem with system() call and openib - blocks send/recv

2007-08-06 Thread Gleb Natapov
On Mon, Aug 06, 2007 at 09:53:20AM -0400, Bill Wichser wrote:
> We have run across an issue, probably more related to openib than to 
> openmpi but don't know how to resolve.
> 
> Linux kernel - 2.6.9-55.0.2.ELsmp x86_64
fork (and thus system()) is not supported by openib in this kernel.
To get system() working you need kernel 2.6.12 at least. To get fork()
somewhat working you need kernel 2.6.16 (or 17 don't remember exactly)
and libibverbs-1.1.

--
Gleb.


Re: [OMPI devel] problem with system() call and openib - blocks send/recv

2007-08-06 Thread Jeff Squyres

Bill --

Check out http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork.

To my knowledge, RHEL4 has not yet received a hotfix that will allow  
fork() with OpenFabrics verbs applications when memory is still  
registered in the parent.



On Aug 6, 2007, at 7:53 AM, Bill Wichser wrote:

We have run across an issue, probably more related to openib than  
to openmpi but don't know how to resolve.


Linux kernel - 2.6.9-55.0.2.ELsmp x86_64
libibverbs-1.0.4-7

openmpi - it doesn't matter - 1.1.5 and 1.2.3 both fail.

When the sample code is run across IB nodes, using the IB  
interface, the receive just hangs whenever a system call is  
issued.  Removing this system call removes the hang.  Running  
across the nodes over TCP removes the hang.  Running on a single  
node removes the hang.  Only when using the IB interface do we have  
this hang.


So the simple solution is "don't do this" but apparently something  
deeper is involved and who knows where it will pop up again.


Thanks,
Bill

ps - sample code compiled using mpicc, built with gcc.  You'll need  
a test.dat file for the system("cp") command.

#include 
#include 
#include 

char All[4840];
int ThisTask;
int NTask;

int main(int argc, char **argv)
{
  int task;
  int nothing;
  MPI_Status status;

  int errorFlag = 0;
  int sysstatus;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &ThisTask);
  MPI_Comm_size(MPI_COMM_WORLD, &NTask);
#if 1
  if(ThisTask == 0) {
  printf("Task %d cmd run\n", ThisTask);
  sysstatus = system(
"cp test.dat test2.dat");
  printf("Task %d cmd status %d\n", ThisTask, sysstatus);
  }
#else
  if (ThisTask == 0) {
 sleep(60);
  }
#endif

  if (ThisTask == 0) {
printf("Task 0 Wait Loop START\n");
for (task = 1; task < NTask; task++) {
   printf("Task %d Recv START\n", task);
   MPI_Recv(¬hing, sizeof(nothing), MPI_BYTE, task, 0,  
MPI_COMM_WORLD,

&status);
   printf("Task %d Recv END\n", task);
}
printf("Task 0 Wait Loop END\n");
  }
  else {
printf("Task %d Send START\n", ThisTask);
MPI_Send(¬hing, sizeof(nothing), MPI_BYTE, 0, 0,  
MPI_COMM_WORLD);

printf("Task %d Send END\n", ThisTask);
  }

  printf("Task %d Finalize START\n", ThisTask);
  MPI_Finalize();   /* clean up & finalize MPI */
  printf("Task %d Finalize END\n", ThisTask);

  return 0;
}
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



[OMPI devel] using google-perftools for hunting memory leaks

2007-08-06 Thread Sven Stork
Dear all,

while hunting for memory leaks I found the google performance tools quite 
useful. The included memory manager has the feature for checking for memory 
leak. Unlike other tools you can use this feature without any recompilation 
and still get some nice call graph locating the memory allocation root (see 
attachment). As it might also be interesting for other people I wanted to 
mention it. Here the link to the homepage :

http://goog-perftools.sourceforge.net

Cheers,
  Sven


pprof6154.0.pdf
Description: Adobe PDF document


[OMPI devel] problem with system() call and openib - blocks send/recv

2007-08-06 Thread Bill Wichser
We have run across an issue, probably more related to openib than to 
openmpi but don't know how to resolve.


Linux kernel - 2.6.9-55.0.2.ELsmp x86_64
libibverbs-1.0.4-7

openmpi - it doesn't matter - 1.1.5 and 1.2.3 both fail.

When the sample code is run across IB nodes, using the IB interface, the 
receive just hangs whenever a system call is issued.  Removing this 
system call removes the hang.  Running across the nodes over TCP removes 
the hang.  Running on a single node removes the hang.  Only when using 
the IB interface do we have this hang.


So the simple solution is "don't do this" but apparently something 
deeper is involved and who knows where it will pop up again.


Thanks,
Bill

ps - sample code compiled using mpicc, built with gcc.  You'll need a 
test.dat file for the system("cp") command.
#include 
#include 
#include 

char All[4840];
int ThisTask;
int NTask;

int main(int argc, char **argv)
{
  int task;
  int nothing;
  MPI_Status status;

  int errorFlag = 0;
  int sysstatus;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &ThisTask);
  MPI_Comm_size(MPI_COMM_WORLD, &NTask);
#if 1
  if(ThisTask == 0) {
  printf("Task %d cmd run\n", ThisTask);
  sysstatus = system(
"cp test.dat test2.dat");
  printf("Task %d cmd status %d\n", ThisTask, sysstatus);
  }
#else
  if (ThisTask == 0) {
 sleep(60);
  }
#endif

  if (ThisTask == 0) {
printf("Task 0 Wait Loop START\n");
for (task = 1; task < NTask; task++) {
   printf("Task %d Recv START\n", task);
   MPI_Recv(¬hing, sizeof(nothing), MPI_BYTE, task, 0, MPI_COMM_WORLD, 
&status);
   printf("Task %d Recv END\n", task);
}
printf("Task 0 Wait Loop END\n");
  }
  else {
printf("Task %d Send START\n", ThisTask);
MPI_Send(¬hing, sizeof(nothing), MPI_BYTE, 0, 0, MPI_COMM_WORLD);
printf("Task %d Send END\n", ThisTask);
  }

  printf("Task %d Finalize START\n", ThisTask);
  MPI_Finalize();		/* clean up & finalize MPI */
  printf("Task %d Finalize END\n", ThisTask);

  return 0;
}