Re: [MTT devel] [MTT svn] svn:mtt-svn r1176

2008-04-04 Thread Jeff Squyres

Um -- yeah, probably.  :-)

But there's also likely no harm in leaving them there.  :-)


On Apr 4, 2008, at 4:29 PM, Ethan Mallove wrote:

I like the "all" keyword. Are these no longer needed?

 _mpi_get_names()
 _mpi_install_names()
 _test_get_names()
 _test_build_names()

-Ethan

On Fri, Apr/04/2008 03:31:07PM, jsquy...@osl.iu.edu wrote:

Author: jsquyres
Date: 2008-04-04 15:31:07 EDT (Fri, 04 Apr 2008)
New Revision: 1176
URL: https://svn.open-mpi.org/trac/mtt/changeset/1176

Log:
Allow "all" keyword in mpi_get, test_get, and test_build fields.

Text files modified:
  trunk/CHANGES| 5 +
  trunk/lib/MTT/MPI/Install.pm | 9 ++---
  trunk/lib/MTT/Test/Build.pm  | 9 ++---
  trunk/lib/MTT/Test/Run.pm| 6 --
  4 files changed, 21 insertions(+), 8 deletions(-)

Modified: trunk/CHANGES
=
=
=
=
=
=
=
=
=
=
--- trunk/CHANGES   (original)
+++ trunk/CHANGES   2008-04-04 15:31:07 EDT (Fri, 04 Apr 2008)
@@ -50,6 +50,11 @@
- _details() - pass arbitrary values from test run sections
   to the mpi details section, indexed by string

+- Allow mpi_get_name, test_get_name, and test_build_name fields to
+  accept the special value "all", meaning that they'll use all
+  corresponding sections that are found (vs. needing to list every
+  section explicitly)
+
- Added export for MTT_TEST_EXECUTABLE, may be used for clean up  
after

  mpi process : pkill -9 $MTT_TEST_EXECUTABLE


Modified: trunk/lib/MTT/MPI/Install.pm
= 
= 
= 
= 
= 
= 
= 
= 
= 
=

--- trunk/lib/MTT/MPI/Install.pm(original)
+++ trunk/lib/MTT/MPI/Install.pm	2008-04-04 15:31:07 EDT (Fri, 04  
Apr 2008)

@@ -199,7 +199,8 @@

# This is only warning about the INI file; we'll see
# if we find meta data for the MPI get later
-if (!$ini_full->SectionExists("mpi get:  
$mpi_get_name")) {

+if ($mpi_get_name ne "all" &&
+!$ini_full->SectionExists("mpi get:  
$mpi_get_name")) {
Warning("Warning: MPI Get section  
\"$mpi_get_name\" does not seem to exist in the INI file\n");

}

@@ -207,7 +208,8 @@
# skip it.  Don't issue a warning because command  
line

# parameters may well have dictated to skip this MPI
# get section.
-if (!exists($MTT::MPI::sources->{$mpi_get_name})) {
+if ($mpi_get_name ne "all" &&
+!exists($MTT::MPI::sources->{$mpi_get_name})) {
Debug("Have no sources for MPI Get  
\"$mpi_get_name\", skipping\n");

next;
}
@@ -217,7 +219,8 @@

# For each MPI source
foreach my $mpi_get_key (keys(% 
{$MTT::MPI::sources})) {

-if ($mpi_get_key eq $mpi_get_name) {
+if ($mpi_get_name eq "all" ||
+$mpi_get_key eq $mpi_get_name) {

# For each version of that source
my $mpi_get = $MTT::MPI::sources- 
>{$mpi_get_key};


Modified: trunk/lib/MTT/Test/Build.pm
= 
= 
= 
= 
= 
= 
= 
= 
= 
=

--- trunk/lib/MTT/Test/Build.pm (original)
+++ trunk/lib/MTT/Test/Build.pm	2008-04-04 15:31:07 EDT (Fri, 04  
Apr 2008)

@@ -120,7 +120,8 @@

# This is only warning about the INI file; we'll see
# if we find meta data for the test get later
-if (!$ini_full->SectionExists("test get:  
$test_get_name")) {

+if ($test_get_name ne "all" &&
+!$ini_full->SectionExists("test get:  
$test_get_name")) {
Warning("Test Get section \"$test_get_name\"  
does not seem to exist in the INI file\n");

}

@@ -128,14 +129,16 @@
# skip it.  Don't issue a warning because command  
line

# parameters may well have dictated to skip this
# section.
-if (!exists($MTT::Test::sources- 
>{$test_get_name})) {

+if ($test_get_name ne "all" &&
+!exists($MTT::Test::sources- 
>{$test_get_name})) {
Debug("Have no sources for Test Get  
\"$test_get_name\", skipping\n");

next;
}

# Find the matching test source
foreach my $test_get_key (keys(% 
{$MTT::Test::sources})) {

-if ($test_get_key eq $test_get_name) {
+if ($test_get_name eq "all" ||
+$test_get_key eq $test_get_name) {
my $test_get = $MTT::Test::sources- 
>{$test_get_key};


# For each MPI source

Modified: trunk/lib/MTT/Test/Run.pm
= 
= 
= 
= 
= 
= 
= 
= 

Re: [MTT devel] Launch scaling data in MTT

2008-04-04 Thread Jeff Squyres
MTT probably could gather this data -- some of it was wall clock  
execution time; other was multiple parts of data extracted from stdout.


Ralph -- is this interesting / useful to you?

On Apr 4, 2008, at 4:56 PM, Ethan Mallove wrote:

I was looking at the graphs posted at
https://svn.open-mpi.org/trac/ompi/wiki/ORTEScalabilityTesting,
and noted that MTT could gather this data if it could just
graph on the "duration" column. Would this be useful? E.g.,

 http://www.open-mpi.org/mtt/index.php?do_redir=582

Note: the duration timings currently round to the second,
but the "duration" column uses an interval type that has
microsecond precision, if we needed it.

-Ethan
___
mtt-devel mailing list
mtt-de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel



--
Jeff Squyres
Cisco Systems



[OMPI devel] Build failure on FreeBSD 7

2008-04-04 Thread Karol Mroz
Hello everyone... it's been some time since I posted here. I pulled the 
latest svn revision (18079) and had some trouble building Open MPI on a 
FreeBSD 7 machine (i386).


Make failed when compiling opal/event/kqueue.c. It appears that freebsd 
needs sys/types.h, sys/ioctl.h, termios.h and libutil.h included in 
order to reference openpty(). I added ifdef/includes for these header 
files into kqueue.c and managed to build. Note that I also tried the 
latest nightly tarball. The tarball build actually succeeded without any 
changes. Curious if anyone has experienced this type of behavior? A 
colleague of mine mentioned it could be a FreeBSD autotools issue?


Although builds were successful (with modification for the svn build, 
and without modification for the nightly tarball), I tried running a 
simple app locally with 2 processes using the TCP BTL that does a 
non-blocking send/recv. The app simply hung. After attaching gdb to one 
of the 2 processes, the console output (not gdb) reported the following 
output:


[warn] kq_init: detected broken kqueue (failed add); not using error 4 
(Interrupted system call)

: Interrupted system call

I'm including the diff of kqueue.c here for completeness. If anyone 
requires any further information, please let me know.


Thanks.
--
Karol

Index: opal/event/kqueue.c
===
--- opal/event/kqueue.c (revision 18079)
+++ opal/event/kqueue.c (working copy)
@@ -52,7 +52,17 @@
 #ifdef HAVE_UTIL_H
 #include 
 #endif
+#ifdef HAVE_SYS_IOCTL_H
+#include 
+#endif
+#ifdef HAVE_LIBUTIL_H
+#include 
+#endif
+#ifdef HAVE_TERMIOS_H
+#include 
+#endif

+
 /* Some platforms apparently define the udata field of struct kevent as
  * intptr_t, whereas others define it as void*.  There doesn't seem to be an
  * easy way to tell them apart via autoconf, so we need to use OS macros. */


Re: [MTT devel] [MTT svn] svn:mtt-svn r1176

2008-04-04 Thread Ethan Mallove
I like the "all" keyword. Are these no longer needed?

  _mpi_get_names()
  _mpi_install_names()
  _test_get_names()
  _test_build_names()

-Ethan

On Fri, Apr/04/2008 03:31:07PM, jsquy...@osl.iu.edu wrote:
> Author: jsquyres
> Date: 2008-04-04 15:31:07 EDT (Fri, 04 Apr 2008)
> New Revision: 1176
> URL: https://svn.open-mpi.org/trac/mtt/changeset/1176
> 
> Log:
> Allow "all" keyword in mpi_get, test_get, and test_build fields.
> 
> Text files modified: 
>trunk/CHANGES| 5 + 
>   
>trunk/lib/MTT/MPI/Install.pm | 9 ++--- 
>   
>trunk/lib/MTT/Test/Build.pm  | 9 ++--- 
>   
>trunk/lib/MTT/Test/Run.pm| 6 --
>   
>4 files changed, 21 insertions(+), 8 deletions(-)
> 
> Modified: trunk/CHANGES
> ==
> --- trunk/CHANGES (original)
> +++ trunk/CHANGES 2008-04-04 15:31:07 EDT (Fri, 04 Apr 2008)
> @@ -50,6 +50,11 @@
>  - _details() - pass arbitrary values from test run sections
> to the mpi details section, indexed by string
>  
> +- Allow mpi_get_name, test_get_name, and test_build_name fields to
> +  accept the special value "all", meaning that they'll use all
> +  corresponding sections that are found (vs. needing to list every
> +  section explicitly)
> +
>  - Added export for MTT_TEST_EXECUTABLE, may be used for clean up after
>mpi process : pkill -9 $MTT_TEST_EXECUTABLE
>  
> 
> Modified: trunk/lib/MTT/MPI/Install.pm
> ==
> --- trunk/lib/MTT/MPI/Install.pm  (original)
> +++ trunk/lib/MTT/MPI/Install.pm  2008-04-04 15:31:07 EDT (Fri, 04 Apr 
> 2008)
> @@ -199,7 +199,8 @@
>  
>  # This is only warning about the INI file; we'll see
>  # if we find meta data for the MPI get later
> -if (!$ini_full->SectionExists("mpi get: $mpi_get_name")) {
> +if ($mpi_get_name ne "all" &&
> +!$ini_full->SectionExists("mpi get: $mpi_get_name")) {
>  Warning("Warning: MPI Get section \"$mpi_get_name\" does 
> not seem to exist in the INI file\n");
>  }
>  
> @@ -207,7 +208,8 @@
>  # skip it.  Don't issue a warning because command line
>  # parameters may well have dictated to skip this MPI
>  # get section.
> -if (!exists($MTT::MPI::sources->{$mpi_get_name})) {
> +if ($mpi_get_name ne "all" && 
> +!exists($MTT::MPI::sources->{$mpi_get_name})) {
>  Debug("Have no sources for MPI Get \"$mpi_get_name\", 
> skipping\n");
>  next;
>  }
> @@ -217,7 +219,8 @@
>  
>  # For each MPI source
>  foreach my $mpi_get_key (keys(%{$MTT::MPI::sources})) {
> -if ($mpi_get_key eq $mpi_get_name) {
> +if ($mpi_get_name eq "all" ||
> +$mpi_get_key eq $mpi_get_name) {
>  
>  # For each version of that source
>  my $mpi_get = $MTT::MPI::sources->{$mpi_get_key};
> 
> Modified: trunk/lib/MTT/Test/Build.pm
> ==
> --- trunk/lib/MTT/Test/Build.pm   (original)
> +++ trunk/lib/MTT/Test/Build.pm   2008-04-04 15:31:07 EDT (Fri, 04 Apr 
> 2008)
> @@ -120,7 +120,8 @@
>  
>  # This is only warning about the INI file; we'll see
>  # if we find meta data for the test get later
> -if (!$ini_full->SectionExists("test get: $test_get_name")) {
> +if ($test_get_name ne "all" &&
> +!$ini_full->SectionExists("test get: $test_get_name")) {
>  Warning("Test Get section \"$test_get_name\" does not 
> seem to exist in the INI file\n");
>  }
>  
> @@ -128,14 +129,16 @@
>  # skip it.  Don't issue a warning because command line
>  # parameters may well have dictated to skip this
>  # section.
> -if (!exists($MTT::Test::sources->{$test_get_name})) {
> +if ($test_get_name ne "all" &&
> +!exists($MTT::Test::sources->{$test_get_name})) {
>  Debug("Have no sources for Test Get \"$test_get_name\", 
> skipping\n");
>  next;
>  }
>  
>  # Find the matching test source
>  foreach my $test_get_key (keys(%{$MTT::Test::sources})) {
> -if ($test_get_key eq $test_get_name) {
> +if ($test_get_name eq "all" ||
> +$test_get_key eq $test_get_name) {
>  

Re: [OMPI devel] MPI_Comm_connect/Accept

2008-04-04 Thread Ralph H Castain
Okay, I have a partial fix in there now. You'll have to use -mca routed
unity as I still need to fix it for routed tree.

Couple of things:

1. I fixed the --debug flag so it automatically turns on the debug output
from the data server code itself. Now ompi-server will tell you when it is
accessed.

2. remember, we added an MPI_Info key that specifies if you want the data
stored locally (on your own mpirun) or globally (on the ompi-server). If you
specify nothing, there is a precedence built into the code that defaults to
"local". So you have to tell us that this data is to be published "global"
if you want to connect multiple mpiruns.

I believe Jeff wrote all that up somewhere - could be in an email thread,
though. Been too long ago for me to remember... ;-) You can look it up in
the code though as a last resort - it is in
ompi/mca/pubsub/orte/pubsub_orte.c.

Ralph



On 4/4/08 12:55 PM, "Ralph H Castain"  wrote:

> Well, something got borked in here - will have to fix it, so this will
> probably not get done until next week.
> 
> 
> On 4/4/08 12:26 PM, "Ralph H Castain"  wrote:
> 
>> Yeah, you didn't specify the file correctly...plus I found a bug in the code
>> when I looked (out-of-date a little in orterun).
>> 
>> I am updating orterun (commit soon) and will include a better help message
>> about the proper format of the orterun cmd-line option. The syntax is:
>> 
>> -ompi-server uri
>> 
>> or -ompi-server file:filename-where-uri-exists
>> 
>> Problem here is that you gave it a uri of "test", which means nothing. ;-)
>> 
>> Should have it up-and-going soon.
>> Ralph
>> 
>> On 4/4/08 12:02 PM, "Aurélien Bouteiller"  wrote:
>> 
>>> Ralph,
>>> 
>>> I've not been very successful at using ompi-server. I tried this :
>>> 
>>> xterm1$ ompi-server --debug-devel -d --report-uri test
>>> [grosse-pomme.local:01097] proc_info: hnp_uri NULL
>>> daemon uri NULL
>>> [grosse-pomme.local:01097] [[34900,0],0] ompi-server: up and running!
>>> 
>>> 
>>> xterm2$ mpirun -ompi-server test -np 1 mpi_accept_test
>>> Port name:
>>> 2285895681.0;tcp://192.168.0.101:50065;tcp://192.168.0.150:50065:300
>>> 
>>> xterm3$ mpirun -ompi-server test  -np 1 simple_connect
>>> --
>>> Process rank 0 attempted to lookup from a global ompi_server that
>>> could not be contacted. This is typically caused by either not
>>> specifying the contact info for the server, or by the server not
>>> currently executing. If you did specify the contact info for a
>>> server, please check to see that the server is running and start
>>> it again (or have your sys admin start it) if it isn't.
>>> 
>>> --
>>> [grosse-pomme.local:01122] *** An error occurred in MPI_Lookup_name
>>> [grosse-pomme.local:01122] *** on communicator MPI_COMM_WORLD
>>> [grosse-pomme.local:01122] *** MPI_ERR_NAME: invalid name argument
>>> [grosse-pomme.local:01122] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>> --
>>> 
>>> 
>>> 
>>> The server code Open_port, and then PublishName. Looks like the
>>> LookupName function cannot reach the ompi-server. The ompi-server in
>>> debug mode does not show any output when a new event occurs (like when
>>> the server is launched). Is there something wrong in the way I use it ?
>>> 
>>> Aurelien
>>> 
>>> Le 3 avr. 08 à 17:21, Ralph Castain a écrit :
 Take a gander at ompi/tools/ompi-server - I believe I put a man page
 in
 there. You might just try "man ompi-server" and see if it shows up.
 
 Holler if you have a question - not sure I documented it very
 thoroughly at
 the time.
 
 
 On 4/3/08 3:10 PM, "Aurélien Bouteiller" 
 wrote:
 
> Ralph,
> 
> 
> I am using trunk. Is there a documentation for ompi-server ? Sounds
> exactly like what I need to fix point 1.
> 
> Aurelien
> 
> Le 3 avr. 08 à 17:06, Ralph Castain a écrit :
>> I guess I'll have to ask the basic question: what version are you
>> using?
>> 
>> If you are talking about the trunk, there no longer is a "universe"
>> concept
>> anywhere in the code. Two mpiruns can connect/accept to each other
>> as long
>> as they can make contact. To facilitate that, we created an "ompi-
>> server"
>> tool that is supposed to be run by the sys-admin (or a user, doesn't
>> matter
>> which) on the head node - there are various ways to tell mpirun
>> how to
>> contact the server, or it can self-discover it.
>> 
>> I have tested publish/lookup pretty thoroughly and it seems to
>> work. I
>> haven't spent much time testing connect/accept except via
>> comm_spawn, which
>> seems to be working. Since that uses the same mechanism, I would
>> have
>> 

Re: [OMPI devel] MPI_Comm_connect/Accept

2008-04-04 Thread Ralph H Castain
Well, something got borked in here - will have to fix it, so this will
probably not get done until next week.


On 4/4/08 12:26 PM, "Ralph H Castain"  wrote:

> Yeah, you didn't specify the file correctly...plus I found a bug in the code
> when I looked (out-of-date a little in orterun).
> 
> I am updating orterun (commit soon) and will include a better help message
> about the proper format of the orterun cmd-line option. The syntax is:
> 
> -ompi-server uri
> 
> or -ompi-server file:filename-where-uri-exists
> 
> Problem here is that you gave it a uri of "test", which means nothing. ;-)
> 
> Should have it up-and-going soon.
> Ralph
> 
> On 4/4/08 12:02 PM, "Aurélien Bouteiller"  wrote:
> 
>> Ralph,
>> 
>> I've not been very successful at using ompi-server. I tried this :
>> 
>> xterm1$ ompi-server --debug-devel -d --report-uri test
>> [grosse-pomme.local:01097] proc_info: hnp_uri NULL
>> daemon uri NULL
>> [grosse-pomme.local:01097] [[34900,0],0] ompi-server: up and running!
>> 
>> 
>> xterm2$ mpirun -ompi-server test -np 1 mpi_accept_test
>> Port name:
>> 2285895681.0;tcp://192.168.0.101:50065;tcp://192.168.0.150:50065:300
>> 
>> xterm3$ mpirun -ompi-server test  -np 1 simple_connect
>> --
>> Process rank 0 attempted to lookup from a global ompi_server that
>> could not be contacted. This is typically caused by either not
>> specifying the contact info for the server, or by the server not
>> currently executing. If you did specify the contact info for a
>> server, please check to see that the server is running and start
>> it again (or have your sys admin start it) if it isn't.
>> 
>> --
>> [grosse-pomme.local:01122] *** An error occurred in MPI_Lookup_name
>> [grosse-pomme.local:01122] *** on communicator MPI_COMM_WORLD
>> [grosse-pomme.local:01122] *** MPI_ERR_NAME: invalid name argument
>> [grosse-pomme.local:01122] *** MPI_ERRORS_ARE_FATAL (goodbye)
>> --
>> 
>> 
>> 
>> The server code Open_port, and then PublishName. Looks like the
>> LookupName function cannot reach the ompi-server. The ompi-server in
>> debug mode does not show any output when a new event occurs (like when
>> the server is launched). Is there something wrong in the way I use it ?
>> 
>> Aurelien
>> 
>> Le 3 avr. 08 à 17:21, Ralph Castain a écrit :
>>> Take a gander at ompi/tools/ompi-server - I believe I put a man page
>>> in
>>> there. You might just try "man ompi-server" and see if it shows up.
>>> 
>>> Holler if you have a question - not sure I documented it very
>>> thoroughly at
>>> the time.
>>> 
>>> 
>>> On 4/3/08 3:10 PM, "Aurélien Bouteiller" 
>>> wrote:
>>> 
 Ralph,
 
 
 I am using trunk. Is there a documentation for ompi-server ? Sounds
 exactly like what I need to fix point 1.
 
 Aurelien
 
 Le 3 avr. 08 à 17:06, Ralph Castain a écrit :
> I guess I'll have to ask the basic question: what version are you
> using?
> 
> If you are talking about the trunk, there no longer is a "universe"
> concept
> anywhere in the code. Two mpiruns can connect/accept to each other
> as long
> as they can make contact. To facilitate that, we created an "ompi-
> server"
> tool that is supposed to be run by the sys-admin (or a user, doesn't
> matter
> which) on the head node - there are various ways to tell mpirun
> how to
> contact the server, or it can self-discover it.
> 
> I have tested publish/lookup pretty thoroughly and it seems to
> work. I
> haven't spent much time testing connect/accept except via
> comm_spawn, which
> seems to be working. Since that uses the same mechanism, I would
> have
> expected connect/accept to work as well.
> 
> If you are talking about 1.2.x, then the story is totally different.
> 
> Ralph
> 
> 
> 
> On 4/3/08 2:29 PM, "Aurélien Bouteiller" 
> wrote:
> 
>> Hi everyone,
>> 
>> I'm trying to figure out how complete is the implementation of
>> Comm_connect/Accept. I found two problematic cases.
>> 
>> 1) Two different programs are started in two different mpirun. One
>> makes accept, the second one use connect. I would not expect
>> MPI_Publish_name/Lookup_name to work because they do not share the
>> HNP. Still I would expect to be able to connect by copying (with
>> printf-scanf) the port_name string generated by Open_port;
>> especially
>> considering that in Open MPI, the port_name is a string containing
>> the
>> tcp address and port of the rank 0 in the server communicator.
>> However, doing so results in "no route to host" and the connecting
>> application aborts. Is the problem 

Re: [OMPI devel] Affect of compression on modex and launch messages

2008-04-04 Thread Edgar Gabriel
actually, we used lzo a looong time ago with PACX-MPI, it was indeed 
faster then zlib. Our findings at that time were however similar to what 
George mentioned, namely a benefit from compression was only visible if 
the network latency was really high (e.g. multiple ms)...


Thanks
Edgar

Roland Dreier wrote:

 > Based on some discussion on this list, I integrated a zlib-based compression
 > ability into ORTE. Since the launch message sent to the orteds and the modex
 > between the application procs are the only places where messages of any size
 > are sent, I only implemented compression for those two exchanges.
 > 
 > I have found virtually no benefit to the compression. Essentially, the

 > overhead consumed in compression/decompressing the messages pretty much
 > balances out any transmission time differences. However, I could only test
 > this for 64 nodes, 8ppn, so perhaps there is some benefit at larger sizes.

A faster compression library might change the balance... eg LZO
(http://www.oberhumer.com/opensource/lzo) might be worth a look although
I'm not an expert on all the compression libraries that are out there.

 - R.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335


Re: [OMPI devel] Affect of compression on modex and launch messages

2008-04-04 Thread Jeff Squyres

LZO looks cool, but it's unfortunately GPL (Open MPI is BSD).  Bummer.

On Apr 4, 2008, at 2:29 PM, Roland Dreier wrote:
Based on some discussion on this list, I integrated a zlib-based  
compression
ability into ORTE. Since the launch message sent to the orteds and  
the modex
between the application procs are the only places where messages of  
any size

are sent, I only implemented compression for those two exchanges.

I have found virtually no benefit to the compression. Essentially,  
the
overhead consumed in compression/decompressing the messages pretty  
much
balances out any transmission time differences. However, I could  
only test
this for 64 nodes, 8ppn, so perhaps there is some benefit at larger  
sizes.


A faster compression library might change the balance... eg LZO
(http://www.oberhumer.com/opensource/lzo) might be worth a look  
although

I'm not an expert on all the compression libraries that are out there.

- R.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Affect of compression on modex and launch messages

2008-04-04 Thread Roland Dreier
 > Based on some discussion on this list, I integrated a zlib-based compression
 > ability into ORTE. Since the launch message sent to the orteds and the modex
 > between the application procs are the only places where messages of any size
 > are sent, I only implemented compression for those two exchanges.
 > 
 > I have found virtually no benefit to the compression. Essentially, the
 > overhead consumed in compression/decompressing the messages pretty much
 > balances out any transmission time differences. However, I could only test
 > this for 64 nodes, 8ppn, so perhaps there is some benefit at larger sizes.

A faster compression library might change the balance... eg LZO
(http://www.oberhumer.com/opensource/lzo) might be worth a look although
I'm not an expert on all the compression libraries that are out there.

 - R.


Re: [OMPI devel] Affect of compression on modex and launch messages

2008-04-04 Thread George Bosilca

Ralph,

There are several studies about compressions and data exchange. Few  
years ago we integrate such mechanism (adaptive compression of  
communication) in one of the projects here at ICL (called GridSolve).  
The idea was to optimize the network traffic for sending large  
matrices used for computation from a server to a specific workers.  
Under some circumstances (few) it can improve the network traffic, and  
according to the main author in the worst case it doesn't harm.  
However, it is still unclear that there is any benefit when the data  
is reasonably small (which is the case in ORTE).


The project is hosted at http://www.loria.fr/~ejeannot/adoc/adoc.html.  
It's a simple dropin for read/write so it is fairly simple to  
integrate. On the author webpage you can find some publication about  
this, publication that highlight the performances of this approach.


  george.

PS: One of these a reference to the paper is available on ACM:

E. Jeannot, B. Knutsson, M. Bjorkmann.
Adaptive Online Data Compression, in: High Performance Distributed  
Computing (HPDC'11), Edinburgh, Scotland, IEEE, july 2002.



On Apr 4, 2008, at 12:52 PM, Ralph H Castain wrote:

Hello all

Based on some discussion on this list, I integrated a zlib-based  
compression
ability into ORTE. Since the launch message sent to the orteds and  
the modex
between the application procs are the only places where messages of  
any size

are sent, I only implemented compression for those two exchanges.

I have found virtually no benefit to the compression. Essentially, the
overhead consumed in compression/decompressing the messages pretty  
much
balances out any transmission time differences. However, I could  
only test
this for 64 nodes, 8ppn, so perhaps there is some benefit at larger  
sizes.


Even though my test size wasn't very big, I did try forcing the  
worst-case
scenario. I included all available BTL's, and ran the OOB over  
Ethernet.
Although there was some difference, it wasn't appreciable - easily  
within

the variations I see on this rather unstable machine.

I invite you to try it yourself. You can get a copy of the code via:

hg clone http://www.open-mpi.org/hg/hgwebdir.cgi/rhc/gather

You will need to configure with LIBS=-lz.

Compression is normally turned "off". You can turn it on by setting:

-mca orte_compress 1

You can also adjust the level of compression:

-mca orte_compress_level [1-9]

If you don't specify the level and select compression, the level will
default to 1. From my tests, this seemed a good compromise. The  
other levels

provided some small amount of better compression, but took longer.

With compression "on", you will get output telling you the original  
size of

the message and its compressed size so you can see what was done.

Please let me know what you find out. I would like to reach a  
decision as to

whether or not compression is worthwhile.

Thanks
Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI devel] MPI_Comm_connect/Accept

2008-04-04 Thread Aurélien Bouteiller

Ralph,

I've not been very successful at using ompi-server. I tried this :

xterm1$ ompi-server --debug-devel -d --report-uri test
[grosse-pomme.local:01097] proc_info: hnp_uri NULL
daemon uri NULL
[grosse-pomme.local:01097] [[34900,0],0] ompi-server: up and running!


xterm2$ mpirun -ompi-server test -np 1 mpi_accept_test
Port name:
2285895681.0;tcp://192.168.0.101:50065;tcp://192.168.0.150:50065:300

xterm3$ mpirun -ompi-server test  -np 1 simple_connect
--
Process rank 0 attempted to lookup from a global ompi_server that
could not be contacted. This is typically caused by either not
specifying the contact info for the server, or by the server not
currently executing. If you did specify the contact info for a
server, please check to see that the server is running and start
it again (or have your sys admin start it) if it isn't.

--
[grosse-pomme.local:01122] *** An error occurred in MPI_Lookup_name
[grosse-pomme.local:01122] *** on communicator MPI_COMM_WORLD
[grosse-pomme.local:01122] *** MPI_ERR_NAME: invalid name argument
[grosse-pomme.local:01122] *** MPI_ERRORS_ARE_FATAL (goodbye)
--



The server code Open_port, and then PublishName. Looks like the  
LookupName function cannot reach the ompi-server. The ompi-server in  
debug mode does not show any output when a new event occurs (like when  
the server is launched). Is there something wrong in the way I use it ?


Aurelien

Le 3 avr. 08 à 17:21, Ralph Castain a écrit :
Take a gander at ompi/tools/ompi-server - I believe I put a man page  
in

there. You might just try "man ompi-server" and see if it shows up.

Holler if you have a question - not sure I documented it very  
thoroughly at

the time.


On 4/3/08 3:10 PM, "Aurélien Bouteiller"   
wrote:



Ralph,


I am using trunk. Is there a documentation for ompi-server ? Sounds
exactly like what I need to fix point 1.

Aurelien

Le 3 avr. 08 à 17:06, Ralph Castain a écrit :

I guess I'll have to ask the basic question: what version are you
using?

If you are talking about the trunk, there no longer is a "universe"
concept
anywhere in the code. Two mpiruns can connect/accept to each other
as long
as they can make contact. To facilitate that, we created an "ompi-
server"
tool that is supposed to be run by the sys-admin (or a user, doesn't
matter
which) on the head node - there are various ways to tell mpirun  
how to

contact the server, or it can self-discover it.

I have tested publish/lookup pretty thoroughly and it seems to  
work. I

haven't spent much time testing connect/accept except via
comm_spawn, which
seems to be working. Since that uses the same mechanism, I would  
have

expected connect/accept to work as well.

If you are talking about 1.2.x, then the story is totally different.

Ralph



On 4/3/08 2:29 PM, "Aurélien Bouteiller" 
wrote:


Hi everyone,

I'm trying to figure out how complete is the implementation of
Comm_connect/Accept. I found two problematic cases.

1) Two different programs are started in two different mpirun. One
makes accept, the second one use connect. I would not expect
MPI_Publish_name/Lookup_name to work because they do not share the
HNP. Still I would expect to be able to connect by copying (with
printf-scanf) the port_name string generated by Open_port;  
especially

considering that in Open MPI, the port_name is a string containing
the
tcp address and port of the rank 0 in the server communicator.
However, doing so results in "no route to host" and the connecting
application aborts. Is the problem related to an explicit check of
the
universes on the accept HNP ? Do I expect too much from the MPI
standard ? Is it because my two applications does not share the  
same

universe ? Should we (re) add the ability to use the same universe
for
several mpirun ?

2) Second issue is when the program setup a port, and then accept
multiple clients on this port. Everything works fine for the first
client, and then accept stalls forever when waiting for the second
one. My understanding of the standard is that it should work: 5.4.2
states "it must call MPI_Open_port to establish a port [...] it  
must

call MPI_Comm_accept to accept connections from clients". I
understand
that for one MPI_Open_port I should be able to manage several MPI
clients. Am I understanding correctly the standard here and  
should we

fix this ?

Here is a copy of the non-working code for reference.

/*
* Copyright (c) 2004-2007 The Trustees of the University of
Tennessee.
* All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
#include 
#include 
#include 

int main(int argc, char *argv[])
{
   char port[MPI_MAX_PORT_NAME];
   int rank;
   int np;


   MPI_Init(, );
   

Re: [OMPI devel] init_thread + spawn error

2008-04-04 Thread Tim Prins
Thanks for the report. As Ralph indicated the threading support in Open 
MPI is not good right now, but we are working to make it better.


I have filed a ticket (https://svn.open-mpi.org/trac/ompi/ticket/1267) 
so we do not loose track of this issue, and attached a potential fix to 
the ticket.


Thanks,

Tim

Joao Vicente Lima wrote:

Hi,
I getting a error on call init_thread and comm_spawn on this code:

#include "mpi.h"
#include 

int
main (int argc, char *argv[])
{
  int provided;
  MPI_Comm parentcomm, intercomm;

  MPI_Init_thread(, , MPI_THREAD_MULTIPLE, );
  MPI_Comm_get_parent ();

  if (parentcomm == MPI_COMM_NULL)
{
  printf ("spawning ... \n");
  MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 1,
  MPI_INFO_NULL, 0, MPI_COMM_SELF, , 
MPI_ERRCODES_IGNORE);
  MPI_Comm_disconnect ();
}
  else
  {
printf ("child!\n");
MPI_Comm_disconnect ();
  }

  MPI_Finalize ();
  return 0;
}

and the error is:

spawning ...
opal_mutex_lock(): Resource deadlock avoided
[localhost:18718] *** Process received signal ***
[localhost:18718] Signal: Aborted (6)
[localhost:18718] Signal code:  (-6)
[localhost:18718] [ 0] /lib/libpthread.so.0 [0x2b6e5d9fced0]
[localhost:18718] [ 1] /lib/libc.so.6(gsignal+0x35) [0x2b6e5dc3b3c5]
[localhost:18718] [ 2] /lib/libc.so.6(abort+0x10e) [0x2b6e5dc3c73e]
[localhost:18718] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b6e5c9560ff]
[localhost:18718] [ 4] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b6e5c95601d]
[localhost:18718] [ 5] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b6e5c9560ac]
[localhost:18718] [ 6] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b6e5c956a93]
[localhost:18718] [ 7] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b6e5c9569dd]
[localhost:18718] [ 8] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b6e5c95797d]
[localhost:18718] [ 9]
/usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_proc_unpack+0x1ec)
[0x2b6e5c957dd9]
[localhost:18718] [10]
/usr/local/mpi/ompi-svn/lib/openmpi/mca_dpm_orte.so [0x2b6e607f05cf]
[localhost:18718] [11]
/usr/local/mpi/ompi-svn/lib/libmpi.so.0(MPI_Comm_spawn+0x459)
[0x2b6e5c98ede9]
[localhost:18718] [12] ./spawn1(main+0x7a) [0x400ae2]
[localhost:18718] [13] /lib/libc.so.6(__libc_start_main+0xf4) [0x2b6e5dc28b74]
[localhost:18718] [14] ./spawn1 [0x4009d9]
[localhost:18718] *** End of error message ***
opal_mutex_lock(): Resource deadlock avoided
[localhost:18719] *** Process received signal ***
[localhost:18719] Signal: Aborted (6)
[localhost:18719] Signal code:  (-6)
[localhost:18719] [ 0] /lib/libpthread.so.0 [0x2b9317a17ed0]
[localhost:18719] [ 1] /lib/libc.so.6(gsignal+0x35) [0x2b9317c563c5]
[localhost:18719] [ 2] /lib/libc.so.6(abort+0x10e) [0x2b9317c5773e]
[localhost:18719] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b93169710ff]
[localhost:18719] [ 4] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b931697101d]
[localhost:18719] [ 5] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b93169710ac]
[localhost:18719] [ 6] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b9316971a93]
[localhost:18719] [ 7] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b93169719dd]
[localhost:18719] [ 8] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b931697297d]
[localhost:18719] [ 9]
/usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_proc_unpack+0x1ec)
[0x2b9316972dd9]
[localhost:18719] [10]
/usr/local/mpi/ompi-svn/lib/openmpi/mca_dpm_orte.so [0x2b931a80b5cf]
[localhost:18719] [11]
/usr/local/mpi/ompi-svn/lib/openmpi/mca_dpm_orte.so [0x2b931a80dad7]
[localhost:18719] [12] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b9316977207]
[localhost:18719] [13]
/usr/local/mpi/ompi-svn/lib/libmpi.so.0(PMPI_Init_thread+0x166)
[0x2b93169b8622]
[localhost:18719] [14] ./spawn1(main+0x25) [0x400a8d]
[localhost:18719] [15] /lib/libc.so.6(__libc_start_main+0xf4) [0x2b9317c43b74]
[localhost:18719] [16] ./spawn1 [0x4009d9]
[localhost:18719] *** End of error message ***
--
mpirun noticed that process rank 0 with PID 18719 on node localhost
exited on signal 6 (Aborted).
--

if I change MPI_Init_thread to MPI_Init all works.
some suggest ?
The attachments contain my ompi_info (r18077) and config.log.

thanks in advance,
Joao.




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel