Re: [MTT devel] [MTT svn] svn:mtt-svn r1176
Um -- yeah, probably. :-) But there's also likely no harm in leaving them there. :-) On Apr 4, 2008, at 4:29 PM, Ethan Mallove wrote: I like the "all" keyword. Are these no longer needed? _mpi_get_names() _mpi_install_names() _test_get_names() _test_build_names() -Ethan On Fri, Apr/04/2008 03:31:07PM, jsquy...@osl.iu.edu wrote: Author: jsquyres Date: 2008-04-04 15:31:07 EDT (Fri, 04 Apr 2008) New Revision: 1176 URL: https://svn.open-mpi.org/trac/mtt/changeset/1176 Log: Allow "all" keyword in mpi_get, test_get, and test_build fields. Text files modified: trunk/CHANGES| 5 + trunk/lib/MTT/MPI/Install.pm | 9 ++--- trunk/lib/MTT/Test/Build.pm | 9 ++--- trunk/lib/MTT/Test/Run.pm| 6 -- 4 files changed, 21 insertions(+), 8 deletions(-) Modified: trunk/CHANGES = = = = = = = = = = --- trunk/CHANGES (original) +++ trunk/CHANGES 2008-04-04 15:31:07 EDT (Fri, 04 Apr 2008) @@ -50,6 +50,11 @@ - _details() - pass arbitrary values from test run sections to the mpi details section, indexed by string +- Allow mpi_get_name, test_get_name, and test_build_name fields to + accept the special value "all", meaning that they'll use all + corresponding sections that are found (vs. needing to list every + section explicitly) + - Added export for MTT_TEST_EXECUTABLE, may be used for clean up after mpi process : pkill -9 $MTT_TEST_EXECUTABLE Modified: trunk/lib/MTT/MPI/Install.pm = = = = = = = = = = --- trunk/lib/MTT/MPI/Install.pm(original) +++ trunk/lib/MTT/MPI/Install.pm 2008-04-04 15:31:07 EDT (Fri, 04 Apr 2008) @@ -199,7 +199,8 @@ # This is only warning about the INI file; we'll see # if we find meta data for the MPI get later -if (!$ini_full->SectionExists("mpi get: $mpi_get_name")) { +if ($mpi_get_name ne "all" && +!$ini_full->SectionExists("mpi get: $mpi_get_name")) { Warning("Warning: MPI Get section \"$mpi_get_name\" does not seem to exist in the INI file\n"); } @@ -207,7 +208,8 @@ # skip it. Don't issue a warning because command line # parameters may well have dictated to skip this MPI # get section. -if (!exists($MTT::MPI::sources->{$mpi_get_name})) { +if ($mpi_get_name ne "all" && +!exists($MTT::MPI::sources->{$mpi_get_name})) { Debug("Have no sources for MPI Get \"$mpi_get_name\", skipping\n"); next; } @@ -217,7 +219,8 @@ # For each MPI source foreach my $mpi_get_key (keys(% {$MTT::MPI::sources})) { -if ($mpi_get_key eq $mpi_get_name) { +if ($mpi_get_name eq "all" || +$mpi_get_key eq $mpi_get_name) { # For each version of that source my $mpi_get = $MTT::MPI::sources- >{$mpi_get_key}; Modified: trunk/lib/MTT/Test/Build.pm = = = = = = = = = = --- trunk/lib/MTT/Test/Build.pm (original) +++ trunk/lib/MTT/Test/Build.pm 2008-04-04 15:31:07 EDT (Fri, 04 Apr 2008) @@ -120,7 +120,8 @@ # This is only warning about the INI file; we'll see # if we find meta data for the test get later -if (!$ini_full->SectionExists("test get: $test_get_name")) { +if ($test_get_name ne "all" && +!$ini_full->SectionExists("test get: $test_get_name")) { Warning("Test Get section \"$test_get_name\" does not seem to exist in the INI file\n"); } @@ -128,14 +129,16 @@ # skip it. Don't issue a warning because command line # parameters may well have dictated to skip this # section. -if (!exists($MTT::Test::sources- >{$test_get_name})) { +if ($test_get_name ne "all" && +!exists($MTT::Test::sources- >{$test_get_name})) { Debug("Have no sources for Test Get \"$test_get_name\", skipping\n"); next; } # Find the matching test source foreach my $test_get_key (keys(% {$MTT::Test::sources})) { -if ($test_get_key eq $test_get_name) { +if ($test_get_name eq "all" || +$test_get_key eq $test_get_name) { my $test_get = $MTT::Test::sources- >{$test_get_key}; # For each MPI source Modified: trunk/lib/MTT/Test/Run.pm = = = = = = = =
Re: [MTT devel] Launch scaling data in MTT
MTT probably could gather this data -- some of it was wall clock execution time; other was multiple parts of data extracted from stdout. Ralph -- is this interesting / useful to you? On Apr 4, 2008, at 4:56 PM, Ethan Mallove wrote: I was looking at the graphs posted at https://svn.open-mpi.org/trac/ompi/wiki/ORTEScalabilityTesting, and noted that MTT could gather this data if it could just graph on the "duration" column. Would this be useful? E.g., http://www.open-mpi.org/mtt/index.php?do_redir=582 Note: the duration timings currently round to the second, but the "duration" column uses an interval type that has microsecond precision, if we needed it. -Ethan ___ mtt-devel mailing list mtt-de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel -- Jeff Squyres Cisco Systems
[OMPI devel] Build failure on FreeBSD 7
Hello everyone... it's been some time since I posted here. I pulled the latest svn revision (18079) and had some trouble building Open MPI on a FreeBSD 7 machine (i386). Make failed when compiling opal/event/kqueue.c. It appears that freebsd needs sys/types.h, sys/ioctl.h, termios.h and libutil.h included in order to reference openpty(). I added ifdef/includes for these header files into kqueue.c and managed to build. Note that I also tried the latest nightly tarball. The tarball build actually succeeded without any changes. Curious if anyone has experienced this type of behavior? A colleague of mine mentioned it could be a FreeBSD autotools issue? Although builds were successful (with modification for the svn build, and without modification for the nightly tarball), I tried running a simple app locally with 2 processes using the TCP BTL that does a non-blocking send/recv. The app simply hung. After attaching gdb to one of the 2 processes, the console output (not gdb) reported the following output: [warn] kq_init: detected broken kqueue (failed add); not using error 4 (Interrupted system call) : Interrupted system call I'm including the diff of kqueue.c here for completeness. If anyone requires any further information, please let me know. Thanks. -- Karol Index: opal/event/kqueue.c === --- opal/event/kqueue.c (revision 18079) +++ opal/event/kqueue.c (working copy) @@ -52,7 +52,17 @@ #ifdef HAVE_UTIL_H #include #endif +#ifdef HAVE_SYS_IOCTL_H +#include +#endif +#ifdef HAVE_LIBUTIL_H +#include +#endif +#ifdef HAVE_TERMIOS_H +#include +#endif + /* Some platforms apparently define the udata field of struct kevent as * intptr_t, whereas others define it as void*. There doesn't seem to be an * easy way to tell them apart via autoconf, so we need to use OS macros. */
Re: [MTT devel] [MTT svn] svn:mtt-svn r1176
I like the "all" keyword. Are these no longer needed? _mpi_get_names() _mpi_install_names() _test_get_names() _test_build_names() -Ethan On Fri, Apr/04/2008 03:31:07PM, jsquy...@osl.iu.edu wrote: > Author: jsquyres > Date: 2008-04-04 15:31:07 EDT (Fri, 04 Apr 2008) > New Revision: 1176 > URL: https://svn.open-mpi.org/trac/mtt/changeset/1176 > > Log: > Allow "all" keyword in mpi_get, test_get, and test_build fields. > > Text files modified: >trunk/CHANGES| 5 + > >trunk/lib/MTT/MPI/Install.pm | 9 ++--- > >trunk/lib/MTT/Test/Build.pm | 9 ++--- > >trunk/lib/MTT/Test/Run.pm| 6 -- > >4 files changed, 21 insertions(+), 8 deletions(-) > > Modified: trunk/CHANGES > == > --- trunk/CHANGES (original) > +++ trunk/CHANGES 2008-04-04 15:31:07 EDT (Fri, 04 Apr 2008) > @@ -50,6 +50,11 @@ > - _details() - pass arbitrary values from test run sections > to the mpi details section, indexed by string > > +- Allow mpi_get_name, test_get_name, and test_build_name fields to > + accept the special value "all", meaning that they'll use all > + corresponding sections that are found (vs. needing to list every > + section explicitly) > + > - Added export for MTT_TEST_EXECUTABLE, may be used for clean up after >mpi process : pkill -9 $MTT_TEST_EXECUTABLE > > > Modified: trunk/lib/MTT/MPI/Install.pm > == > --- trunk/lib/MTT/MPI/Install.pm (original) > +++ trunk/lib/MTT/MPI/Install.pm 2008-04-04 15:31:07 EDT (Fri, 04 Apr > 2008) > @@ -199,7 +199,8 @@ > > # This is only warning about the INI file; we'll see > # if we find meta data for the MPI get later > -if (!$ini_full->SectionExists("mpi get: $mpi_get_name")) { > +if ($mpi_get_name ne "all" && > +!$ini_full->SectionExists("mpi get: $mpi_get_name")) { > Warning("Warning: MPI Get section \"$mpi_get_name\" does > not seem to exist in the INI file\n"); > } > > @@ -207,7 +208,8 @@ > # skip it. Don't issue a warning because command line > # parameters may well have dictated to skip this MPI > # get section. > -if (!exists($MTT::MPI::sources->{$mpi_get_name})) { > +if ($mpi_get_name ne "all" && > +!exists($MTT::MPI::sources->{$mpi_get_name})) { > Debug("Have no sources for MPI Get \"$mpi_get_name\", > skipping\n"); > next; > } > @@ -217,7 +219,8 @@ > > # For each MPI source > foreach my $mpi_get_key (keys(%{$MTT::MPI::sources})) { > -if ($mpi_get_key eq $mpi_get_name) { > +if ($mpi_get_name eq "all" || > +$mpi_get_key eq $mpi_get_name) { > > # For each version of that source > my $mpi_get = $MTT::MPI::sources->{$mpi_get_key}; > > Modified: trunk/lib/MTT/Test/Build.pm > == > --- trunk/lib/MTT/Test/Build.pm (original) > +++ trunk/lib/MTT/Test/Build.pm 2008-04-04 15:31:07 EDT (Fri, 04 Apr > 2008) > @@ -120,7 +120,8 @@ > > # This is only warning about the INI file; we'll see > # if we find meta data for the test get later > -if (!$ini_full->SectionExists("test get: $test_get_name")) { > +if ($test_get_name ne "all" && > +!$ini_full->SectionExists("test get: $test_get_name")) { > Warning("Test Get section \"$test_get_name\" does not > seem to exist in the INI file\n"); > } > > @@ -128,14 +129,16 @@ > # skip it. Don't issue a warning because command line > # parameters may well have dictated to skip this > # section. > -if (!exists($MTT::Test::sources->{$test_get_name})) { > +if ($test_get_name ne "all" && > +!exists($MTT::Test::sources->{$test_get_name})) { > Debug("Have no sources for Test Get \"$test_get_name\", > skipping\n"); > next; > } > > # Find the matching test source > foreach my $test_get_key (keys(%{$MTT::Test::sources})) { > -if ($test_get_key eq $test_get_name) { > +if ($test_get_name eq "all" || > +$test_get_key eq $test_get_name) { >
Re: [OMPI devel] MPI_Comm_connect/Accept
Okay, I have a partial fix in there now. You'll have to use -mca routed unity as I still need to fix it for routed tree. Couple of things: 1. I fixed the --debug flag so it automatically turns on the debug output from the data server code itself. Now ompi-server will tell you when it is accessed. 2. remember, we added an MPI_Info key that specifies if you want the data stored locally (on your own mpirun) or globally (on the ompi-server). If you specify nothing, there is a precedence built into the code that defaults to "local". So you have to tell us that this data is to be published "global" if you want to connect multiple mpiruns. I believe Jeff wrote all that up somewhere - could be in an email thread, though. Been too long ago for me to remember... ;-) You can look it up in the code though as a last resort - it is in ompi/mca/pubsub/orte/pubsub_orte.c. Ralph On 4/4/08 12:55 PM, "Ralph H Castain"wrote: > Well, something got borked in here - will have to fix it, so this will > probably not get done until next week. > > > On 4/4/08 12:26 PM, "Ralph H Castain" wrote: > >> Yeah, you didn't specify the file correctly...plus I found a bug in the code >> when I looked (out-of-date a little in orterun). >> >> I am updating orterun (commit soon) and will include a better help message >> about the proper format of the orterun cmd-line option. The syntax is: >> >> -ompi-server uri >> >> or -ompi-server file:filename-where-uri-exists >> >> Problem here is that you gave it a uri of "test", which means nothing. ;-) >> >> Should have it up-and-going soon. >> Ralph >> >> On 4/4/08 12:02 PM, "Aurélien Bouteiller" wrote: >> >>> Ralph, >>> >>> I've not been very successful at using ompi-server. I tried this : >>> >>> xterm1$ ompi-server --debug-devel -d --report-uri test >>> [grosse-pomme.local:01097] proc_info: hnp_uri NULL >>> daemon uri NULL >>> [grosse-pomme.local:01097] [[34900,0],0] ompi-server: up and running! >>> >>> >>> xterm2$ mpirun -ompi-server test -np 1 mpi_accept_test >>> Port name: >>> 2285895681.0;tcp://192.168.0.101:50065;tcp://192.168.0.150:50065:300 >>> >>> xterm3$ mpirun -ompi-server test -np 1 simple_connect >>> -- >>> Process rank 0 attempted to lookup from a global ompi_server that >>> could not be contacted. This is typically caused by either not >>> specifying the contact info for the server, or by the server not >>> currently executing. If you did specify the contact info for a >>> server, please check to see that the server is running and start >>> it again (or have your sys admin start it) if it isn't. >>> >>> -- >>> [grosse-pomme.local:01122] *** An error occurred in MPI_Lookup_name >>> [grosse-pomme.local:01122] *** on communicator MPI_COMM_WORLD >>> [grosse-pomme.local:01122] *** MPI_ERR_NAME: invalid name argument >>> [grosse-pomme.local:01122] *** MPI_ERRORS_ARE_FATAL (goodbye) >>> -- >>> >>> >>> >>> The server code Open_port, and then PublishName. Looks like the >>> LookupName function cannot reach the ompi-server. The ompi-server in >>> debug mode does not show any output when a new event occurs (like when >>> the server is launched). Is there something wrong in the way I use it ? >>> >>> Aurelien >>> >>> Le 3 avr. 08 à 17:21, Ralph Castain a écrit : Take a gander at ompi/tools/ompi-server - I believe I put a man page in there. You might just try "man ompi-server" and see if it shows up. Holler if you have a question - not sure I documented it very thoroughly at the time. On 4/3/08 3:10 PM, "Aurélien Bouteiller" wrote: > Ralph, > > > I am using trunk. Is there a documentation for ompi-server ? Sounds > exactly like what I need to fix point 1. > > Aurelien > > Le 3 avr. 08 à 17:06, Ralph Castain a écrit : >> I guess I'll have to ask the basic question: what version are you >> using? >> >> If you are talking about the trunk, there no longer is a "universe" >> concept >> anywhere in the code. Two mpiruns can connect/accept to each other >> as long >> as they can make contact. To facilitate that, we created an "ompi- >> server" >> tool that is supposed to be run by the sys-admin (or a user, doesn't >> matter >> which) on the head node - there are various ways to tell mpirun >> how to >> contact the server, or it can self-discover it. >> >> I have tested publish/lookup pretty thoroughly and it seems to >> work. I >> haven't spent much time testing connect/accept except via >> comm_spawn, which >> seems to be working. Since that uses the same mechanism, I would >> have >>
Re: [OMPI devel] MPI_Comm_connect/Accept
Well, something got borked in here - will have to fix it, so this will probably not get done until next week. On 4/4/08 12:26 PM, "Ralph H Castain"wrote: > Yeah, you didn't specify the file correctly...plus I found a bug in the code > when I looked (out-of-date a little in orterun). > > I am updating orterun (commit soon) and will include a better help message > about the proper format of the orterun cmd-line option. The syntax is: > > -ompi-server uri > > or -ompi-server file:filename-where-uri-exists > > Problem here is that you gave it a uri of "test", which means nothing. ;-) > > Should have it up-and-going soon. > Ralph > > On 4/4/08 12:02 PM, "Aurélien Bouteiller" wrote: > >> Ralph, >> >> I've not been very successful at using ompi-server. I tried this : >> >> xterm1$ ompi-server --debug-devel -d --report-uri test >> [grosse-pomme.local:01097] proc_info: hnp_uri NULL >> daemon uri NULL >> [grosse-pomme.local:01097] [[34900,0],0] ompi-server: up and running! >> >> >> xterm2$ mpirun -ompi-server test -np 1 mpi_accept_test >> Port name: >> 2285895681.0;tcp://192.168.0.101:50065;tcp://192.168.0.150:50065:300 >> >> xterm3$ mpirun -ompi-server test -np 1 simple_connect >> -- >> Process rank 0 attempted to lookup from a global ompi_server that >> could not be contacted. This is typically caused by either not >> specifying the contact info for the server, or by the server not >> currently executing. If you did specify the contact info for a >> server, please check to see that the server is running and start >> it again (or have your sys admin start it) if it isn't. >> >> -- >> [grosse-pomme.local:01122] *** An error occurred in MPI_Lookup_name >> [grosse-pomme.local:01122] *** on communicator MPI_COMM_WORLD >> [grosse-pomme.local:01122] *** MPI_ERR_NAME: invalid name argument >> [grosse-pomme.local:01122] *** MPI_ERRORS_ARE_FATAL (goodbye) >> -- >> >> >> >> The server code Open_port, and then PublishName. Looks like the >> LookupName function cannot reach the ompi-server. The ompi-server in >> debug mode does not show any output when a new event occurs (like when >> the server is launched). Is there something wrong in the way I use it ? >> >> Aurelien >> >> Le 3 avr. 08 à 17:21, Ralph Castain a écrit : >>> Take a gander at ompi/tools/ompi-server - I believe I put a man page >>> in >>> there. You might just try "man ompi-server" and see if it shows up. >>> >>> Holler if you have a question - not sure I documented it very >>> thoroughly at >>> the time. >>> >>> >>> On 4/3/08 3:10 PM, "Aurélien Bouteiller" >>> wrote: >>> Ralph, I am using trunk. Is there a documentation for ompi-server ? Sounds exactly like what I need to fix point 1. Aurelien Le 3 avr. 08 à 17:06, Ralph Castain a écrit : > I guess I'll have to ask the basic question: what version are you > using? > > If you are talking about the trunk, there no longer is a "universe" > concept > anywhere in the code. Two mpiruns can connect/accept to each other > as long > as they can make contact. To facilitate that, we created an "ompi- > server" > tool that is supposed to be run by the sys-admin (or a user, doesn't > matter > which) on the head node - there are various ways to tell mpirun > how to > contact the server, or it can self-discover it. > > I have tested publish/lookup pretty thoroughly and it seems to > work. I > haven't spent much time testing connect/accept except via > comm_spawn, which > seems to be working. Since that uses the same mechanism, I would > have > expected connect/accept to work as well. > > If you are talking about 1.2.x, then the story is totally different. > > Ralph > > > > On 4/3/08 2:29 PM, "Aurélien Bouteiller" > wrote: > >> Hi everyone, >> >> I'm trying to figure out how complete is the implementation of >> Comm_connect/Accept. I found two problematic cases. >> >> 1) Two different programs are started in two different mpirun. One >> makes accept, the second one use connect. I would not expect >> MPI_Publish_name/Lookup_name to work because they do not share the >> HNP. Still I would expect to be able to connect by copying (with >> printf-scanf) the port_name string generated by Open_port; >> especially >> considering that in Open MPI, the port_name is a string containing >> the >> tcp address and port of the rank 0 in the server communicator. >> However, doing so results in "no route to host" and the connecting >> application aborts. Is the problem
Re: [OMPI devel] Affect of compression on modex and launch messages
actually, we used lzo a looong time ago with PACX-MPI, it was indeed faster then zlib. Our findings at that time were however similar to what George mentioned, namely a benefit from compression was only visible if the network latency was really high (e.g. multiple ms)... Thanks Edgar Roland Dreier wrote: > Based on some discussion on this list, I integrated a zlib-based compression > ability into ORTE. Since the launch message sent to the orteds and the modex > between the application procs are the only places where messages of any size > are sent, I only implemented compression for those two exchanges. > > I have found virtually no benefit to the compression. Essentially, the > overhead consumed in compression/decompressing the messages pretty much > balances out any transmission time differences. However, I could only test > this for 64 nodes, 8ppn, so perhaps there is some benefit at larger sizes. A faster compression library might change the balance... eg LZO (http://www.oberhumer.com/opensource/lzo) might be worth a look although I'm not an expert on all the compression libraries that are out there. - R. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Edgar Gabriel Assistant Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
Re: [OMPI devel] Affect of compression on modex and launch messages
LZO looks cool, but it's unfortunately GPL (Open MPI is BSD). Bummer. On Apr 4, 2008, at 2:29 PM, Roland Dreier wrote: Based on some discussion on this list, I integrated a zlib-based compression ability into ORTE. Since the launch message sent to the orteds and the modex between the application procs are the only places where messages of any size are sent, I only implemented compression for those two exchanges. I have found virtually no benefit to the compression. Essentially, the overhead consumed in compression/decompressing the messages pretty much balances out any transmission time differences. However, I could only test this for 64 nodes, 8ppn, so perhaps there is some benefit at larger sizes. A faster compression library might change the balance... eg LZO (http://www.oberhumer.com/opensource/lzo) might be worth a look although I'm not an expert on all the compression libraries that are out there. - R. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Affect of compression on modex and launch messages
> Based on some discussion on this list, I integrated a zlib-based compression > ability into ORTE. Since the launch message sent to the orteds and the modex > between the application procs are the only places where messages of any size > are sent, I only implemented compression for those two exchanges. > > I have found virtually no benefit to the compression. Essentially, the > overhead consumed in compression/decompressing the messages pretty much > balances out any transmission time differences. However, I could only test > this for 64 nodes, 8ppn, so perhaps there is some benefit at larger sizes. A faster compression library might change the balance... eg LZO (http://www.oberhumer.com/opensource/lzo) might be worth a look although I'm not an expert on all the compression libraries that are out there. - R.
Re: [OMPI devel] Affect of compression on modex and launch messages
Ralph, There are several studies about compressions and data exchange. Few years ago we integrate such mechanism (adaptive compression of communication) in one of the projects here at ICL (called GridSolve). The idea was to optimize the network traffic for sending large matrices used for computation from a server to a specific workers. Under some circumstances (few) it can improve the network traffic, and according to the main author in the worst case it doesn't harm. However, it is still unclear that there is any benefit when the data is reasonably small (which is the case in ORTE). The project is hosted at http://www.loria.fr/~ejeannot/adoc/adoc.html. It's a simple dropin for read/write so it is fairly simple to integrate. On the author webpage you can find some publication about this, publication that highlight the performances of this approach. george. PS: One of these a reference to the paper is available on ACM: E. Jeannot, B. Knutsson, M. Bjorkmann. Adaptive Online Data Compression, in: High Performance Distributed Computing (HPDC'11), Edinburgh, Scotland, IEEE, july 2002. On Apr 4, 2008, at 12:52 PM, Ralph H Castain wrote: Hello all Based on some discussion on this list, I integrated a zlib-based compression ability into ORTE. Since the launch message sent to the orteds and the modex between the application procs are the only places where messages of any size are sent, I only implemented compression for those two exchanges. I have found virtually no benefit to the compression. Essentially, the overhead consumed in compression/decompressing the messages pretty much balances out any transmission time differences. However, I could only test this for 64 nodes, 8ppn, so perhaps there is some benefit at larger sizes. Even though my test size wasn't very big, I did try forcing the worst-case scenario. I included all available BTL's, and ran the OOB over Ethernet. Although there was some difference, it wasn't appreciable - easily within the variations I see on this rather unstable machine. I invite you to try it yourself. You can get a copy of the code via: hg clone http://www.open-mpi.org/hg/hgwebdir.cgi/rhc/gather You will need to configure with LIBS=-lz. Compression is normally turned "off". You can turn it on by setting: -mca orte_compress 1 You can also adjust the level of compression: -mca orte_compress_level [1-9] If you don't specify the level and select compression, the level will default to 1. From my tests, this seemed a good compromise. The other levels provided some small amount of better compression, but took longer. With compression "on", you will get output telling you the original size of the message and its compressed size so you can see what was done. Please let me know what you find out. I would like to reach a decision as to whether or not compression is worthwhile. Thanks Ralph ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI devel] MPI_Comm_connect/Accept
Ralph, I've not been very successful at using ompi-server. I tried this : xterm1$ ompi-server --debug-devel -d --report-uri test [grosse-pomme.local:01097] proc_info: hnp_uri NULL daemon uri NULL [grosse-pomme.local:01097] [[34900,0],0] ompi-server: up and running! xterm2$ mpirun -ompi-server test -np 1 mpi_accept_test Port name: 2285895681.0;tcp://192.168.0.101:50065;tcp://192.168.0.150:50065:300 xterm3$ mpirun -ompi-server test -np 1 simple_connect -- Process rank 0 attempted to lookup from a global ompi_server that could not be contacted. This is typically caused by either not specifying the contact info for the server, or by the server not currently executing. If you did specify the contact info for a server, please check to see that the server is running and start it again (or have your sys admin start it) if it isn't. -- [grosse-pomme.local:01122] *** An error occurred in MPI_Lookup_name [grosse-pomme.local:01122] *** on communicator MPI_COMM_WORLD [grosse-pomme.local:01122] *** MPI_ERR_NAME: invalid name argument [grosse-pomme.local:01122] *** MPI_ERRORS_ARE_FATAL (goodbye) -- The server code Open_port, and then PublishName. Looks like the LookupName function cannot reach the ompi-server. The ompi-server in debug mode does not show any output when a new event occurs (like when the server is launched). Is there something wrong in the way I use it ? Aurelien Le 3 avr. 08 à 17:21, Ralph Castain a écrit : Take a gander at ompi/tools/ompi-server - I believe I put a man page in there. You might just try "man ompi-server" and see if it shows up. Holler if you have a question - not sure I documented it very thoroughly at the time. On 4/3/08 3:10 PM, "Aurélien Bouteiller"wrote: Ralph, I am using trunk. Is there a documentation for ompi-server ? Sounds exactly like what I need to fix point 1. Aurelien Le 3 avr. 08 à 17:06, Ralph Castain a écrit : I guess I'll have to ask the basic question: what version are you using? If you are talking about the trunk, there no longer is a "universe" concept anywhere in the code. Two mpiruns can connect/accept to each other as long as they can make contact. To facilitate that, we created an "ompi- server" tool that is supposed to be run by the sys-admin (or a user, doesn't matter which) on the head node - there are various ways to tell mpirun how to contact the server, or it can self-discover it. I have tested publish/lookup pretty thoroughly and it seems to work. I haven't spent much time testing connect/accept except via comm_spawn, which seems to be working. Since that uses the same mechanism, I would have expected connect/accept to work as well. If you are talking about 1.2.x, then the story is totally different. Ralph On 4/3/08 2:29 PM, "Aurélien Bouteiller" wrote: Hi everyone, I'm trying to figure out how complete is the implementation of Comm_connect/Accept. I found two problematic cases. 1) Two different programs are started in two different mpirun. One makes accept, the second one use connect. I would not expect MPI_Publish_name/Lookup_name to work because they do not share the HNP. Still I would expect to be able to connect by copying (with printf-scanf) the port_name string generated by Open_port; especially considering that in Open MPI, the port_name is a string containing the tcp address and port of the rank 0 in the server communicator. However, doing so results in "no route to host" and the connecting application aborts. Is the problem related to an explicit check of the universes on the accept HNP ? Do I expect too much from the MPI standard ? Is it because my two applications does not share the same universe ? Should we (re) add the ability to use the same universe for several mpirun ? 2) Second issue is when the program setup a port, and then accept multiple clients on this port. Everything works fine for the first client, and then accept stalls forever when waiting for the second one. My understanding of the standard is that it should work: 5.4.2 states "it must call MPI_Open_port to establish a port [...] it must call MPI_Comm_accept to accept connections from clients". I understand that for one MPI_Open_port I should be able to manage several MPI clients. Am I understanding correctly the standard here and should we fix this ? Here is a copy of the non-working code for reference. /* * Copyright (c) 2004-2007 The Trustees of the University of Tennessee. * All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow * * $HEADER$ */ #include #include #include int main(int argc, char *argv[]) { char port[MPI_MAX_PORT_NAME]; int rank; int np; MPI_Init(, );
Re: [OMPI devel] init_thread + spawn error
Thanks for the report. As Ralph indicated the threading support in Open MPI is not good right now, but we are working to make it better. I have filed a ticket (https://svn.open-mpi.org/trac/ompi/ticket/1267) so we do not loose track of this issue, and attached a potential fix to the ticket. Thanks, Tim Joao Vicente Lima wrote: Hi, I getting a error on call init_thread and comm_spawn on this code: #include "mpi.h" #include int main (int argc, char *argv[]) { int provided; MPI_Comm parentcomm, intercomm; MPI_Init_thread(, , MPI_THREAD_MULTIPLE, ); MPI_Comm_get_parent (); if (parentcomm == MPI_COMM_NULL) { printf ("spawning ... \n"); MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 1, MPI_INFO_NULL, 0, MPI_COMM_SELF, , MPI_ERRCODES_IGNORE); MPI_Comm_disconnect (); } else { printf ("child!\n"); MPI_Comm_disconnect (); } MPI_Finalize (); return 0; } and the error is: spawning ... opal_mutex_lock(): Resource deadlock avoided [localhost:18718] *** Process received signal *** [localhost:18718] Signal: Aborted (6) [localhost:18718] Signal code: (-6) [localhost:18718] [ 0] /lib/libpthread.so.0 [0x2b6e5d9fced0] [localhost:18718] [ 1] /lib/libc.so.6(gsignal+0x35) [0x2b6e5dc3b3c5] [localhost:18718] [ 2] /lib/libc.so.6(abort+0x10e) [0x2b6e5dc3c73e] [localhost:18718] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b6e5c9560ff] [localhost:18718] [ 4] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b6e5c95601d] [localhost:18718] [ 5] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b6e5c9560ac] [localhost:18718] [ 6] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b6e5c956a93] [localhost:18718] [ 7] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b6e5c9569dd] [localhost:18718] [ 8] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b6e5c95797d] [localhost:18718] [ 9] /usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_proc_unpack+0x1ec) [0x2b6e5c957dd9] [localhost:18718] [10] /usr/local/mpi/ompi-svn/lib/openmpi/mca_dpm_orte.so [0x2b6e607f05cf] [localhost:18718] [11] /usr/local/mpi/ompi-svn/lib/libmpi.so.0(MPI_Comm_spawn+0x459) [0x2b6e5c98ede9] [localhost:18718] [12] ./spawn1(main+0x7a) [0x400ae2] [localhost:18718] [13] /lib/libc.so.6(__libc_start_main+0xf4) [0x2b6e5dc28b74] [localhost:18718] [14] ./spawn1 [0x4009d9] [localhost:18718] *** End of error message *** opal_mutex_lock(): Resource deadlock avoided [localhost:18719] *** Process received signal *** [localhost:18719] Signal: Aborted (6) [localhost:18719] Signal code: (-6) [localhost:18719] [ 0] /lib/libpthread.so.0 [0x2b9317a17ed0] [localhost:18719] [ 1] /lib/libc.so.6(gsignal+0x35) [0x2b9317c563c5] [localhost:18719] [ 2] /lib/libc.so.6(abort+0x10e) [0x2b9317c5773e] [localhost:18719] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b93169710ff] [localhost:18719] [ 4] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b931697101d] [localhost:18719] [ 5] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b93169710ac] [localhost:18719] [ 6] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b9316971a93] [localhost:18719] [ 7] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b93169719dd] [localhost:18719] [ 8] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b931697297d] [localhost:18719] [ 9] /usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_proc_unpack+0x1ec) [0x2b9316972dd9] [localhost:18719] [10] /usr/local/mpi/ompi-svn/lib/openmpi/mca_dpm_orte.so [0x2b931a80b5cf] [localhost:18719] [11] /usr/local/mpi/ompi-svn/lib/openmpi/mca_dpm_orte.so [0x2b931a80dad7] [localhost:18719] [12] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b9316977207] [localhost:18719] [13] /usr/local/mpi/ompi-svn/lib/libmpi.so.0(PMPI_Init_thread+0x166) [0x2b93169b8622] [localhost:18719] [14] ./spawn1(main+0x25) [0x400a8d] [localhost:18719] [15] /lib/libc.so.6(__libc_start_main+0xf4) [0x2b9317c43b74] [localhost:18719] [16] ./spawn1 [0x4009d9] [localhost:18719] *** End of error message *** -- mpirun noticed that process rank 0 with PID 18719 on node localhost exited on signal 6 (Aborted). -- if I change MPI_Init_thread to MPI_Init all works. some suggest ? The attachments contain my ompi_info (r18077) and config.log. thanks in advance, Joao. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel