Re: [MTT users] [MTT bugs] [MTT] #212: Generic networklockingserver *REVIEW NEEDED*

2010-03-05 Thread Ethan Mallove
On Fri, Feb/19/2010 12:00:55PM, Ethan Mallove wrote:
> On Thu, Feb/18/2010 04:13:15PM, Jeff Squyres wrote:
> > On Feb 18, 2010, at 10:48 AM, Ethan Mallove wrote:
> > 
> > > To ensure there is never a collision between $a->{k} and $b->{k}, the
> > > user can have two MTT clients share a $scratch, but they cannot both
> > > run the same INI section simultaneously.  I setup my scheduler to run
> > > batches of MPI get, MPI install, Test get, Test build, and Test run
> > > sections in parallel with successor INI sections dependent on their
> > > predecessor INI sections (e.g., [Test run: foo] only runs after [Test
> > > build: foo] completes).  The limitation stinks, but the current
> > > limitation is much worse: two MTT clients can't even run the same
> > > *phase* out of one $scratch.
> > 
> > Maybe it might be a little nicer just to protect the user from
> > themselves -- if we ever detect a case where $a->{k} and $b->{k}
> > both exist and are not the same value, dump out everything to a file
> > and abort with an error message.  This is clearly an erroneous
> > situation, but running MTT in big parallel batches like this is a
> > worthwhile-but-complicated endeavor, and some people are likely to
> > get it wrong.  So we should at least detect the situation and fail
> > gracefully, rather than losing or corrupting results.
> > 
> > Make sense?
> 
> Yes.  I'll add this.

The check is there now.  Ready for review.

-Ethan

> 
> -Ethan
> 
> > 
> > > I originally wanted the .dump files to be completely safe, but MTT
> > > clients were getting locked out of the .dump files for way too long.
> > > E.g., MTT::MPI::LoadInstalls happens very early in client/mtt, and an
> > > hour could elapse before MTT::MPI::SaveInstalls is called in
> > > Install.pm.
> > 
> > Yep, if you lock from load->save, then that can definitely happen...
> > 
> > -- 
> > Jeff Squyres
> > jsquy...@cisco.com
> > 
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> > 
> > 
> > ___
> > mtt-users mailing list
> > mtt-us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
> ___
> mtt-users mailing list
> mtt-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
--- client/mtt  Mon Nov 09 14:38:09 2009 -0500
+++ client/mtt  Fri Mar 05 14:02:39 2010 -0500
@@ -498,6 +498,15 @@
 # execute on_start callback if exists
_do_step($ini, "mtt", "before_mtt_start_exec");

+# Process setenv, unsetenv, prepend_path, and append_path
+
+my $config;
+$config->{setenv} = Value($ini, "mtt", "setenv");
+$config->{unsetenv} = Value($ini, "mtt", "unsetenv");
+$config->{prepend_path} = Value($ini, "mtt", "prepend_path");
+$config->{append_path} = Value($ini, "mtt", "append_path");
+my @save_env;
+ProcessEnvKeys($config, \@save_env);

 # Set the logfile, if specified

--- lib/MTT/Defaults.pm Mon Nov 09 14:38:09 2009 -0500
+++ lib/MTT/Defaults.pm Fri Mar 05 14:02:39 2010 -0500
@@ -42,7 +42,7 @@

 known_compiler_names => [ "gnu", "pgi", "ibm", "intel", "kai", "absoft",
   "pathscale", "sun", "microsoft", "none", 
"unknown" ],
-known_resource_manager_names => [ "slurm", "tm", "loadleveler", "n1ge",
+known_resource_manager_names => [ "slurm", "tm", "loadleveler", "sge",
   "alps", "none", "unknown" ],
 known_network_names => [ "tcp", "udp", "ethernet", "gm", "mx", "verbs",
  "udapl", "psm", "elan", "portals", "shmem",
--- lib/MTT/MPI.pm  Mon Nov 09 14:38:09 2009 -0500
+++ lib/MTT/MPI.pm  Fri Mar 05 14:02:39 2010 -0500
@@ -16,6 +16,8 @@

 use strict;
 use MTT::Files;
+use MTT::Messages;
+use MTT::Util;

 #--

@@ -28,10 +30,13 @@
 #--

 # Filename where list of MPI sources is kept
-my $sources_data_filename = "mpi_sources.dump";
+my $sources_data_filename = "mpi_sources";

 # Filename where list of MPI installs is kept
-my $installs_data_filename = "mpi_installs.dump";
+my $installs_data_filename = "mpi_installs";
+
+# Filename extension for all the Dumper data files
+my $data_filename_extension = "dump";

 #--

@@ -42,10 +47,15 @@
 # Explicitly delete anything that was there
 $MTT::MPI::sources = undef;

-# If the file exists, read it in
-my $data;
-MTT::Files::load_dumpfile("$dir/$sources_data_filename", \$data);
-$MTT::MPI::sources = $data->{VAR1};
+my @dumpfiles = 
glob("$dir/$sources_data_filename-*.$data_filename_extension");
+foreach my $dumpfile (@dumpfiles) {
+
+# If the file exists, read it in
+my $data;
+MTT::Files::load_dumpfile($dumpfile, \$data);
+  

Re: [MTT users] [MTT bugs] [MTT] #212: Generic networklockingserver *REVIEW NEEDED*

2010-02-18 Thread Jeff Squyres
On Feb 18, 2010, at 10:48 AM, Ethan Mallove wrote:

> To ensure there is never a collision between $a->{k} and $b->{k}, the
> user can have two MTT clients share a $scratch, but they cannot both
> run the same INI section simultaneously.  I setup my scheduler to run
> batches of MPI get, MPI install, Test get, Test build, and Test run
> sections in parallel with successor INI sections dependent on their
> predecessor INI sections (e.g., [Test run: foo] only runs after [Test
> build: foo] completes).  The limitation stinks, but the current
> limitation is much worse: two MTT clients can't even run the same
> *phase* out of one $scratch.

Maybe it might be a little nicer just to protect the user from themselves -- if 
we ever detect a case where $a->{k} and $b->{k} both exist and are not the same 
value, dump out everything to a file and abort with an error message.  This is 
clearly an erroneous situation, but running MTT in big parallel batches like 
this is a worthwhile-but-complicated endeavor, and some people are likely to 
get it wrong.  So we should at least detect the situation and fail gracefully, 
rather than losing or corrupting results.

Make sense?

> I originally wanted the .dump files to be completely safe, but MTT
> clients were getting locked out of the .dump files for way too long.
> E.g., MTT::MPI::LoadInstalls happens very early in client/mtt, and an
> hour could elapse before MTT::MPI::SaveInstalls is called in
> Install.pm.

Yep, if you lock from load->save, then that can definitely happen...

-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/