Thanks I appreciate it.  As always with this sort of operation I will 
tread with all due caution.

-Paul Edmon-

On 06/06/2013 10:02 AM, Loris Bennett wrote:
> Hi Paul,
>
> Paul Edmon <[email protected]> writes:
>
>> If you could send me the script that would be appreciated.  Thanks.
>>
>> -Paul Edmon-
> So first off, I'm providing this code as used successfully by me, but,
> to quote the GPL:
>
>      This program is distributed in the hope that it will be useful,
>      but WITHOUT ANY WARRANTY; without even the implied warranty of
>      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> So if it deletes all your data, sets fire to your computer, or makes
> your house fall down, it's not my fault.
>
> You are going to have to make the following changes:
>
> 1. The relevant database tables all have the name of the cluster as part
>     of the table name.  So, since our cluster is called 'soroban', the
>     job information is stored in the table 'soroban_job_table'.  You
>     probably need to do
>
>     use slurm_acct_db;
>     show tables;
>
>     in MySQL to see what the correct names are.
>
> 2. The main program consists of a loop over all the jobs selected by the
>     SELECT statement right at the beginning of the program.  Then there
>     is a loop over all the corresponding steps. You then have to decide
>     which cases you want to deal with and how to deal with them.  This is
>     all contained in the horrible mess of 'if's and 'else's.
>
> 3. There is an argument to the program, maxjobid, used in the main
>     SELECT.  This is because I had to distinguish between jobs which
>     really were running and those which were not, but had no end_time.
>     So I only considered ID lower than the lowest I knew really was
>     running.  Only jobs with lower IDs are considered for modification.
>
> The program generates a file to make the changes, 'update.sql', and a
> file to undo the changes, 'update_undo.sql.  You can apply them with the
> following
>
> mysql -u root -p slurm_acct_db < /path/to/update.sql
>
> Note that if you run the program, apply the changes, then run the
> program again, you will be prompted whether you want to overwrite the
> existing files.  However, since you are going to be trying this out on a
> test system using a dump of the production database, it is not going to
> matter, right?
>
> If you have any problems, let me know.
>
> Cheers,
>
> Loris
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> use Carp;
> use Getopt::Long;
> use Pod::Usage;
>
> use DBI;
> use DBD::mysql;
> use Term::ReadKey;
> use List::Util qw /max min/;
>
> ###############################################################################
> # Configuration
> ###############################################################################
>
> my $platform = 'mysql';
> my $database = 'slurm_acct_db';
> my $host     = 'localhost';
> my $port     = '3306';
> my $user     = 'root';
> my $dsn      = "dbi:$platform:$database:$host:$port";
> my %states_by_code =
>    (0 => 'PENDING',  1 => 'RUNNING',
>     2 => 'SUSPENDED',3 => 'COMPLETED',
>     4 => 'CANCELLED',5 => 'FAILED',
>     6 => 'TIMEOUT',  7 => 'NODE_FAIL',
>     8 => 'PREEMPTED');
> my %states_by_string = reverse %states_by_code;
>
> my $output_dir = './';
> my $update_file = $output_dir . 'update.sql';
> my $undo_file   = $output_dir . 'update_undo.sql';
>
> my @jobs_without_end_times;
> my @jobs_without_end_times_fixed;
> my @sql_new_values;
> my @sql_old_values;
>
> ###############################################################################
> # Options and Arguments
> ###############################################################################
>
> my $help;
> #my $max_jobid = 123400;
> my $max_jobid = 148213;
> my $std_time = 1; # Thu Jan  1 01:00:01 1970
>
> GetOptions('help|h'       => \$help,
>             'maxjobid|m=i' => \$max_jobid,
>            );
> if ($help) {
>    pod2usage(-verbose => 2);
>    exit;
> }
>
> ###############################################################################
> # Main Program
> ###############################################################################
>
> print "Enter " . $user. "'s $platform password: ";
> ReadMode 'noecho';
> my $password = ReadLine 0;
> chomp $password;
> ReadMode 'normal';
> print "\n";
>
> my $dbh = DBI->connect($dsn, $user, $password,
>                         {
>                          'RaiseError' => 1});
> my $select_jobs_stmt =
>    'select job_db_inx,id_job,' .
>    'time_start,time_end,state' .
>    ' from soroban_job_table' .
>    ' where id_job<' . $max_jobid .
>    ' and time_end=0';
> my $job_href =
>    $dbh->selectall_hashref($select_jobs_stmt,'id_job');
>
> # Loop over jobs
> # --------------
> my @job_ids = sort {$a <=> $b} keys(%{$job_href});
> foreach my $job_id (@job_ids) {
>    printf 
> '%12s%12s%12s%12s%12s','job_db_inx','Job/StepID','Start','End','State';
>    print "\n------------  ----------  ----------  ----------  ----------\n";
>    my @nonzero_step_start_times;
>    my @nonzero_step_end_times;
>    my $job = $job_href->{$job_id};
>    printf '%12d%12d%12d%12d%12s',$job->{job_db_inx},$job_id,
>      $job->{time_start},$job->{time_end},$states_by_code{$job->{'state'}};
>    print "\n";
>
>    my $select_steps_stmt =
>      'select job_db_inx,id_step,time_start,time_end,state' .
>        ' from soroban_step_table' .
>          ' where job_db_inx=' . $job->{job_db_inx};
>    my $step_href =
>      $dbh->selectall_hashref($select_steps_stmt,'job_db_inx');
>
>    # Loop over job steps
>    # -------------------
>    while (my ($step_db_inx, $step) = each %{$step_href}) {
>      my $step_id = $step->{id_step};
>      printf '%12d%12d%12d%12d%12s',$step_db_inx,$step_id,
>        $step->{time_start},$step->{time_end},
>          $states_by_code{$step->{state}};
>      print "\n";
>
>      # Collect nonzero step start/end times
>      if ( $step->{time_start} > 0 ) {
>        push(@nonzero_step_start_times,$step->{time_start});
>      }
>      if ( $step->{time_end} > 0 ) {
>        push(@nonzero_step_end_times,$step->{time_end});
>      }
>
>      # Solutions to inconsistencies
>      # ----------------------------
>      # 1. Jobs/step with RUNNING/PENDING should be set to FAILED
>      # 2. Jobs with a valid step_end_time use that
>      # 3. Jobs w/o  a valid step_end_time use job_start_time
>
>      # Check step start/end times
>      if ($step->{time_start} == 0) {
>        print "WARNING: Step start time = $step->{time_start}!\n" .
>          "NO SOLUTION DECIDED YET!\n";
>        exit;
>      }
>      if ($step->{time_end} == 0) {
>        print "Change step end time to step start time:" .
>          " $step->{time_end} ->  $step->{time_start}\n";
>        update_step(\@sql_old_values,$job->{job_db_inx},$step_id,
>                    'time_end',$step->{time_end});
>        update_step(\@sql_new_values,$job->{job_db_inx},$step_id,
>                    'time_end',$step->{time_start});
>
>      }
>      if ($step->{time_start} == 0 and $step->{time_end} == 0) {
>        print "Both step start/end times = 0!\n" .
>          "NO SOLUTION DECIDED YET!\n";
>        exit;
>      }
>
>      # Fix step state
>      # --------------
>      if ( $step->{state} == $states_by_string{PENDING} or
>           $step->{state} == $states_by_string{RUNNING}) {
>        if ( $job->{state} == $states_by_string{PENDING} or
>             $job->{state} == $states_by_string{RUNNING}) {
>          print "Change step state: $states_by_code{$step->{state}}" .
>            " -> FAILED\n";
>          update_step(\@sql_old_values,$job->{job_db_inx},$step_id,
>                      'state',$step->{state});
>          update_step(\@sql_new_values,$job->{job_db_inx},$step_id,
>                      'state',$states_by_string{FAILED});
>        } else {
>          print "Change step state to job state:" .
>            " $states_by_code{$step->{state}} -> 
> $states_by_code{$job->{state}}\n";
>          update_step(\@sql_old_values,$job->{job_db_inx},$step_id,
>                      'state',$step->{state});
>          update_step(\@sql_new_values,$job->{job_db_inx},$step_id,
>                      'state',$job->{state});
>        }
>      }
>    }
> # ----------------------
> # End of loop over steps
>
>    # Fix job state
>    # -------------
>    if ( $job->{state} == $states_by_string{PENDING} or
>         $job->{state} == $states_by_string{RUNNING}) {
>      print "Change job state from $states_by_code{$job->{state}}" .
>        " to FAILED\n";
>      update_job(\@sql_old_values,$job->{job_db_inx},'state',$job->{state});
>      update_job(\@sql_new_values,$job->{job_db_inx},'state',
>                 $states_by_string{FAILED});
>    }
>
>    # Fix job start/end_time
>    # ----------------
>    if ($job->{time_start} > 0 ) {                 # if start_time is non-0
>      if (scalar(@nonzero_step_end_times)<1) {     # & no non-0 step end_times
>        if (scalar(@nonzero_step_start_times)<1) { # & no non-0 step 
> start_times
>          # Set job_end_time to job_start_time
>          print "Change job end time to job start time:" .
>            " 0 -> $job->{time_start}\n";
>          push(@jobs_without_end_times,$job_id);
>          update_job(\@sql_old_values,$job->{job_db_inx},'time_end',0);
>          update_job(\@sql_new_values,$job->{job_db_inx},'time_end',
>                     $job->{time_start});
>        } else {
>          # Set job_end_time to max step_start_time
>          my $time = max(@nonzero_step_start_times);
>          print "Change job end time to max step start time:" .
>            " 0 -> $time\n";
>          update_job(\@sql_old_values,$job->{job_db_inx},'time_end',0);
>          update_job(\@sql_new_values,$job->{job_db_inx},'time_end',$time);
>        }
>      } else {
>        # otherwise use latest step end time for job
>        my $time = max(@nonzero_step_end_times);
>        print "Change job end time to last step end time:" .
>          " 0 -> $time\n";
>        update_job(\@sql_old_values,$job->{job_db_inx},'time_end',0);
>        update_job(\@sql_new_values,$job->{job_db_inx},'time_end',$time);
>      }
>    } else { # job start time == 0
>      if (scalar(@nonzero_step_start_times)<1) {
>        # Use now
>        print "Change job start/end time both to $std_time:" .
>          " 0 -> " . localtime($std_time) . "\n";
>        update_job(\@sql_old_values,$job->{job_db_inx},'time_start',0);
>        update_job(\@sql_new_values,$job->{job_db_inx},
>                   'time_start',$std_time);
>        update_job(\@sql_old_values,
>                   $job->{job_db_inx},
>                   'time_end',0);
>        update_job(\@sql_new_values,
>                   $job->{job_db_inx},
>                   'time_end',$std_time);
>      } else {
>        # otherwise use earliest/latest step start time for job
>        my $min_start_time = min(@nonzero_step_start_times);
>        print "Change job start time to first step start time:" .
>          "$job->{time_start} -> $min_start_time\n";
>        update_job(\@sql_old_values,
>                   $job->{job_db_inx},
>                   'time_start',0);
>        update_job(\@sql_new_values,
>                   $job->{job_db_inx},
>                   'time_start',$min_start_time);
>        my $max_start_time = max(@nonzero_step_start_times);
>        print "Change job end time to last step start time:" .
>          " $job->{time_end} -> $max_start_time\n";
>        update_job(\@sql_old_values,
>                   $job->{job_db_inx},
>                   'time_end',0);
>        update_job(\@sql_new_values,
>                   $job->{job_db_inx},
>                   'time_end',$max_start_time);
>      }
>    }
>    print "------------------------------------------------------------\n";
> }
> # ---------------------
> # End of loop over jobs
>
> print <<END
>
> Summary
> =======
>
> END
>    ;
> print "Total number of jobs w/o end time:      " .
>    scalar(keys(%{$job_href})) . "\n";
> print "Job w/o job and w/o step end time: " .
>    scalar(@jobs_without_end_times) . "\n";
> #foreach (@jobs_without_end_times) {
> #  print "\t$_\n";
> #}
>
> write_sql(\@sql_new_values,$update_file);
> write_sql(\@sql_old_values,$undo_file);
>
> $dbh->disconnect();
> exit;
>
> ###############################################################################
> # Subroutines
> ###############################################################################
>
> sub update_job {
>    my ($aref,$job_db_inx,$col,$val) = @_;
>    my $stmt =
>      'update soroban_job_table' .
>        ' set ' . $col . '=' . $val .
>          ' where job_db_inx=' . $job_db_inx;
>    push(@{$aref},$stmt);
> }
>
> sub update_step {
>    my ($aref,$job_db_inx,$id_step,$col,$val) = @_;
>    my $stmt =
>      'update soroban_step_table' .
>        ' set ' . $col . '=' . $val .
>          ' where job_db_inx=' . $job_db_inx .
>            ' and id_step=' . $id_step;
>    push(@{$aref},$stmt);
> }
>
> sub write_sql {
>    my ($sql_statements_aref,$file_name) = @_;
>    if (-r $file_name) {
>      print "$file_name exists. Overwrite? [y/n]: ";
>      my $ans = <STDIN>;
>      chomp $ans;
>      return if $ans ne 'y';
>    }
>    open(my $output,'>',$file_name)
>      or croak "Can't open $file_name: $!";
>    foreach my $sql (@$sql_statements_aref) {
>      print $output "$sql;\n";
>    }
>    close($output);
> }
>
> __END__
>
> ##############################################################################
> # POD
> ##############################################################################
>
> =head1 NAME
>
> fix_sacct_db.pl - fixes jobs/steps w/o end time
>
> =head1 SYNOPSIS
>
>     fix_sacct_db -m
>
> =head1 DESCRIPTION
>
> The program fixes errors in the sacct database, which result in the command
> 'sacct' showing jobs as running which have in fact come to an end.  All the 
> jobs
> with zero end time are selected and a suitable finite end time determined.
>
> =head1 OPTIONS
>
>     -h  --help   this help
>     -m  --maxid  max. Job ID considered
>
> =cut
>
> # Local Variables:
> # cperl-indent-level: 2
> # End:
>

Reply via email to