On Tue, Oct/06/2009 10:23:48AM, Ashley Pittman wrote:
>
> Further to the mail linked below, padb is able to perform diagnostics,
> including backtraces on hung jobs and integrates well into automated
> testing environments.
Can padb get a backtrace from a non-debuggable MPI (e.g., not compiled
with -g)?
-Ethan
>
> The attached patch is a minimal change which should enable the
> functionality. I don't however have access to a working MTT
> installation to test this however.
>
> http://www.open-mpi.org/community/lists/mtt-devel/2009/06/0415.php
>
> This will require a HEAD version of padb, at least r273 to allow it to
> accept the pid of mpirun rather than a jobid assigned by the underlying
> resource manager.
>
> Yours,
>
> Ashley,
>
> --
>
> Ashley Pittman, Bath, UK.
>
> Padb - A parallel job inspection tool for cluster computing
> http://padb.pittman.org.uk
> Index: lib/MTT/DoCommand.pm
> ===
> --- lib/MTT/DoCommand.pm (revision 1322)
> +++ lib/MTT/DoCommand.pm (working copy)
> @@ -359,6 +359,7 @@
> }
> my $killed_status = undef;
> my $last_over = 0;
> +my $padb_output;
> while ($done > 0) {
> my $nfound = select($rout = $rin, undef, undef, $t);
> if (vec($rout, fileno(OUTread), 1) == 1) {
> @@ -410,6 +411,8 @@
> my $timeout_email_recipient =
> $MTT::Globals::Values->{docommand_timeout_notify_email};
> my $timeout_notify_timeout =
> $MTT::Globals::Values->{docommand_timeout_notify_timeout};
>
> + $padb_output = `padb --config-option rmgr=mpirun
> --full-report=$pid`;
> +
> if (defined($timeout_sentinel_file)) {
>
> # Email someone, if an email address has been specified
> @@ -493,6 +496,9 @@
> # Return an anonymous hash containing the relevant data
>
> $ret->{result_stdout} = join('', @out);
> +if ( defined $padb_output ) {
> + $ret->{result_stdout} .= "\n$padb_output";
> +}
> $ret->{result_stderr} = join('', @err),
> if (!$merge_output);
> return $ret;
> ___
> mtt-devel mailing list
> mtt-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel